Using ExternalProject to download a header-only library

PaulMoore · March 30, 2021, 12:39pm

My code is extremely simple, it’s a single C file:

cmake_minimum_required(VERSION 3.19)
project(MyApp)
add_executable(MyApp myapp.c)

But I want to #include a header from an external library (specifically SQLite). There’s nothing I need to link to, I just want the include. (This is a stripped down version of a project where I’m writing a SQLite extension, which is a shared library target that needs the SQLite header).

I’m extremely new with CMake, so I was floundering around a bit trying to find out where to start (this isn’t a scenario that gets covered in tutorials!) but I discovered ExternalProject_Add, which seems like it should work for me:

include(ExternalProject)

ExternalProject_Add(SQLite
    URL https://sqlite.org/2021/sqlite-amalgamation-3350300.zip
    CONFIGURE_COMMAND ""
    BUILD_COMMAND ""
    INSTALL_COMMAND ""
)

I’ve given empty commands for everything as I don’t need to configure, build or install (or if I do, I don’t know how to, as this is just a zipfile with a couple of .c and .h files).

That does everything I want (downloads and unpacks the file) but I don’t know how to tell my main project where to find the include directory (which is buried in the build directory somewhere CMake chooses to put it). I think this is communicated via properties on the external project, but I don’t know what properties I need or how to find them (if they are automatically set) or set them (if I need to do that myself via a configure command).

The simpler tutorial examples using a library in a subdirectory don’t help much here, as they assume the library is built with CMake, and don’t explain how the sub-project communicates with the main project.

It’s possible this is covered in the documentation, but I find it awfully hard to find “how things work” information like this in the manuals - so a pointer to where to look, or better still a summary of the important points, would be very much appreciated.

ben.boeckel · March 30, 2021, 7:20pm

I think that FetchContent is likely more in line with what you’re looking for. @craig.scott is way more familiar with it than I am.

To continue using ExternalProject, you could perhaps look at extracting the SOURCE_DIR property via ExternalProject_get_property and using that as the basis to get the include directory.

PaulMoore · March 30, 2021, 9:30pm

Thanks, I’ll look at FetchContent as well. You suggest extracting SOURCE_DIR - that sounds like a possibility too, but how did you know that there’s a SOURCE_DIR property on the external project? I’ve not been able to find anywhere that documents what properties exist, or any way of finding out…

ben.boeckel · March 30, 2021, 9:46pm

Ah, the property names are basically the arguments that are available. It’s how ExternalProject stores its information for use in various bits of its internals. Documentation could be improved to that effect.

PaulMoore · March 31, 2021, 10:50am

Thanks. So is reading properties the “correct” way of doing this? You seem to be saying it’s internal details - or am I misunderstanding you?

Taking a step back here, am I approaching my problem in the wrong way altogether? Basically, all I want to do is:

Download and unpack a file from a URL - I don’t care where it gets unpacked, but see the next point.
Add a reference to the unpacked location of a particular .h file in the archive to my project.

Doing this in a shell script would be easy, but the result wouldn’t be portable, and I’d have to deal with a bunch of admin like picking somewhere to download and unpack the file, and dealing with errors. I get the feeling that I’m getting caught in a process of making more and more complex solutions, simply because I’m missing something basic - but I don’t know what Unfortunately, none of the tutorials I’ve read really cover this sort of situation.

It sounds like FetchContent might be closer to what I want, but trying to work out what to do based on the documentation, I got as far as

include(FetchContent)
FetchContent_Populate(SQLite
    URL https://sqlite.org/2021/sqlite-amalgamation-3350300.zip
)

target_include_directories(MyApp PUBLIC "${SQLite_SOURCE_DIR}")

and I’m still getting “Cannot open include file ‘sqlite3.h’”. The library did get downloaded, it’s in build/sqlite-src, and it did get downloaded at configure time, but if I add a message() call, it looks like the variable SQLite_SOURCE_DIR is empty… (And adding FetchContent_GetProperties(SQLite) mde no difference, either).

(By the way, another reason I want to properly understand all this is that I have another situation where I think ExternalProject is what I need, but I need a custom command because the project isn’t CMake-based. I don’t know how to communicate information back from that command to my CMake script, so that it knows what to link into my project and where to find headers, etc. I think that’s again because I don’t really understand what’s going on - but I’ll ask that question separately, once I better understand how to handle the example here, as I think that’ll make it easier to formulate the next question).

fenrir · March 31, 2021, 11:33am

It looks like you’ve misunderstood the documentation as you’re missing the FetchContent_Declare and using the _Populate wrong. I think what you want is this (writing from memory):

FetchContent_Declare(SQLite
    URL  https://sqlite.org/2021/sqlite-amalgamation-3350300.zip
)
FetchContent_Populate(SQLite)
target_include_directories(MyApp PUBLIC "${sqlite_SOURCE_DIR}")

PaulMoore · March 31, 2021, 12:23pm

Thank you. That seems to work!

For clarification:

I was trying to do the fetch in one step because it looks cleaner to me, and the documentation says “The FetchContent_Populate() command also supports a syntax allowing the content details to be specified directly rather than using any saved details”, so I thought it was allowed. I don’t need any of the “fancy” stuff around declaring first and populating later. Clearly my mistake, but it would be good if the documentation were more specific about why you need the “declare then populate” pattern even in trivial cases. What’s the logic here? There’s a lot of stuff in the docs about scope, and global properties, which I haven’t been able to find good explanations for yet, so it’s still very hazy to me why I need to care about them.
I was using SQLite_SOURCE_DIR rather than sqlite_SOURCE_DIR. I thought CMake was generally case insensitive? Is there any documentation on what is case sensitive and what isn’t?

ben.boeckel · March 31, 2021, 3:44pm

Yes, it’s the correct way. Knowing that it’s the correct way involved knowing the internal details. The documentation should be updated to mention (or be clearer about) what properties are available.

It’s probably mentioned in the syntax documentation. Generally, the only case-insensitive thing are command names (the part before the first open parentheses on each line). Variable names, properties, genex names, targets, test names, etc. are certainly case-sensitive.

craig.scott · April 1, 2021, 3:49am

Here’s the FetchContent code I’d use for what you’re trying to do:

FetchContent_Declare(SQLite
    URL      https://sqlite.org/2021/sqlite-amalgamation-3350300.zip
    URL_HASH MD5=edfc21b8f1a6ea506b0a54f707634a75
)
FetchContent_MakeAvailable(SQLite)
target_include_directories(MyApp PUBLIC "${sqlite_SOURCE_DIR}")

This is similar to the code from @fenrir, but with the following differences:

Use FetchContent_MakeAvailable() rather than FetchContent_Populate() so that population only happens if nothing has already done it earlier in the configure run.
Use a URL_HASH to avoid re-downloading every time CMake is run.

Calling FetchContent_Populate() directly with all the details would be suitable for use only in a script where you always want to perform the download every time. Don’t use it in an actual project because there you want to re-use the download from a previous run if available.

Think of someone who later decides they want to use your project as a dependency in their own project that you know nothing about. They might want to use a slightly different version of sqlite to the one you specified (there may have been a bug fix in sqlite since the code you put in your project, but your project hasn’t been updated for it yet or they don’t want to wait for you to update your project just for that). They want to have the opportunity to override your details for the sqlite dependency. Separating out the “declare” and “do it” parts allows a parent project to override the details by declaring them first (a key behavior of FetchContent is to honour the first declared details for a dependency), without requiring the parent to be concerned about the “do it” part, which can still be left to the dependency. In some cases, you might declare details but based on other logic, you may decide to skip actually populating that dependency (e.g. some CMake option turns off the feature that needs it).

The FetchContent documentation includes the following under the docs for the FetchContent_Populate() command:

FetchContent_Populate() will set three variables in the scope of the caller; <lcName>_POPULATED , <lcName>_SOURCE_DIR and <lcName>_BINARY_DIR , where <lcName> is the lowercased <name> .

The decision to force the dependency name to lowercase for this came from my experiences working with the pre-cursor to FetchContent. Users wouldn’t always think of the dependency name with the same upper/lowercase conventions. It became clear that FetchContent needed to consider the dependency name in a case-insensitive way. However, variable names in CMake are case sensitive, so in order to ensure predictable behavior, it was decided to use the lowercased dependency name as part of the variable name.

FetchContent also provides some other cache variables for each dependency. Cache variables have a pretty strong convention of being fully uppercase, so that convention was followed for the cache variables. Examples are things like FETCHCONTENT_SOURCE_DIR_MYDEPNAME, where MYDEPNAME is the uppercased dependency name. This is also documented further down for the FetchContent_Populate() command.

PaulMoore · April 1, 2021, 8:23am

Thanks for the very useful explanation, I see the logic now. I wasn’t thinking in terms of others reusing my project, but when looking at it like that the approach makes sense.

Apologies for the places where I missed stuff in the documentation - there’s a lot in there, and it’s easy to miss bits when skimming over parts that don’t seem relevant

craig.scott · April 1, 2021, 8:45am

I agree the current documentation is not all that well structured at the moment. It grew over time with some new functionality, but I’m not all that happy with the end result. I will need to give it a working over in the future, as it doesn’t currently lead the reader through the right sequence of ideas (you find out about FetchContent_MakeAvailable() much too late, for example).

PaulMoore · April 1, 2021, 9:02am

IMO (and I very definitely only offer this comment as a newcomer’s perspective, I understand how hard writing documentation is) it’s not so much the FetchContent documentation itself, but rather the higher level “how this all hangs together” context, which is there, but buried in a variety of pages without an obvious roadmap that someone should take when starting out. So it’s fine as reference, but more of a struggle if you don’t have the basic concepts internalised yet.