How to achieve good dependencie granularity AND allowing source globing AND monkeypatching?

james_brown · December 5, 2020, 10:15pm

What I am doing

I am using a CMake based build system where code is organized into libraries to allow monkeypatching and source globing. Basically, some directory are translated into libraries.

Creating libraries rather than integrating into source allow to maintain small executable size for executable that use only a subpart of sources. This also allow monkeypatching by playing with linking order.

So in my project, a folder is going to contain the normal sources to form code_lib. Another one will form the test_lib

The tests executables are linked against the test_lib
and then the code_lib, while a normal executable link to the code_lib only.

Problem

The thing I don’t like about this is that now, a minor change in one file of code_lib make the whole lib out of date, thus every tests are going to be rebuild, since they depend on the library. This is often unnecessary because every test executable only use one or two source files from the lib (unit testing).

It’s not a little matter of performance. My build system expose a target that allow to run only changed/failed tests. I put a lot of effort into this, but now it’s kinda useless since the slightest change make all the tests to rebuild and rerun again, when it should run 1 test instead.

Solution?

During the first build of any executable, CMake could make the linker output the files that were really used (ie that defined at least one included symbol). This may require the first build to be compiled with additional flags. Then CMake automatically replace the executable dependencies with the output.

During my search I have bumped into an other user that ask for such feature: https://stackoverflow.com/questions/51069613/detecting-unnecessary-libraries-in-target-link-libraries

Others alternatives:

I would like to know if someone see an easier way. I thought of object library. But if I understand the concept correctly, I would loose the monkey-patching and the size economy. I also quickly looked over LINK_WHAT_YOU_USE because of the promising name, but I don’t think it fits my needs.

Can someone think of something else to achieve what I am trying to do? I am sure I am not the first one to bump into this. If CMake can not do this, do you know a build system that would allow it?

Thanks for your time.

ben.boeckel · December 6, 2020, 6:44pm

Do they rebuild or just relink?

You may know this, but the build tool only sees “the library changed, this test executable requires it, so I will rerun the link command”. There’s no better granularity than that available in CMake.

This is a large undertaking. AFAIK, the IDE generators just wouldn’t even have a way to specify this kind of thing. Makefiles could do it, but it’d be messy. Ninja also can’t replace dependencies with another set (it can only add them via depfiles).

I know of no build tool that provides this kind of “here’s my best guess at dependencies, I’ll give you a better list later” logic. I mean you could hack it up with Makefiles, but you’re in hand-crafted Make at that point.

james_brown · December 7, 2020, 10:31am

They relink. I mean that the target is rebuilt. And when the tests target rebuilds, it marks the tests as need to be run again, via a target custom command.

In my mind, CMake could just monitor the first build so compilers output the file information (via dwarf compiler flags), and then trigger a reconfiguration of the project to keep only real dependencies parsing the dwarf output. This would not rely on generators specific features, but rather on linker flags dwarf support and a tool to parse. Either an already existing one (nm on linux and windows), either a very basic homemade cross platform dwarf parser).

This could be a target property that the user could set for specific wanted targets, thus avoiding necessary work if dev allready done it, and not affecting older projects.

This looks complex. But the result is really appealing to me. This would reduce very much the interaction with CMake needed by developers to achieve correct dependencies. To my point of view, it would makes globing working as expected modulus the CONFIGURE_DEPEND not working on some platforms.

ben.boeckel · December 7, 2020, 3:02pm

“could just” is carrying a lot of weight here . There’s no mechanism for CMake to run at build time except as a command just like the compiler to run a script or regenerate the build files.

This would need to happen on every build since a change in source can cause new objects to be used in a link step. This is not something I forsee any build system being willing to add.

The backing tools would need some way to hook this work into the build graph. I can’t see how ninja wouldn’t complain about a cycle here and I’m unfamiliar enough with VS or Xcode to know.

I don’t think this is feasible work for any build tool available today. Maybe the monorepo projects could add it, but it sounds very fragile and they tend to do tree pruning via hashes anyways.

Cc: @brad.king

james_brown · December 7, 2020, 7:18pm

Well is’nt CMake able to make the generators put a MARKER_FILE dependency on everything it want (symbolic target or .o file), that would run a process doing whatever it want (reconfiguring the project ) and creating the MARKER_FILE afterward? This would allow to control totally the building of the first build after a modification.

Why? You think it’s too slow? Reconfiguring when using the cache always had been really negligible for me compared to build/tests. Also, this would happen only if the sources changed (Detected automatically when some sources are newer than MARKER_FILE).

If you complexity the process, I am sure that there must be some clever way of not forcing to re-parse the whole executable information again by using some kind of external state. Imagine that you apply the marker trick on every .o. You parse the dwarf of the modified .o only and you update some kind of global database of the dependencies according to the appearing/disappearing ones.

Then you just need a way to override dependencies of a target at the end of your CMakelist. If the database have result for a target, you override it, else, you don’t touch dependencies.

I would do that by forcing reconfiguration of the project with the marker trick, which would recreate the generator but overriding dependencies with the ones that are present in the database, when present. This would still use the previous cache, so it would not be slow.

Or maybe what I don’t get is that that some generators don’t allow for custom file dependencies ? Is it the problem?

Sorry, after reading I thinks it’s pretty hard to understand what I mean. I hope you understand the global picture.

Thanks for your answer.

ben.boeckel · December 7, 2020, 7:39pm

There could only be one of these during any given build (because doing these in parallel is not going to end well at all), so all targets would need to synchronize on one “we’re rerunning cmake now” step. This is at odds with generated sources (library → executable used during the build → sources used in another executable).