Interfaces, implementations, and unresolved symbols

ingolf · March 24, 2022, 2:32pm

Hi,

how should we model the following situation in CMake?

There is an interface specification (e.g. set of C language header files): H
There are two (static) libraries L1 and L2, each of them implementing this interface H
There is a (static) library U which uses the H interface but does not care whether L1 or L2 is used to implement that interface
There is another (static) library M which uses U
Finally, there are two executables E1 and E2 which both use M and additionally L1 or L2, respectively.

The goal is that E1 pulls in M which in turn pulls in U and that the unresolved symbols (from the H interface) are resolved using L1. (Similarly for E2.)

Our current approach is to

define an INTERFACE library H
define “normal” libraries L1 and L2
define a “normal” library U and specify target_link_libraries(U PRIVATE H)
define a “normal” library M and specify target_link_libraries(M PRIVATE U)
define executables E1 and E2 and specify target_link_libraries(E1 PRIVATE M L1) and target_link_libraries(E2 PRIVATE M L2)

Unfortunately, the resulting sequence of libraries linked with the executables is

M
one of L1 and L2
U

This is no wonder as U does not depend on L1/L2. Nevertheless, U is linked too late resulting in unresolved symbols.

Is there some mechanism which supports the following specifications?

“Whenever some item is used which depends on H, a library which implements H must be linked afterwards”
"L1 implements the H interface"

(Of course, for each executable, there may only be one library which implements H.)

An alternative could potentially be to have CMake link depth first. However, I currently lack imagination of possible draw backs of this approach.

Kind regards
Ingolf

Edit: Added suggestion for depth first link order.

JohannesWilde · March 28, 2022, 8:46am

It seems, someone already encountered this problem some time ago - without any solution:

https://stackoverflow.com/questions/38205119/cmake-link-ordering-when-using-multiple-implementations-of-interface-libraries

In addition I created a simple test project:

https://github.com/JohannesWilde/TestingCMakeInterfacing

ingolf · March 29, 2022, 8:39am

I’ve finally opened a ticket in the CMake gitlab for this problem.

ingolf · March 29, 2022, 12:20pm

@brad.king could you please elaborate on your comment in the gitlab ticket?

Why do you expect the headers to depend on the implementation? For sure, the implementation depends on the headers; but I do not see a reason for the other direction. In my understanding, it is the very nature of an interface to not depend on its implementation.
Can you suggest an alternative for the original problem? How would we specify that U only needs the H interface (i.e. that it is irrelevant whether that interface is implemented in L1 or L2) and that the actual selection of implementation is done at a higher level (here: when linking E1 / E2)?

Kind regards
Ingolf

ben.boeckel · March 29, 2022, 1:09pm

One possible solution:

set(is_exe "$<STREQUAL:$<TARGET_PROPERTY:TYPE>,EXECUTABLE>")
set(use_l1 "$<BOOL:$<TARGET_PROPERTY:H_IS_IMPL_BY_L1>>")
set(use_l2 "$<BOOL:$<TARGET_PROPERTY:H_IS_IMPL_BY_L2>>")
target_link_libraries(H
  INTERFACE
    # Sanity checks
    # Cannot use both.
    "$<$<AND:${use_l1},${use_l2}>:error::cannot_use_l1_and_l2>"
    # Must choose one for executables
    "$<$<AND:${is_exe},$<NOT:${use_l1}>,$<NOT:${use_l2}>>:error::must_choose_an_implementation_of_H>"

    # Choose the implementation
    "$<${use_l1}:L1>" # maybe include ${is_exe} here?
    "$<${use_l2}:L2>"
    )

brad.king · March 29, 2022, 1:12pm

U does not depend on L1/L2.

U does depend on one of those, or linking without them wouldn’t get unresolved symbols. You need to tell CMake about this dependency. U depends both on the interface of H and an implementation of it. The latter needs to be specified by U too.

Try the following:

# Interface H headers.
add_library(H INTERFACE)
target_include_directories(H INTERFACE ...)

# Interface H implementation L1.
add_library(L1 STATIC l1.c)
target_link_libraries(L1 PRIVATE H)

# Interface H implementation L2.
add_library(L2 STATIC l2.c)
target_link_libraries(L2 PRIVATE H)

# Consumer of interface H.
# Implementation selected based on a target property of the final executable.
add_library(U STATIC u.c)
target_link_libraries(U PRIVATE H "$<TARGET_PROPERTY:H_IMPL>")

# Intermediate library to match original example.
add_library(M STATIC m.c)
target_link_libraries(M PRIVATE U)

# Executable that requests H be implemented by L1.
add_executable(E1 e.c)
target_link_libraries(E1 PRIVATE M)
set_property(TARGET E1 PROPERTY H_IMPL L1)

# Executable that requests H be implemented by L2.
add_executable(E2 e.c)
target_link_libraries(E2 PRIVATE M)
set_property(TARGET E2 PROPERTY H_IMPL L2)

ingolf · March 29, 2022, 1:15pm

@ben.boeckel, thanks for this suggestion. Does it really support having both executables E1 (using L1) and E2 (using L2) in parallel?

ben.boeckel · March 29, 2022, 1:17pm

Yes, it would; $<TARGET_PROPERTY> asks the target evaluating the genex the question (without an explicit target name). Brad’s suggestion to just use the value directly works too, but probably doesn’t degrade as nicely when there are errors (though the error checking is verbose too). It also supports an open set of impls; mine is rather more closed-set.

ingolf · March 29, 2022, 1:19pm

@brad.king, I agree that eventually U requires to be linked with one implementation of the H interface. My goal is to have the executable choose the implementation. And furthermore, that I can have different executables with different selections of which implementation is to be used.

brad.king · March 29, 2022, 1:20pm

My example does that. The executable sets the H_IMPL target property to specify which implementation it wants for H.

craig.scott · March 30, 2022, 4:59am

@brad.king I think the approach @ingolf is looking for is basically a variant of the “link seaming” problem I was seeking to use the new INTERFACE_LINK_LIBRARIES_DIRECT property for (bumped from the 3.23 release and pushed back to 3.24). The library defining the interface wants to say “an implementation will be linked by the top level executable, and that target’s XXX property tells you what to link for that”. The executable then needs to define the XXX property to provide an object file or object library to link to. By making it an object file or object library, the linker won’t be discarding its symbols and so it should resolve the symbols required by the library defining the interface. Sorry if that’s a bit vague, I mostly just wanted to make the association in case it expands the discussion to include related recent work.

ingolf · March 30, 2022, 10:41am

Thanks, @ben.boeckel and @brad.king. I believe, we are converging… gradually.

The downside of small test cases is that they often lack the complexity of real life. In the real project, L1 and L2 are themselves linked into larger (static) libraries, sometimes even indirectly. In consequence, E1 and E2 do not “know” of the existence of the L1 and L2 libraries (and thus cannot set_property() the H_IMPL property themselves).

Maybe it is helpful to see a concrete real-life use case. Consider an embedded application whose main functionality is implemented in the M library which shall support different hardware environments. Furthermore, this application uses some library U for UDP based communication.

The actual communication medium is, say, Ethernet, but a generic implementation of U is only possible down to a certain layer of the communication protocol as different hardware environments are equipped with different Ethernet controllers (L1 and L2, respectively). However, both ethernet controllers implement the same H interface, so U does not need to know which controller type is actually used.

Each of the ethernet controllers is part of a specific board (B1 and B2) which typically has additional components driven by different libraries. As each Ethernet driver needs different initialization, the L1 and L2 libraries provide indivual functions (in addition to H) which are used by B1 / B2 at board initialization time.

item	purpose
`M`	main application library
`U`	library for UDP based communication
`H`	generic interface of the Ethernet drivers
`L1`	driver library implementing `H` for controller type #1
`L2`	driver library implementing `H` for controller type #2
`B1`	library which handles boards of type #1 (including an Ethernet controller of type #1)
`B2`	library which handles boards of type #2 (including an Ethernet controller of type #2)
`E1`	executable which implements the application for board type #1
`E2`	executable which implements the application for board type #2

So, actually, the executables do not link directly with L1 or L2 but with B1 and B2, respectively. Is it possible to let each of L1 and L2 define the H_IMPL property (referring to themselves) and have CMake forward that property upwards to B1 / B2 and eventually to E1 / E2?

I tried in vain to fully understand the concept of properties in the context of Transitive User Requirements. Furthermore, I have difficulties determining the actual target refered to by $<TARGET_PROPERTY:prop> in Target-Dependent Queries: Apparently, U can query the H_IMPL property of the executable (with several layers in between?). Is there some similar mechanism by which L1 and L2 could set a property up-stream?

(In case you were wondering… Yes, the whole flexibility could also be implemented with run time polymorphism, i.e. linking all drivers into one large executable and then setting up a arrays of function pointers to the implementation which is to be enabled. However, this would significantly increase the memory footprint of the application, not to speak of slower execution due to indirect function calls and other drawbacks. This is why we chose link time polymorphism.)

Kind regards
Ingolf

brad.king · March 30, 2022, 2:05pm

When $<TARGET_PROPERTY:prop> appears in INTERFACE_LINK_LIBRARIES, it evaluates to the value of the property on the target whose transitive link closure is currently being computed. That can be a different value for every downstream consumer through an entire chain, but one might only set the property on a subset of the downstream targets (such as just the final executables)

Apparently, U can query the H_IMPL property of the executable (with several layers in between?).

Yes.

Is there some similar mechanism by which L1 and L2 could set a property up-stream?

No. Since usage requirements can be defined in terms of properties, allowing them to also set properties would create a computationally intractable problem.

You just need to make each executable’s decision about what board to use based on a property, rather than by what other libraries it links. Rather than setting H_IMPL to L1 or L2 directly, define a logical BOARD property, and link executables to an intermediate board selection interface instead of B1 or B2 directly:

add_library(H INTERFACE)
target_include_directories(H INTERFACE ...)

add_library(L1 STATIC l1.c)
target_link_libraries(L1 PRIVATE H)

add_library(L2 STATIC l2.c)
target_link_libraries(L2 PRIVATE H)

add_library(U STATIC u.c)
target_link_libraries(U PRIVATE H
  "$<$<STREQUAL:$<TARGET_PROPERTY:BOARD>,Type1>:L1>"
  "$<$<STREQUAL:$<TARGET_PROPERTY:BOARD>,Type2>:L2>"
)

add_library(B1 STATIC b1.c)
target_link_libraries(B1 PRIVATE L1)

add_library(B2 STATIC b2.c)
target_link_libraries(B2 PRIVATE L2)

# Executables link this and set their BOARD property to select a board type.
add_library(B INTERFACE)
target_link_libraries(B INTERFACE
  "$<$<STREQUAL:$<TARGET_PROPERTY:BOARD>,Type1>:B1>"
  "$<$<STREQUAL:$<TARGET_PROPERTY:BOARD>,Type2>:B2>"
)

add_library(M STATIC m.c)
target_link_libraries(M PRIVATE U)

add_executable(E1 e.c)
target_link_libraries(E1 PRIVATE M B)
set_property(TARGET E1 PROPERTY BOARD Type1)

add_executable(E2 e.c)
target_link_libraries(E2 PRIVATE M B)
set_property(TARGET E2 PROPERTY BOARD Type2)

ingolf · April 27, 2022, 8:22am

Thanks again for your support.

I’ve finally managed to find a user friendly way based on @brad.king’s approach which basically works as follows:

all add_library() and add_executable() invocations need to be wrapped
the wrapper for add_library()…
- (in case of an INTERFACE library): allows for specification that the library denotes an abstract interface and thus needs an implementation; the corresponding name of the target property to be used in executables is derived from the name of the library
- (in case of a “normal” library): allows for specification which abstract interfaces (if any) are implemented by this library
- (again for “normal” libraries): recursively collects abstract interfaces implemented in this library and all subordinates (specified via target_link_libraries()) and stores this list in a custom target property
- scans direct subordinates (specified via target_link_libraries()); in case of abstract interfaces, it adds additional dependencies using generator expressions similar to Brad’s suggestion
the wrapper for add_executable()…
- scans the libraries which are linked into the executable and
- defines the target properties which define the chosen implementation for the abstract interfaces.

Obviously, this needs significant “magic” behind the scenes. Would you consider this something which sould eventually be integrated in CMake? In that case, I’d try and re-write the ticket in the CMake gitlab (improvement/extension rather than bug).

Kind regards
Ingolf

ben.boeckel · April 27, 2022, 10:20am

This is very similar to VTK’s autoinit system, but I don’t think this is something that has enough common ground in how it actually mechanically works in practice to be something CMake could reliably perform (there are just too many project-specific details that matter IMO).