Linking against CUDA::cuda_driver not working right with libcuda stub, wants libcuda.so.1 not libcuda.so

Artem-B · March 7, 2023, 11:40pm

Does anyone know what is going on here?

I can provide a bit of context. libcuda.so is a bit of a special snowflake.

Unlike most of the other libraries that ship with CUDA SDK, libcuda.so is provided by the NVIDIA driver, which is only installed on the machines where NVIDIA GPUs are present. This is often not the case for the machines where one builds CUDA apps.

So, in order to be able to build a functional CUDA app which uses the driver API, the executable has to be linked with stubs/libcuda.so. The stub is essentially an interface library, which only provides the symbols and allows the linker to finish linking the executable w/o complaining about the missing symbols. DT_SONAME=libcuda.so.1 of the stub is intentionally does not match the file name libcuda.so, because we do not want dynamic linker to ever load stub/libcuda.so if we were to run the executable linked with it.

Instead, when the executable is run, dynamic linker will go searching for libcuda.so.1 among the shared libraries in the standard search path. On machines where NVIDIA driver is installed, it will find the real libcuda.so.1.X.Y provided by the driver vX.Y. On machines w/o the GPU the execution will be expected to fail due to the missing libcuda.so.

If one needs the application to run on machines where libcuda.so is unavailable, then the standard approach is to not link with libcuda.so (stub or real) but instead dlopen(libcuda.so.1) and use dlsym to find the pointers to the appropriate driver API functions. This is how libcudart.so operates under the hood.