Linking cufft_static with a CXX target

The cuFFT/1d_c2c sample by Nvidia provides a CMakeLists.txt which links CUDA::cufft. Modifying it to link against CUDA::cufft_static causes a lot of linking issues. The cuFFT docs provide some guidance here, so I modified the CMakeLists.txt accordingly to link against CMAKE_DL_LIBS and pthreads (Threads::Threads) and turned on CUDA_SEPARABLE_COMPILATION. This still doesn’t work as CMake invokes g++ for linking instead of nvcc. I would have thought that setting the LINKER_LANGUAGE property to CUDA would fix this, but it does not. Is that a bug or am I missing something?

The easy way around this is to change the source file name to use .cu instead of .cpp. One still needs the CUDA_SEPARABLE_COMPILATION property, but no explicit pthreads and CMAKE_DL_LIBS. But as the file doesn’t contain actual CUDA (i.e. device) code I see this only as a workaround.

My modified (and cleaned up) CMakeLists.txt:

cmake_minimum_required(VERSION 3.18)

set(ROUTINE 1d_c2c)

project(
  "${ROUTINE}_example"
  DESCRIPTION "GPU-Accelerated Fast Fourier Transforms"
  HOMEPAGE_URL "https://docs.nvidia.com/cuda/cufft/index.html"
  LANGUAGES CUDA CXX)

find_package(CUDAToolkit REQUIRED)
find_package(Threads REQUIRED)

add_executable(${ROUTINE}_example)

set_target_properties(${ROUTINE}_example
  PROPERTIES
    LINKER_LANGUAGE CUDA
    CUDA_SEPARABLE_COMPILATION ON
    RUNTIME_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/bin)

target_compile_features(${ROUTINE}_example
  PRIVATE cxx_std_11)

target_sources(${ROUTINE}_example
  PRIVATE ${PROJECT_SOURCE_DIR}/${ROUTINE}_example.cpp)

target_include_directories(${ROUTINE}_example
  PRIVATE ${CMAKE_SOURCE_DIR}/../utils)

target_link_libraries(${ROUTINE}_example PRIVATE
  PRIVATE
    CUDA::cufft_static
    ${CMAKE_DL_LIBS}
    Threads::Threads)

I am using CMake 3.23.1, CUDA Toolkit 11.8 and gcc 11.3

CUDA_SEPARABLE_COMPILATION is needed when you are compiling code itself that requires separate compilation. In your case you just need the device linking step to occur, and so enabling the CUDA_RESOLVE_DEVICE_SYMBOLS will be sufficient ( works locally with a c++ source file ).

When building locally I modified the target_link_libraries call to look like:

target_link_libraries(1d_c2c PRIVATE
  PRIVATE
    CUDA::cufft_static
    CUDA::cudart_static
    CUDA::culibos
    )

That correctly compiles and links. Since the example uses methods like cudaMallocAsync we do need to link to the cuda runtime ( CUDA::cudart_static ).

But you have identified an issue with CUDA::cudfft where it isn’t expressing the proper link requirements on pthreads and dl which I will fix.

CUDA::culibos should automatically come with CUDA::cufft_static according to the docs.

Replacing CUDA_SEPARABLE_COMPILATION with CUDA_RESOLVE_DEVICE_SYMBOLS and adding CUDA::cudart_static is enough for it to compile for me (even without setting LINKER_LANGUAGE to CUDA).

Are you saying that CUDA::cudart_static expresses these requirements for pthreads and dl, but CUDA::cufft_static should as well so that it could be used without CUDA::cudart_static (e.g. with the driver API, I guess)?

I was actually asking myself if it is expected that CUDA::cufft_static comes with culibos, but the cufft_static that becomes available just by using the CUDA language without find_package(CUDAToolkit) does not automatically include it.

Are these targets without the CUDA:: namespace a feature or just an implementation detail. In contrast to the CUDA:: ones they don’t seem to be documented at all?

Mainly correct. This allows CUDA::cufft_static to be used with either the shared or static cudart ( CUDA::cudart or CUDA::cudart_static.

No CUDA:: targets are brought in by enabling the CUDA language. These are done by find_package(CUDAToolkit) which documents all the user facing targets ( CUDA::cufft_static ).
The non namespaced names aren’t targets, but are just libraries on disk. So target_link_libraries(my_target PUBLIC cufft) just expects cufft.so or cufft.a to exist in an implicit link directory of the system. Which for projects that have CUDA enabled is the case.

Makes sense, thank you!