CUDA_SEPARABLE_COMPILATION
is needed when you are compiling code itself that requires separate compilation. In your case you just need the device linking step to occur, and so enabling the CUDA_RESOLVE_DEVICE_SYMBOLS
will be sufficient ( works locally with a c++ source file ).
When building locally I modified the target_link_libraries
call to look like:
target_link_libraries(1d_c2c PRIVATE
PRIVATE
CUDA::cufft_static
CUDA::cudart_static
CUDA::culibos
)
That correctly compiles and links. Since the example uses methods like cudaMallocAsync
we do need to link to the cuda runtime ( CUDA::cudart_static
).
But you have identified an issue with CUDA::cudfft
where it isn’t expressing the proper link requirements on pthreads and dl which I will fix.