The CMAKE_CUDA_ARCHITECTURES at the moment works so that single target is generated into the make file, where CUDA compiler invocation would list all the architectures desired. So for example, if I put following in my CMake file:
set_target_properties(tgt PROPERTIES CUDA_ARCHITECTURES "70;75;80")
then the build command for CUDA files for this target will look like:
nvcc ... --generate-code=arch=compute_70,code=[compute_70,sm70] --generate-code=arch=compute_75,code=[compute_75,sm_75] --generate-code=arch=compute_80,code=[compute_80,sm_80] ...
The CUDA compiler then generates code serially for each given architecture.
We have a project with couple large CUDA files, that are main culprit for slow build times. From CUDA 11.x, the CUDA compiler has a flag to use multiple threads, for building for different architectures in parallel. However, it’s hard to match value for “-j” make option with the value of this flag in order to avoid oversubscribing CPU cores on build machine. So I’m wondering, would it be possible for CMake to have a flag that would make it possible to change the default behavior and, for above example, generate three targets instead, one per each of the requested architectures? This way, building for different architectures would go in parallel automatically.