Selecting lto_70 CUDA architecture with Cmake

The CUDA manual says that link-time device code optimization can be done on CUDA code now if we use a special architecture, e.g. 70_lto.

Previously I had been using compute_70 for all CUDA targets in my project by the Cmake command:

set(CMAKE_CUDA_ARCHITECTURES 70)

It seems Cmake allows us to also put “70-virtual” above, but I’ve not found any way to use the lto_70 architecture as described in the CUDA manual link above.

How can I use lto_70? I have tried simply using the -dlto flag to enable link time optimizations, but it seems that the flags as set by CMake cause this error from nvcc:

nvcc fatal : '-dlto' conflicts with '-gencode' to control what is generated; use 'code=lto_<arch>' with '-gencode' instead of '-dlto' to request lto intermediate

Cc: @robert.maynard