[ROCm, HIP] architecture detection: keep track of failed command execution

Matteo_Concas · October 5, 2023, 5:02pm

When using ROCm, GPU architecture detection is steered by using
rocm_agent_enumerator executable.

It can happen that during a project configuration stage the call to the
executable fails, e.g. because of a different Python version picked up
at configuration stage w.r.t. the system one that is working.
In this case it happens that the detection selects a default
architecture and goes forward. The logic to get default architecture is
not clear to me, but it does not really matter at this point.

I might have a limited understanding of the context but I wonder if it
couldn’t make much sense to propagate the result of this (auto)detection
failure/default pick inside a dedicated variable.
In this way cmake would leave some flag to the project maintainers to
override the default choice.

This would be useful especially because errors due to targets built
against the wrong GPU architectures can be spotted only later during
program execution.

brad.king · October 6, 2023, 2:31pm

For CUDA we detect what nvcc selects as its default architecture and use that as the default value for CMAKE_CUDA_ARCHITECTURES. We also detect the host GPU’s native architectures, but we only use them if the user or project explicitly sets CMAKE_CUDA_ARCHITECTURES to native. If native is requested but no native architectures were detected, we issue a cmake-time error.

In order to use a similar approach for ROCm+HIP we need a way to choose default CMAKE_HIP_ARCHITECTURES that does not rely on native hardware detection. Does anyone know if/how hipcc and/or ROCm’s clang selects a default AMD GPU architecture if no explicit --offload-arch= is given?

mconcas · October 7, 2023, 3:35pm

Does anyone know if/how hipcc and/or ROCm’s clang selects a default AMD GPU architecture if no explicit --offload-arch= is given?

Looking at the hipcc.pl script, the logic to detect available architectures is via rocm_agent_enumerator, which is the same approach in use by CMake.
The question might be how the default architecture is chosen in CMake if the enumerator does not work.
E.g. see: here.
Regardless, having a state reporting the failed autodetection would help to externally manage these cases.

brad.king · October 11, 2023, 2:06pm

I’ve opened CMake Issue 25325 for this, thanks.