Hi,
We have encountered a regression in CMake starting with version 3.22 that affects MPI detection on Cray systems, specifically on Frontier (a HPE Cray EX supercomputer at OLCF). This issue is not present in earlier versions such as 3.20 or 3.21. Below are the details and steps to reproduce the problem.
[Summary of the issue]
CMake Versions:
Not reproducible: CMake 3.20, 3.21
Reproducible: CMake 3.22, 3.27, 3.31 (latest)
Scenario 1:
CMAKE_SYSTEM_NAME=Catamount
Modules: PrgEnv-cray or PrgEnv-gnu
Not reproducible if CMAKE_SYSTEM_NAME is unset.
Scenario 2:
CMAKE_SYSTEM_NAME unset
Modules: PrgEnv-cray, craype-accel-amd-gfx90a, rocm/5.4.0
LDFLAGS=-fopenmp
Not reproducible with PrgEnv-gnu.
Both scenarios yield the same error message when attempting to configure a project with multiple subdirectories invoking find_package(MPI REQUIRED COMPONENTS C).
[Simple reproducer]
To simulate the issue on a Frontier login node:
mkdir src1 src2
cat < CMakeLists.txt
project (MY_PROJECT C)
message(STATUS “Configuring src1”)
add_subdirectory(src1)
message(STATUS “Configuring src2”)
add_subdirectory(src2)
EOF
mkdir src1/src1_subdir1 src1/src1_subdir2
cat < src1/CMakeLists.txt
add_subdirectory(src1_subdir1)
add_subdirectory(src1_subdir2)
EOF
cat < src1/src1_subdir1/CMakeLists.txt
message(STATUS “Configuring src1_subdir1”)
find_package(MPI REQUIRED COMPONENTS C)
EOF
cat < src1/src1_subdir2/CMakeLists.txt
message(STATUS “Configuring src1_subdir2”)
find_package(MPI REQUIRED COMPONENTS C)
EOF
mkdir src2/src2_subdir
cat < src2/CMakeLists.txt
add_subdirectory(src2_subdir)
EOF
cat < src2/src2_subdir/CMakeLists.txt
message(STATUS “Configuring src2_subdir”)
find_package(MPI REQUIRED COMPONENTS C)
EOF
mkdir build_scenario_1 && cd build_scenario_1
module load cmake/3.27.9
CC=cc cmake -Wno-dev -DCMAKE_SYSTEM_NAME=Catamount …
cd …
mkdir build_scenario_2 && cd build_scenario_2
module load craype-accel-amd-gfx90a rocm/5.4.0
CC=cc LDFLAGS=“-fopenmp” cmake -Wno-dev …
[Observed errors]
Both scenarios fail with the following error:
…
– Configuring src1
– Configuring src1_subdir1
– Found MPI_C: /opt/cray/pe/craype/2.7.31.11/bin/cc (found version “3.1”)
– Found MPI: TRUE (found version “3.1”) found components: C
– Configuring src1_subdir2
– Configuring src2
– Configuring src2_subdir
– Could NOT find MPI_C (missing: MPI_C_WORKS)
CMake Error at …/FindPackageHandleStandardArgs.cmake:230 (message):
Could NOT find MPI (missing: MPI_C_FOUND C)
Call Stack (most recent call first):
/autofs/nccs-svm1_sw/frontier/spack-envs/core-24.07/opt/gcc-7.5.0/cmake-3.27.9-pyxnvhiskwepbw5itqyipzyhhfw3yitk/share/cmake-3.27/Modules/FindPackageHandleStandardArgs.cmake:600 (_FPHSA_FAILURE_MESSAGE)
/autofs/nccs-svm1_sw/frontier/spack-envs/core-24.07/opt/gcc-7.5.0/cmake-3.27.9-pyxnvhiskwepbw5itqyipzyhhfw3yitk/share/cmake-3.27/Modules/FindMPI.cmake:1837 (find_package_handle_standard_args)
src2/src2_subdir/CMakeLists.txt:2 (find_package)
[Observations]
- Not reproducible: /usr/bin/cmake (3.20.4) and cmake/3.21.3
- Reproducible: cmake/3.22.2, cmake/3.27.9, cmake 3.31.0 (custom build)
- Setting CMAKE_SYSTEM_NAME=Catamount or using specific environment modules like craype-accel-amd-gfx90a seems to exacerbate the issue.
- If we use mpicc instead of the Cray cc compiler wrapper, the configuration of src2_subdir hangs indefinitely.
- Likely regression introduced in 3.22 and persisting in 3.31.
[Request for assistance]
Could you please confirm if this is a known regression or provide guidance on addressing it? We believe this issue may stem from changes to FindMPI.cmake or the way subdirectory configurations propagate MPI state.
Thank you for your attention and support!
Best regards,
Danqing Wu