Recommendations for Python module inclusion

Vino · January 24, 2021, 12:16am

Hello,
I am researching a current, modern ways how to implement CMAKE based build system for my project which is currently based on handwritten makefiles.

The main part is written in C and consists of several libraries and executables, some additional parts are written in C++, bash and Python. (Plus I would like to leave the doors open for possible other languages in the future.) The Python part is Cython wrappers around the C code, Python C extension library (which could be in due time rewritten in Cython), modules (libraries and executable) which build on these wrappers and then some pure Python modules included just as examples or additional resources (all have logical connection to the core project).

And now I am wondering, how I should implement in CMAKE, so I am using the latest Python advancement and procedures. Testing in virtual environment against multiple versions, source layout, building wheels, source distribution and native Linux distributions packages. All project I have seen, like scikit-build or others on Github are wrapping CMAKE in Python Setuptools (or Distutils). I want for CMAKE to stay the main driving force as the Python part should be only an option.

Is there any recommendation for PEP517+PEP518 using CMAKE? I have seen something like it mentioned for Meson and SCons.

Should I just call the standard Setuptools build from CMAKE and create the pyproject.tomls and setup.cfgs as templated files?

Anybody else solved this?

ben.boeckel · January 25, 2021, 5:05pm

I think that find_package(Python3) provides all of the pieces to make Python C extensions, but Cython might be something new. I expect setup.py is just better there. You can have setup.py drive the build for any non-Python bits using CMake. If you want pyproject.toml or other distutils/setuptools helpers to be working optimally, I’d really suggest just using those tools directly rather than interposing CMake into it.

Cc: @craig.scott @marc.chevrier @jcfr for others with experience here.

cerna · May 11, 2021, 1:43pm

What I decided on doing in my project is just let Python tooling do its thing.

Modern Python packaging pretty much requires the use of wheels for normal distribution and sdists for people who really needs to rebuild locally or for some other special cases. Once you have a wheel, you can turn it in other formats. (You can use Wheel2Deb to turn a wheel into deb, for example.)

So for pure python packages, you create the structure in CMAKE_BINARY_DIR tree with the module.py, pyproject.toml, __init__.py and any namespacing directory structure and just call the PEP517 frontend as custom target.

For Cython you would use the CMake-to-PEP517Backend-to-CMake call structure. You would generate the C files during the first CMake call (configure time or build time, does not really matter that much [maybe build time is better]), then you call the PEP517 frontend. That will call the PEP517 backend, which itself will call the new CMake build (with CMake build extension or something) creating the binary wheel. That way it will work both for wheels and for sdists (you will need to use exports to link against any library from main CMake build). Source distribution will distribute only the cythonized files and not the pyx and pxd files. Bit crazy, but I wasn’t able to come up with anything better.

You then create a local PEP503 python index in the main CMake BINARY tree you can use to satisfy dependencies during tests.

BTW, it would be great if CMake included its own PEP517 frontend, but you can just use the official Build.

ben.boeckel · May 11, 2021, 2:43pm

I’ll just preface this by saying that most of my interactions with Python packaging has been through projects that additionally provide Python modules, but are meant to be primarily used through some other means (e.g., C++ headers and libraries). For projects that are Python modules that just happen to be written in C++, everything is probably much rosier and the above works. But it is, AFAIK, not a solution for projects which are not “Python package, but written in C++”.

I’ve been doing this for VTK, but AFAICT, sdists are just impossible to do for complex projects (namely where configuring what you want from it is not just a list of “please add this to the build” for whatever Python packaging calls the [name1,name2] suffixes). So VTK just can’t provide one. I’ve also found wheels to be terrible artifacts because they basically assume that all that matters is a project’s Python interface. This is generally not true, so they’re quite unsuitable for many projects that happen to provide Python modules in addition to some C or C++ interface. As such, VTK’s wheel is largely a dead end for any VTK-consuming projects that want something other than its Python interface[1]. Maybe someone will come up with a spec on how to ship headers, CMake package files, etc. in wheels, but I’m not holding my breath.

I don’t think CMake fits for this (though I’ve just glanced through the PEP). Maybe CPack could do it, but this is basically another “language” for CMake to handle and CMake isn’t going to link to Python anytime soon in order to read whatever configuration is needed for PEP517 compliance.

<sidenote> Python packaging is quite a mess and I don’t think I’d like to see CMake add that to its large plate of problems it has to deal with in the C and C++ world already. I wish the Python community would apply some of the Zen of Python (namely “There should be one-- and preferably only one --obvious way to do it.”) and apply it to the packaging mess that exists. Maybe CMake could then find some way to interface with it. But, that’s also basically the same problem that CMake has for C and C++ compilers today (except CMake doesn’t even have a handle on a C or C++ implementation nevermind the primary one, though that difference doesn’t make it easy either).

[1] There is apparently a hack to build against the build tree that made a particular wheel, but this is definitely a hack and not something I would call a solution.

cerna · May 11, 2021, 4:14pm

Yes, Python is a mess, period. However, by some fluke and chance, it became the premier language for quick and simple (and not so simple) scripts and executables. In many Linux specific C and C++ projects, it is tightly intervened in the source code and provides core functionality. (Well, the CPython implementation of it.) This is the basis of my example, the C library provides IPC mechanism for shared memory access and then programs written in C or C++ link against it directly or programs written in Python use the Python module wrapper to do the same thing.

So the project have the python wrapper code written in C and a bunch of pure Python modules making use of that wrappers. (Plus few additional project specific modules with library like functionality for these programs.)

Originally, it was build all in one monorepo. Now it was split into two, but still the core C/C++/Python functionality is build from one repository. I am not saying this is the best overall solution, but it is the best solution at the time. And what I described is the sanest solution I could think of at the time and I published it here for others who might find it interesting.

You are right, you cannot define in a wheel a non-Python dependency. If a dynamic loader will not be able to find the linked shared library at a runtime, it will fail and there is no way how to define some dependency graph. That is why the moment you would translate the wheel to some other format like DEB or RPM, you would need to add it.

But you need the wheel for supporting the virtual environment install. For Python world, this is an absolute must.

The source distribution build would be dependent on find_package call and installed .cmake scripts. That would hopefully make it more fault proof.

Don’t. Use wheels only for distribution of Python interface, nothing else. Headers, CMake scripts, everything else what is not used only in the Python code needs to be distributed some other way (DEBs, NIXs, RPMs etc)

ben.boeckel · May 11, 2021, 4:58pm

This is all fine, but the problem comes when some other C or C++ program that uses your library wants to also make its own wheel. What’s the situation here?

Right, but that information isn’t in the wheel, so…where does it come from? For VTK, vtk-config.cmake is all that will ever be able to provide find_package(VTK); no amount of find_library will be able to piece together its CMake API or the module properties that are required for further Python wrapping or global static factory (yay) mechanisms.

I’ve seen Anaconda and while it isn’t perfect, it at least considers non-Python artifacts. Personally, I’ve been seeing that as a better road, but it also has the “all Anaconda or no Anaconda” problem to some extent.

malaterre · November 19, 2021, 12:41pm

Just FYI, here is the solution that I came up with (using setuptools):

add_custom_command(
  OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/setup_timestamp
  COMMAND ${PYTHON_EXECUTABLE} ARGS ${SETUP_PY} install --root ${SETUP_OUTPUT}
  COMMAND ${CMAKE_COMMAND} -E touch ${CMAKE_CURRENT_BINARY_DIR}/setup_timestamp
  DEPENDS ${SETUP_DEPS})

followed by:

install(
  # trailing slash is important:
  DIRECTORY ${SETUP_OUTPUT}/
  # "." syntax is a reliable mechanism, see:
  # https://gitlab.kitware.com/cmake/cmake/-/issues/22616
  DESTINATION "."
  COMPONENT python)

Full reference:

Creating a python package (deb/rpm) from cmake - Stack Overflow

cerna · November 23, 2021, 12:05am

Interesting, thank you for pointing it to me!

What I ended up doing is using the Python Build package in the build stage and Python Installer package in the installation stage. (Both were recently debianized and the Install one looks like will be used in official .deb package building process for PEP517/518. [I am planning on supporting RPM in the future too, but so far I have not gotten to it.])

Then I used the CMake to create the filesystem structure needed for Python package build in the BINARY TREE. I found it to use CMake for direct Python package building too painful, and it would require for me to really create and maintain a new Python PEP517 backend, so all I am doing is using somebody’s else Python buildsystem (this one supports PEP621). I call from CMake add_custom_target the pyproject-build executable, which in turn use the backend to build the wheel and source distribution. Then in install() command, I use the SCRIPT option to use the Python Installer module to install the wheel arbitrary directory. (This happens using CPack COMPONENT groups, so I can use dh_cmake to distribute it to several binary packages.)

For the C extension modules (both pure C API and Cython), I use the same workflow, but I call from the Python backend another CMake build using the CMake-build-extension.

What I found the most annoying is the inability to specify when the custom target should be considered out of date. CMake specifies that the add_custom_target is used for target which do not create files (or at least I think I read it somewhere), but it has no (or that I know of) functionality to create an arbitrary target for arbitrary build workflow, even the adding custom LANGUAGE process seems to support only very limited compilation workflow. (Truth is, I did not study the source, as that seems like “Mine the ore to produce the steel, to build the car to get to work” kind of solution at the moment.)

To support it with examples (I know it looks quite rough around the edges, it’s still Work-In-Progress):

Example of pure python package
The wheel installation function
Wheel install executable