Some of you may remember me. I work with YottaDB now on the M compiler.
We have a longstanding issue with our CMake toolchain located here: YottaDB / Tools / YDBCMake · GitLab. Let’s start with a quick introduction:
Our compiler needs to produce 1 or 2 objects. 1 is an ASCII mode object (for legacy applications), and 2 objects is 1 ASCII and 1 UTF8. When we have two objects, the install destination for each object is different (the utf8 one goes into a utf8 folder). Whether 1 or 2 objects is produced depends on whether libicu is installed and if YottaDB has a UTF-8 installation (it’s optional).
Currently our toolchain is designed so that you cannot compile both objects at the same time. Rather, you have to run cmake again with -DM_UTF8_MODE in order for it to produce a different object. That’s the theory anyways… not sure that it even works right now.
I don’t know much about the internals of CMake, and googling around for a tutorial for making a new language is not yielding much. So I sure could hope for some help.
Does your compiler produce two objects in a single invocation, or are they coming from separate invocations? Are they from the same source file, or different source files? is -DM_UTF8_MODE a CMake flag, or a compiler flag?
Does your compiler produce two objects in a single invocation, or are they coming from separate invocations?
That’s the core of our problem. We invoke it twice, not once. The invocations are with different environment variables. We want to invoke cmake/make once though, and that’s the core of my question.
Are they from the same source file, or different source files?
Same source file.
is -DM_UTF8_MODE a CMake flag, or a compiler flag?
CMake Flag. The compiler uses $LC_ALL and $ydb_chset env variables in order to figure out which object to output.
If the compiler is being invoked twice, then I would suggest creating two object libraries that both compile the same source file but with different flags. Would that work for your use case?
I tried that, but that doesn’t work. This line gets run again, and overrides the original flags, so in the end you either compile two M objects or two UTF-8 objects.
Is there a way to make your compiler use ASCII or UTF-8 depending on a command line flag instead of an environment variable?
If not, I would suggest creating a “superbuild” CMake project that calls a smaller subproject twice, once with the ASCII flags and once with the UTF-8 flags.
Is there a way to make your compiler use ASCII or UTF-8 depending on a command line flag instead of an environment variable?
No. Long history and existing customers and upstream code bases… we can’t change that.
If not, I would suggest creating a “superbuild” CMake project that calls a smaller subproject twice, once with the ASCII flags and once with the UTF-8 flags.
That sounds like it may work. Can you reference an example where this is done?
I suspect ExternalProject is the better fit here. You want to control the environment seen by the compiler, but FetchContent can’t do that for you. If you use ExternalProject, you can specify the build command in a way that sets or modifies the environment at build time. Sketching out a skeleton of the essential bits:
The directory pointed to by /path/to/subproject would need to be able to be built as a standalone CMake project. It could be a subdirectory within your source tree which may be considered the “meat” of the main project, with the main project effectively just being a wrapper around these two ExternalProject_Add() calls.
The BUILD_COMMAND lines can define whatever environment variables you need. I’ve just shown it setting ydb_chset as an example, but add whatever key=value items you require.
If you need to pass compiler definitions as well, you could use something like the CMAKE_ARGS keyword in the calls to ExternalProject_Add() to achieve that. Read up in the ExternalProject module documentation to see how to do that.
I haven’t addressed the question of how to combine the results of the two ExternalProject_Add() calls. That is in part because I feel like this is all heading down the wrong path for what you ultimately want to achieve. It feels overly complex, but I can’t offer an alternative solution.
Perhaps you might be able to use a wrapper script to hide the details of this from CMake? Your wrapper script could take care of adjusting the environment settings and adding some extra compiler flags to the compile line before passing it along to the real compiler. Take a look at the <LANG>_COMPILER_LAUNCHER target property and its associated CMAKE_<LANG>_COMPILER_LAUNCHER variable for doing that. I’ll have to leave you to experiment with whether you can make that work.
Because of the complexity of the other solutions, we are entertaining passing flags to the compiler (something to be developed in the future) instead of using environment variables.
What’s the best way for us to create the two objects from a single cmake pass? I am thinking of setting CMAKE_M_COMPILE_OBJECT to run two commands rather than one.
More than a year later, I am happy to report that I have a resolution.
@craig.scott I tried the ExternalProject paradigm, and it works well, but it was too complex for my taste. PS: I have your CMake book; you are a very good writer!
I contacted Brad, and a couple of email exchanges later, he guided to me to the solution:
Use target_compile_options() to pass <FLAGS> to CMAKE_M_COMPILE_OBJECT.
Use a macro or function for outside users to use this functionality which will create both objects and both install rules at the same time.
The new CMAKE_M_COMPILE_OBJECT looks like this: set(CMAKE_M_COMPILE_OBJECT "LC_ALL=C.utf-8 <FLAGS> <CMAKE_M_COMPILER> -object=<OBJECT>")
Ah, nice. That’s much cleaner than going down the ExternalProject route. So clients of the library then have to choose which of the two libraries they link against? I guess that’s clear enough and makes explicit whether they are using the old legacy library or the new utf-8 one. My initial reaction would be “I asked to create library <XXX>, but I got <XXX>M and <XXX>utf8”. But I think once you understood there are two libraries created, that would be easy enough to adjust to.
Actually, the .so files are the end products; and we don’t expect other people to link to them; but the people writing CMake scripts may need to depend on <XXX>M and <XXX>utf8. This actually caught me until I realized what I did.