Why CMake functions can't return value?

OleksandrKvl · August 17, 2020, 1:59pm

Hi there! This question bothered me for a while: why CMake functions historically can’t return values? I couldn’t resist and implemented this feature, you can read about it here “Allowing CMake functions to return(value)”. Now I understand what effort it would require but I’m still wondering why hasn’t it been there from the beginning. Do you really think it’s not important feature, is there any plans to implement it in the future(maybe in CMake 4.0)?

alex · August 18, 2020, 4:47am

Your nomenclature is a bit misleading. Of course CMake functions can return values (via set(... PARENT_SCOPE)). The real complaint is twofold: (1) there isn’t a single function return mechanism and (2) function application is not an expression. Or really, that the expression language is exclusively string literals, variable lookup, and concatenation. Plus some special syntax for parameter pack expansions (unquoted ${}). There are no other value types in CMake, with functions/macros occupying their own space.

Regarding (1), I think it would be a good idea to pick a convention for output variables and then rewrite syntax like VAR := func(ARGS) into whatever ensures that VAR is populated. Maybe this can be the first argument, or an argument named RETVAR in cmake_parse_arguments. Return() could take its arguments and assign them to ${RETVAR} as a convenience. You could even extend this to return multiple values and assign by VAR1, VAR2 := whatever. This change would make absolutely routine CMake usage less awkward.

As to whether or not (2) is important, well… that would significantly extend the complexity of the average string interpolation. I don’t think the blog post makes a strong enough case to counter the general trend towards making CMake more declarative. I agree with that trend. I don’t want to read complicated variable assignments and deeply nested function calls. Programmers will always take the path of least resistance.

I think keeping imperative programming less appealing than declarative programming is probably a good thing for CMake. Also, the new cmake_language(CALL) function in 3.18 helps you create new declarative abstractions.

The no-argument examples testing features can just be replaced by variables (cached if they’re expensive to compute) and the one argument example can be replaced via variable-name lookup as in ${name_for_${id}}.

By the time the post gets to formatting a name with several levels of function composition, it’s pretty far from a typical build system task.

alex · August 18, 2020, 7:40am

Here’s a very basic, no-error-checking example of what I mean by creating new abstractions…

cmake_minimum_required(VERSION 3.18)

# Tiny library. Extend as desired.

macro(return_)
  set(${ARGS} ${ARGN} PARENT_SCOPE)
  return()
endmacro()

function(fun_expr ARGS)
  unset(args)
  unset(fxns)
  while(ARGN)
    list(POP_FRONT ARGN tok)
    if (tok STREQUAL "(")
      list(APPEND args ${tok})
    elseif (tok STREQUAL ")")
      unset(call)
      unset(arg)
      list(POP_BACK args arg)
      while (args AND NOT arg STREQUAL "(")
        list(PREPEND call "${arg}")
        list(POP_BACK args arg)
      endwhile ()
      list(POP_BACK fxns fn)
      cmake_language(CALL ${fn} RET ${call})
      list(APPEND args ${RET})
    else ()
      list(APPEND fxns ${tok})
    endif ()
  endwhile()
  return_("${args}")
endfunction()

# Definitions from the blog post

function(format_name ARGS first last)
  return_("First: ${first}, last: ${last}")
endfunction()

function(get_first_name ARGS)
  return_("John")          # return quoted
endfunction()

function(get_last_name ARGS)
  return_(Doe)             # return unquoted
endfunction()

function(get_first_and_last ARGS)
  return_([[John]] Doe)    # return list
endfunction()

# Blog demo

fun_expr(NAME format_name(get_first_and_last()))
message(${NAME})

fun_expr(NAME format_name(get_first_name() get_last_name()))
message(${NAME})

Output:

alex@Alex-Desktop:~$ cmake -P fun_expr.cmake
First: John, last: Doe
First: John, last: Doe

You could push this quite far using Dijkstra’s shunting yard algorithm or another bottom-up operator precedence parser. And this is without CMake needing to natively support functions as expressions.

OleksandrKvl · August 18, 2020, 12:52pm

First of all, thanks for such a detailed answer.

Actually it was my first experience with language design so I used more “practical” terms from my day-to-day usage. I would like VAR := func(ARGS) syntax but it’s contrary to existing
“everything is a command()” design.

I was thinking about something like std::tie(), e.g. list(TIE list_var VAR1 VAR2).

What complexity are you talking about? Complexity of implementation or run-time cost? Regarding implementation one, yes it’s requires some big changes but I believe it’s worth it. In my implementation there’s no big difference between evaluating f(${g()}) and combination of g(RETVAR) + f(${RETVAR}). No big run-time
cost either.
I don’t understand why it’s counter to declarative style. It helps to hide
implementation details, isn’t it the goal of declarative programming?

Maybe it’s a matter of taste but I think that function calls are better than direct value manipulations. Who should initially compute and set them? We can’t predict all use cases, so user should initialize them somewhere.

It’s there only for demonstration, I don’t like deeply nested calls either.

Your fun_expr example is great, the only detail I don’t like is the necessary to have that NAME var. I think that it should be more flexible and allow user to decide whether they wants to store it in the var. Again, isn’t it against declarative style to force user to create unneeded variables? Also don’t you think that while extending it to handle more complex cases we’re actually doing parser’s job? I mean, yes, maybe you can do it in your custom library, but one of my goals was to make CMake more user-friendly and convenient to people from another languages, that’s why I believe that such feature should be built-in rather than provided by third-party library.

alex · August 18, 2020, 6:14pm

Fun! Welcome to the party

Putting arbitrary function composition into the string syntax is also contrary to “everything is a command()” design. That doesn’t automatically mean it would make “CMake 4” a worse product.

This is all just syntax, though; the actual semantics are the important part.

If you give programmers a feature, they’re going to use it. And folding it in to string interpolation means that its power increases substantially. Here’s why: before adding it, strings are (concatenations of) strings. After adding it, strings are equivalent to arbitrary CMake code, potentially with side effects. That could range from benign but annoying, like printing unneeded status messages, to quite bad, like unintentionally touching disk in a loop.

I could see programmers doing things like set(_ "${list(${get_args(...)})}") just for the side-effect on a local list. You wouldn’t even be able to see which list it was modifying.

To be fair, other languages have successfully included this feature (eg. Python, Bash, C#, etc.) but I suspect this is because the greater richness of imperative features in those languages makes it less tempting to put real logic there. This particular implementation also treats list return values as parameter packs, which can lead to some unexpected bugs and subtle behavior.

As a counter-point, reading a variable is guaranteed to have no side-effects unlike a function call.

Yeah, it totally is! Giving CMake a unified function return mechanism would let you drop the explicit NAME var. I was working around the fact it doesn’t have one by imposing a convention. The implementation of fun_expr has to capture the function return values somehow.

alex · August 18, 2020, 6:22pm

Sorry, I thought you were talking about the ARGS abuse-of-notation I was using to hide the out variable. With a standard return mechanism, you could hypothetically write it as:

NAME := fun_expr(format_name(get_first_and_last()))
message(${NAME})

which is cleaner, but does not avoid the name binding, true. On the other hand, the name binding seems less bad when the message is more complex…

NAME := fun_expr(format_name(get_first_and_last()))
message("Hello, ${NAME}!")

OleksandrKvl · August 18, 2020, 8:52pm

Yeah, now I see your caution about its overuse.
Well, it’s an interesting dilemma: give user a powerful feature and allow him to shoot himself in a leg, or give him a weak but safe feature, freedom vs safety It also reminds me complaints of C-users about C++. In C you can see all function calls but in C++ any simple statement might result in a function call. C in turn has more free and dangerous type casting rules and people successfully use both of them.
I think it involves both parties: language devs and language users. Both should do their best, understand that world is not perfect and use common sense

brad.king · August 18, 2020, 9:48pm

As maintainer of CMake, I’ll jump in here to state that I don’t think return values should be added at this time. Once functions have return values, arbitrary expressions with side effects will be possible as discussed above. Then people will ask for inline operators, etc., and complexity will keep increasing. The language is not well-suited to such extension IMO.

For reference, CMake Issue 19891 has some discussion about alternative specification language design.

alex · August 18, 2020, 11:29pm

I agree with your assessment of function expressions (as I wrote at length above). But I wonder what your thoughts are on a uniform return variable syntax? The purely hypothetical one I suggested would enable easier usage of existing commands, like string:

search1 := string(FIND "foo bar" "o b")
search2 := string(REPLACE        "foo"    "bar" "foo baz")
search3 := string(REGEX MATCH    "[0-9]+"       "123foo")
search4 := string(REGEX MATCHALL "[0-9]+"       "123foo456")
search5 := string(REGEX REPLACE  "[0-9]+" "foo" "123 bar")

manip1 := string(CONCAT foo bar baz)
manip2 := string(JOIN ", " foo bar baz)
manip3 := string(TOLOWER "LOWER")
manip4 := string(TOUPPER "upper")
manip5 := string(LENGTH "foo")
manip6 := string(SUBSTRING "foo bar" 4 3)
manip7 := string(STRIP "  foo  ")
manip8 := string(GENEX_STRIP "foo $<CONFIG> bar")
manip9 := string(REPEAT "CMake is cool " 8)

cmp1 := string(COMPARE LESS "Alice" "Bob")

hash1 := string(MD5 "foo")

gen1 := string(ASCII 102 111 111)
gen2 := string(HEX "foo")
gen3 := string(CONFIGURE "@PROJECT_NAME@" @ONLY)
gen4 := string(MAKE_C_IDENTIFIER "h3110 \/\/0rld")
gen5 := string(RANDOM LENGTH 8)
gen6 := string(TIMESTAMP "%Y-%m-%dT%H:%M:%SZ" UTC)
gen7 := string(UUID TYPE SHA1)

Or list:

read1 := list(LENGTH my_list)
read2 := list(GET my_list 3)
read3 := list(JOIN my_list ", ")
read4 := list(SUBLIST my_list 3 2)

search1 := list(FIND my_list "foo")

# These will need to be special-cased to give 
# out-of-place results when used with this syntax
mod1 := list(APPEND my_list foo)
mod2 := list(FILTER my_list EXCLUDE REGEX "^[0-9]")
mod3 := list(INSERT my_list 2 "foo")
(mod4a, mod4b) := list(POP_BACK my_list)
(mod5a, mod5b) := list(POP_FRONT my_list)
mod6 := list(PREPEND my_list "foo")
mod7 := list(REMOVE_ITEM my_list "foo")
mod8 := list(REMOVE_AT my_list 5)
mod9 := list(REMOVE_DUPLICATES my_list)
mod10 := list(TRANSFORM my_list REPLACE "make" "CMake")

ord1 := list(REVERSE my_list)
ord2 := list(SORT my_list)

Custom functions could be written and used like:

# Outputs are mapped to the LHS of := at the call site
# and the values of those names in the function's scope
# are assigned to the corresponding names in the caller's
# scope.
function(example1 IN x y OUT z)
  if ("${x}${y}" STREQUAL "")
    set(z "both are empty")
  else ()
    set(z "x = ${x}, y = ${y}")
  endif ()
endfunction()

# This would even work with cmake_parse_arguments
function(example2 OUT z)
  cmake_parse_arguments(ARG_ "" "X;Y" "" ${ARGN})
  set(z "X = ${ARG_X}, Y = ${ARG_Y}")
endfunction()

var1 := example1("" "")
message(STATUS "${var1}")  # prints '-- both are empty'

var2 := example1("foo" "bar")
message(STATUS "${var2}")  # prints '-- x = foo, y = bar'

var3 := example2(X baz Y hello)
message(STATUS "${var3}")  # prints '-- X = baz, Y = hello'

I’ll add that if it needed to be a command, then any of a number of alternative syntaxes would work

set(VARS var1 TO example1(foo bar))
cmake_language(OUT var1 CALL example1 foo bar)
# etc.

The more important change is the function() modification that promotes local variables to parent scope variables.

OleksandrKvl · August 20, 2020, 3:31pm

Thanks for answer. Good to know that some work is going on about making it more declarative and easy to use.

Ericson2314 · July 28, 2022, 9:42pm

@brad.king

I don’t follow the logic behind your answer.

Once functions have return values, arbitrary expressions with side effects will be possible as discussed above.

Why does this matter? Statements already have side effects. Function bodies already have side effects. CMake is full of side effects.

The point of being more functional to allow code that has fewer side effects. The people that dislike having to set(....PARENT_SCOP) are also the people that will try to stay away from gratuitous side effects!

Languages like Javascript, Java, hell even Excel have added new “functional” syntaxes over time, and I don’t think there has been any complaint of code becoming harder to read because more side effects! Quite the opposite, in fact!

Then people will ask for inline operators, etc., and complexity will keep increasing. The language is not well-suited to such extension IMO.

This is the slipper slope fallacy.

We know from other languages where this road leads, and it’s not to C++/Scala madness. It’s just about trying to recover the little Scheme hiding within every other language.

For reference, CMake Issue 19891 has some discussion about alternative specification language design.

2 years on, that issue is unsurprisingly a cesspit of random ideas, bikeshedding, and scope-creap — the second-system effect. Let’s be real, is it really going to happen?

Besides, and grand new system will have to have pretty good interoperability with existing CMake modules to be a viable successor in practice. So even if we got such a system, having return values in “classic CMake” would be good, because they would offer a nicer FFI between the old world and the new world.

How else would one call legacy functions doing set(...PARENT_SCOPE) is the new “declarative” system doesn’t support mutation?