Introduction
I would like to add some more functionality to CMake’s string() processing function, and would like to discuss these changes and receive feedback. Once I’ve gotten some feedback, I’ll look at submitting a PR for these changes.
I would like to implement the following subcommands for the string() function:
string(TOUPPERCAMELCASE ...)
string(TOLOWERCAMELCASE ...)
string(TOUPPERSNAKECASE ...)
string(TOLOWERSNAKECASE ...)
string(CAPITALIZE ...)
string(SPLIT ...)
This is functionality that I have found myself reaching for and re-implementing across projects. Below I’ll describe the inputs and functionality that I would propose for each.
Please let me know what you think.
Proposed function signatures
string(CAPITALIZE <INPUT_STRING> [<OUT_VAR>])
- INPUT_STRING – any given input string. If OUT_VAR is not specified, INPUT_STRING is mutated
- OUT_VAR – the variable in which to store the output. If OUT_VAR is not specified, INPUT_STRING remains unchanged.
This would perform the following on the string:
- Convert all alphabetical characters to lower-case
- If the first letter of the string is alphabetic, capitalize it.
- If the first letter of the string is not alphabetic, error? Act like TOLOWER?
Rationale
This is useful for templatization, and can be passed to functions like configure_file. Capitalizing the first token in a string is a pretty commonly desired string operation.
string(TOUPPERCAMELCASE|TOLOWERCAMELCASE|TOUPPERSNAKECASE|TOLOWERSNAKECASE <INPUT_STRING> <FORMAT> [<OUT_VAR>])
- INPUT_STRING – any given input string. If OUT_VAR is not specified, INPUT_STRING is mutated
- FORMAT – The format of the input string (UPPERCAMELCASE, LOWERCAMELCASE, UPPERSNAKECASE, LOWERSNAKECASE)
- OUT_VAR – the variable in which to store the output. If OUT_VAR is not specified, INPUT_STRING remains unchanged.
This would perform the following on the string:
- Validate that the input string is in the format specified.
- Map the string in one format to the other format
The FORMAT specifier is necessary because there is some overlap in these formats. For example “foo” in lower_snake_case is “foo” in lowerCamelCase.
Perhaps a version of this function could also take a list of tokens, and then assemble them in the correct format?
Rationale
This is useful for templatization with configure_file, or custom commands which generate artifacts that may require different formats. This is somewhat complex in the general case (an arbitrary string), but if we limit it to converting between these common formats, it should be trivial.
I have surprisingly found myself wanting to convert between camelCase and snake_case quite a bit when invoking external scripts as a part of my buildsystem.
string(SPLIT <INPUT_STRING> <OUTPUT_VAR> [<SEPARATOR>])
- INPUT_STRING – The string to split.
- OUTPUT_VAR – The variable in which to store the resulting list of tokens
- SEPARATOR – An optional string input specifying the separator to split along. If not provided, this will assume \s as the separator character
Rationale
Sometimes you want to split an input string into its individual components, and then process them. This is often useful when you want to process the name of an input to get information about a file programmatically, and that input could be a header file like module_functionality_subfunctionality.h, and you want to get those three tokens when making decisions.
Currently you can kind of do this with separate_arguments
, but you need to first process the string into one of the given command line formats.