Status of CMake debugger

ben · February 4, 2020, 12:56am

I see that a 2 year old closed MR attempted to add a debugger mode to CMake but was never merged. Have there been any activity to add a debugger to CMake since then (~2 years ago)? If the approach in the closed MR still looks reasonable, I’d be interested in doing the work necessary to get this functionality.

Alternatively, the Debug Adapter Protocol has gained some traction since 2017 with several tools supporting it like Visual Studio, Visual Studio Code, Vim, and Emacs. I could rework the existing debugger to support this protocol instead, which could make it easier for additional tools to add debugging support in the future.

brad.king · February 4, 2020, 12:08pm

AFAIK there has been no further progress. Furthermore, IIRC that MR was based on the server-mode protocol infrastructure. The server mode has been deprecated in favor of file-api and most of its infrastructure will be removed along with the server mode. Therefore the design space remains wide open.

ferdnyc · February 4, 2020, 4:45pm

Honest question (and let me disclaim, up front, that I haven’t seen the “discussion in the developer mailing list”) : What is the envisioned utility of such a debugger? What functionality is expected/proposed?

In fact, more generally: What, precisely, would be debugged with such a debugger?

Obviously not CMake’s output, since that’s just build files.
Perhaps the parsing of the CMakeLists.txt files could be debugged. --trace and --trace-expand already provide pretty good visibility into the line-by-line processing of CMake statements.

(However, a parser debugger would be useful for wiring up live syntax checking of CMakeLists.txt files in a code editor.)

Maybe the variable assignment and expansion that goes on during processing, but again --trace-expand covers that somewhat, plus there’s the contents of the CMakeCache.txt file. During execution, it’s possible to use CMakePrintHelpers to perform some rudimentary debug-printing of variable or property values, though it’s admittedly lacking as a debugging tool.

(Still, here again code editors can provide hinting and lookup functionality if they’re being fed parser data for the currently-open file.)

And then we come to the elephant(s) in the room, the major aspects of CMake’s functionality that currently don’t ever get stored on disk for convenient examination, nor are they able to be displayed with any of the current tools:
- The definition and interpretation of targets
- Processing/expansion of generator expressions

So, is that the goal here, to reveal more of those (currently-mysterious) aspects of CMake’s processing of the build system configuration to users/developers? Or is it more about wiring CMake into code editors so they can offer enhanced coding features?

Just trying to calibrate my anticipation and enthusiasm accordingly.

sjoubert · February 4, 2020, 6:49pm

You can find an early work of what I believe is the original attempt in this video from FOSDEM 2018: https://ftp.belnet.be/mirror/FOSDEM/2016/k4401/enabling-gui-tools-for-cmake-code.mp4

This ultimately lead to the now deprecated cmake-server mode for completion and code model (target list,…) but also had a debug mode for:

reading variable values
tracing their modifications
exploring where these are modified,…

ben · February 5, 2020, 12:39am

Thanks for that perspective @ferdnyc! I work on C++ tooling professionally and was originally envisioning this as a way to use existing IDE debugging interfaces that are familiar to developers to debug the execution of CMake scripts. --trace and --trace-expand certainly already provide large pieces of this functionality, but I think there’s still benefit from interactive debug sessions because they can provide functionality like seeing all variables values in scope, modifying variable values during execution, or even changing the “instruction pointer” as the script executes by skipping certain commands or rerunning others.

Admittedly, I spend more time writing tooling for C++ and CMake than I spend actually writing CMake scripts, so if more experienced users think this kind of debugger tooling wouldn’t actually be very useful in practice, I’m 100% open to reconsidering the approach here. Please chime in if you have opinions or examples of bugs that were difficult to diagnose with existing tools!

I haven’t given a lot of thought to debugging generator expressions or targets, but that sounds like a great direction to investigate further. Given how open this design space currently is, I’ll spend some time putting together a more complete proposal that makes the expected utility and functionality of a debugger clear, then loop back to this thread.

ben · October 30, 2020, 11:47pm

Sorry for the long delay. I’ve got time now to work on a debugger proposal and Microsoft is interested in adding this support, but first I wanted to get some clarity on the future of libuv the CMake codebase. Do maintainers think it would be reasonable to keep the libuv dependency and some of the associated server infrastructure to support a future debugger?

Assuming a debugger supports basic functionality like hitting breakpoints and resuming execution, we’ll need some kind of interactive communication mechanism so that the user can instruct the debugger (whether on the command line or via a tool) during script execution. It seems like the most natural fit would be to rework the infrastructure already used by cmake-server. This would yield a separate, asynchronous communication channel that can be used for debugger commands and responses. However, I know that cmake-server has been deprecated for a while, and if there are fundamental reasons for removing the libuv dependency I don’t want a future debugger to stand in the way of that.

Alternatively, the debugging protocol could be structured such that input is only expected at well defined points in script execution, like between command invocations. By removing the asynchronous requirement, this would allow for reading input directly from stdin (or perhaps a pipe), but I suspect this design would lead to a very chatty protocol and more difficult integration for vendors. I don’t see a system like the File API working well for debugging because unless the scope of the debugger is severely restricted, there’s a hard requirement for interactivity.

ben.boeckel · October 31, 2020, 1:48am

I believe libuv is going to replace the kwsys process management code at least (mainly seen in CTest). There’s an issue to do some async/await-like behavior with try_* and execute_process, but I’m not sure of its status. I assume it’ll use libuv as well.

The server infrastructure itself…I don’t see that surviving (though I’m certainly also not the one most in-the-know on that). Between commands is likely to be the most doable. Adding breakpoints (rather than just notifications) on variable changes/accesses might be a bit invasive, though variable_watch is already a thing, so maybe its tendrils are sufficient enough for such debugger hooks to follow? Property modifications might also be of interest for pausing execution. Modifying things at variable/property watch points is probably not a great idea since these things get poked at all kinds of random places inside of CMake’s internals where behind-the-back things really shouldn’t be happening. Asking to pause at the end of the current command on the firing of a watchpoint seems reasonable though.

For genexes, something like a $<TRACE:token,value> to trace expansion and evaluation of what is inside might be best? I don’t know how useful single-stepping through them would be beyond such an expansion trace anyways.

ben · November 3, 2020, 1:45am

Thanks for that insight Ben.

At a high-level, I’m thinking an MVP would support:

Setting breakpoints based on filename and line number or command name.
Setting breakpoints for CMake errors or warnings being triggered (particularly fatal errors, if possible).
Setting breakpoints when variable values are read or written.
Step into (single step), step over, and stepping out of the currently executing macro or function.
Exposing the state of variables in scope.
Exposing currently defined targets and properties (this needs some more thought about what properties will be meaningfully defined during script execution).
Information about the call stack (as much as is available).

As mentioned above, initially I was thinking this could leverage the Debug Adapter Protocol (DAP). However, given that some of this functionality is CMake-specific (like exposing targets) and the desire to avoid a separate async communication channel, I’m now thinking the best approach is using a simple, custom protocol that performs all input and output at well-defined points between command executions. This could be based on JSON, like the File API and CMake server, and it should be relatively straightforward to layer DAP on top for tools that require it.

Roughly, this would look like adding a flag to enable debugging (say --debugger) to an existing CMake command line invocation. CMake would then stop before script execution and wait for messages (by default on stdin) to set breakpoints, do any other required configuration, and start execution. Once a breakpoint is hit, CMake would again block script execution, send a message (by default on stdout) that it’s ready for more input, and respond to any requests from the client for more information (variable values, call stack, etc.). Eventually the client will send a message indicating that execution should continue (run, single step, step out, etc.) and the cycle repeats until the script terminates. What this offers beyond the existing --trace, --trace-expand, and variable_watch functionality is better integration with tooling and interactive exploration of program state.

I hadn’t given a lot of thought to genexs, but I like your suggestion of providing a way to trace their expansion. Ideally this tracing could be dynamically configured (like --trace) and not require source level changes, but that may require more invasive changes in CMake to track source positions that generator expressions were derived from (if they exist).

Assuming this high-level sketch sounds reasonable, what’s the right avenue to solicit feedback on a more detailed design? I can continue discussion here on the forums, or open an issue on GitLab if that’s a better venue for technical discussion.

ben.boeckel · November 3, 2020, 1:36pm

This looks like a reasonable overview to me, but thoughts from others would be appreciated (though I’m not familiar with DAP).

As for $<TRACE:> I think it will likely require a source-level change as tracing would otherwise be spotty as $<TRACE:> gets added across projects over time (also slamming their minimum CMake all the way to 3.2x in the process). An alternative might be --trace-genex-target ZLIB::zlib which would trace genexes when evaluated for the named target (HeadTarget in the source) to handle the likely-common case (this would not cause genexes mentioned in ZLIB::zlib properties to be traced when evaluated in the context of a depending target, but CurrentTarget could be matched too I suppose). This should likely be split off into a separate discussion though as it is largely an orthogonal feature.

Cc: @brad.king @craig.scott @robert.maynard @kyle.edwards

craig.scott · November 3, 2020, 9:18pm

We may need to think a bit about what this means for the various cmake_language() subcommands.

Another thing that users may want to have access to is the currently defined export sets. The install components would already be available through the COMPONENTS global pseudo-property.

An interesting idea might also be code injection. If we can find a way to easily inject CMake code to be executed upon a breakpoint, that opens up a whole range of flexible debugging capabilities. Maybe we can leverage the cmake_language() functionality somehow for this?

I would personally be happy with genex support coming later. The other debugging features seem easier and less invasive and would likely cover many already useful debugging situations. If we manage to provide code injection, that might offer more choices for this as well (but I haven’t thought that through very far).

robert.maynard · November 4, 2020, 2:43pm

I agree that genex support should come later as it would require some thought on how you would debug genex evaluation as it all happens ‘concurrently’ at the end of execution.

I agree the ability to pritty print the state of a target and requested associated properties would be a great feature. This will need to be done carefully as not to mislead the user though. For example printing the content of target_link_libraries would only show the direct dependencies and not anything that comes from an embedded generator expression.

ben · November 4, 2020, 6:52pm

Thanks for all your feedback. From skimming the cmake_language() implementation it looks like it would be relatively straightforward to support code injection.

I’ll do some prototyping and write up a more concrete debugger proposal, then loop back for review. At this point I’m planning to leave out genex support, but I will try to incorporate all the features in my original list as well as viewing the export set and some kind of code injection.

ben · November 25, 2020, 8:16pm

Further discussion can happen on the issue tracker.