This has a good overview, but it seems the ninja feature was never completed.
The general idea is that cl will create some number of static subprocesses (essentially a pool of worker processes), and the pool can gain speedup via eliminating process startup/teardown overhead, and reducing duplicated work (such as parsing the same includes for every source file). Note there are (simple) rules the build system must abide by: any source files given on the same command line to cl necessarily share the same command line options. So, it is advantageous for the build system to be aware of the parallelism cl is using such that it could invoke multiple cl instances which internally all have some degree of parallelism. (e.g. your cpu has 64threads, and you have 4 groups of source files with unique commandlines consisting of <= 16 files each, to reach maximum performance you can schedule all of the groups simultaneously).
AFAIK the only mode of operation in Ninja is that it starts a cl process for each input source file, where the number of concurrent processes is controlled by the machine’s thread count or -j.
p.s. I’m not sure why that feature writeup mentions
/showIncludes (or exactly what Ninja is doing internally with the info), but on more recent versions of msvc, there is the
/sourceDependencies option, which is a more robust implementation. I’ve used it to implement something like ccache’s direct-mode and it works quite well.