I noticed that we use
std::map quite a bit throughout the CMake source code. No doubt this stems from before the C++11 days, but now that we require C++11, I’m wondering if at least some of these uses might be more efficient if replaced with
std::unordered_map? If we have containers of a reasonable number of values, they may be good candidates for replacement. Could be some easy wins lurking in there, if someone is interested and knows where a few hot code paths are.
I noticed that we use
As with anything to do with performance, results from measured projects are the only thing that actually matter here. Speculation can guide what to test first, but these kinds of things are sensitive to the weirdest things that are hard to predict and hand-waving just doesn’t save time over actual measurements.
That said, there are even low-hanging fruits over
std::unordered_map if we’re going down that route. Of note is that, for some reason, the standard requires a
size_t n_buckets; behavior of the hashing containers. This leads to 64-bit integer modulus operations which are vastly slower than the 32-bit counterpart:
IDIV r64 Throughput 85-100 cycles per instruction IDIV r32 Throughput 20-26 cycles per instruction
A hashing container which just “limits” its number of buckets to 32-bit should get a nice perf boost. I don’t know if we can measure how “hot” each of our map instances are or not, but that might be the first thing to do here.
As for actual measuring, callgrind with KCacheGrind as a viewer has worked well for me.
perf can also find interesting things, but I find it harder to use (though usually because I get what I need from callgrind and haven’t used
perf that much).
Several historical uses of
std::map have been replaced with
std::unordered_map already. Such updates are fine with me. However, there may be some cases where we depend on the ordering guarantees of
std::map to produce deterministic results. That needs to be considered on a case-by-case basis.
The one that got my attention which led to this post was in
cmGeneratorExpressionNode::GetNode(). There’s a static
std::map object that seems like it could be a
std::unordered_map and I am curious if it is a factor in generation time (I hindsight, it probably isn’t a big enough map to lead to an appreciable difference). I don’t have any test cases at hand with which I can readily measure that though.