string(JSON ... GET ...) slow

Hi,

I’ve got a 3 MB JSON file consisting of a list of a few thousand items.
I need to parse them one by one and get the “name” member from each item.
So I’m doing

foreach(idx RANGE ${length})
string(JSON name GET “${json}” “name”)
list(APPEND names ${name})
endforeach()

This is quite slow, around 10 items per second, so for a few thousand items this takes long.
Is there a more efficient way to do this ?

Or does it need an additional mode for string(JSON) which would get each item in a list, or something like this ?

Probably because every call to string(JSON) re-parses ${json}.

Instead, you can build up a list of indices to get and fetch them all at once:

set(idxs "")
foreach (idx RANGE ${length})
  list(APPEND idxs "${idx}")
endforeach ()
string(JSON names GET "${json}" ${idxs})

I vaguely recall some discussion about caching of such calls internally so that we could avoid having to reparse the entire JSON content on each call. I don’t think that idea was rejected, just not something to hold back the initial released implementation. I haven’t seen or heard any activity around returning to that idea, but it might help for cases like this.

It’s been an idea of mine to stuff variable values in a structure that “remembers” how the string has been parsed as persistent values. Things like:

struct views {
    cm::maybe_parsed<int> as_int;
    cm::maybe_parsed<std::string> as_path;
    cm::maybe_parsed<std::vector<std::string>> as_path_components;
    cm::maybe_parsed<json::value> as_json;
    cm::maybe_parsed<std::vector<std::string>> as_cmake_list; // `cm::string_view` may be possible
}

and so on. The cm::maybe_parsed would remember:

  • not tried
  • tried and failed
  • tried and cached

It would be cleared on any modification to the value (though possibly things like list(APPEND) could preserve specific entries).

How about a call to parse it, and then calls to access the parsed data ?
Something like
string(JSON PARSE [ERROR_VARIABLE err] “json-string”)
which would parse the json string and store the parsed data e.g. in a json::value with the same scoping as variables,
and then calls to access the parsed json could refer to this, e.g.
string(JSON out_var GET PARSED_JSON …)
This would be a bit similar to accessing the matched groups from regexes matches.

The “normal” JSON access functions could also put their result into this json::value, so that the following calls could access it.
Or the PARSE function could create a named JSON object, which could be referred to in following calls.

What do you think ?

That’d require the same kind of design thinking that’d be involved in the other solution. Note that the “official” value is always the string representation, so manipulating the JSON directly may not be equivalent to the previous round-trip string(JSON) behavior (e.g., key ordering in objects) and therefore require a policy.