string(JSON ... GET ...) slow

neundorf · March 21, 2023, 11:09pm

Hi,

I’ve got a 3 MB JSON file consisting of a list of a few thousand items.
I need to parse them one by one and get the “name” member from each item.
So I’m doing

foreach(idx RANGE ${length})
string(JSON name GET “${json}” “name”)
list(APPEND names ${name})
endforeach()

This is quite slow, around 10 items per second, so for a few thousand items this takes long.
Is there a more efficient way to do this ?

Or does it need an additional mode for string(JSON) which would get each item in a list, or something like this ?

ben.boeckel · March 22, 2023, 12:24pm

Probably because every call to string(JSON) re-parses ${json}.

Instead, you can build up a list of indices to get and fetch them all at once:

set(idxs "")
foreach (idx RANGE ${length})
  list(APPEND idxs "${idx}")
endforeach ()
string(JSON names GET "${json}" ${idxs})

craig.scott · March 22, 2023, 9:29pm

I vaguely recall some discussion about caching of such calls internally so that we could avoid having to reparse the entire JSON content on each call. I don’t think that idea was rejected, just not something to hold back the initial released implementation. I haven’t seen or heard any activity around returning to that idea, but it might help for cases like this.

ben.boeckel · March 22, 2023, 9:51pm

It’s been an idea of mine to stuff variable values in a structure that “remembers” how the string has been parsed as persistent values. Things like:

struct views {
    cm::maybe_parsed<int> as_int;
    cm::maybe_parsed<std::string> as_path;
    cm::maybe_parsed<std::vector<std::string>> as_path_components;
    cm::maybe_parsed<json::value> as_json;
    cm::maybe_parsed<std::vector<std::string>> as_cmake_list; // `cm::string_view` may be possible
}

and so on. The cm::maybe_parsed would remember:

not tried
tried and failed
tried and cached

It would be cleared on any modification to the value (though possibly things like list(APPEND) could preserve specific entries).

neundorf · March 23, 2023, 8:57pm

How about a call to parse it, and then calls to access the parsed data ?
Something like
string(JSON PARSE [ERROR_VARIABLE err] “json-string”)
which would parse the json string and store the parsed data e.g. in a json::value with the same scoping as variables,
and then calls to access the parsed json could refer to this, e.g.
string(JSON out_var GET PARSED_JSON …)
This would be a bit similar to accessing the matched groups from regexes matches.

The “normal” JSON access functions could also put their result into this json::value, so that the following calls could access it.
Or the PARSE function could create a named JSON object, which could be referred to in following calls.

What do you think ?

ben.boeckel · March 23, 2023, 9:28pm

That’d require the same kind of design thinking that’d be involved in the other solution. Note that the “official” value is always the string representation, so manipulating the JSON directly may not be equivalent to the previous round-trip string(JSON) behavior (e.g., key ordering in objects) and therefore require a policy.