New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
isArrayParent inefficient #2281
Comments
Yes, the function is definitely not super efficient.
Nope, although I think you should put “just” under quotes here. Ideally checking for an array parent should be nothing more than searching for the meta key kdb mount config.json user/tests/yajl yajl
kdb set user/tests/yajl/array/#0 bla
#> Create a new key user/tests/yajl/array/#0 with string "bla"
kdb set user/tests/yajl/array/#1 blubb
#> Create a new key user/tests/yajl/array/#1 with string "blubb"
kdb lsmeta user/tests/yajl/array
#> . The commands above show a
Of course, but I guess that would mean more code and duplicated functionality. For now I modified |
This commit addresses issue ElektraInitiative#2281.
From a wild guess the duplication and modification of all keys causes the problem. Duplication of KeySets is quite fast (one memcopy and updating the ref counters). A solution that requires the presence of the "array" metadata is imho the way to go, see also #182. It will also be needed by the high-level API. From a quick look only yajl and csvstorage seem to be the current offenders? But even if we have the metadata, we nevertheless need to check if the structure and the metadata matches. So we need an efficient checking anyway. |
This commit addresses issue ElektraInitiative#2281.
When parsing large json files (>2MB, I used machinetalk-protobuf.min.map.json for the benchmark, see https://github.com/machinekit/machinetalk-protobuf/tree/master/dist) more than 70% of the CPU time is spent in isArrayParent. (I could not check for files > 20MB because then it does not terminate in reasonable time anymore.) Unfortunately I did not have debug symbols of the directoryvalue plugin, so I do not know at which line in that function the time is spent.
@sanssecours Do you see an obvious performance problem there? Is it necessary to create a new KeySet and to convert all of the keys to a direct child just to find out if it is an array parent? Shouldn't it be possible to do the check without modifying the keys? Or is the function doing more than it says it does?
The text was updated successfully, but these errors were encountered: