Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

isArrayParent inefficient #2281

Closed
markus2330 opened this issue Oct 27, 2018 · 2 comments
Closed

isArrayParent inefficient #2281

markus2330 opened this issue Oct 27, 2018 · 2 comments
Assignees
Labels

Comments

@markus2330
Copy link
Contributor

When parsing large json files (>2MB, I used machinetalk-protobuf.min.map.json for the benchmark, see https://github.com/machinekit/machinetalk-protobuf/tree/master/dist) more than 70% of the CPU time is spent in isArrayParent. (I could not check for files > 20MB because then it does not terminate in reasonable time anymore.) Unfortunately I did not have debug symbols of the directoryvalue plugin, so I do not know at which line in that function the time is spent.

@sanssecours Do you see an obvious performance problem there? Is it necessary to create a new KeySet and to convert all of the keys to a direct child just to find out if it is an array parent? Shouldn't it be possible to do the check without modifying the keys? Or is the function doing more than it says it does?

@sanssecours
Copy link
Member

Do you see an obvious performance problem there?

Yes, the function is definitely not super efficient.

Is it necessary to create a new KeySet and to convert all of the keys to a direct child just to find out if it is an array parent?

Nope, although I think you should put “just” under quotes here. Ideally checking for an array parent should be nothing more than searching for the meta key array. In reality storage plugins might not add that meta key at all though. As far as I remember YAJL is one of those plugins:

kdb mount config.json user/tests/yajl yajl
kdb set user/tests/yajl/array/#0 bla
#> Create a new key user/tests/yajl/array/#0 with string "bla"
kdb set user/tests/yajl/array/#1 blubb
#> Create a new key user/tests/yajl/array/#1 with string "blubb"
kdb lsmeta user/tests/yajl/array
#> 

. The commands above show a kdb session after I removed directoryvalue from infos/needs. If we do not assume that the meta key array is set correctly, then we have to check if every child of a key is an array key. That is more or less what isArrayParent does.

Shouldn't it be possible to do the check without modifying the keys?

Of course, but I guess that would mean more code and duplicated functionality. For now I modified isArrayParent locally, so it does not create a new KeySet anymore. The function still duplicates and modifies child keys though…

sanssecours added a commit to sanssecours/elektra that referenced this issue Oct 27, 2018
@markus2330
Copy link
Contributor Author

From a wild guess the duplication and modification of all keys causes the problem. Duplication of KeySets is quite fast (one memcopy and updating the ref counters).

A solution that requires the presence of the "array" metadata is imho the way to go, see also #182. It will also be needed by the high-level API. From a quick look only yajl and csvstorage seem to be the current offenders?

But even if we have the metadata, we nevertheless need to check if the structure and the metadata matches. So we need an efficient checking anyway.

sanssecours added a commit to sanssecours/elektra that referenced this issue Oct 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants