Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document allows sequences of kdbGet & kdbSet #4520

Closed
kodebach opened this issue Oct 8, 2022 · 6 comments
Closed

Document allows sequences of kdbGet & kdbSet #4520

kodebach opened this issue Oct 8, 2022 · 6 comments
Labels

Comments

@kodebach
Copy link
Member

kodebach commented Oct 8, 2022

Problem

From #4514:

Key * keyA = ...;
Key * keyB = ...;
// assume that neither A below B nor B below A

KeySet * a = ...;
KeySet * b = ...;

kdbGet(kdb, a, keyA); // (A)
kdbGet(kdb, b, keyB); // (B)

kdbSet(kdb, a, keyA); // (C)

If this is a valid sequence of events. Then neither libelektra-kdb nor anything else called by it (i.e. all plugins, including hook plugins) will be able to correctly calculate the added/changed/removed diff set at (C).

Why? Simply put, because the KDB instance kdb is used with two separate KeySets (a & b) and parentKeys (keyA & keyB).

The detailed reason is:

  • at (A) we can remember the current state of the KDB, by storing the contents of a (EDIT: not the version that was passed in, but the one that will be returned by kdbGet) somewhere (again doesn't matter whether this is done by kdbGet or by a plugin)
  • at (C) we would then use this stored state to calculate the diff
  • but at (B) the state from (A) will be replaced, so we won't get a correct diff

Solution

In #4514 we decided to simply disallow such problematic sequences. We now need to document precisely which sequences of kdbGet/kdbSet calls we support.

@atmaxinger
Copy link
Contributor

Honestly I think this is a stupid idea. How would we disallow this? Just writing somewhere in the documentation that those sequences are disallowed won't help - programmers are lazy. We'd have to change the entire API so that all other sequences are disallowed by the compiler. If we can only do kdbGet and kdbSet once, then we can also say goodbye to kdbOpen and kdbClose.

Also, such a behaviour isn't something that I'd expect from a piece of software that describes itself as a 'global key database'. Imagine if you'd have to reopen a new connection for every query and insert on a SQL database!

As I see it, there are two options:

  1. Have a giant deep-duped KeySet or a hashmap (parent key -> KeySet) for everything that was returned by a kdbGet invocation of a single KDB instance. Calculate the changes on every kdbSet.

  2. Do the tracking on the KeySet level for added/removed keys, and on the Key level for value changes with copy-on-write. (This also includes meta data, as meta data is just a KeySet with keys in it). One downside is that you'd have to use the same KeySet in the kdbSet operation that was returned by the previous kdbGet operation - but this is something we could easily check for and generate an error.

We can also automatically enable/disable the tracking based on whether any hook with the notification/send/diff is registered.

Implementation wise option 1 is easier, but option 2 will consume less memory with the copy-on-write approach. As this would work completely transparent for the users of the API, we could do both and evaluate which is better. Remember, measuring memory impact is part of my thesis, so this would give me something nice to compare 😉

@kodebach
Copy link
Member Author

kodebach commented Oct 8, 2022

Honestly I think this is a stupid idea. How would we disallow this? Just writing somewhere in the documentation that those sequences are disallowed won't help - programmers are lazy.

I agree it's not a good solution, but this is C and the compiler and type system are limited. There already some preconditions you have to fulfil to call kdbGet/kdbSet. This would just be one where there are less checks inside kdbGet/kdbSet.

Also you have to keep in mind that kdbGets docs already have this warning on master:

* @note kdbGet() might retrieve more Keys than requested which are not
* below parentKey or even in a different namespace.
* These keys, except of `proc:/` keys, must be passed to calls of kdbSet(),
* otherwise they will be lost. This stems from the fact that the
* user has the only copy of the whole configuration and backends
* only write configuration that was passed to them.

And this one new-backend (since a few minutes ago):

* **Important**: The KDB @p handle DOES NOT store a copy of the retrieved @p ks internally.
* This means if you later call kdbSet() with the same @p handle you must make sure to
* pass all keys from @p ks, which you do not want to remove.

The both version could be a bit more clear what actually is allowed and needs to happen, but that's why we have this issue.

If we can only do kdbGet and kdbSet once

Of course you can do multiple kdbGet and kdbSet. You just can't arbitrarily mix them while using the same KDB * handle. Some examples of sequences that are not problematic:

kdbGet (handle, ksA, key);
kdbGet (handle, ksB, key);

kdbSet (handle, ksA, key); // OR kdbSet (handle, ksB, key);

// ---

kdbGet (handle, ks, keyA);
kdbGet (handle, ks, keyB);

kdbSet (handle, ks, keyA); // OR kdbSet (handle, ks, keyB);

// ---

kdbGet (handle, ksA, keyBelowKeyA);
kdbGet (handle, ksB, keyA);

kdbSet (handle, ksB, keyBelowKeyA); // OR kdbSet (handle, ksB, keyA);

Note: The sequences above all have different semantic meaning and there is always some connection between the kdbGet/kdbSet calls. The problem is when you try to interleave independent kdbGet & kdbSet sequences. That's because that is very similar to using one KDB handle on multiple threads and a KDB is explicitly not thread-safe.

Imagine if you'd have to reopen a new connection for every query and insert on a SQL database!

AFAIK (but I'm no expert), it is generally not possible to have simultaneous independent SQL transactions on a single connection. Nested transactions can be possible, but not unrelated ones.

In Elektra kdbGet is basically the start of a transaction that ends with kdbSet. So the fact that you can't arbitrarily mix them is quite similar.

The difference is of course that (SQL being a Domain Specific Language) in SQL it is not just an error, but impossible to create code that does this. But like I said, here we are limited by what C can do.

Implementation wise option 1 is easier, but option 2 will consume less memory with the copy-on-write approach.

Option 2 needs a lot more details to even be something we can consider. Key and KeySet are part of libelektra-core not libelektra-kdb. The API to track changes must work standalone and not rely on anything outside libelektra-core. I don't see much more options than callbacks for "key chagned" and "keyset changed" here, which can then be registered by libelektra-kdb.

As I see it, there are two options:

There are (at least) three other option:

  1. Enforce that the parentKey used in kdbSet is below the one used in the last kdbGet. This can be enforce by:

    1. Adding a Key * lastGetParent to struct _KDB.
    2. Doing keyCopy (handle->lastGetParent, parentKey, KEY_CP_NAME) in kdbGet
    3. And checking keyIsSameOrBelow (handle->lastGetParent, parentKey) == 1 in kdbSet

    This should be enough to avoid the problematic situation, because this makes it obvious that anything not present in ks will be removed, even if it was previously loaded into another ks.

    It is already the case that

    kdbGet (handle, ks1, parentKey);
    kdbGet (handle, ks2, belowParentKey);
    
    kdbSet (handle, ks2, parentKey);

    means "delete everything below parentKey that isn't also below belowParentKey".

  2. Enforce that ks used in kdbSet is the same as in the last kdbGet. This can be enforced similarly, by:

    1. Adding a Key * lastGetKs to struct _KDB.
    2. Doing handle->lastGetKs = ks in kdbGet
    3. And checking lastGetKs == ks in kdbSet
  3. Go one step beyond and change the API to

    // same as before but uses handle->data instead of the ks argument
    int kdbGet (KDB * handle, Key * parentKey);
    // same as before but uses handle->data instead of the ks argument
    int kdbSet (KDB * handle, Key * parentKey);
    
    KeySet * kdbData (KDB * handle) {
        return handle->data;
    }

    I prefer this option, because it gets rid of the problematic sequences entirely. You can now arbitrarily mix kdbGet and kdbSet, because it is obvious that the KeySet is linked to the KDB instance. This is, however, the most effort of these 3 options (but less effort than your copy-on-write option).

@atmaxinger
Copy link
Contributor

atmaxinger commented Oct 8, 2022

Key and KeySet are part of libelektra-core not libelektra-kdb. The API to track changes must work standalone and not rely on anything outside libelektra-core

Yes. The complete tracking would take place within libelektra-core. My idea is to extend KeySet with two more arrays - one for added and one for removed keys. And Key with another field for the original data. Both will get a new flag shouldTrack.

Then the altering methods (ksCut, ksAppend, keySetString, keySetBinary, ...) would check the flag, and if set do the tracking accordingly (the key... methods will just set the original data field if not set yet, the ks... methods will add or remove the key from the appropriate array).

kdbGet would then just set the shouldTrack flag when appropriate on the returned KeySet and the keys within. Of course, we would need some extra methods in libelektra-core to set those flags and get the original values and added/removed keys.

@kodebach
Copy link
Member Author

kodebach commented Oct 8, 2022

My idea is to extend

That could need a lot of extra memory, even when the tracking is turned off. The flag could be kept in the existing flags field, but adding a pointer for "original data" to struct _Key means 8 more bytes for every key on 64-bit architectures.

Storing a callback function pointer in e.g. meta:/callback/change and keeping the data elsewhere would be better. If you really want to keep the old value in a key, then storing in a metakey e.g. meta:/kdb/getvalue would also be better, since it only uses extra memory when in use.

However, a big concern that can't really be addresses by any of these solution is: libelektra-core is supposed to be minimal. Tracking changes is not really minimal IMO.

@github-actions
Copy link

github-actions bot commented Oct 9, 2023

I mark this stale as it did not have any activity for one year. I'll close it in two weeks if no further activity occurs. If you want it to be alive again, ping by writing a message here or create a new issue with the remainder of this issue.
Thank you for your contributions 💖

@github-actions github-actions bot added the stale label Oct 9, 2023
@github-actions
Copy link

I closed this now because it has been inactive for more than one year. If I closed it by mistake, please do not hesitate to reopen it or create a new issue with the remainder of this issue.
Thank you for your contributions 💖

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants