Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking changes to the configuration within KDB #4514

Closed
atmaxinger opened this issue Oct 6, 2022 · 13 comments
Closed

Tracking changes to the configuration within KDB #4514

atmaxinger opened this issue Oct 6, 2022 · 13 comments
Assignees
Labels

Comments

@atmaxinger
Copy link
Contributor

atmaxinger commented Oct 6, 2022

We currently cannot calculate added/changed/removed in kdbSet, because we don't have a copy of the KeySet from kdbGet. We only know the keyNeedSync state. But that doesn't tell us, if a key was removed (it just won't be there) and we cannot differentiate between added key and changed key.

That said, I actually do think that we should change that. kdbGet should store a copy (ksDeepDup) of ks inside the KDB * handle. Then we can actually calculate the real diff in kdbSet and provide that to notification plugins. The benefit here is that in the current situation every notification hook that needs a diff must create it's own ksDeepDup copy of the keyset, or at least of the part of the keyset which is being watched.

We could even use a separate hook name (e.g. hook/notification/send/diff), just in case some notification plugins don't want a diff and want the whole KeySet instead.

Note The ksDeepDup copy should only be created, if we need it, i.e. there is an enabled notification hook.

This could also be extended and we could let backend plugins access the diff for their mountpoint. The advantage there would be that non-file-based backends (like databases) could (in theory at least) take advantage of the diff. For file-based backends we don't need a diff, because we just write the whole anyway since everything is would be very complicated in most formats and not worth it. But a database backend could for example translate the added/changed/removed keys into INSERT/UPDATE/DELETE SQL statements.

Originally posted by @kodebach in #4504 (comment)


I thought about this too, but we'd also need to map it using the parent keys used to get the keysets. Otherwise we would only have the last retrieved keyset stored, and there is no way of enforcing that the user of the API will only perform kdbGet after a kdbSet of the previous read keyset.

Originally posted by @atmaxinger in #4504 (comment)


I thought about this too, but we'd also need to map it using the parent keys used to get the keysets. Otherwise we would only have the last retrieved keyset stored, and there is no way of enforcing that the user of the API will only perform kdbGet after a kdbSet of the previous read keyset.

I'm not entirely sure what you mean here. But regardless wouldn't any such issue exist in exactly the same way in a plugin?

It is already mandatory to call kdbGet before kdbSet and the kdbGet call must use the same or more backends as kdbSet needs.

There is also the KeySet * keys in each of the struct _BackendData in kdb->backends. I'd have to check, but maybe these already are or at least could be keyDup'ed and maybe they aren't cleared at the end of kdbGet. Then we could use that data at the start of kdbSet (before all the KeySet * keys are overwritten).

Originally posted by @kodebach in #4504 (comment)


Yes, the same problem persists in the plugin. What I mean is this:

Key * keyA = ...;
Key * keyB = ...;

KeySet * a = ...;
KeySet * b = ...;

kdbGet(kdb, a, keyA);
kdbGet(kdb, b, keyB);

// Change keys in a

kdbSet(kdb, a, keyA);

If we only store the last KeySet that has been retrieved, we would have stored b, but now we are updatding KeySet a.

Originally posted by @atmaxinger in #4504 (comment)


Yes, the same problem persists in the plugin.

i.e. calculating the diff in kdbSet by storing the keyset wouldn't make anything worse

What I mean is this: [...]

Ah, okay. The first solution for this is simply documentation. Simply state that in this sequence:

// previous uses of kdb handle possible
kdbGet (kdb, ks1, key1);
// no kdbGet/kdbSet inbetween
kdbSet (kdb, ks2, key2);

key2 must be the same as or below key1. The relationship of ks1 and ks2 follows from this.

Alternative solutions are:

  1. Totally change the KDB API:
    1. Store a KeySet * data inside struct _KDB and create a KeySet * kdbData (KDB * handle) that retrieves it.
    2. Change kdbOpen to take a parentKey instead of an errorKey, store that parentKey (just a copy of the same name) in struct _KDB
    3. Remove the KeySet * argument from kdbGet and kdbSet
    4. Mandate (and check) that the parentKey used with kdbGet/kdbSet is same or below the one from kdbOpen.
  2. Mandate that the KeySet * used in a kdbSet must be the same as in the last kdbGet (for the same KDB*). This can be checked, by storing the KeySet * inside the struct _KDB without any ksDup. Simply:
    // end of kdbGet
    handle->ks = ks;
    
    // start of kdbSet
    if (handle->ks != ks) return -1;

@markus2330 Any opinions on this issue? Was the sequence suggested by @atmaxinger even a supported use case at any point, or was this maybe always considered misuse of the API?

Originally posted by @kodebach in #4504 (comment)

@markus2330
Copy link
Contributor

We currently cannot calculate added/changed/removed in kdbSet

I don't understand this statement, what is doing src/plugins/logchange doing then?

I would prefer if all plugins would work with the same interface, even if this means that a KeySet needs to be duplicated. This is a quite cheap operation, so I do not see a problem with it.

(Didn't read the rest for now.)

@kodebach
Copy link
Member

kodebach commented Oct 6, 2022

(Didn't read the rest for now.)

Then don't comment... Especially not after just (half of) one sentence...

The actual problem is described in the example following this line:

Yes, the same problem persists in the plugin. What I mean is this:

The question I originally posed to you is at the very end:

Any opinions on this issue? Was the sequence suggested by atmaxinger even a supported use case at any point, or was this maybe always considered misuse of the API?


I would prefer if all plugins would work with the same interface

Hook plugins already use separately exported functions, so changing the arguments really isn't an issue.

even if this means that a KeySet needs to be duplicated. This is a quite cheap operation, so I do not see a problem with it

If you just care about the names of added/changed/removed keys, then a simple ksDup is enough:

  • "added": Key is in new, but not in old
  • "removed": Key is in old, but not in new
  • "changed": Key is in both and sync flag is set

A keyDup is indeed cheap enough that every plugin could do it, in a pinch, but it would still be waste, when libelektra-kdb could do it just once instead.

However, even if you also want to know the old and new values in kdbSet you need a ksDeepDup, because someone could have called keySetValue and changed the data in the ksDup'ed keyset. Metadata is a similar problem, just using ksDup means it can still be changed elsewhere.

Regardless, none of this actually addresses the issue discovered by atmaxinger. Multiple kdbGet and kdbSet can still mean you don't have the right keyset, unless you keep all of them.

@markus2330
Copy link
Contributor

Then don't comment...

IMHO issues should be understandable, at least for core developers, from the first sentence to the last.

Yes, the same problem persists in the plugin. What I mean is this:

Normal plugins should never see during kdbGet what the user passed. They should only see what retrieved from the KDB. Was this introduced in the new-backend branch? (Global plugins admittedly probably already had such problems before new-backend. But this was a bug.)

The question I originally posed to you is at the very end:

I didn't answer, as I already had problems understanding what this issue is about much earlier. If I tried to answer this question your hint would have been appropriate: one shouldn't comment about what someone didn't read.

Hook plugins already use separately exported functions, so changing the arguments really isn't an issue.

This was never decided that way? Is this implemented in the new-backend branch?

However, even if you also want to know the old and new values in kdbSet you need a ksDeepDup, because someone could have called keySetValue and changed the data in the ksDup'ed keyset.

Ok, this would need quite a lot of memory.. Is it a possibility that we remember old values of keys? (keySetString creates e.g. new metadata with old value iff the metadata doesn't exist and the value is actually different.)

Metadata is a similar problem, just using ksDup means it can still be changed elsewhere.

For metadata the problem shouldn't exist, as meta-keys are unchangeable. (This is also a requirement for the spec plugin to work.)

@kodebach
Copy link
Member

kodebach commented Oct 6, 2022

to the last.

That's the important part. The first sentence is often explained by a later one. If you stop reading somewhere, of course you may not understand it.

Normal plugins should never see during kdbGet what the user passed. They should only see what retrieved from the KDB. Was this introduced in the new-backend branch?

Once again, I regret that we use the term "plugin" for everything. In this case "persists in the plugin" is about hook plugins (previously global plugins).

Completely unrelated to that. The issue is about "what was retrieved from the KDB". It has nothing to do with what the user passed in.

I didn't answer, as I already had problems understanding what this issue is about much earlier.

What I wanted to say is: I posed the question at the end of the comment chain, therefore I considered the entire comment chain (i.e. the entire description of this issue) context for the question. In other words, the question being at the very end should have been the hint that you need to read it all.

This was never decided that way? Is this implemented in the new-backend branch?

It is implemented that way and was decided that way. I'm not sure why it's not in doc/decisions/hooks.md, but it is in doc/dev/hooks.md which was part of #4471. A PR which you yourself merged, so we all assumed you agreed with it. IIRC you also didn't comment on this fact in the PR.

Is it a possibility that we remember old values of keys? (keySetString creates e.g. new metadata with old value iff the metadata doesn't exist and the value is actually different.)

Sounds like the origvalue problem...

For metadata the problem shouldn't exist, as meta-keys are unchangeable.

The "problem" I meant is that a ksDup reuses all the existing meta keysets in all the keys (it doesn't keyDup). So when someone e.g. removes a metakey, it is also gone from the ksDuped version.


I'm not sure if it is needed, but here is a detailed explanation of the problem:

Key * keyA = ...;
Key * keyB = ...;
// assume that neither A below B nor B below A

KeySet * a = ...;
KeySet * b = ...;

kdbGet(kdb, a, keyA); // (A)
kdbGet(kdb, b, keyB); // (B)

kdbSet(kdb, a, keyA); // (C)

If this is a valid sequence of events. Then neither libelektra-kdb nor anything else called by it (i.e. all plugins, including hook plugins) will be able to correctly calculate the added/changed/removed diff set at (C).

Why? Simply put, because the KDB instance kdb is used with two separate KeySets (a & b) and parentKeys (keyA & keyB).

The detailed reason is:

  • at (A) we can remember the current state of the KDB, by storing the contents of a (EDIT: not the version that was passed in, but the one that will be returned by kdbGet) somewhere (again doesn't matter whether this is done by kdbGet or by a plugin)
  • at (C) we would then use this stored state to calculate the diff
  • but at (B) the state from (A) will be replaced, so we won't get a correct diff

So now our options are (roughly, more details in issue description) in order of ascending effort:

  1. Forbid doing what happens here. Under certain relations between keyA and keyB (and a and b) things work, these could be allowed.
  2. Document (and maybe enforce) that the KeySet * used for kdbSet must be the same as in the last kdbGet (for this KDB *)
  3. Severely change the KDB, so that KDB owns a KeySet * data instead of it being provided by the user.

@atmaxinger
Copy link
Contributor Author

This was never decided that way? Is this implemented in the new-backend branch?

I'm not sure why it's not in doc/decisions/hooks.md

It actually IS in there under constraints:

Plugin interface should be the same. Many plugins, where appropriate, e.g. dbus, should work as global plugins w/o any change in code (i.e. only changes in contract)

@atmaxinger
Copy link
Contributor Author

atmaxinger commented Oct 6, 2022

So now our options are (roughly, more details in issue description) in order of ascending effort:

I think there is actually another option:

  • Store a deep copy of all keys that were ever read in a big KeySet.
  • Do NOT replace this KeySet on further kdbGet calls -- only append a copy of those new keys into the existing KeySet
  • When calling kdbSet, append a copy of those keys also, and replace the existing keys with the new values.

This will use lots of memory, but theoretically it should work. After all, the only thing we want to know is what a single kdbSet changed to the current state. The main challenge there would be to find out which keys were deleted. But as we have to supply the parent key with each kdbSet, this should be doable by looking which keys in the big KeySet are below this parent key, but are missing in the keyset passed to kdbSet.

@kodebach
Copy link
Member

kodebach commented Oct 6, 2022

Hook plugins already use separately exported functions, so changing the arguments really isn't an issue.

This was never decided that way? Is this implemented in the new-backend branch?

I'm not sure why it's not in doc/decisions/hooks.md

It actually IS in there under constraints:

Okay, but "only changes in contract" doesn't really say that we do add new "exports" to the contract. I think that was the problem.

I think there is actually another option:

Isn't t "the big KeySet" just the same as we already have in the KeySet * keys in struct _BackendData (see #4487 (comment) for bit about this struct)?

But not sure if those keys are ksDeepDup'ed, but if that's the case, it would be pretty simple.

This will use lots of memory

If we actually need additional memory (e.g. a deep dup), to make this (A), (B), (C) sequence work, I would just go with the option that forbids this sequence. Such a sequence is probably quite rare in practice and when you need it you can always use two separate KDB instances.

@markus2330
Copy link
Contributor

The first sentence is often explained by a later one.

This is only done by someone who has troubles writing understandable text. Forward references are a terrible style. What is okay is that earlier sentences create a question that is (gradually) answered by the next sentence(s).

“Begin at the beginning," the King said, very gravely, "and go on till you come to the end: then stop.”

― Lewis Carroll, Alice in Wonderland

Once again, I regret that we use the term "plugin" for everything.

Elektra's plugins all need to implement the same interface. So your proposed hooks would actually not be plugins.

Sounds like the origvalue problem...

Only if there are several parties needing it. We need to say that this metadata is exclusively for recording.

I'm not sure if it is needed, but here is a detailed explanation of the problem

Of course, it is very much needed! The issue should have begun with this.

I am still not sure if the problem actually exists.

Consider

  • kdbGet as flow of KeySet from persistent storage to applications
  • kdbSet as flow of KeySet from applications to persistent storage

Then it actually does not matter what keyset is passed to kdbGet because the relevant KeySet is the one that gets emitted from the storage plugin. So IMHO the hook simply need to be at the right spot, where it sees the KeySet as emitted from the storage plugin (and not the one passed from the application).

I'm not sure why it's not in doc/decisions/hooks.md

Mistakes happen. Obviously the most important facts must be parts of decisions. And as @atmaxinger points out there is even a contradiction. Decisions always win over implementation details described somewhere else.

Anyway: if hooks not being plugins really is the better way, we can change the decision. It looks like there wasn't enough understanding about recording at the point when we created the decision.

Store a deep copy of all keys that were ever read in a big KeySet.

IMHO this would fit in the model that the mmap-cache allows.

Such a sequence is probably quite rare in practice and when you need it you can always use two separate KDB instances.

For me this is quite clear: If this sequence really is the problem, we should prohibit it. This shouldn't be a basis for a decision.

What matters is would we get permanent overhead for recording even if the recording plugin is disabled (both the recording of the oldvalue and calculating something in the core would have this problem). Memory overhead actually would be even worse.

@atmaxinger can you please write decision(s) with the current problems the framework has related to recording changes?

From the timeline, however, we should focus on getting new-backend merged, so let us please first concentrate on that. Let us simply fix the docu about hooks (describing how it is in new-backend) and let us move the decision to "In Discussion" as there are obviously misunderstandings.

@kodebach
Copy link
Member

kodebach commented Oct 8, 2022

  1. Please don't close issues (especially not as "completed"), where something clearly needs to be done (here: document what is prohibited). At the very least open a new issue and then close the old issue by linking to the new one.
  2. Either read the whole issue or don't complain when someone asks you to read at least the whole issue description. I also don't ask people "Where did the rabbit get a pocket watch?" after the first few sentences of Alice in Wonderland (to use your example). No, I read the rest of the story and if at the end things are still unclear, then I ask questions. In fact the thought "wait why is this a problem" should be reason to read on, not to stop and complain that the first sentence doesn't explain everything.

From the timeline, however, we should focus on getting new-backend merged, so let us please first concentrate on that.

The whole reason this issue exists, was so we could merge #4504 and not loose the comment thread.

This is only done by someone who has troubles writing understandable text.

I have to defend atmaxinger here. This issue is a copy of a comment thread from #4504. That could have been made a bit more clear yes, and the original context of the code in the PR is missing, but you really should read the whole issue before you complain about not understanding. When you read at least until the first "Originally posted by" it should also become clear that this issue exists to not loose a comment thread from a merged PR.

I should probably also apologize for my harsh response, but you have done this "I haven't read the whole thing, but I have some questions, which are actually already answered in the part I didn't read" many times. It is really annoying for the people who then have to waste time to point out that reading the whole thing would be a good idea.


can you please write decision(s) with the current problems the framework has related to recording changes?

Your obsession with decisions honestly baffles me. Why would you write a decision when the only thing we have is a problem? Without a solution what is there to decide...

This is exactly the case, where opening an issue to discuss a problem is much better IMO. Otherwise we just end up with "PR as discussion forum" again and that assumes atmaxinger even has an idea for a solution. If not we'll just have a PR half a decision file (with just "this is the problem") and end up with even broader discussions.

Note Don't answer this last part, I just wanted to give some context to my arguments in #4436.

@kodebach
Copy link
Member

kodebach commented Oct 8, 2022

New issue for remaining work: #4520

@markus2330
Copy link
Contributor

markus2330 commented Oct 9, 2022

I have to defend atmaxinger here.

No need to defend anyone. Nothing in "This is only done by someone who has troubles writing understandable text." refers to you or @atmaxinger. "The first sentence is often explained by a later one." is a bad style not done by either of you.

Your obsession with decisions honestly baffles me. Why would you write a decision when the only thing we have is a problem? Without a solution what is there to decide...

Because I unfortunately don't have the time to collect all the information from many discussion entries in PRs and/or issues myself. So I need to ask you to properly define the problem, assumptions etc. in a decision. There is no need to immediately write a solution if it is not known yet.

Otherwise we just end up with "PR as discussion forum" again

This is not needed if the decision is well done. If the problem is not really major, the assumptions flawed etc. then yes... its not easy to reach anywhere then.

If there are questions, we can use the issue tracker. But if we want to come to a technical sound decision, someone needs to do the work properly:

  1. clearly define the problem
  2. etc. etc. what is written in the decision template.

What I see as my task is that I explain the decisions better. It is explained when visiting the FLOSS course, in all good courses for software architects (like done by the arc42 creators), and also in many papers, e.g. by Uwe Zdun, but I agree that I cannot expect you to do this research. In the same way I won't do the research what the problem in this endless discussion is, so I needed to close this issue.

If anything essential is here, @atmaxinger can copy it to a decision.

@kodebach
Copy link
Member

kodebach commented Oct 9, 2022

I get the need to document the problem properly. What I don't get is how a half finished decision file in a PR is any better a normal issue.

If the problem is already well known and can be define the issue could look like #4521 with a clear definition of the problem and a summary of what was covered in previous discussions.

If the exact problem isn't clear yet, the decision file wold also just contain one or two sentences just like an issue would.

IMO as long as there is no concrete solution to propose there shouldn't be a PR.

@markus2330
Copy link
Contributor

See #4528

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants