-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Duplicate NETWORK.CREATE usage events resulting in duplicate usage records #10687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
#9888 fixed this and some other issues on 4.20.1. I'll backport the changes to 4.19.3, and work on a normalization script. |
@winterhazel thanks for sharing, any timeline on when you can raise the backport PR. Thanks. |
@Pearl1594 today 🙂 Regarding the normalization script, though, I hope to look into it within this week. |
Thanks @winterhazel! .. |
@winterhazel , I noticed a small issue in your solution that I want to discuss. Not sure if this is the place but let me start here at least. What happens in your solution is that for all CREATE events a remove is forced if a DESTROY event is found for the same network. I think the issue is actually that CREATE events are created on implementation as well, which should be a separate type of event. I am not sure how harmfull this is but we are now hiding the bug instead of solving it. If you look at for instance VMs, these can have a sequence of events like, All that said, I am not against backporting your fix. I think in the network case, usage might not require the specifics of implement, though organisations with strict resource billing would. future work? |
Hey @DaanHoogland, thanks for bringing this up. We're dealing with two different issues here. The first issue is regarding existing environments. These environments already have multiple CREATE events for the same network, and are seeing (or will see after the next Usage job if we do not address this) duplicated usage record generation. This behavior impacts the usage rating through Quota (because it uses the generated usage records) and through external tools based on the generated usage records. I don't usually see organizations performing resource usage rating by directly consuming the events from Also, I think you may have misunderstood something in my solution by your description, so just making sure that we are aligned on this: my changes mark existing network usage helper entries for a network as removed whenever a CREATE event is processed, independently of there being a DESTROY or not, so that two consecutive CREATE events for the same network do not result in duplicated usage records being generated. The second issue is what you pointed out: network state changes are not tracked properly. We only have three usage events published for networks, and one of them is published for two different situations:
I agree with you that this should be improved. It would be good to have more event types so that we can properly identify when a network transitioned to |
@winterhazel , I largely agree with your previous remarks, except that as you describe above there might be an issue; Older CREATE events shouldn’ t be removed but the newer ones should. The network was actually “ created” on the first create. It may have to be “implemented” multiple times after that , which also lead to CREATE events, but those are not the creation. removing older events instead of the newer ones will lead to missed usage data. (and yes I either misunderstood slightly before , or I do now it seems, but it doesn’t make the overall situation much different) Make sense? |
@DaanHoogland my changes are not removing CREATE events. Modifying existing records in I'll try to explain things from the beginning, please tell me if you still see an issue in this: The way Usage works is around three things: events (from the
Let's use the volume event model as an example. a. Suppose a period between dates A (last time Usage executed) and D (next execution):
b. Then, the Usage job runs at D:
My changes implement b.1.2 for the network event processing. That is, Usage will mark previous helper entries (not events!) as removed when it sees a CREATE event for network, but there are already existing helper entries. This does not lead to missing usage data. The usage records will be generated as intended, as exemplified in this comment, and events remain unchanged. |
@winterhazel , I understand all that and I am just talking about the
My point is that it should not. It should consider the oldest one as legitimate (for networks!) because of the bug we have that after creation of a network more I do not know what the consequences for usage are, but we should address this creation of events (at least) as well. |
@DaanHoogland ah, I think I now understood what you're proposing. It seems good for me. I'll change the current code on NETWORK.CREATE event processing to not create new helper entries for a network when an active one already exists, and make the normalization keep the first created entry as not removed instead of the last one. I'll also take a look at the duplicated NETWORK.CREATE event creation. |
thanks @winterhazel , I will have a look as well. I think adding "NETWORK.ALLOCATION" and "NETWORK.DEALLOCATION" would be good (intuition) but not sure if that makes sense. |
@DaanHoogland I looked into the network Usage event model. Turns out that I was somewhat wrong in my previous comment regarding when
This means that:
I opened #10755 to fix these, and to track when the network goes from |
@DaanHoogland I think we can keep using |
This is not what I observed. I’ll have another look. I saw the |
correct @DaanHoogland, NETWORK.CREATE on create and implement. NETWORK.CREATE event also logged when the network is re-implemented when first VM is created after GC (network.gc.interval config is set) - multiple records are created when the network is GC-ed and re-implemented several times. |
@DaanHoogland @sureshanaparti just making sure:
I tested this multiple times on both the 4.19.2.0 and 4.20.0.0 releases, but nothing is inserted on Also, the only place I found
Line 1526 in 431e4f9
If you could confirm these and provide me where the event is being inserted on |
I do not see any usage event when I create an isolated network, i.e when it isin the 'Allocated; state. I see the
|
ok, thanks @rajujith , I thought I had seen it one time extra, but won’t bother to reproduce. |
correct, NETWORK.CREATE is added on
correct
yes, I observed the same @winterhazel comments inline ^^^ My observations below: Created Isolated Network (Network in Allocated state) =>
Deployed VM in Isolated Network created above (Network in Implemented state) =>
Stopped the VM deployed above (Network back to Allocated state, after NetworkGarbageCollector triggered) =>
Started the VM (Network in Implemented state) =>
|
nice! @winterhazel , I think we can mark your PR #10755 as fixing this issue. Than ks a lot. |
fixed in #10755 |
problem
For isolated networks and VPC tiers, a usage event NETWORK.CREATE is created in the cloud.usage_event table whenever the network transitions from 'Allocated' to 'Implemented' state. A NETWORK.DELETE event is created when the network is deleted.
I suspect there is a bug in the code or a bad design causing this issue. These events result in duplicate usage records for the network for the same period.
Either the NETWORK.CREATE should be created only when it is created, or if we need to capture usage for Allocated and Implemented networks separately, there could be another event.
The network below is not deleted.
The network below got deleted:
Usage records for the same:
Workaround
update configuration set value='-1' where name='network.gc.interval';
Note that it has a side effect that it will not release guest VLANs even when there are no running instances on the network, and the VRs will continue to run.
Existing duplicate usage events and usage records need to be fixed separately.
versions
4.19.1.2, 4.19.2
The steps to reproduce the bug
...
What to do about it?
Review the issue, This looks like a bug that needs to be fixed soon.
The text was updated successfully, but these errors were encountered: