-
Notifications
You must be signed in to change notification settings - Fork 5k
Filebeat in memory states registry improvements #6346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- make States type easier to find by moving into separate states.go file - Ensure provides States constructor is actually used - Add ID->array index, index for faster lookups on update and find operations. When updating states in a big registry, the Updates converged to quadratic complexity. The index helps in keeping the complexity about linear in number of state updates. - Debug will print number of states subject to future cleanups (if state TTL > 0) - Add title to states unit tests
filebeat/input/file/states.go
Outdated
| return len(s.states) | ||
| } | ||
|
|
||
| // Returns a copy of the file states |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment on exported method States.GetStates should be of the form "GetStates ..."
| } | ||
| } | ||
|
|
||
| func (s *States) FindPrevious(newState State) State { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exported method States.FindPrevious should have comment or be unexported
filebeat/input/file/states.go
Outdated
| return len(s.states) | ||
| } | ||
|
|
||
| // Returns a copy of the file states |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment on exported method States.GetStates should be of the form "GetStates ..."
| } | ||
| } | ||
|
|
||
| func (s *States) FindPrevious(newState State) State { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exported method States.FindPrevious should have comment or be unexported
filebeat/input/file/states.go
Outdated
| return len(s.states) | ||
| } | ||
|
|
||
| // Returns a copy of the file states |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment on exported method States.GetStates should be of the form "GetStates ..."
| } | ||
| } | ||
|
|
||
| func (s *States) FindPrevious(newState State) State { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exported method States.FindPrevious should have comment or be unexported
|
In-memory state registry optimizations only (original ticket: #6323). |
| // The number of states that were cleaned up and number of states that can be | ||
| // cleaned up in the future is returned. | ||
|
|
||
| func (s *States) Cleanup() (int, int) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exported method States.Cleanup should have comment or be unexported
|
One test is still failing for filebeat? |
ph
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code look good one test is failling.
filebeat/input/file/states.go
Outdated
| return s.states[i] | ||
| } | ||
|
|
||
| // findPrevious returns the previous state fo the file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/fo/for
filebeat/input/file/states.go
Outdated
| // Cleanup cleans up the state array. All states which are older then `older` are removed | ||
| // The number of states that were cleaned up and number of states that can be | ||
| // cleaned up in the future is returned. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Space?
filebeat/input/file/states.go
Outdated
| // remove entry by relocating last entry + shrink states array | ||
| last := len(s.states) - 1 | ||
| s.states[i] = s.states[last] | ||
| s.states = s.states[:last] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thinking out loud here, related to the case of a lot of files.
Could this generate a lot of eviction/resizing in the real world? I guess we are just moving pointers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean CPU cache? No idea without testing/profiling. But we basically narrow down the array from beginning and end at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also depends on how many entries need to be removed.
ruflin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitively huge +1 on using an index for the lookup, be it as part of the state map string or separate index. Left some minor questions.
| states []State | ||
|
|
||
| // idx maps state IDs to state indexes for fast lookup and modifications. | ||
| idx map[string]int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason you changed your mind to the separate map string for keeping track of the ids?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too many copies :)
The states mapping has these very frequent access patterns:
- access single entry by ID
- loop/copy all states
Performance wise, both access patterns are equally important, as they are run very very often (basically per event).
Optimizing the access by ID case (using map[string]state) I found the profiling being dominated by extra copies, even if only one field is required. Introducing an index on array indexes, we still have the well working 'all-access loop' + we optimize single entry access to be about O(1) complexity.
Kind of best of both worlds with a little constant overhead when actually removing entries from the registry. As state update/add + iteration + copy are the most often used operations, the added overhead on removal is pretty much worth it.
| statesBefore := len(s.states) | ||
| numCanExpire := 0 | ||
|
|
||
| L := len(s.states) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
upper case on purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep. L for "length" :)
| logp.Debug("registrar", | ||
| "Registrar states cleaned up. Before: %d, After: %d", | ||
| beforeCount, beforeCount-cleanedStates) | ||
| "Registrar states cleaned up. Before: %d, After: %d, Pending: %d", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already see questions coming on what pending cleanup means :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can call it 'active'. The number of 'pending' states is all states with TTL > 0.
Following the state changes and TTLs, I found TTL basically has 4 meanings:
-2: zombie state (no prospector/input processing the file)-1: alive (prospector/input owning the file, but no TTL)0: done (entry is marked as removable)>0: subject to timeout
Any better naming?
Currently The 'pending' count is the number of states potentially subject to being timed out. But states -1 and >0 indicate the state to be 'active'.
| } | ||
| } else { | ||
| i++ | ||
| if canExpire { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason you added canExpire? If there is clean_inactive configured it applies to all states.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my TTL comment. The canExpire marks states with TTL > 0 -> states subject to timeout. If Cleanup returns and number of events with TTL>0 is 0, no cleanup needs to be run until a new state with TTL>=0 is added.
I also make use of this in my followup PR (#6347) by deferring/suspending states Cleanup in the registrar.
| delete(s.idx, state.ID()) | ||
| logp.Debug("state", "State removed for %v because of older: %v", state.Source, state.TTL) | ||
|
|
||
| L-- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the code with i++ and L-- I'm thinking if the option with combining index and registry would actually bring less complexity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's not that clear as some extra work is done within the loop, but the loop basically sorts the list into alive and dead states. The L is the current length. Removal/sorting is in-place. That is, we're forced to check the current index again upon removal.
The i++ and L-- act as delimiters of processed and alive/dead states => both need to be updated, depending if state is alive or not. Very basic in-place removal loop without extra copies.
would this be more clear for you:
func (s *States) Cleanup() (int, int) {
s.Lock()
defer s.Unlock()
L := len(s.states)
currentTime := time.Now()
statesBefore := L
numCanExpire := 0 // number of alive states with TTL > 0
for i := 0; i < L; {
state := &s.states[i]
canExpire := state.TTL > 0
expired := (canExpire && currentTime.Sub(state.Timestamp) > state.TTL)
if (state.TTL == 0 || expired) && state.Finished {
// remove by replacing entry at i with last entry and shrink 'alive' states length
L--
if L != i {
// swap and update index
s.states[i], s.states[L] = s.states[L], s.states[i]
s.idx[s.states[i].ID()] = i
}
} else {
i++
if canExpire {
numCanExpire++
}
}
}
alive := s.states[:L]
dead := s.states[L:]
for i := range dead {
delete(s.idx, dead[i].ID())
}
s.states = alive
return statesBefore - len(s.states), numCanExpire
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I've looked at the code initially I had to use pseudo code here to make sure the in place was working correctly.
I prefer the first version.
|
Failing ci test is on auditbeat. Unrelated to this PR. |
ph
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@ph Please go forward and merge if Jenkins failure is not related. The most important part is the functionality and there we agree. We can still do minor refactoring later. |
- Defer registrar state gc until the registry needs to be written. - when updating regitrar states from batch, ensure all updated states will have same timestamp. Depends on #6346
* Filebeat in memory states registry improvements - make States type easier to find by moving into separate states.go file - Ensure provides States constructor is actually used - Add ID->array index, index for faster lookups on update and find operations. When updating states in a big registry, the Updates converged to quadratic complexity. The index helps in keeping the complexity about linear in number of state updates. - Debug will print number of states subject to future cleanups (if state TTL > 0) - Add title to states unit tests * Fix godocs * update state index in Cleanup * Refine index handling + reintroduce debug message * review (cherry picked from commit 490cbcd)
* Filebeat in memory states registry improvements - make States type easier to find by moving into separate states.go file - Ensure provides States constructor is actually used - Add ID->array index, index for faster lookups on update and find operations. When updating states in a big registry, the Updates converged to quadratic complexity. The index helps in keeping the complexity about linear in number of state updates. - Debug will print number of states subject to future cleanups (if state TTL > 0) - Add title to states unit tests * Fix godocs * update state index in Cleanup * Refine index handling + reintroduce debug message * review (cherry picked from commit 490cbcd)
- Defer registrar state gc until the registry needs to be written. - when updating regitrar states from batch, ensure all updated states will have same timestamp. Depends on elastic#6346 (cherry picked from commit 2d2400b)
Backport changes from elastic#6346 and elastic#6347, to improve state update performance on large registry files. - make States type easier to find by moving into separate states.go file - Add ID->array index, index for faster lookups on update and find operations. When updating states in a big registry, the Updates converged to quadratic complexity. The index helps in keeping the complexity about linear in number of state updates. - Do not gc state if there are no 'pending' entries
…ic#6804) * Filebeat in memory states registry improvements - make States type easier to find by moving into separate states.go file - Ensure provides States constructor is actually used - Add ID->array index, index for faster lookups on update and find operations. When updating states in a big registry, the Updates converged to quadratic complexity. The index helps in keeping the complexity about linear in number of state updates. - Debug will print number of states subject to future cleanups (if state TTL > 0) - Add title to states unit tests * Fix godocs * update state index in Cleanup * Refine index handling + reintroduce debug message * review (cherry picked from commit 042b284)
- Defer registrar state gc until the registry needs to be written. - when updating regitrar states from batch, ensure all updated states will have same timestamp. Depends on elastic#6346 (cherry picked from commit 0ae953c)
operations. When updating states in a big registry, the Updates
converged to quadratic complexity. The index helps in keeping the
complexity about linear in number of state updates.