-
Notifications
You must be signed in to change notification settings - Fork 5.2k
datas tuning fix #98743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
datas tuning fix #98743
Conversation
Tagging subscribers to this area: @dotnet/gc Issue Detailswill add description soon.
|
} | ||
|
||
float mean (float* arr, int size) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth checking if size > 0 as a precondition check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added an assert in slope which makes more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kind of similar to log_with_base, is the assertion/condition intended for mean
or callers of mean
? If it's a precondition for mean
, then I would expect the precondition check to be in mean
(or both mean
and the callers).
Or if mean
is supposed to support some callers with a negative size, then the final return probably needs to be something like return (size > 0) ? (sum / size) : 0
|
||
size_t gc_heap::get_num_completed_gcs () | ||
float log_with_base (float x, float base) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth asserting if x and base > 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's actually meant to have x > base and should be enforced. but I can still add an assert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log_b(x)
is fine for x <= base
(e.g., log_2(2) = 1, log_4(2) = 1/2)
I think you're saying (by "should be enforced") that current call site(s) expect x > base
. log_with_base
is a very reasonable helper function that could get used elsewhere without such a restriction. Or maybe you want to rename it to show that it is intended as a helper for a specific context rather than a general log helper?
uint64_t elapsed_between_gcs; // time between gcs in microseconds (this should really be between_pauses) | ||
uint64_t gc_pause_time; // pause time for this GC | ||
uint64_t msl_wait_time; | ||
size_t gc_survived_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size_t gc_survived_size; | |
size_t gc_survived_size; // total survived size across all relevant generations for this GC |
i.e., it's -not- gen0 to be consistent in what is being recorded
// | ||
// We need to observe the history of tcp's so record them in a small buffer. | ||
// | ||
float recorded_tcp_rearranged[recorded_tcp_array_size]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've mentioned this before, but this is doable without copying the data (though I think the real concern would be avoiding the additional concept of "rearranged" data rather than the copy of a small amount data, which easily could be negligible in cost.
Encapsulating the data in a circular buffer with an iterator would probably accomplish this - probably makes sense to this as a follow-up PR which I can do.
float recorded_tcp_rearranged[recorded_tcp_array_size]; | ||
float recorded_tcp[recorded_tcp_array_size]; | ||
int recorded_tcp_index; | ||
int total_recorded_tcp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int total_recorded_tcp; | |
int total_recorded_tcp; // can exceed the array size |
recorded_tcp_index++; | ||
if (recorded_tcp_index == recorded_tcp_array_size) | ||
{ | ||
recorded_tcp_index = 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recorded_tcp_index++; | |
if (recorded_tcp_index == recorded_tcp_array_size) | |
{ | |
recorded_tcp_index = 0; | |
} | |
recorded_tcp_index = (recorded_tcp_index + 1) % recorded_tcp_array_size; |
if (total_recorded_tcp >= recorded_tcp_array_size) | ||
{ | ||
int earlier_entry_size = recorded_tcp_array_size - recorded_tcp_index; | ||
memcpy (recorded_tcp_rearranged, (recorded_tcp + recorded_tcp_index), (earlier_entry_size * sizeof (float))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use std::copy in this project to avoid the manual byte size computation?
return copied_count; | ||
} | ||
|
||
int highest_avg_recorded_tcp (int count, float avg, float* highest_avg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name is a bit confusing to me. It looks like it returns the average and count of the elements above a limit (which happens to be the average, given the name of the parameter, but it isn't relevant to this function that it's the average).
float highest_sum = 0.0; | ||
int highest_count = 0; | ||
|
||
for (int i = 0; i < count; i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is using the count
oldest elements in the buffer - should it be newest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note - count
is the entire buffer (as returned by the rearrange method and passed back in here), so there isn't a correctness issue here
float recorded_tcp_rearranged[recorded_tcp_array_size]; | ||
float recorded_tcp[recorded_tcp_array_size]; | ||
int recorded_tcp_index; | ||
int total_recorded_tcp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recorded_tcp_count to be consistent with other naming?
// each time our calculation tells us to shrink. | ||
int dec_failure_count; | ||
int dec_failure_recheck_threshold; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For later - I think it would be interesting to share the increment/decrement cases to avoid some duplication. It would have to be parameterized in some way so that the behavior could be customized. Anyways, there's no requested change here right now.
float below_target_accumulation; | ||
float below_target_threshold; | ||
|
||
// Currently only used for dprintf. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#ifdef this?
// Recording the gen2 GC indices so we know how far apart they are. Currently unused | ||
// but we should consider how much value there is if they are very far apart. | ||
size_t gc_index; | ||
// This is (gc_elapsed_time / time inbetween this and the last gen2 GC) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit - "in between" or even just "between"
// at the beginning of a BGC and the PM triggered full GCs | ||
// fall into this case. | ||
PER_HEAP_ISOLATED_FIELD_DIAG_ONLY uint64_t suspended_start_time; | ||
// Right now this is diag only but may be used functionally later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this comment really adds anything
dynamic_heap_count_data.sample_index = (dynamic_heap_count_data.sample_index + 1) % dynamic_heap_count_data_t::sample_size; | ||
(dynamic_heap_count_data.current_samples_count)++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It bugs me a bit that the sample and recorded tcp handling are different (one inline here, the other in helper methods), but I think that's for another day.
} | ||
} | ||
|
||
float avg_x = (float)sum_x / n; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
float avg_x = (float)sum_x / n; | |
float avg_x = ((float)sum_x) / n; |
or the static_cast<float>(sum_x) / n
format requires parenthesis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also below, though I don't think those explicit casts are needed since avg_x is a float. fine to be careful though of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also this is just (n+1) / 2.0f, though the loop is still needed for dprintf
// Change it to a desired number if you want to print. | ||
int max_times_to_print_tcp = 0; | ||
|
||
// Return the slope, and the average values in the avg arg. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a name for the slope that is being calculated here? I see that it's a weighted sum based on distance from the middle, but I'm not familiar with that. For example, I don't think this is the slope of a typical regression line? (which is fine, though I guess I'm a bit curious about the mathematical properties of this)
} | ||
|
||
float median_throughput_cost_percent = median_of_3 (throughput_cost_percents[0], throughput_cost_percents[1], throughput_cost_percents[2]); | ||
float avg_throughput_cost_percent = (float)((throughput_cost_percents[0] + throughput_cost_percents[1] + throughput_cost_percents[2]) / 3.0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit - might be able to drop the (float)
if you used 3.0f
if (dynamic_heap_count_data.dec_failure_count) | ||
{ | ||
(dynamic_heap_count_data.dec_failure_count)++; | ||
} | ||
else | ||
{ | ||
dynamic_heap_count_data.dec_failure_count = 1; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this if
is necessary.
|
||
if (shrink_p && step_down_int && (new_n_heaps > step_down_int)) | ||
{ | ||
// TODO - if we see that it wants to shrink by 1 heap too many times, we do want to shrink. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also if n_heaps is small, then 1 is significant
(well, significant to the heap count, if the gc heap is a small fraction of overall memory, which it might be if the heap count is small, then the memory savings could still be insignificant)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My review is very late for the preview release. These aren't necessary right now and can be addressed in a future PR.
/cc @MichalStrehovsky @eerhardt Just for visibility as people started asking about this, I believe this introduced a slight RPS regression in the native aot benchmarks. Windows and Linux. And we can see an improvement in max working set NB: The unstable results are unrelated and were tracked in #98021 |
trending up/down (and if so how fast is that trend) and make a decision if we want to grow/shrink according to our calculation
There are a few issues with these that will be addressed in future checkins -