-
Notifications
You must be signed in to change notification settings - Fork 134
Introduce options to control profiling-specific optimizations #614
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce options to control profiling-specific optimizations #614
Conversation
The `-profile-tail-call-opt {true|false}` controls whether or not the
SSA{,2} shrinker optimizes tail calls in the presence of profiling.
`-profile-tail-call-opt false` is expected to have a significant
performance penalty, but can improve the accuracy of exception
history. It likely worsens the accuracy of time profiling, since the
profiled program (without tail call optimizations) will be
significantly different from the non-profiled program (with tail call
optimizations).
The `-profile-intro-loops-opt {true|false}` controls whether or not the
SSA IntroduceLoops optimization applies in presence of profiling.
In particular, when `-profile-tail-call-opt false` but
`-profile-intro-loops-opt true`, then IntroduceLoops will recognize
self non-tail calls with eta return and handler continuations as tail
calls. This is expected to recover some of the performance penalty of
`-profile-tail-call-opt false`, at the expense of less accurate
exception history; the exception history will have only one entry for
the recursive function, even if the exception is raised by a deeply
nested recursive (tail) call.
|
Legendary implementation speed. Also, if |
Excellent idea! It is especially nice, because we can distinguish self tail calls from non-self tail calls in the shrinker; that would allow us to revert the changes to |
Revise `-profile-tail-call-opt` to `{always|self-only|never}`
Thanks to @YawarRaza7349 for the suggestion (#614 (comment)).
Add
-profile-tail-call-opt {true|false}expert compile-time option. The-profile-tail-call-opt {true|false}controls whether or not the SSA{,2} shrinker optimizes tail calls in the presence of profiling.Add
-profile-tail-call-opt {true|false}expert compile-time option. The-profile-intro-loops-opt {true|false}controls whether or not the SSA IntroduceLoops optimization applies in presence of profiling and-profile-tail-call-opt false, when the SSA{,2} shrinker does not optimize tail calls in the presence of profiling. In particular, when-profile-tail-call-opt falsebut-profile-intro-loops-opt true, then IntroduceLoops will recognize self non-tail calls with eta return and handler continuations as tail calls.-profile-tail-call-opt falseand-profile-intro-loops-opt falseis expected to have a significant performance penalty, but can improve the accuracy of exception history. It likely worsens the accuracy of time profiling, since the profiled program (without tail call and introduce loops optimizations) will be significantly different from the non-profiled program (with tail call and introduce loops optimizations).-profile-tail-call-opt falseand-profile-intro-loops-opt trueis expected to recover some of the performance penalty, at the expense of less accurate exception history; the exception history will have only one entry for the recursive function, even if the exception is raised by a deeply nested recursive (tail) call.Profiling results:
For the benchmarks that run in all configurations (see below),
-profile-tail-call-opt false -profile-intro-loops-opt falseintroduces considerable overhead, while-profile-tail-call-opt false -profile-intro-loops-opt trueis not significantly worse than-profile-tail-call-opt true.With
-profile-tail-call-opt false -profile-intro-loops-opt false,even-odd,output1,psdes-random,reduce,tailfibterminate withOut of memory with max heap size 4Gb, due to tail-recursive functions that are normally turned into loops being executed as non-tail-recursive functions with explosive stack growth.Interestingly, with
-profile-tail-call-opt false -profile-intro-loops-opt true,even-oddalso terminates withOut of memory with max heap size 4Gb.With
-codegen llvm,imp-forandtensor(with-profile-tail-call-opt trueor-profile-tail-call-opt false -profile-intro-loops-opt true), LLVM is able to completely optimize away the inner loops, leading to run times of 0 (and meaningless run time ratios).It may be worth considering making
-profile-tail-call-opt false -profile-intro-loops-opt truethe default when-const 'Exn.keepHistory true'is used in order to improve the accuracy of exception history. However, the fact that one benchmark (even-odd) exhausts heap with explosive stack growth is worrisome.See #609.