-
Couldn't load subscription status.
- Fork 3k
Fix slow lists:mapfoldl/3 #2594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix slow lists:mapfoldl/3 #2594
Conversation
|
Very interesting! mapfoldl(F, Acc0, [Hd|Tail]) ->
Res1 = F(Hd, Acc0),
+ R = element(1, Res1),
Res2 = mapfoldl(F, element(2, Res1), Tail),
- {[element(1, Res1)|element(1, Res2)],element(2, Res2)};
+ {[R|element(1, Res2)],element(2, Res2)};
mapfoldl(F, Acc, []) ->
{[],Acc}.How can we make the compiler attempt such reorderings more often? Here the call Could we instruct the compiler on the semantics of some of the basic functions such as element/2, map_key/2, ... (maybe only guards even) so these reorderings can be attempted? Maybe there's even a heuristic similar to the number of stack slots mentioned here that can be used to decide when to reorder or better: which ordering to pick. I guess these "basic functions" can be any function the compiler knows is pure. Guards are a good subset of these here for memory saving as their output is often smaller than their input, trading off a small amount of computation (except for Should I open an issue on https://bugs.erlang.org/? |
|
@fenollp I only have time for a quick answer.
No, that is not what my PR does. The compiler usually sinks (executing later) tuple extraction instructions. When compiling
No. I don't think that your suggested transformation (hoisting tuple extraction instructions) would be generally beneficial.
The compiler already knows the semantics of many BIFs and uses that knowledge to do a multitude of optimizations. Especially in OTP 22 and 23. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
lib/compiler/src/beam_ssa_opt.erl
Outdated
|
|
||
| %% Here is an heuristic to avoid harmful sinking in | ||
| %% lists:mapfold/3 and similar functions. | ||
| DefLocGC = case DefLocGC0 of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a bit heavy, could you break it out into a separate function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
lib/compiler/src/beam_trim.erl
Outdated
| %% {[Instruction],[TrimInstruction]}. | ||
| %% Try to renumber Y registers in the instruction stream. The | ||
| %% first rececipe that works will be used. | ||
| %% first reccipe that works will be used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| %% first reccipe that works will be used. | |
| %% first recipe that works will be used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix.
The stack trimming used to be very conservative, avoiding stack trimming if the trimming instruction sequence was estimated to be slower than the original sequence. That could make recursive functions using a huge amount of stack slower if unused stack slots were kept. To avoid the cost of not trimming in recursive functions, adjust the cost calculation formula to trim more often. This commit is a partial solution to ERL-1216.
Since stack trimming preceded by moving of Y registers has been become more common, combine two move instructions that move Y registers to Y registers.
Sinking (delaying) the extraction of tuple elements until the elements
are needed is often advantageous. However, in rare circumstances, the
"optimized" code could become much slower. As an example, take the
following function:
mapfoldl(F, Acc0, [Hd|Tail]) ->
{R,Acc1} = F(Hd, Acc0),
{Rs,Acc2} = mapfoldl(F, Acc1, Tail),
{[R|Rs],Acc2};
mapfoldl(F, Acc, []) ->
{[],Acc}.
If the compiler delays the extraction of tuple elements as long as
possible, the resulting code will be similar to the following:
slow_mapfoldl(F, Acc0, [Hd|Tail]) ->
Res1 = F(Hd, Acc0),
Res2 = slow_mapfoldl(F, element(2, Res1), Tail),
{[element(1, Res1)|element(1, Res2)],element(2, Res2)};
slow_mapfoldl(F, Acc, []) ->
{[],Acc}.
Note that the tuple bound to the `Res1` variable will be kept alive
during the recursive call. That means that all intermediate accumulators
will be kept alive until `slow_mapfoldl/3` returns. In this case, it would
clearly be better to extract all tuple elements at once:
fast_mapfoldl(F, Acc0, [Hd|Tail]) ->
Res1 = F(Hd, Acc0),
R = element(1, Res1),
Res2 = fast_mapfoldl(F, element(2, Res1), Tail),
{[R|element(1, Res2)],element(2, Res2)};
fast_mapfoldl(F, Acc, []) ->
{[],Acc}.
`fast_mapfoldl/3` uses the same amount of stack space as `slow_mapfoldl/3`.
Thus, `slow_mapfoldl/3` has now advantages whatsoever over `fast_mapfoldl/3`.
To ensure that the compiler emits code similar to the `fast_mapfoldl/3`
example, make the sinking of `get_tuple_element` instructions more
conservative. Only sink when there is an advantage in terms of stack space
or if it can be shown that sinking is essentially harmless.
https://bugs.erlang.org/browse/ERL-1216
d3c559c to
720afbc
Compare
This PR fixes two issues in the compiler that made
lists:mapfoldl/3slow. See the commit messages for further details.https://bugs.erlang.org/browse/ERL-1216