Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@bjorng
Copy link
Contributor

@bjorng bjorng commented Apr 14, 2020

This PR fixes two issues in the compiler that made lists:mapfoldl/3 slow. See the commit messages for further details.

https://bugs.erlang.org/browse/ERL-1216

@bjorng bjorng added team:VM Assigned to OTP team VM fix labels Apr 14, 2020
@bjorng bjorng requested a review from jhogberg April 14, 2020 11:46
@bjorng bjorng self-assigned this Apr 14, 2020
@fenollp
Copy link
Contributor

fenollp commented Apr 15, 2020

Very interesting!
So to recap fast-growing O(n) space can be saved thanks to a reordering of calls:

 mapfoldl(F, Acc0, [Hd|Tail]) ->
     Res1 = F(Hd, Acc0),
+    R = element(1, Res1),
     Res2 = mapfoldl(F, element(2, Res1), Tail),
-    {[element(1, Res1)|element(1, Res2)],element(2, Res2)};
+    {[R|element(1, Res2)],element(2, Res2)};
 mapfoldl(F, Acc, []) ->
     {[],Acc}.

How can we make the compiler attempt such reorderings more often?

Here the call element(1, Res1) can be moved to happen before element(2, Res1) as the former cannot fail if the latter didn't (Res1 is thus proven to be a tuple of size >= 2).

Could we instruct the compiler on the semantics of some of the basic functions such as element/2, map_key/2, ... (maybe only guards even) so these reorderings can be attempted?

Maybe there's even a heuristic similar to the number of stack slots mentioned here that can be used to decide when to reorder or better: which ordering to pick.
I'll have a try (albeit probably overfitted to the case at hand):
Pick the reodering that minimizes the amount of "basic function" calls happening after non-"basic function" calls.

I guess these "basic functions" can be any function the compiler knows is pure. Guards are a good subset of these here for memory saving as their output is often smaller than their input, trading off a small amount of computation (except for length/1).

Should I open an issue on https://bugs.erlang.org/?

@bjorng
Copy link
Contributor Author

bjorng commented Apr 15, 2020

@fenollp I only have time for a quick answer.

So to recap fast-growing O(n) space can be saved thanks to a reordering of calls:

No, that is not what my PR does. The compiler usually sinks (executing later) tuple extraction instructions. When compiling mapfoldl/3, that results in slow_mapfoldl/3. In this particular case, this is a pessimization because Res1 will be kept alive too long. So my PR uses a heuristic to disable this particular optimization of sinking tuple extraction instructions when it would be potentially harmful.

Should I open an issue on https://bugs.erlang.org/?

No. I don't think that your suggested transformation (hoisting tuple extraction instructions) would be generally beneficial.

Could we instruct the compiler on the semantics of some of the basic functions such as element/2, map_key/2, ... (maybe only guards even) so these reorderings can be attempted?

The compiler already knows the semantics of many BIFs and uses that knowledge to do a multitude of optimizations. Especially in OTP 22 and 23.

Copy link
Contributor

@jhogberg jhogberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!


%% Here is an heuristic to avoid harmful sinking in
%% lists:mapfold/3 and similar functions.
DefLocGC = case DefLocGC0 of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit heavy, could you break it out into a separate function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

%% {[Instruction],[TrimInstruction]}.
%% Try to renumber Y registers in the instruction stream. The
%% first rececipe that works will be used.
%% first reccipe that works will be used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
%% first reccipe that works will be used.
%% first recipe that works will be used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix.

bjorng added 3 commits April 20, 2020 06:56
The stack trimming used to be very conservative, avoiding stack
trimming if the trimming instruction sequence was estimated to be
slower than the original sequence. That could make recursive functions
using a huge amount of stack slower if unused stack slots were kept.

To avoid the cost of not trimming in recursive functions, adjust the
cost calculation formula to trim more often.

This commit is a partial solution to ERL-1216.
Since stack trimming preceded by moving of Y registers has been
become more common, combine two move instructions that move Y
registers to Y registers.
Sinking (delaying) the extraction of tuple elements until the elements
are needed is often advantageous. However, in rare circumstances, the
"optimized" code could become much slower. As an example, take the
following function:

    mapfoldl(F, Acc0, [Hd|Tail]) ->
	{R,Acc1} = F(Hd, Acc0),
	{Rs,Acc2} = mapfoldl(F, Acc1, Tail),
	{[R|Rs],Acc2};
    mapfoldl(F, Acc, []) ->
	{[],Acc}.

If the compiler delays the extraction of tuple elements as long as
possible, the resulting code will be similar to the following:

    slow_mapfoldl(F, Acc0, [Hd|Tail]) ->
	Res1 = F(Hd, Acc0),
	Res2 = slow_mapfoldl(F, element(2, Res1), Tail),
	{[element(1, Res1)|element(1, Res2)],element(2, Res2)};
    slow_mapfoldl(F, Acc, []) ->
	{[],Acc}.

Note that the tuple bound to the `Res1` variable will be kept alive
during the recursive call. That means that all intermediate accumulators
will be kept alive until `slow_mapfoldl/3` returns. In this case, it would
clearly be better to extract all tuple elements at once:

    fast_mapfoldl(F, Acc0, [Hd|Tail]) ->
	Res1 = F(Hd, Acc0),
	R = element(1, Res1),
	Res2 = fast_mapfoldl(F, element(2, Res1), Tail),
	{[R|element(1, Res2)],element(2, Res2)};
    fast_mapfoldl(F, Acc, []) ->
	{[],Acc}.

`fast_mapfoldl/3` uses the same amount of stack space as `slow_mapfoldl/3`.
Thus, `slow_mapfoldl/3` has now advantages whatsoever over `fast_mapfoldl/3`.

To ensure that the compiler emits code similar to the `fast_mapfoldl/3`
example, make the sinking of `get_tuple_element` instructions more
conservative. Only sink when there is an advantage in terms of stack space
or if it can be shown that sinking is essentially harmless.

https://bugs.erlang.org/browse/ERL-1216
@bjorng bjorng force-pushed the bjorn/compiler/fix-slow-mapfoldl branch from d3c559c to 720afbc Compare April 20, 2020 04:57
@bjorng bjorng merged commit 3dffe21 into erlang:master Apr 20, 2020
@bjorng bjorng deleted the bjorn/compiler/fix-slow-mapfoldl branch June 5, 2020 04:11
@jhogberg jhogberg mentioned this pull request Oct 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fix team:VM Assigned to OTP team VM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants