Fix slow lists:mapfoldl/3 #2594

bjorng · 2020-04-14T11:46:36Z

This PR fixes two issues in the compiler that made lists:mapfoldl/3 slow. See the commit messages for further details.

https://bugs.erlang.org/browse/ERL-1216

fenollp · 2020-04-15T08:44:07Z

Very interesting!
So to recap fast-growing O(n) space can be saved thanks to a reordering of calls:

 mapfoldl(F, Acc0, [Hd|Tail]) ->
     Res1 = F(Hd, Acc0),
+    R = element(1, Res1),
     Res2 = mapfoldl(F, element(2, Res1), Tail),
-    {[element(1, Res1)|element(1, Res2)],element(2, Res2)};
+    {[R|element(1, Res2)],element(2, Res2)};
 mapfoldl(F, Acc, []) ->
     {[],Acc}.

How can we make the compiler attempt such reorderings more often?

Here the call element(1, Res1) can be moved to happen before element(2, Res1) as the former cannot fail if the latter didn't (Res1 is thus proven to be a tuple of size >= 2).

Could we instruct the compiler on the semantics of some of the basic functions such as element/2, map_key/2, ... (maybe only guards even) so these reorderings can be attempted?

Maybe there's even a heuristic similar to the number of stack slots mentioned here that can be used to decide when to reorder or better: which ordering to pick.
I'll have a try (albeit probably overfitted to the case at hand):
Pick the reodering that minimizes the amount of "basic function" calls happening after non-"basic function" calls.

I guess these "basic functions" can be any function the compiler knows is pure. Guards are a good subset of these here for memory saving as their output is often smaller than their input, trading off a small amount of computation (except for length/1).

Should I open an issue on https://bugs.erlang.org/?

bjorng · 2020-04-15T11:15:04Z

@fenollp I only have time for a quick answer.

So to recap fast-growing O(n) space can be saved thanks to a reordering of calls:

No, that is not what my PR does. The compiler usually sinks (executing later) tuple extraction instructions. When compiling mapfoldl/3, that results in slow_mapfoldl/3. In this particular case, this is a pessimization because Res1 will be kept alive too long. So my PR uses a heuristic to disable this particular optimization of sinking tuple extraction instructions when it would be potentially harmful.

Should I open an issue on https://bugs.erlang.org/?

No. I don't think that your suggested transformation (hoisting tuple extraction instructions) would be generally beneficial.

Could we instruct the compiler on the semantics of some of the basic functions such as element/2, map_key/2, ... (maybe only guards even) so these reorderings can be attempted?

The compiler already knows the semantics of many BIFs and uses that knowledge to do a multitude of optimizations. Especially in OTP 22 and 23.

jhogberg

Looks good to me!

jhogberg · 2020-04-17T08:16:51Z

lib/compiler/src/beam_ssa_opt.erl

+
+    %% Here is an heuristic to avoid harmful sinking in
+    %% lists:mapfold/3 and similar functions.
+    DefLocGC = case DefLocGC0 of


This looks a bit heavy, could you break it out into a separate function?

jhogberg · 2020-04-17T08:25:16Z

lib/compiler/src/beam_trim.erl

 %%           {[Instruction],[TrimInstruction]}.
 %%  Try to renumber Y registers in the instruction stream. The
-%%  first rececipe that works will be used.
+%%  first reccipe that works will be used.


Suggested change

%% first reccipe that works will be used.

%% first recipe that works will be used.

The stack trimming used to be very conservative, avoiding stack trimming if the trimming instruction sequence was estimated to be slower than the original sequence. That could make recursive functions using a huge amount of stack slower if unused stack slots were kept. To avoid the cost of not trimming in recursive functions, adjust the cost calculation formula to trim more often. This commit is a partial solution to ERL-1216.

Since stack trimming preceded by moving of Y registers has been become more common, combine two move instructions that move Y registers to Y registers.

Sinking (delaying) the extraction of tuple elements until the elements are needed is often advantageous. However, in rare circumstances, the "optimized" code could become much slower. As an example, take the following function: mapfoldl(F, Acc0, [Hd|Tail]) -> {R,Acc1} = F(Hd, Acc0), {Rs,Acc2} = mapfoldl(F, Acc1, Tail), {[R|Rs],Acc2}; mapfoldl(F, Acc, []) -> {[],Acc}. If the compiler delays the extraction of tuple elements as long as possible, the resulting code will be similar to the following: slow_mapfoldl(F, Acc0, [Hd|Tail]) -> Res1 = F(Hd, Acc0), Res2 = slow_mapfoldl(F, element(2, Res1), Tail), {[element(1, Res1)|element(1, Res2)],element(2, Res2)}; slow_mapfoldl(F, Acc, []) -> {[],Acc}. Note that the tuple bound to the `Res1` variable will be kept alive during the recursive call. That means that all intermediate accumulators will be kept alive until `slow_mapfoldl/3` returns. In this case, it would clearly be better to extract all tuple elements at once: fast_mapfoldl(F, Acc0, [Hd|Tail]) -> Res1 = F(Hd, Acc0), R = element(1, Res1), Res2 = fast_mapfoldl(F, element(2, Res1), Tail), {[R|element(1, Res2)],element(2, Res2)}; fast_mapfoldl(F, Acc, []) -> {[],Acc}. `fast_mapfoldl/3` uses the same amount of stack space as `slow_mapfoldl/3`. Thus, `slow_mapfoldl/3` has now advantages whatsoever over `fast_mapfoldl/3`. To ensure that the compiler emits code similar to the `fast_mapfoldl/3` example, make the sinking of `get_tuple_element` instructions more conservative. Only sink when there is an advantage in terms of stack space or if it can be shown that sinking is essentially harmless. https://bugs.erlang.org/browse/ERL-1216

bjorng added team:VM Assigned to OTP team VM fix labels Apr 14, 2020

bjorng requested a review from jhogberg April 14, 2020 11:46

bjorng self-assigned this Apr 14, 2020

jhogberg approved these changes Apr 17, 2020

View reviewed changes

bjorng added 3 commits April 20, 2020 06:56

BEAM loader: Combine two stack shuffling move instructions

070a47d

Since stack trimming preceded by moving of Y registers has been become more common, combine two move instructions that move Y registers to Y registers.

bjorng force-pushed the bjorn/compiler/fix-slow-mapfoldl branch from d3c559c to 720afbc Compare April 20, 2020 04:57

bjorng merged commit 3dffe21 into erlang:master Apr 20, 2020

josevalim mentioned this pull request May 6, 2020

Improve performance of Enum.with_index/2 elixir-lang/elixir#10020

Merged

bjorng deleted the bjorn/compiler/fix-slow-mapfoldl branch June 5, 2020 04:11

jhogberg mentioned this pull request Oct 10, 2025

optimize beam_trim #10272

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix slow lists:mapfoldl/3 #2594

Fix slow lists:mapfoldl/3 #2594

Uh oh!

bjorng commented Apr 14, 2020

Uh oh!

fenollp commented Apr 15, 2020

Uh oh!

bjorng commented Apr 15, 2020 •

edited

Loading

Uh oh!

jhogberg left a comment

Uh oh!

jhogberg Apr 17, 2020

Uh oh!

bjorng Apr 20, 2020

Uh oh!

jhogberg Apr 17, 2020

Uh oh!

bjorng Apr 20, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	%% first reccipe that works will be used.
	%% first recipe that works will be used.

Uh oh!

Fix slow lists:mapfoldl/3 #2594

Fix slow lists:mapfoldl/3 #2594

Uh oh!

Conversation

bjorng commented Apr 14, 2020

Uh oh!

fenollp commented Apr 15, 2020

Uh oh!

bjorng commented Apr 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhogberg left a comment

Choose a reason for hiding this comment

Uh oh!

jhogberg Apr 17, 2020

Choose a reason for hiding this comment

Uh oh!

bjorng Apr 20, 2020

Choose a reason for hiding this comment

Uh oh!

jhogberg Apr 17, 2020

Choose a reason for hiding this comment

Uh oh!

bjorng Apr 20, 2020

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bjorng commented Apr 15, 2020 •

edited

Loading