Thanks to visit codestin.com
Credit goes to github.com

Skip to content

270x and 36000x faster beam_ssa_bool and ssa_opt_sink passes from simple change #5686

@drathier

Description

@drathier

Describe the bug
This is a follow-up on #5140 (comment) namely

The beam_bool and ssa_opt_sink passes are very fast when compiling [email protected], but slow for full@ps. If you want us to try to speed up those passes, we would need yet another version of your module.

This one-line change to the [email protected] test file from #5140 triggers some pathologic case in both the beam_bool and ssa_opt_sink passes:

codecDump() -> fun (_@151) ->
codecDump() -> fun ({potato,_@151}) ->

The new file is available here: https://gist.github.com/drathier/709f7c5be3de1c1c345dc9fa93227dc2#file-booloptsink-ps-erl

To Reproduce
I ran this using the 24.2 erlc, not any of the fixed ones from #5140. Hence the disabled compiler pass.

~/Downloads/709f7c5be3de1c1c345dc9fa93227dc2-3cf560cf86da1ea4f9b1337e4f333cd787bd731b$ date ; time ERL_COMPILER_OPTIONS="[time,no_stack_trimming]" erlc [email protected] ; date 
Sun Feb  6 17:20:25 CET 2022
Compiling dump@ps
 remove_file                   :      0.004 s       8.8 kB
 parse_module                  :      0.177 s    5464.8 kB
 transform_module              :      0.000 s    5464.8 kB
 lint_module                   :      0.018 s    5465.1 kB
 compile_directives            :      0.000 s    5465.1 kB
 expand_records                :      0.007 s    5465.1 kB
 core                          :     15.987 s   41614.2 kB
 sys_core_fold                 :      1.146 s   37987.0 kB
 sys_core_alias                :      0.016 s   37987.0 kB
 core_transforms               :      0.000 s   37987.0 kB
 sys_core_bsm                  :      0.013 s   37987.0 kB
 v3_kernel                     :      0.234 s   39501.6 kB
 beam_kernel_to_ssa            :      0.129 s   25512.6 kB
 beam_ssa_bool                 :      0.100 s   25512.6 kB
 beam_ssa_share                :      0.030 s   25512.6 kB
 beam_ssa_recv                 :      0.001 s   25512.6 kB
 beam_ssa_bsm                  :      0.113 s   25605.6 kB
    %% Sub passes of beam_ssa_bsm from slowest to fastest:
    allow_context_passthrough  :      0.108 s  98 %
    annotate_context_parameters:      0.001 s   1 %
    combine_matches            :      0.001 s   1 %
    accept_context_args        :      0.000 s   0 %
    skip_outgoing_tail_extracti:      0.000 s   0 %
 beam_ssa_funs                 :      0.039 s   25605.6 kB
 beam_ssa_opt                  :      2.939 s   25682.9 kB
    %% Sub passes of beam_ssa_opt from slowest to fastest:
    ssa_opt_type_start         :      1.026 s  37 %
    ssa_opt_live               :      0.649 s  23 %
    ssa_opt_type_continue      :      0.354 s  13 %
    ssa_opt_dead               :      0.227 s   8 %
    ssa_opt_cse                :      0.103 s   4 %
    ssa_opt_merge_blocks       :      0.080 s   3 %
    ssa_opt_tail_phis          :      0.047 s   2 %
    ssa_opt_trim_unreachable   :      0.042 s   2 %
    ssa_opt_linearize          :      0.038 s   1 %
    ssa_opt_element            :      0.036 s   1 %
    ssa_opt_split_blocks       :      0.031 s   1 %
    ssa_opt_record             :      0.030 s   1 %
    ssa_opt_coalesce_phis      :      0.026 s   1 %
    ssa_opt_tail_calls         :      0.026 s   1 %
    ssa_opt_float              :      0.018 s   1 %
    ssa_opt_bsm                :      0.006 s   0 %
    ssa_opt_blockify           :      0.006 s   0 %
    ssa_opt_ne                 :      0.004 s   0 %
    ssa_opt_tuple_size         :      0.004 s   0 %
    ssa_opt_sink               :      0.003 s   0 %
    ssa_opt_get_tuple_element  :      0.003 s   0 %
    ssa_opt_bsm_shortcut       :      0.002 s   0 %
    ssa_opt_sw                 :      0.002 s   0 %
    ssa_opt_bc_size            :      0.002 s   0 %
    ssa_opt_try                :      0.001 s   0 %
    ssa_opt_bs_puts            :      0.001 s   0 %
    ssa_opt_type_finish        :      0.000 s   0 %
    ssa_opt_unfold_literals    :      0.000 s   0 %
 beam_ssa_throw                :      0.080 s   25682.9 kB
 beam_ssa_pre_codegen          :     45.037 s   35647.5 kB
    %% Sub passes of beam_ssa_pre_codegen from slowest to fastest:
    place_frames               :     30.430 s  68 %
    find_yregs                 :      7.902 s  18 %
    reserve_yregs              :      4.538 s  10 %
    live_intervals             :      0.983 s   2 %
    reserve_regs               :      0.203 s   0 %
    linear_scan                :      0.201 s   0 %
    use_set_tuple_element      :      0.154 s   0 %
    turn_yregs                 :      0.114 s   0 %
    frame_size                 :      0.113 s   0 %
    number_instructions        :      0.087 s   0 %
    sanitize                   :      0.084 s   0 %
    assert_no_critical_edges   :      0.057 s   0 %
    opt_get_list               :      0.050 s   0 %
    copy_retval                :      0.050 s   0 %
    fix_bs                     :      0.041 s   0 %
    match_fail_instructions    :      0.004 s   0 %
    fix_receives               :      0.001 s   0 %
    legacy_bs                  :      0.000 s   0 %
 beam_ssa_codegen              :      1.582 s   15231.0 kB
 beam_validator_strong         :      1.545 s   15231.0 kB
 beam_a                        :      0.009 s   15138.0 kB
 beam_block                    :      0.013 s   17779.6 kB
 beam_jump                     :      0.040 s   17779.6 kB
 beam_peep                     :      0.011 s   17779.6 kB
 beam_clean                    :      0.007 s   17779.6 kB
 beam_flatten                  :      0.003 s   15138.0 kB
 beam_z                        :      0.004 s   10328.9 kB
 beam_validator_weak           :      1.529 s   10328.9 kB
 beam_asm                      :      0.072 s       9.2 kB
 save_binary                   :      0.001 s       9.1 kB
[email protected]:3:2: Warning: export_all flag enabled - all functions will be exported
%    3| -compile(export_all).
%     |  ^

ERL_COMPILER_OPTIONS="[time,no_stack_trimming]" erlc [email protected]  66.78s user 2.34s system 95% cpu 1:12.14 total
Sun Feb  6 17:21:37 CET 2022
~/Downloads/709f7c5be3de1c1c345dc9fa93227dc2-3cf560cf86da1ea4f9b1337e4f333cd787bd731b$ date ; time ERL_COMPILER_OPTIONS="[time,no_stack_trimming]" erlc [email protected] ; date
Sun Feb  6 17:27:13 CET 2022
Compiling dump2pattern@ps
 remove_file                   :      0.000 s       9.3 kB
 parse_module                  :      0.180 s    5465.5 kB
 transform_module              :      0.000 s    5465.5 kB
 lint_module                   :      0.017 s    5465.9 kB
 compile_directives            :      0.000 s    5466.0 kB
 expand_records                :      0.007 s    5466.0 kB
 core                          :     15.958 s   58395.5 kB
 sys_core_fold                 :      0.889 s   53530.8 kB
 sys_core_alias                :      0.030 s   53530.8 kB
 core_transforms               :      0.000 s   53530.8 kB
 sys_core_bsm                  :      0.009 s   53530.8 kB
 v3_kernel                     :      0.214 s   55294.6 kB
 beam_kernel_to_ssa            :      0.143 s   29588.7 kB
 beam_ssa_bool                 :     27.002 s   29588.7 kB
 beam_ssa_share                :      0.023 s   29588.7 kB
 beam_ssa_recv                 :      0.001 s   29588.7 kB
 beam_ssa_bsm                  :      0.105 s   29681.7 kB
    %% Sub passes of beam_ssa_bsm from slowest to fastest:
    allow_context_passthrough  :      0.099 s  98 %
    annotate_context_parameters:      0.001 s   1 %
    combine_matches            :      0.001 s   1 %
    accept_context_args        :      0.000 s   0 %
    skip_outgoing_tail_extracti:      0.000 s   0 %
 beam_ssa_funs                 :      0.049 s   29681.7 kB
 beam_ssa_opt                  :    112.304 s   29757.8 kB
    %% Sub passes of beam_ssa_opt from slowest to fastest:
    ssa_opt_sink               :    107.902 s  96 %
    ssa_opt_type_start         :      0.961 s   1 %
    ssa_opt_type_continue      :      0.923 s   1 %
    ssa_opt_live               :      0.909 s   1 %
    ssa_opt_dead               :      0.688 s   1 %
    ssa_opt_cse                :      0.233 s   0 %
    ssa_opt_tail_phis          :      0.120 s   0 %
    ssa_opt_merge_blocks       :      0.096 s   0 %
    ssa_opt_trim_unreachable   :      0.051 s   0 %
    ssa_opt_record             :      0.041 s   0 %
    ssa_opt_tail_calls         :      0.034 s   0 %
    ssa_opt_split_blocks       :      0.033 s   0 %
    ssa_opt_element            :      0.033 s   0 %
    ssa_opt_linearize          :      0.032 s   0 %
    ssa_opt_coalesce_phis      :      0.023 s   0 %
    ssa_opt_float              :      0.018 s   0 %
    ssa_opt_ne                 :      0.009 s   0 %
    ssa_opt_tuple_size         :      0.007 s   0 %
    ssa_opt_blockify           :      0.006 s   0 %
    ssa_opt_bsm                :      0.004 s   0 %
    ssa_opt_try                :      0.004 s   0 %
    ssa_opt_get_tuple_element  :      0.004 s   0 %
    ssa_opt_bs_puts            :      0.002 s   0 %
    ssa_opt_sw                 :      0.002 s   0 %
    ssa_opt_bsm_shortcut       :      0.001 s   0 %
    ssa_opt_bc_size            :      0.001 s   0 %
    ssa_opt_type_finish        :      0.000 s   0 %
    ssa_opt_unfold_literals    :      0.000 s   0 %
 beam_ssa_throw                :      0.077 s   29757.8 kB
 beam_ssa_pre_codegen          :     44.902 s   39719.3 kB
    %% Sub passes of beam_ssa_pre_codegen from slowest to fastest:
    place_frames               :     30.782 s  69 %
    find_yregs                 :      7.563 s  17 %
    reserve_yregs              :      4.384 s  10 %
    live_intervals             :      0.980 s   2 %
    reserve_regs               :      0.197 s   0 %
    linear_scan                :      0.186 s   0 %
    use_set_tuple_element      :      0.148 s   0 %
    frame_size                 :      0.117 s   0 %
    sanitize                   :      0.116 s   0 %
    number_instructions        :      0.086 s   0 %
    turn_yregs                 :      0.085 s   0 %
    copy_retval                :      0.071 s   0 %
    opt_get_list               :      0.060 s   0 %
    assert_no_critical_edges   :      0.060 s   0 %
    fix_bs                     :      0.035 s   0 %
    match_fail_instructions    :      0.005 s   0 %
    fix_receives               :      0.001 s   0 %
    legacy_bs                  :      0.000 s   0 %
 beam_ssa_codegen              :      1.475 s   18807.2 kB
 beam_validator_strong         :      1.594 s   18807.2 kB
 beam_a                        :      0.011 s   18714.2 kB
 beam_block                    :      0.014 s   21355.8 kB
 beam_jump                     :      0.074 s   21355.7 kB
 beam_peep                     :      0.026 s   21355.7 kB
 beam_clean                    :      0.011 s   21355.7 kB
 beam_flatten                  :      0.003 s   18714.1 kB
 beam_z                        :      0.005 s   11580.8 kB
 beam_validator_weak           :      1.650 s   11580.8 kB
 beam_asm                      :      0.082 s       9.8 kB
 save_binary                   :      0.000 s      12.0 kB
/Users/drathier/Downloads/709f7c5be3de1c1c345dc9fa93227dc2-3cf560cf86da1ea4f9b1337e4f333cd787bd731b/[email protected]: Module name 'dump@ps' does not match file name 'dump2pattern@ps'
[email protected]:3:2: Warning: export_all flag enabled - all functions will be exported
%    3| -compile(export_all).
%     |  ^

ERL_COMPILER_OPTIONS="[time,no_stack_trimming]" erlc [email protected]  199.84s user 2.99s system 97% cpu 3:27.81 total
Sun Feb  6 17:30:41 CET 2022

Affected versions
Only tested 24.2

Hope it's fun to debug!

Metadata

Metadata

Labels

bugIssue is reported as a bugteam:VMAssigned to OTP team VM

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions