Optimize printing sympy expressions during logging and cache key computation #151823

laithsakka · 2025-04-21T19:37:49Z

repo:


import torch
def _cumsum(o):
    ret = [0] * (len(o) + 1)
    for i in range(len(o)):
        ret[i + 1] = ret[i] + o[i]
    return ret

@torch.compile(dynamic=True)
def func(o):
    out = _cumsum(o)
    return out

func([i for i in range(2000)])

We have a fast print implementation used in inductor here

pytorch/torch/_inductor/utils.py

Lines 652 to 667 in 625b4ed

    
           def sympy_str(expr: sympy.Expr) -> str: 
        
               """ 
        
               Normal sympy str is very slow, this is a lot faster.  The result are 
        
               somewhat worse, as it doesn't do as much simplification.  So don't 
        
               use this for final codegen. 
        
               """ 
        
               if isinstance(expr, sympy.Symbol): 
        
                   return expr.name 
        
               if isinstance(expr, sympy.Add): 
        
                   return " + ".join(map(sympy_str, expr.args)) 
        
               if isinstance(expr, sympy.Mul): 
        
                   return " * ".join(map(sympy_str, expr.args)) 
        
               if isinstance(expr, (ModularIndexing, CleanDiv, FloorDiv, Identity)): 
        
                   return f"{expr.func.__name__}({', '.join(map(sympy_str, expr.args))})" 
        
               return str(expr)

maybe we can reuse it?

profile:

https://fburl.com/scuba/pyperf_experimental/on_demand/vo6ru8ty

internal xref:
https://fb.workplace.com/groups/1075192433118967/permalink/23929961646604309/

Note this part is disabled from the model compilation even we can enable it after we fix this .

even though its not there we still see 10% cost for printing sympy expression in full model compilation
https://docs.google.com/document/d/1H-jueMz5VJuX6qVzyBl10OhlWWkxhAjp74JGtl7JhKg/edit?ouid=111904611073736927346&usp=docs_home&ths=true

cc @chauhang @penguinwu @ezyang @bobrenjc93

The text was updated successfully, but these errors were encountered:

Teach the graph printer how to allow overriding printing SymTypes (`SymInt`, `SymFloat`, `SymBool`) and then use that to reuse the fast SymNode printing from `torch._inductor.utils.sympy_str()` to make computing the cache key faster. On my computer the repro from #151823 goes from 480s -> 80s (still terrible... but better). Fixes #151823 cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]

laithsakka assigned aorenste Apr 21, 2025

aorenste mentioned this issue Apr 22, 2025

Improve cache key graph printing performance #151928

Closed

bdhirsh added oncall: pt2 module: dynamic shapes labels Apr 22, 2025

eellison added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Apr 28, 2025

pytorchmergebot closed this as completed in 7a0781e May 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize printing sympy expressions during logging and cache key computation #151823

Optimize printing sympy expressions during logging and cache key computation #151823

laithsakka commented Apr 21, 2025 •

edited by pytorch-bot bot

Loading

Optimize printing sympy expressions during logging and cache key computation #151823

Optimize printing sympy expressions during logging and cache key computation #151823

Comments

laithsakka commented Apr 21, 2025 • edited by pytorch-bot bot Loading

laithsakka commented Apr 21, 2025 •

edited by pytorch-bot bot

Loading