-
Notifications
You must be signed in to change notification settings - Fork 50
Description
Compile the following program with dalec a.dt -s ir -O4 --no-dale-stdlib (Dale is compiled with LLVM 6.0.1),
(import cstdio)
(def main (fn extern-c int (void)
(def sum (var auto \ 0))
(for (i \ 0) (< i 500000000) (incv i)
(setv sum (+ sum i)))
(printf "%d\n" sum)
0))
I get:
[...]
; Function Attrs: alwaysinline norecurse nounwind readonly
define weak_odr i32 @"_Z1$2bii"(i32 %a, i32 %b) local_unnamed_addr #0 {
entry:
%0 = add nsw i32 %b, %a
ret i32 %0
}
[...]
; Function Attrs: nounwind
define i32 @main() local_unnamed_addr #2 {
entry:
%0 = tail call i8 @"_Z1$3cii"(i32 0, i32 500000000)
%1 = and i8 %0, 1
%2 = icmp eq i8 %1, 0
br i1 %2, label %_dale_internal_label_breaklabel_2, label %_dale_internal_label_continuelabel_1.preheader
_dale_internal_label_continuelabel_1.preheader: ; preds = %entry
br label %_dale_internal_label_continuelabel_1
_dale_internal_label_continuelabel_1: ; preds = %_dale_internal_label_continuelabel_1.preheader, %_dale_internal_label_continuelabel_1
%.03 = phi i32 [ %3, %_dale_internal_label_continuelabel_1 ], [ 0, %_dale_internal_label_continuelabel_1.preheader ]
%.012 = phi i32 [ %4, %_dale_internal_label_continuelabel_1 ], [ 0, %_dale_internal_label_continuelabel_1.preheader ]
%3 = tail call i32 @"_Z1$2bii"(i32 %.03, i32 %.012)
%4 = tail call i32 @"_Z1$2bii"(i32 %.012, i32 1)
%5 = tail call i8 @"_Z1$3cii"(i32 %4, i32 500000000)
%6 = and i8 %5, 1
%7 = icmp eq i8 %6, 0
br i1 %7, label %_dale_internal_label_breaklabel_2, label %_dale_internal_label_continuelabel_1
_dale_internal_label_breaklabel_2: ; preds = %_dale_internal_label_continuelabel_1, %entry
%.0.lcssa = phi i32 [ 0, %entry ], [ %3, %_dale_internal_label_continuelabel_1 ]
%8 = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @_dvidrl0, i64 0, i64 0), i32 %.0.lcssa)
ret i32 0
}
[...]
Note those _Z1$2bii calls, the primitive operator functions are not inlined, despite being marked as alwaysinline. This leads to very bad performance, and on my machine it's 10x slower than an equivalent C program compiled with gcc.
After some investigation, I think this is caused by inefficient use of LLVM. As indicated by LLVM command line tools, LLVM can easily optimize the whole thing into printing a constant. With the following commands,
> llvm-as a.dt.ll
> opt -O1 a.dt.bc -o O1.bc
> llvm-dis O1.bc
A file O1.ll is produced:
[...]
; Function Attrs: nounwind
define i32 @main() local_unnamed_addr #2 {
_dale_internal_label_breaklabel_2:
%0 = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @_dvidrl0, i64 0, i64 0), i32 1711656320)
ret i32 0
}
[...]
I also think --no-dale-stdlib should always be enabled. Those Dale run-time functions are very small, and without them being properly inlined, performance of both compiled programs and compilation time (to run the macros) is largely degraded.