Speed up search by using parallel Glob and Binary Search for including files checks#1122
Conversation
967f290 to
8c20a3b
Compare
|
@yonaskolb is there something I can help in order this PR to be merged in? |
yonaskolb
left a comment
There was a problem hiding this comment.
Thanks for this @PaulTaykalo! Do you have examples of before and afters in a large project?
Could you please add a changeling entry as well
There was a problem hiding this comment.
Could you please add some tests for this
8c20a3b to
fcdb276
Compare
There was a problem hiding this comment.
Another thing the SortedArray does is sort the elements it's initialised with if we could test that too
|
@keith are you still using this at Lyft? Would be great to see the results on your project, as it's the biggest I've heard of |
fcdb276 to
69353e0
Compare
I tested this out for the Instacart project and saw some modest gains. From around 11.64 seconds -> 11.3 seconds — certainly a move in the right direction, but perhaps not an example of the massive gains you may see elsewhere. |
|
Here's some tests with a very large configuration for us: master xcodegen: changed xcodegen: so this definitely helps a bit, but maybe this just isn't a path we're hurt much by |
|
@keith it seems that your case is - many configurations, many small projects. Am I right? |
|
This test was 1 project, 2 configurations, a few thousand targets, tens of thousands of files |
😳 Wondering what was is so different in my case. |
Binary Search in checking if file if contained in included paths
in this PR, instead of checking if each file contains the paths, we're using binary search.
This PR speed ups checks, especially in cases when multiple files are added (i.e few thousand)
Before this PR, the complexity of checking if a file is included is O(N * M) where N - files to check, M - included files
This PR will decrease complexity to O(N * log(M)) which significantly improves operations on big projects.
Parallel calls of Glob invocations
This PR adds a parallel call of glob invocations, since calling those in one thread and waiting for the results will take lead to enormous waste.
By adding these two changes, one can achieve significant speedup, especially on relatively big projects###
Before
After