-
Notifications
You must be signed in to change notification settings - Fork 23
Chunk better #45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Chunk better #45
Conversation
`parListChunk` previously split a list up into chunks, applied the given strategy to each chunk, and then put them all together again. This led to two extra copies of the list. We get very little benefit from actually splitting the list, because the parallel computations need to traverse their part anyway; we can instead just hand off the whole list and let them count out their chunk. We count each chunk twice, but that shouldn't cost enough to matter. Now that `Eval` has a `MonadFix` instance, we can avoid actually having to put together lists at the end; instead, we pass each parallel computation the (as-yet-uncomputed) result of calculating the rest of the list.
Great. Can we have some tests and benchmarks please? |
@treeowl @simonmar for this benchmark on a 22 core Intel Core Ultra 7 Processor 155H: module Main where
import Control.Parallel.Strategies
totients :: Int -> Int -> [Int]
totients lower upper = map euler [lower, lower + 1 .. upper]
euler :: Int -> Int
euler n = length (filter (relprime n) [1 .. n - 1])
relprime :: Int -> Int -> Bool
relprime x y = hcf x y == 1
hcf :: Int -> Int -> Int
hcf x 0 = x
hcf x y = hcf y (rem x y)
main :: IO ()
main = do
let result = sum (totients 1 100000 `using` parListChunk 100 rseq)
putStrLn ("Sum of Totients between [1..100000] is " ++ show result) With the current version of
With David's alternative implementation of
These max heap residency and total allocated results are consistent across multiple runs of this benchmark. Eventlog file of current Eventlog file of David's |
chunk :: Int -> [a] -> [[a]] | ||
chunk _ [] = [] | ||
chunk n xs = as : chunk n bs where (as,bs) = splitAt n xs | ||
parListChunk' :: Int -> (a -> Eval b) -> [a] -> Eval [b] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no point in defining a parListChunk'
, just define parListChunk
directly.
chunk n xs = as : chunk n bs where (as,bs) = splitAt n xs | ||
parListChunk' :: Int -> (a -> Eval b) -> [a] -> Eval [b] | ||
parListChunk' n strat | ||
| n <= 1 = traverse strat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| n <= 1 = traverse strat | |
| n <= 1 = evalList strat |
evalList
is defined as traversable
but more clear IMO.
parListChunk' n0 strat = go n0 | ||
where | ||
go !_n [] = pure [] | ||
go n as = mdo | ||
-- Calculate the first chunk in parallel, passing it the result | ||
-- of calculating the rest | ||
bs <- rpar $ runEval $ evalChunk strat more n as | ||
|
||
-- Calculate the rest | ||
more <- go n (drop n as) | ||
return bs | ||
|
||
-- | @evalChunk strat end n as@ uses @strat@ to evaluate the first @n@ | ||
-- elements of @as@ (ignoring the rest) and appends @end@ to the result. | ||
evalChunk :: (a -> Eval b) -> [b] -> Int -> [a] -> Eval [b] | ||
evalChunk strat = \end -> | ||
let | ||
go !_n [] = pure end | ||
go 0 _ = pure end | ||
go n (a:as) = (:) <$> strat a <*> go (n - 1) as | ||
in go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to avoid the recursive do, by doing something like go 0 more = pure $ runEval (parListChunk n strat more)
?
@robstewart57 thank you for the benchmarks! This change definitely improves memory usage. |
parListChunk
previously split a list up into chunks, appliedthe given strategy to each chunk, and then put them all together
again. This led to two extra copies of the list.
We get very little benefit from actually splitting the list, because the
parallel computations need to traverse their part anyway; we can instead just
hand off the whole list and let them count out their chunk. We count each
chunk twice, but that shouldn't cost enough to matter.
Now that
Eval
has aMonadFix
instance, we can avoid actually havingto put together lists at the end; instead, we pass each parallel
computation the (as-yet-uncomputed) result of calculating the rest
of the list.