Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Chunk better #45

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Chunk better #45

wants to merge 1 commit into from

Conversation

treeowl
Copy link
Contributor

@treeowl treeowl commented Jun 21, 2018

parListChunk previously split a list up into chunks, applied
the given strategy to each chunk, and then put them all together
again. This led to two extra copies of the list.

We get very little benefit from actually splitting the list, because the
parallel computations need to traverse their part anyway; we can instead just
hand off the whole list and let them count out their chunk. We count each
chunk twice, but that shouldn't cost enough to matter.

Now that Eval has a MonadFix instance, we can avoid actually having
to put together lists at the end; instead, we pass each parallel
computation the (as-yet-uncomputed) result of calculating the rest
of the list.

`parListChunk` previously split a list up into chunks, applied
the given strategy to each chunk, and then put them all together
again. This led to two extra copies of the list.

We get very little benefit from actually splitting the list, because the
parallel computations need to traverse their part anyway; we can instead just
hand off the whole list and let them count out their chunk.  We count each
chunk twice, but that shouldn't cost enough to matter.

Now that `Eval` has a `MonadFix` instance, we can avoid actually having
to put together lists at the end; instead, we pass each parallel
computation the (as-yet-uncomputed) result of calculating the rest
of the list.
@simonmar
Copy link
Member

Great. Can we have some tests and benchmarks please?

@robstewart57
Copy link

@treeowl @simonmar for this benchmark on a 22 core Intel Core Ultra 7 Processor 155H:

module Main where

import Control.Parallel.Strategies

totients :: Int -> Int -> [Int]
totients lower upper = map euler [lower, lower + 1 .. upper]

euler :: Int -> Int
euler n = length (filter (relprime n) [1 .. n - 1])

relprime :: Int -> Int -> Bool
relprime x y = hcf x y == 1

hcf :: Int -> Int -> Int
hcf x 0 = x
hcf x y = hcf y (rem x y)

main :: IO ()
main = do
  let result = sum (totients 1 100000 `using` parListChunk 100 rseq)
  putStrLn ("Sum of Totients between [1..100000] is " ++ show result)

With the current version of parListChunk:

  • Max heap size: 122MB
  • Max heap residency: 2.9MB
  • Total allocated: 32.2MB
  • Total runtime: 26.51s
  • Sparks: 1000 created, all converted

With David's alternative implementation of parListChunk in 26b1f48 :

  • Max heap size: 120MB
  • Max heap residency: 234KB
  • Total allocated: 14.8MB
  • Total runtime: 21.6s
  • Sparks: 1000 created, all converted

These max heap residency and total allocated results are consistent across multiple runs of this benchmark.

Eventlog file of current parListChunk implementation: https://www.dropbox.com/scl/fi/wb63qk0sshhv2rrc3dpnk/haskell-totient-upstream-1-100k.eventlog?rlkey=y085sy3j8ikid1znqdgvlnhy3&st=2yulnu9d&dl=0

Eventlog file of David's parListChunk implementation: https://www.dropbox.com/scl/fi/4nexsaenlp6hcglt8ev76/haskell-totient-DavidFeuer-1-100k.eventlog?rlkey=qut2znzoo8b5f36ydwbv7a18n&st=jm538eba&dl=0

chunk :: Int -> [a] -> [[a]]
chunk _ [] = []
chunk n xs = as : chunk n bs where (as,bs) = splitAt n xs
parListChunk' :: Int -> (a -> Eval b) -> [a] -> Eval [b]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no point in defining a parListChunk', just define parListChunk directly.

chunk n xs = as : chunk n bs where (as,bs) = splitAt n xs
parListChunk' :: Int -> (a -> Eval b) -> [a] -> Eval [b]
parListChunk' n strat
| n <= 1 = traverse strat
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| n <= 1 = traverse strat
| n <= 1 = evalList strat

evalList is defined as traversable but more clear IMO.

Comment on lines +593 to +613
parListChunk' n0 strat = go n0
where
go !_n [] = pure []
go n as = mdo
-- Calculate the first chunk in parallel, passing it the result
-- of calculating the rest
bs <- rpar $ runEval $ evalChunk strat more n as

-- Calculate the rest
more <- go n (drop n as)
return bs

-- | @evalChunk strat end n as@ uses @strat@ to evaluate the first @n@
-- elements of @as@ (ignoring the rest) and appends @end@ to the result.
evalChunk :: (a -> Eval b) -> [b] -> Int -> [a] -> Eval [b]
evalChunk strat = \end ->
let
go !_n [] = pure end
go 0 _ = pure end
go n (a:as) = (:) <$> strat a <*> go (n - 1) as
in go
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to avoid the recursive do, by doing something like go 0 more = pure $ runEval (parListChunk n strat more)?

@konsumlamm
Copy link
Collaborator

@robstewart57 thank you for the benchmarks! This change definitely improves memory usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants