Description
I think it is or will be relatively common for some calculations to be too large to fit in memory. Sometimes an intermediate step will be too large, but the final result will be sparse. In these cases, partitioning the data to perform calculations in batches is probably a "good enough", pragmatic solution. I would like us to come up with solutions to make chunkwise computations easier and clearer.
@ilya-antonov has a use case that involves matrix-matrix multiplies followed by a select. The matrix-matrix multiply (w/o masking) sometimes results in very large matrices, and select will make it sparse again. This currently requires chunking. Perhaps someday GraphBLAS implementations will automatically detect this pattern and fuse the kernels together to keep it sparse, but this does not yet exist today.
We sometimes need to do batching in graphblas-algorithms
too b/c an object can be too large. We have some helper functions here:
For example, these are used by square_clustering
:
For some applications, it may be best to chunk the data before creating GraphBLAS objects.
Other times, we may want to chunk existing objects into smaller pieces. Here is an example:
# Original
M_ll = select.valueeq(C_l @ A @ C_l.T, l * l).new(dtype=bool)
# Chunkwise
import itertools
from graphblas import binary, select, Matrix
concat = itertools.chain.from_iterable
N = 100000
k = 30
l = 5
# Create example matrices (pretend they have data :)
A = Matrix(int, N, N)
C_l = Matrix(int, k, N)
# Option 1: split A along rows and C_l along columns, and accumulate results in M_ll
chunksize = 1000
C_l_chunks = C_l.ss.split((None, chunksize))[0]
A_chunks = concat(A.ss.split((chunksize, None)))
M_ll = Matrix(bool, k, k)
for C_l_chunk, A_chunk in zip(C_l_chunks, A_chunks):
# Which matrix multiply should we do first: with C_l_chunk or C_l.T ?
M_ll(binary.any) << select.valueeq(C_l_chunk @ A_chunk @ C_l.T, l * l)
Can we make this chunking clearer? For example:
M_ll = Matrix(bool, k, k)
for C_l_chunk, A_chunk in Chunkwise(chunksize).chunk_columns(C_l).chunk_rows(A):
M_ll(binary.any) << select.valueeq(C_l_chunk @ A_chunk @ C_l.T, l * l)
If operations use indices of any of the chunks, then we would want the chunks to be the same size as the original; how can we indicate this?
If we are multiplying two matrices together of sizes m x n
and n x p
, there are many ways to split up the calculation:
- Split on
n
(like above) and accumulate into a result that is the final size - Split on
m
and concat rows into the final result - Split on
p
and concat columns into final result - Split on
m
andp
, and concat 2-d list of lists of results into final result - etc.
Where is the "sweet spot" of functionality that is useful but not overly complicated?
An alternative approach would be to create our own functions that serve as "fused kernels"; for example, we could create def matrix_multply_select(...)
that performs chunking. I think it may be very handy if we built up a library of useful functions such as this.