Thanks to visit codestin.com
Credit goes to github.com

Skip to content

"from components" to encapsulate common to_coo() / from_coo() recipes #253

Open
@eriknw

Description

@eriknw

Edit: note that to_values is deprecated; use to_coo instead

We have found that we sometimes perform to_values() followed by from_values() to perform different operations such as creating a diagonal matrix or using the values as an index.

We may want to give this common(-ish) pattern a name. For example:

Matrix:
    @classmethod
    def from_components(
        row_obj,
        col_obj,
        val_obj,
        dtype=None,
        *,
        row_how="row",
        col_how="col",
        val_how="val",
        dup_op=None,
        nrows=None,
        ncols=None,
        name=None,
    ):

Vector:
    @classmethod
    def from_components(
        index_obj,
        val_obj,
        dtype=None,
        *,
        index_how="index",
        dup_op=None,
        size=None,
        name=None,
    ):

For example, in generalized_degrees python-graphblas/graphblas-algorithms#11, we compute a number for each edge, and then we calculate a histogram of counts using the column index as the count:

    rows, cols, vals = Tri.to_values()
    # The column index indicates the number of triangles an edge participates in.
    # The largest this can be is `A.ncols - 1`.  Values is count of edges.
    return Matrix.from_values(
        rows,
        vals,
        np.ones(vals.size, dtype=int),
        dup_op=binary.plus,
        nrows=A.nrows,
        ncols=A.ncols - 1,
        name="generalized_degree",
    )

would be written as

    return Matrix.from_components(
        Tri,
        Tri,
        1,
        col_how="val",
        dup_op=binary.plus,
        ncols=Tri.ncols - 1,
        name="generalized_degree",
    )

row_how, col_how, and val_how should be one of {"index", "row", "col", "val"}. For Vectors, we treat "row" and "col" the same as "index". It is an error to use "index" for Matrix objects.

Default nrows and ncols are inferred from what they come from. For example, for nrows:

  • row_how == "row": nrows = row_obj.nrows
  • row_how == "col": nrows = row_obj.ncols
  • row_how == "val": nrows = max(row_obj) + 1

row_obj, col_obj, and val_obj must match type and structure, but val_obj may also be a scalar. We will verify types and nvals, but not structure (behavior is undefined if structures don't match). Note that a Matrix can be created from Vector components, and vice versa.

Values should be integral dtype when used as indices. If signed integer, we should take the min to check for negative indices. If negative, should we raise (the easiest to start with), or use it as negative offset from the end?

Until this is implemented, let's try to gather examples here where we would use this function or similar functions. Maybe we'll find different variations, better APIs, better names, or better recipes.

A potential goal is to try to get this functionality implemented in C or added to the C spec (if it is sufficiently justified), which may need to be less flexible than our Python API. But, we won't know if such functionality will be useful until we try to encapsulate it in a function in Python!

Alright, let's gather examples (from LAGraph, grblas-recipes, etc) to see if this function works...

Metadata

Metadata

Assignees

No one assigned

    Labels

    discussionDiscussing a topic with no specific actions yetfeatureSomething is missing

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions