Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Memory inefficient dense matrix compute in BaseSimilarity #770

@niekdejonge

Description

@niekdejonge

I started having a look at rewriting Scores and BaseSimilarity for a matchms 1.0 version.

Currently the BaseSimilarity.matrix method supports returning numpy and sparse. Having a single method returning two datatypes can quickly become confusing. In addition, the default compute method implemented is a sparse calculation method. For each score it stores the row and column, which is not memory efficient for a dense matrix. I understand that the keep_score method removes scores which are zero, however scores like ms2deepscore (and to some extend mod cosine) almost never predict 0. So in these cases this is a weird way of computing a matrix.

Suggested fix:
Create two matrix compute methods:

matrix()

  • Should be memory efficient complete matrix compute (and always return a numpy matrix)
  • If for some reason you want a sparse array from this you can just use the add_dense_matrix in the sparse_array class.

sparse_array()

  • Is the old matrix compute, but has the option to set a filter range. In matrix now keep_score filters this step, but by default this is removing only anything that is 0. So the filter_by_range is done after first creating a complete matrix. this does not make sense from a memory standpoint. So here the filtering should happen during the matrix compute, by replacing keep_score within range filter.
    sparse_array()
  • Should always return a sparse array
  • Should allow masking like current sparse_array, but if no mask is set it should compute a sparse array in the current matrix() way (still applying filtering).

Metadata

Metadata

Assignees

No one assigned

    Labels

    matchms_1_0Issues that lead to breaking changes and should be included in the matchms 1.0 release

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions