-
Notifications
You must be signed in to change notification settings - Fork 76
Open
Labels
matchms_1_0Issues that lead to breaking changes and should be included in the matchms 1.0 releaseIssues that lead to breaking changes and should be included in the matchms 1.0 release
Description
I started having a look at rewriting Scores and BaseSimilarity for a matchms 1.0 version.
Currently the BaseSimilarity.matrix method supports returning numpy and sparse. Having a single method returning two datatypes can quickly become confusing. In addition, the default compute method implemented is a sparse calculation method. For each score it stores the row and column, which is not memory efficient for a dense matrix. I understand that the keep_score method removes scores which are zero, however scores like ms2deepscore (and to some extend mod cosine) almost never predict 0. So in these cases this is a weird way of computing a matrix.
Suggested fix:
Create two matrix compute methods:
matrix()
- Should be memory efficient complete matrix compute (and always return a numpy matrix)
- If for some reason you want a sparse array from this you can just use the add_dense_matrix in the sparse_array class.
sparse_array()
- Is the old matrix compute, but has the option to set a filter range. In matrix now keep_score filters this step, but by default this is removing only anything that is 0. So the filter_by_range is done after first creating a complete matrix. this does not make sense from a memory standpoint. So here the filtering should happen during the matrix compute, by replacing keep_score within range filter.
sparse_array() - Should always return a sparse array
- Should allow masking like current sparse_array, but if no mask is set it should compute a sparse array in the current matrix() way (still applying filtering).
florian-huber
Metadata
Metadata
Assignees
Labels
matchms_1_0Issues that lead to breaking changes and should be included in the matchms 1.0 releaseIssues that lead to breaking changes and should be included in the matchms 1.0 release