Preparations for multivariate plotting #29877

trygvrad · 2025-04-06T14:08:18Z

PR summary

This PR continues the work of #28658 and #28454 and #29876, aiming to close #14168. (Feature request: Bivariate colormapping)

This is part two of the former PR, #29221, and builds upon #29876. Please see #29221 for the previous discussion

#29876 includes:

A MultiNorm class. This is a subclass of colors.Normalize and holds n_variate norms.
Testing of the MultiNorm class

This PR includes in this PR:

changes to colorizer.py needed to expose the MultiNorm class

Features not included in this PR:

Exposes the functionality provided by MultiNorm together with BivarColormap and MultivarColormap to the plotting functions axes.imshow(...), axes.pcolor, and `axes.pcolormesh(...)
Testing of the new plotting methods
Examples in the docs

This commit introduces the MultiNorm calss to prepare for the introduction of multivariate plotting methods

oscargus · 2025-04-06T16:02:00Z

lib/matplotlib/cbook.py

+            return x
+        else:
+            # in case of a dtype with multiple fields:
+            try:


Would be good to get at least partial coverage for this branch.

I haven't really been involved in this work nor understand how it works, but there is quite a bit of introduced code to deal with multiple datatypes? If this will be covered by tests/functionality in later PRs, that is fine, if not, please add tests for (most of) it.

I was asked to split #29221 into multiple PRs, and this PR is one of them.
There is tests for this functionality in #29221 using the top-level plotting functions (axes.imshow() etc.)
In my mind it is better to test using the top-level API, but if you wish I could add dedicated testing to this PR.

anntzer · 2025-04-10T10:58:59Z

lib/matplotlib/colorizer.py

+            if self.norm.n_output != cmap_obj.n_variates:
+                raise ValueError(f"The colormap {cmap} does not support "
+                                 f"{self.norm.n_output} variates as required by "
+                                 f"the {type(self.norm)} on this Colorizer.")


Error messages typically have no end dot (same comment applies throughout).

Thanks, I'll need to change this in the other PR as well.

anntzer · 2025-04-10T11:01:54Z

lib/matplotlib/cbook.py

+                mask = np.empty(x.shape, dtype=np.dtype('bool, '*len(x.dtype.descr)))
+                for dd, dm in zip(x.dtype.descr, mask.dtype.descr):
+                    mask[dm[0]] = ~(np.isfinite(x[dd[0]]))
+                xm = np.ma.array(x, mask=mask, copy=False)


Do numpy masked arrays actually support struct arrays as mask, with possibly different masking of the fields?

I have found that this is the only way numpy supports masking dtypes with multiple fields, but I will see if [("mask", bool, len(x.dtype.descr))] as you suggest bellow is a reasonable approach to using a single mask.

anntzer · 2025-04-10T11:02:47Z

lib/matplotlib/cbook.py

+        else:
+            # in case of a dtype with multiple fields:
+            try:
+                mask = np.empty(x.shape, dtype=np.dtype('bool, '*len(x.dtype.descr)))


Could the dtype be e.g. [("mask", bool, len(x.dtype.descr))] (with a slightly different API)?

This is an interesting idea. I'll make a prototype and see if this would add unnecessary complexity somewhere else.

lib/matplotlib/colorizer.py

trygvrad · 2025-04-13T13:04:29Z

@anntzer I think this is important, so I wanted to reply to this in the main thread.

Could the dtype be e.g. [("mask", bool, len(x.dtype.descr))] (with a slightly different API)?

The context here is that mulrivariate data is stored internally as an array with a data type with multiple fields.
This has been chosen, because it ensures that data.shape returns the same shape for both scalar and multivariate data.
If a numpy array with multiple fields is masked, it must have a separate mask for each channel. I read @anntzer s suggestion as letting the mask be another field, i.e. ['bool', 'float64', 'float64'] interpreted as [mask, variate0, variate1] when a dataset with two variates is masked.

It should be noted that when a regular np.array is masked, and the mask is false for all values, only a single instance of is stored (instead of a full array of bools). This is not the case for structured arrays. For structured arrays, full mask [with a separate bool for each field] is encoded in all cases.

I didn't actually get as far as to prototype this, but I did have a look around.

I have found that it will largely involve changes to colors.multi_norm._iterable_variates_in_data() and cbook.safe_masked_invalid()

I have tried to list the advantages/disadvantages of the two approaches below:

A: Use a masked array with a struct array.

Implication: each variate has a separate mask

Advantages:

It is easy to iterate over the channels (this is in any case handled by colors.multi_norm._iterable_variates_in_data())
Easy to parse masked input
np.ma.is_masked() will work for both multivariate and scalar data
3.1 I don't think this is actually used internally in the context of the data for a relevant plotting method, so this appears to be a minor issue.
Each variate may have a different mask, and we may implement different belending mode in color for each.
4.1 i.e. instead of having the masked values be transparent, it is possible to map them to unique colors, [typically colors that otherwise do not occur naturally in the colormap, typically cyan, magenta, bright green ] so that the user knows which channel has masked [invalid] data.
4.2 We will probably not support this initially, but choosing this route allows us the flexibility in the future

Disadvantages:

Need to store a separate mask for each channel.

B: store the mask as an additional dtype in the struct array i.e. [("mask", bool, len(x.dtype.descr))]

Implication: a shared mask for all channels

Advantages:

Only one mask
1.1 Less memory use
1.2 No ambiguity as to what data is masked

Disadvantages:

In order to iterate over the channels, a masked array must be created for each channel. (i.e. slicing the array will not produce masked arrays – this can be handled in colors.multi_norm._iterable_variates_in_data().)
1.1. The data may be iterated over multiple times in order to produce a single plot [autoset limits(?) etc.]. One way to interpret option A is that it caches each variate in its masked form, whereas with option B the masked version of each variate is created only upon access.
A no-masked and masked version of the same array has different number of fields, which can lead to confusion.
Masked input must be parsed
3.1 With this implementation, practically all data will need to be formatted upon input, whereas with implementation A, data that is already structs [or is complex!] is interoperable with the internal workings of matplotlib.
I suspect it will be more difficult to onboard new developers with this approach.

Having looked at this, my personal opinion is that option A is more suitable for matplotlib because I think it will be easier to maintain.

@anntzer let me know if I have interpreted your suggestion correctly, and if you agree with my assessment of approach A or B, or if you think I should make a full prototype to explore this further.

@QuLogic

Thank you @QuLogic Co-authored-by: Elliott Sales de Andrade <[email protected]>

MultiNorm class

5e0266a

This commit introduces the MultiNorm calss to prepare for the introduction of multivariate plotting methods

github-actions bot added topic: color/color & colormaps topic: transforms and scales labels Apr 6, 2025

trygvrad changed the title ~~Multivariate plot prapare 2~~ Preparations for multivariate plotting Apr 6, 2025

trygvrad mentioned this pull request Apr 6, 2025

Multivariate plotting in imshow, pcolor and pcolormesh #29221

Open

5 tasks

oscargus reviewed Apr 6, 2025

View reviewed changes

anntzer reviewed Apr 10, 2025

View reviewed changes

lib/matplotlib/colorizer.py Outdated Show resolved Hide resolved

anntzer reviewed Apr 10, 2025

View reviewed changes

lib/matplotlib/colorizer.py Outdated Show resolved Hide resolved

updates based on feedback from review, @oscargus, @anntzer

6985111

trygvrad force-pushed the multivariate-plot-prapare-2 branch from 54a945c to eeb895c Compare April 10, 2025 20:42

github-actions bot removed the topic: transforms and scales label Apr 10, 2025

trygvrad force-pushed the multivariate-plot-prapare-2 branch from 41acef7 to 9c62126 Compare April 13, 2025 10:52

trygvrad mentioned this pull request Apr 17, 2025

MultiNorm class #29876

Open

trygvrad and others added 6 commits April 17, 2025 17:17

Apply suggestions from code review

f42d65b

Thank you @QuLogic Co-authored-by: Elliott Sales de Andrade <[email protected]>

Updates based on feedback from @anntzer

73713e7

change MultiNorm.n_intput to n_variables

78b173e

Changes to colorizer to prepare for multivariate plotting

3979e09

Updates based on feedback from review

adb4ee3

update following change to MultiNorm

a276d89

trygvrad force-pushed the multivariate-plot-prapare-2 branch from 9c62126 to a276d89 Compare May 7, 2025 17:53

github-actions bot added the status: needs rebase label May 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Preparations for multivariate plotting #29877

Preparations for multivariate plotting #29877

trygvrad commented Apr 6, 2025

Uh oh!

oscargus Apr 6, 2025

Uh oh!

oscargus Apr 6, 2025

Uh oh!

trygvrad Apr 6, 2025

Uh oh!

anntzer Apr 10, 2025

Uh oh!

trygvrad Apr 10, 2025

Uh oh!

anntzer Apr 10, 2025

Uh oh!

trygvrad Apr 10, 2025

Uh oh!

anntzer Apr 10, 2025

Uh oh!

trygvrad Apr 10, 2025

Uh oh!

Uh oh!

Uh oh!

trygvrad commented Apr 13, 2025

Uh oh!

Uh oh!

Uh oh!

Preparations for multivariate plotting #29877

Are you sure you want to change the base?

Preparations for multivariate plotting #29877

Conversation

trygvrad commented Apr 6, 2025

PR summary

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

trygvrad commented Apr 13, 2025

Uh oh!

Uh oh!