Thanks to visit codestin.com
Credit goes to github.com

Skip to content

All categorical values are displayed in the legend when source dataframe is polars.DataFrame #977

@knl

Description

@knl

This is the example that produces all categorical values in the legend, which is unexpected. I copied it from a notebook, so might need tweaking to display elsewhere.

import polars as pl
import polars.selectors as cs
import datetime as dt
import random

import plotnine
from plotnine import *

import mizani
from mizani.formatters import log_format, custom_format, date_format, label_number, label_percent, label_bytes
from mizani.breaks import date_breaks

# turn on String Cache, otherwise we can't concat different dataframes into one
pl.enable_string_cache()


date = dt.datetime(2025, 8, 29)          # any date is fine
start = dt.datetime(date.year, date.month, date.day, 6, 0, 0)
end   = dt.datetime(date.year, date.month, date.day, 18, 0, 0)

# timestamps: every 5 minutes, inclusive of both ends (06:00 and 18:00)
timestamps = pl.datetime_range(
    start=start,
    end=end,
    interval="5m",
    closed="both",
    eager=True
)

random.seed(42)

sources = pl.Series("source", ["u", "w", "v"], dtype=pl.Categorical)
dirs = pl.Series("dir", ["incoming", "outgoing"], dtype=pl.Categorical)

df = (
    pl.DataFrame({"source": sources})
    .join(
        pl.DataFrame({"five_min_slot": timestamps}),
        how="cross"
    )
    .join(
        pl.DataFrame({"dir": dirs}),
        how="cross"
    )
    .with_columns(
        values=pl.lit(100.0),
    )
    .with_columns(
        pl.col('values').map_elements(lambda v: v*random.random(), return_dtype=pl.self_dtype()),
    )
)

(
    ggplot(
        df.filter(
            pl.col('source').eq('u')
        )
        .sort('five_min_slot'),
        aes(x='five_min_slot', y='values', color='factor(dir)')
    )
    + geom_line()
    + theme_linedraw()
    + theme(
        figure_size=(16, 5),
        axis_text_x=element_text(rotation=45, ha='right'),
    )
    + scale_x_datetime(
        labels=date_format("%H:%M"),
        date_breaks="15 minute",
        date_minor_breaks="5 minute",
        expand=(0, 0),
    )
    + scale_y_continuous(
        labels=label_percent(scale=1),
    )

)

For that code, I get the following output:

Image

I would expect the legend to contain only incoming and outgoing.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions