Thanks to visit codestin.com
Credit goes to github.com

Skip to content

⚡️ Speed up function _facet_grid_color_categorical by 47% #116

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

codeflash-ai[bot]
Copy link

@codeflash-ai codeflash-ai bot commented May 24, 2025

📄 47% (0.47x) speedup for _facet_grid_color_categorical in plotly/figure_factory/_facet_grid.py

⏱️ Runtime : 3.26 seconds 2.22 seconds (best of 5 runs)

📝 Explanation and details

Here’s a comprehensive rewrite focused on runtime and memory optimization based on your profiler, especially for _facet_grid_color_categorical. The main performance issues are in DataFrame slicing/filtering, repeated function calls, unnecessary list conversions, dictionary creation, and marker dict construction. See detailed explanations inline.

Major optimizations applied:

  • Heavy DataFrame filters are replaced with calls that use precomputed masks/arrays, reducing repeated computation.
  • Reduce the number of .unique() and .groupby() calls by caching results.
  • Avoid unneeded lists (like list(df.groupby(...))).
  • Inline and reduce dict unpacking where not dynamically needed.
  • Reduce object construction and string formatting overhead.
  • Vectorize, or use lookups instead of repeated logic when possible.
  • Minimize calls to _make_trace_for_scatter/_return_label/_annotation_dict by restructuring loops.

Below is the optimized code; docstrings and core API are untouched.

Key optimizations and effects:

  • No unnecessary listification of .groupby(). Instead, use for key, g in df.groupby(.., sort=False).
  • Cache .unique() calls (use only once per branch, not in every loop).
  • DataFrame slicing: Use boolean numpy arrays (values == val) and assign/lookup columns only for those rows; avoids repeated mask computation and duplicate DataFrame slicing.
  • Trace creation: Avoid unnecessary dict unpacking/concat unless it is necessary for dynamic keys.
  • Annotation/label: No logic change, but the fast-path in _return_label avoids unnecessary .format, just builds string with f-string if required.
  • Empty facet handling: Only create empty DataFrames once, avoids repeated creation and repeated [None, None, None] allocations.
  • Branching: Only perform operations required for each branch (e.g. only annotate rows/cols on the first iteration).
  • In-place marker dict construction.

You should observe significantly less memory overhead, much lower DataFrame slicing time, and a substantial drop in trace-building and annotation time (see profiler lines with high Per Hit numbers previously).

NOTE: If you use this function on very large DataFrames, you may get dramatic speedups due to reduced Python-loop time and fewer pandas DataFrame object constructions and slice copies. For smaller frames, the gains are still notable but less dramatic.

Let me know if you want help optimizing special branches or want further Numba or Cython vectorization applied.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 33 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests Details
import random

# Helper to create a simple pandas DataFrame for tests
import pandas as pd
# imports
import pytest
from plotly.figure_factory._facet_grid import _facet_grid_color_categorical

# function to test and dependencies are already defined above


# Helper function to generate a DataFrame for large scale tests
def generate_large_df(num_rows, num_colors=3, num_facet_rows=2, num_facet_cols=2):
    data = {
        "x": [random.random() for _ in range(num_rows)],
        "y": [random.random() for _ in range(num_rows)],
        "color": [f"cat{(i % num_colors)}" for i in range(num_rows)],
        "facet_row": [f"row{(i % num_facet_rows)}" for i in range(num_rows)],
        "facet_col": [f"col{(i % num_facet_cols)}" for i in range(num_rows)],
    }
    return pd.DataFrame(data)

# Basic Test Cases

def test_no_facets_basic_colorgrouping():
    # Test with no facet_row and no facet_col, just color grouping
    df = pd.DataFrame({
        "x": [1, 2, 3, 4],
        "y": [10, 20, 30, 40],
        "color": ["a", "b", "a", "b"]
    })
    colormap = {"a": "red", "b": "blue"}
    fig, annotations = _facet_grid_color_categorical(
        df=df,
        x="x",
        y="y",
        facet_row=None,
        facet_col=None,
        color_name="color",
        colormap=colormap,
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=True,
        flipped_cols=True,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={},
    )
    names = set(trace.name for trace in fig.data)
    # Check that colors are correct
    for trace in fig.data:
        pass
    # Should be a single subplot (1,1)
    for trace in fig.data:
        pass

def test_facet_row_only():
    # Test with facet_row only
    df = pd.DataFrame({
        "x": [1, 2, 3, 4],
        "y": [10, 20, 30, 40],
        "color": ["a", "b", "a", "b"],
        "facet_row": ["r1", "r1", "r2", "r2"]
    })
    colormap = {"a": "red", "b": "blue"}
    fig, annotations = _facet_grid_color_categorical(
        df=df,
        x="x",
        y="y",
        facet_row="facet_row",
        facet_col=None,
        color_name="color",
        colormap=colormap,
        num_of_rows=2,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=True,
        flipped_cols=True,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={},
    )
    # Each facet row should have 2 traces (a and b)
    facet_counts = {}
    for trace in fig.data:
        facet_counts.setdefault(trace.name, 0)
        facet_counts[trace.name] += 1
    # Annotation text should include "r1" and "r2"
    texts = [ann["text"] for ann in annotations]

def test_facet_col_only():
    # Test with facet_col only
    df = pd.DataFrame({
        "x": [1, 2, 3, 4],
        "y": [10, 20, 30, 40],
        "color": ["a", "b", "a", "b"],
        "facet_col": ["c1", "c2", "c1", "c2"]
    })
    colormap = {"a": "red", "b": "blue"}
    fig, annotations = _facet_grid_color_categorical(
        df=df,
        x="x",
        y="y",
        facet_row=None,
        facet_col="facet_col",
        color_name="color",
        colormap=colormap,
        num_of_rows=1,
        num_of_cols=2,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=True,
        flipped_cols=True,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={},
    )
    facet_counts = {}
    for trace in fig.data:
        facet_counts.setdefault(trace.name, 0)
        facet_counts[trace.name] += 1
    texts = [ann["text"] for ann in annotations]

def test_facet_row_and_col():
    # Test with both facet_row and facet_col
    df = pd.DataFrame({
        "x": [1, 2, 3, 4],
        "y": [10, 20, 30, 40],
        "color": ["a", "b", "a", "b"],
        "facet_row": ["r1", "r1", "r2", "r2"],
        "facet_col": ["c1", "c2", "c1", "c2"]
    })
    colormap = {"a": "red", "b": "blue"}
    fig, annotations = _facet_grid_color_categorical(
        df=df,
        x="x",
        y="y",
        facet_row="facet_row",
        facet_col="facet_col",
        color_name="color",
        colormap=colormap,
        num_of_rows=2,
        num_of_cols=2,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=True,
        flipped_cols=True,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={},
    )
    # Each trace should have correct color
    for trace in fig.data:
        pass
    texts = [ann["text"] for ann in annotations]

def test_custom_labels_dict():
    # Test with custom facet_row_labels and facet_col_labels as dicts
    df = pd.DataFrame({
        "x": [1, 2, 3, 4],
        "y": [10, 20, 30, 40],
        "color": ["a", "b", "a", "b"],
        "facet_row": ["r1", "r1", "r2", "r2"],
        "facet_col": ["c1", "c2", "c1", "c2"]
    })
    colormap = {"a": "red", "b": "blue"}
    facet_row_labels = {"r1": "Row One", "r2": "Row Two"}
    facet_col_labels = {"c1": "Col One", "c2": "Col Two"}
    fig, annotations = _facet_grid_color_categorical(
        df=df,
        x="x",
        y="y",
        facet_row="facet_row",
        facet_col="facet_col",
        color_name="color",
        colormap=colormap,
        num_of_rows=2,
        num_of_cols=2,
        facet_row_labels=facet_row_labels,
        facet_col_labels=facet_col_labels,
        trace_type="scatter",
        flipped_rows=True,
        flipped_cols=True,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={},
    )
    # Check that annotation texts match custom labels
    texts = [ann["text"] for ann in annotations]

def test_custom_labels_str():
    # Test with facet_row_labels and facet_col_labels as string
    df = pd.DataFrame({
        "x": [1, 2, 3, 4],
        "y": [10, 20, 30, 40],
        "color": ["a", "b", "a", "b"],
        "facet_row": ["r1", "r1", "r2", "r2"],
        "facet_col": ["c1", "c2", "c1", "c2"]
    })
    colormap = {"a": "red", "b": "blue"}
    facet_row_labels = "facet_row"
    facet_col_labels = "facet_col"
    fig, annotations = _facet_grid_color_categorical(
        df=df,
        x="x",
        y="y",
        facet_row="facet_row",
        facet_col="facet_col",
        color_name="color",
        colormap=colormap,
        num_of_rows=2,
        num_of_cols=2,
        facet_row_labels=facet_row_labels,
        facet_col_labels=facet_col_labels,
        trace_type="scatter",
        flipped_rows=True,
        flipped_cols=True,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={},
    )
    texts = [ann["text"] for ann in annotations]

def test_empty_dataframe():
    # Edge: empty DataFrame
    df = pd.DataFrame({"x": [], "y": [], "color": []})
    colormap = {"a": "red", "b": "blue"}
    fig, annotations = _facet_grid_color_categorical(
        df=df,
        x="x",
        y="y",
        facet_row=None,
        facet_col=None,
        color_name="color",
        colormap=colormap,
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=True,
        flipped_cols=True,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={},
    )

def test_missing_color_group():
    # Edge: colormap missing a color group present in data
    df = pd.DataFrame({
        "x": [1, 2],
        "y": [10, 20],
        "color": ["a", "b"]
    })
    colormap = {"a": "red"}  # missing "b"
    with pytest.raises(KeyError):
        _facet_grid_color_categorical(
            df=df,
            x="x",
            y="y",
            facet_row=None,
            facet_col=None,
            color_name="color",
            colormap=colormap,
            num_of_rows=1,
            num_of_cols=1,
            facet_row_labels=None,
            facet_col_labels=None,
            trace_type="scatter",
            flipped_rows=True,
            flipped_cols=True,
            show_boxes=False,
            SUBPLOT_SPACING=0.05,
            marker_color=None,
            kwargs_trace={},
            kwargs_marker={},
        )

def test_nan_in_facet():
    # Edge: NaN in facet_row or facet_col
    df = pd.DataFrame({
        "x": [1, 2, 3],
        "y": [10, 20, 30],
        "color": ["a", "b", "a"],
        "facet_row": ["r1", None, "r2"],
        "facet_col": ["c1", "c2", None]
    })
    colormap = {"a": "red", "b": "blue"}
    # Should not raise, should handle NaN as a group
    fig, annotations = _facet_grid_color_categorical(
        df=df,
        x="x",
        y="y",
        facet_row="facet_row",
        facet_col="facet_col",
        color_name="color",
        colormap=colormap,
        num_of_rows=3,
        num_of_cols=3,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=True,
        flipped_cols=True,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={},
    )

def test_single_value_facets():
    # Edge: only one value in facet_row and facet_col
    df = pd.DataFrame({
        "x": [1, 2],
        "y": [10, 20],
        "color": ["a", "b"],
        "facet_row": ["r1", "r1"],
        "facet_col": ["c1", "c1"]
    })
    colormap = {"a": "red", "b": "blue"}
    fig, annotations = _facet_grid_color_categorical(
        df=df,
        x="x",
        y="y",
        facet_row="facet_row",
        facet_col="facet_col",
        color_name="color",
        colormap=colormap,
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=True,
        flipped_cols=True,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={},
    )

def test_no_x_or_y():
    # Edge: x or y is None (should not crash, but trace will lack x/y)
    df = pd.DataFrame({
        "x": [1, 2],
        "y": [10, 20],
        "color": ["a", "b"]
    })
    colormap = {"a": "red", "b": "blue"}
    fig, annotations = _facet_grid_color_categorical(
        df=df,
        x=None,
        y="y",
        facet_row=None,
        facet_col=None,
        color_name="color",
        colormap=colormap,
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=True,
        flipped_cols=True,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={},
    )
    for trace in fig.data:
        pass

def test_marker_kwargs():
    # Test that marker kwargs are passed through
    df = pd.DataFrame({
        "x": [1, 2],
        "y": [10, 20],
        "color": ["a", "b"]
    })
    colormap = {"a": "red", "b": "blue"}
    fig, _ = _facet_grid_color_categorical(
        df=df,
        x="x",
        y="y",
        facet_row=None,
        facet_col=None,
        color_name="color",
        colormap=colormap,
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=True,
        flipped_cols=True,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={"size": 12, "symbol": "circle"},
    )
    for trace in fig.data:
        pass

# Large Scale Test Cases

def test_large_number_of_colors():
    # Large scale: 100 colors, 200 rows
    n_colors = 100
    df = pd.DataFrame({
        "x": list(range(200)),
        "y": list(range(200)),
        "color": [f"c{i%100}" for i in range(200)]
    })
    colormap = {f"c{i}": f"rgb({i},{i},{i})" for i in range(100)}
    fig, annotations = _facet_grid_color_categorical(
        df=df,
        x="x",
        y="y",
        facet_row=None,
        facet_col=None,
        color_name="color",
        colormap=colormap,
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=True,
        flipped_cols=True,
        show_boxes=False,
        SUBPLOT_SPACING=0.01,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={},
    )
    # Each trace should have a unique name and color
    names = set(trace.name for trace in fig.data)
    colors = set(trace.marker.color for trace in fig.data)

def test_large_number_of_facets_and_colors():
    # Large scale: 10 facet rows, 10 facet cols, 5 colors, 500 rows
    n_rows, n_cols, n_colors = 10, 10, 5
    df = pd.DataFrame({
        "x": list(range(500)),
        "y": list(range(500)),
        "color": [f"c{i%n_colors}" for i in range(500)],
        "facet_row": [f"r{i%n_rows}" for i in range(500)],
        "facet_col": [f"col{i%n_cols}" for i in range(500)],
    })
    colormap = {f"c{i}": f"rgb({i*10},{i*10},{i*10})" for i in range(n_colors)}
    fig, annotations = _facet_grid_color_categorical(
        df=df,
        x="x",
        y="y",
        facet_row="facet_row",
        facet_col="facet_col",
        color_name="color",
        colormap=colormap,
        num_of_rows=n_rows,
        num_of_cols=n_cols,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=True,
        flipped_cols=True,
        show_boxes=False,
        SUBPLOT_SPACING=0.005,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={},
    )

def test_performance_large_df():
    # Large scale: 900 rows, 3 facet rows, 3 facet cols, 3 colors
    df = generate_large_df(900, num_colors=3, num_facet_rows=3, num_facet_cols=3)
    colormap = {f"cat{i}": f"rgb({i*40},{i*40},{i*40})" for i in range(3)}
    fig, annotations = _facet_grid_color_categorical(
        df=df,
        x="x",
        y="y",
        facet_row="facet_row",
        facet_col="facet_col",
        color_name="color",
        colormap=colormap,
        num_of_rows=3,
        num_of_cols=3,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.01,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={},
    )

def test_large_marker_kwargs():
    # Large scale: test with many marker kwargs
    df = generate_large_df(100, num_colors=4, num_facet_rows=2, num_facet_cols=2)
    colormap = {f"cat{i}": f"rgb({i*50},{i*50},{i*50})" for i in range(4)}
    marker_kwargs = {"size": 8, "opacity": 0.7, "symbol": "square", "line_width": 2}
    fig, _ = _facet_grid_color_categorical(
        df=df,
        x="x",
        y="y",
        facet_row="facet_row",
        facet_col="facet_col",
        color_name="color",
        colormap=colormap,
        num_of_rows=2,
        num_of_cols=2,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.02,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker=marker_kwargs,
    )
    for trace in fig.data:
        pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import pandas as pd
# imports
import pytest  # used for our unit tests
from plotly.figure_factory._facet_grid import _facet_grid_color_categorical

# function to test and all dependencies are defined above


# -------------------------------
# BASIC TEST CASES
# -------------------------------

def test_basic_single_facet_no_row_no_col():
    # Single facet, no row or col, two color groups
    df = pd.DataFrame({
        "x": [1, 2, 3, 4],
        "y": [10, 20, 30, 40],
        "color": ["A", "B", "A", "B"]
    })
    colormap = {"A": "red", "B": "blue"}
    fig, annotations = _facet_grid_color_categorical(
        df=df,
        x="x",
        y="y",
        facet_row=None,
        facet_col=None,
        color_name="color",
        colormap=colormap,
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )
    names = set(trace.name for trace in fig.data)
    for trace in fig.data:
        pass

def test_basic_facet_row():
    # Facet by row, two rows, two color groups
    df = pd.DataFrame({
        "x": [1, 2, 3, 4],
        "y": [10, 20, 30, 40],
        "row": ["R1", "R1", "R2", "R2"],
        "color": ["A", "B", "A", "B"]
    })
    colormap = {"A": "red", "B": "blue"}
    fig, annotations = _facet_grid_color_categorical(
        df=df,
        x="x",
        y="y",
        facet_row="row",
        facet_col=None,
        color_name="color",
        colormap=colormap,
        num_of_rows=2,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )
    # Each trace should have correct row/color
    seen = set()
    for i, trace in enumerate(fig.data):
        # traces are appended in row order, color order
        row = "R1" if i < 2 else "R2"
        color = trace.name
        group = df[(df["row"] == row) & (df["color"] == color)]
        seen.add((row, color))

def test_basic_facet_col():
    # Facet by col, two cols, two color groups
    df = pd.DataFrame({
        "x": [1, 2, 3, 4],
        "y": [10, 20, 30, 40],
        "col": ["C1", "C2", "C1", "C2"],
        "color": ["A", "A", "B", "B"]
    })
    colormap = {"A": "red", "B": "blue"}
    fig, annotations = _facet_grid_color_categorical(
        df=df,
        x="x",
        y="y",
        facet_row=None,
        facet_col="col",
        color_name="color",
        colormap=colormap,
        num_of_rows=1,
        num_of_cols=2,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )
    seen = set()
    for i, trace in enumerate(fig.data):
        # traces are appended in col order, color order
        col = "C1" if i < 2 else "C2"
        color = trace.name
        group = df[(df["col"] == col) & (df["color"] == color)]
        seen.add((col, color))

def test_basic_facet_row_col():
    # Facet by row and col, 2x2, two color groups
    df = pd.DataFrame({
        "x": [1, 2, 3, 4],
        "y": [10, 20, 30, 40],
        "row": ["R1", "R1", "R2", "R2"],
        "col": ["C1", "C2", "C1", "C2"],
        "color": ["A", "B", "A", "B"]
    })
    colormap = {"A": "red", "B": "blue"}
    fig, annotations = _facet_grid_color_categorical(
        df=df,
        x="x",
        y="y",
        facet_row="row",
        facet_col="col",
        color_name="color",
        colormap=colormap,
        num_of_rows=2,
        num_of_cols=2,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )
    # Each trace should have correct row, col, color
    seen = set()
    for i, trace in enumerate(fig.data):
        # traces are appended row-major, then col, then color
        row = "R1" if i < 4 else "R2"
        col = "C1" if (i % 4) < 2 else "C2"
        color = trace.name
        group = df[(df["row"] == row) & (df["col"] == col) & (df["color"] == color)]
        # If group is empty, trace.x/y are empty
        if not group.empty:
            pass
        seen.add((row, col, color))
    # All possible combinations present
    expected = {("R1", "C1", "A"), ("R1", "C1", "B"), ("R1", "C2", "A"), ("R1", "C2", "B"),
                ("R2", "C1", "A"), ("R2", "C1", "B"), ("R2", "C2", "A"), ("R2", "C2", "B")}

def test_basic_trace_type_and_marker_kwargs():
    # Check marker kwargs and trace type are respected
    df = pd.DataFrame({
        "x": [1, 2],
        "y": [3, 4],
        "color": ["A", "B"]
    })
    colormap = {"A": "red", "B": "blue"}
    fig, _ = _facet_grid_color_categorical(
        df=df,
        x="x",
        y="y",
        facet_row=None,
        facet_col=None,
        color_name="color",
        colormap=colormap,
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scattergl",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={"customdata": [1,2]},
        kwargs_marker={"size": 10}
    )
    # Each trace should have marker.size = 10, type scattergl, and customdata
    for trace in fig.data:
        pass

# -------------------------------
# EDGE TEST CASES
# -------------------------------

def test_edge_empty_dataframe():
    # Empty dataframe should produce no traces and no annotations
    df = pd.DataFrame(columns=["x", "y", "color"])
    colormap = {}
    fig, annotations = _facet_grid_color_categorical(
        df=df,
        x="x",
        y="y",
        facet_row=None,
        facet_col=None,
        color_name="color",
        colormap=colormap,
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )

def test_edge_single_row_single_col_single_color():
    # Single row, col, color
    df = pd.DataFrame({"x":[1], "y":[2], "row":["A"], "col":["B"], "color":["C"]})
    colormap = {"C":"red"}
    fig, annotations = _facet_grid_color_categorical(
        df=df,
        x="x",
        y="y",
        facet_row="row",
        facet_col="col",
        color_name="color",
        colormap=colormap,
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )
    trace = fig.data[0]

def test_edge_missing_facet_combinations():
    # Some facet combinations missing
    df = pd.DataFrame({
        "x": [1, 2, 3],
        "y": [10, 20, 30],
        "row": ["R1", "R2", "R2"],
        "col": ["C1", "C1", "C2"],
        "color": ["A", "A", "B"]
    })
    colormap = {"A": "red", "B": "blue"}
    fig, annotations = _facet_grid_color_categorical(
        df=df,
        x="x",
        y="y",
        facet_row="row",
        facet_col="col",
        color_name="color",
        colormap=colormap,
        num_of_rows=2,
        num_of_cols=2,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )
    # Check that the traces for missing combinations have empty x/y
    # R1,C2 is missing, so those traces should have x==[], y==[]
    idx = 2  # R1,C2,A; R1,C2,B
    for i in [2,3]:
        trace = fig.data[i]

def test_edge_nonstring_labels_dict():
    # facet_row_labels and facet_col_labels as dicts
    df = pd.DataFrame({
        "x": [1, 2],
        "y": [3, 4],
        "row": ["foo", "bar"],
        "col": ["baz", "qux"],
        "color": ["A", "B"]
    })
    colormap = {"A": "red", "B": "blue"}
    facet_row_labels = {"foo": "FooRow", "bar": "BarRow"}
    facet_col_labels = {"baz": "BazCol", "qux": "QuxCol"}
    fig, annotations = _facet_grid_color_categorical(
        df=df,
        x="x",
        y="y",
        facet_row="row",
        facet_col="col",
        color_name="color",
        colormap=colormap,
        num_of_rows=2,
        num_of_cols=2,
        facet_row_labels=facet_row_labels,
        facet_col_labels=facet_col_labels,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )

def test_edge_labels_as_string():
    # facet_row_labels and facet_col_labels as string
    df = pd.DataFrame({
        "x": [1, 2],
        "y": [3, 4],
        "row": ["foo", "bar"],
        "col": ["baz", "qux"],
        "color": ["A", "B"]
    })
    colormap = {"A": "red", "B": "blue"}
    fig, annotations = _facet_grid_color_categorical(
        df=df,
        x="x",
        y="y",
        facet_row="row",
        facet_col="col",
        color_name="color",
        colormap=colormap,
        num_of_rows=2,
        num_of_cols=2,
        facet_row_labels="ROW",
        facet_col_labels="COL",
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )

def test_edge_flipped_rows_cols():
    # Test flipped_rows and flipped_cols
    df = pd.DataFrame({
        "x": [1, 2, 3, 4],
        "y": [10, 20, 30, 40],
        "row": ["R1", "R1", "R2", "R2"],
        "col": ["C1", "C2", "C1", "C2"],
        "color": ["A", "B", "A", "B"]
    })
    colormap = {"A": "red", "B": "blue"}
    fig1, ann1 = _facet_grid_color_categorical(
        df, "x", "y", "row", "col", "color", colormap, 2, 2,
        None, None, "scatter", False, False, False, 0.05, None, {}, {}
    )
    fig2, ann2 = _facet_grid_color_categorical(
        df, "x", "y", "row", "col", "color", colormap, 2, 2,
        None, None, "scatter", True, True, False, 0.05, None, {}, {}
    )

def test_edge_no_x_no_y():
    # No x or y (should not error, but traces have no x/y)
    df = pd.DataFrame({
        "foo": [1, 2],
        "bar": [3, 4],
        "color": ["A", "B"]
    })
    colormap = {"A": "red", "B": "blue"}
    fig, _ = _facet_grid_color_categorical(
        df=df,
        x=None,
        y=None,
        facet_row=None,
        facet_col=None,
        color_name="color",
        colormap=colormap,
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )
    for trace in fig.data:
        pass

# -------------------------------
# LARGE SCALE TEST CASES
# -------------------------------

def test_large_many_colors():
    # 50 color groups, single facet
    n = 50
    df = pd.DataFrame({
        "x": list(range(n)),
        "y": list(range(n)),
        "color": [f"C{i}" for i in range(n)]
    })
    colormap = {f"C{i}": f"rgb({i},{i},{i})" for i in range(n)}
    fig, annotations = _facet_grid_color_categorical(
        df=df,
        x="x",
        y="y",
        facet_row=None,
        facet_col=None,
        color_name="color",
        colormap=colormap,
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.01,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )
    for i, trace in enumerate(fig.data):
        pass

def test_large_many_facets():
    # 10x10 facets, 2 colors per facet
    n = 10
    df = pd.DataFrame({
        "x": list(range(n*2))*n,
        "y": list(range(n*2, n*2*2))*n,
        "row": [f"R{i}" for i in range(n) for _ in range(n*2)],
        "col": [f"C{j}" for _ in range(n) for j in range(n) for _ in range(2)],
        "color": ["A", "B"] * (n*n)
    })
    colormap = {"A": "red", "B": "blue"}
    fig, annotations = _facet_grid_color_categorical(
        df=df,
        x="x",
        y="y",
        facet_row="row",
        facet_col="col",
        color_name="color",
        colormap=colormap,
        num_of_rows=n,
        num_of_cols=n,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.001,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )
    # There should be n row and n col annotations
    row_ann = [ann for ann in annotations if "R" in ann["text"]]
    col_ann = [ann for ann in annotations if "C" in ann["text"]]

def test_large_sparse_facets():
    # 20x20 facets, but only fill diagonal
    n = 20
    data = []
    for i in range(n):
        data.append({"x": i, "y": i*2, "row": f"R{i}", "col": f"C{i}", "color": "A"})
    df = pd.DataFrame(data)
    colormap = {"A": "red"}
    fig, annotations = _facet_grid_color_categorical(
        df=df,
        x="x",
        y="y",
        facet_row="row",
        facet_col="col",
        color_name="color",
        colormap=colormap,
        num_of_rows=n,
        num_of_cols=n,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.0005,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )
    count_nonempty = sum(len(trace.x) > 0 for trace in fig.data)
    # There should be n row and n col annotations
    row_ann = [ann for ann in annotations if "R" in ann["text"]]
    col_ann = [ann for ann in annotations if "C" in ann["text"]]

def test_large_unique_color_per_facet():
    # Each facet has different color values
    n = 5
    data = []
    for i in range(n):
        for j in range(n):
            for k in range(2):
                data.append({"x": i*10 + j, "y": j*10 + i, "row": f"R{i}", "col": f"C{j}", "color": f"Z{k}_{i}_{j}"})
    df = pd.DataFrame(data)
    colormap = {f"Z{k}_{i}_{j}": f"rgb({i*10},{j*10},{k*100})" for i in range(n) for j in range(n) for k in range(2)}
    fig, annotations = _facet_grid_color_categorical(
        df=df,
        x="x",
        y="y",
        facet_row="row",
        facet_col="col",
        color_name="color",
        colormap=colormap,
        num_of_rows=n,
        num_of_cols=n,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.002,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )
    # Each trace's color matches the colormap
    for trace in fig.data:
        pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-_facet_grid_color_categorical-mb2evvn5 and push.

Codeflash

Here’s a comprehensive rewrite focused on **runtime and memory optimization** based on your profiler, especially for `_facet_grid_color_categorical`. The main performance issues are in DataFrame slicing/filtering, repeated function calls, unnecessary list conversions, dictionary creation, and marker dict construction. See detailed explanations inline.

**Major optimizations applied:**
- Heavy DataFrame filters are replaced with calls that use precomputed masks/arrays, reducing repeated computation.
- Reduce the number of `.unique()` and `.groupby()` calls by caching results.
- Avoid unneeded lists (like `list(df.groupby(...))`).
- Inline and reduce dict unpacking where not dynamically needed.
- Reduce object construction and string formatting overhead.
- Vectorize, or use lookups instead of repeated logic when possible.
- Minimize calls to `_make_trace_for_scatter`/`_return_label`/`_annotation_dict` by restructuring loops.

Below is the optimized code; docstrings and core API are untouched.



### **Key optimizations and effects:**
- **No unnecessary listification of .groupby()**. Instead, use `for key, g in df.groupby(.., sort=False)`.
- **Cache `.unique()` calls** (use only once per branch, not in every loop).
- **DataFrame slicing**: Use boolean numpy arrays (`values == val`) and assign/lookup columns only for those rows; avoids repeated mask computation and duplicate DataFrame slicing.
- **Trace creation**: Avoid unnecessary dict unpacking/concat unless it is necessary for dynamic keys.
- **Annotation/label**: No logic change, but the fast-path in `_return_label` avoids unnecessary `.format`, just builds string with f-string if required.
- **Empty facet handling**: Only create empty DataFrames once, avoids repeated creation and repeated [None, None, None] allocations.
- **Branching**: Only perform operations required for each branch (e.g. only annotate rows/cols on the first iteration).
- **In-place marker dict construction.**

You should observe significantly less memory overhead, much lower DataFrame slicing time, and a substantial drop in trace-building and annotation time (see profiler lines with high `Per Hit` numbers previously).

**NOTE:** If you use this function on *very large DataFrames*, you may get dramatic speedups due to reduced Python-loop time and fewer pandas DataFrame object constructions and slice copies. For smaller frames, the gains are still notable but less dramatic.

Let me know if you want help optimizing special branches or want further *Numba* or *Cython* vectorization applied.
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label May 24, 2025
@codeflash-ai codeflash-ai bot requested a review from misrasaurabh1 May 24, 2025 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⚡️ codeflash Optimization PR opened by Codeflash AI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants