Thanks to visit codestin.com
Credit goes to github.com

Skip to content

⚡️ Speed up function _facet_grid_color_numerical by 14% #117

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

codeflash-ai[bot]
Copy link

@codeflash-ai codeflash-ai bot commented May 24, 2025

📄 14% (0.14x) speedup for _facet_grid_color_numerical in plotly/figure_factory/_facet_grid.py

⏱️ Runtime : 1.29 seconds 1.13 seconds (best of 8 runs)

📝 Explanation and details

Here’s an optimized version of your provided code, focused on.

  • Avoiding repeated DataFrame lookups and groupbys.
  • More efficient label lookups.
  • Reducing dict constructions per call (especially for marker, trace dicts, and annotation dicts).
  • Avoiding unnecessary .tolist() conversions and DataFrame creation.
  • Inlining fast paths and minimizing branching and overheads in hot loops.
  • Preallocating where possible.

Details.

  • Avoid calculating df[color_name] repeatedly inside loops, cache instead.
  • Use local variables for attribute accesses.
  • Use get for dict-style label lookup to avoid possible KeyErrors.
  • make_subplots isn’t optimized (it forwards to plotly), kept as-is for compatibility.
  • For groupby heavy code, use df.groupby(fields, sort=False) to avoid unnecessary sorting, and cache df[color_name].values.
  • Use .get() for dict label lookup to safely fallback.
  • Condense dict constructions outside hotspot functions (like marker_dict), reusing objects.
  • Inline variable paths and branch reductions in _annotation_dict.
  • When creating empty DataFrames, cache them rather than repeating.

Below is the rewritten, faster version.


Summary of key changes.

  • Label lookup is O(1) and safely falls back if missing.
  • Only one empty DataFrame for missing facets is ever constructed.
  • df[].values used for all colors/data for efficient numpy array access.
  • Markers and trace dicts are built once per loop using cached color arrays.
  • Removed .tolist() based empty checks; replaced with isnull().all(axis=None) on cached empty dataframe.
  • Minimized repeated dict constructions by reusing colorbar_dict, marker dict template, etc.
  • Branching in _annotation_dict is flattened for speed.

If your DataFrames are large, these changes will effectively reduce memory allocation and improve runtime especially for the hot-path (_facet_grid_color_numerical).

Let me know if you would like further micro-optimization or vectorization based on the nature of your DataFrames or inputs!

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 34 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests Details
import pandas as pd
# imports
import pytest  # used for our unit tests
from plotly.figure_factory._facet_grid import _facet_grid_color_numerical

# function to test and helpers are defined above


# Basic Test Cases
# ----------------

def test_no_faceting_basic_scatter():
    """Test with no facet_row or facet_col, basic scatter, color numerical."""
    df = pd.DataFrame({
        "x": [1, 2, 3],
        "y": [4, 5, 6],
        "color": [10, 20, 30]
    })
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row=None,
        facet_col=None,
        color_name="color",
        colormap="Viridis",
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )
    # The trace should have correct x, y, and marker color
    trace = fig.data[0]

def test_facet_row_scatter():
    """Test with faceting on rows only."""
    df = pd.DataFrame({
        "x": [1, 2, 3, 4],
        "y": [5, 6, 7, 8],
        "color": [10, 20, 30, 40],
        "row": ["A", "A", "B", "B"]
    })
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row="row",
        facet_col=None,
        color_name="color",
        colormap="Viridis",
        num_of_rows=2,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.1,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )
    # Each trace should have correct x values
    xs = [list(trace.x) for trace in fig.data]
    # Annotation text should include "A" and "B"
    texts = [ann["text"] for ann in annotations]

def test_facet_col_scatter():
    """Test with faceting on columns only."""
    df = pd.DataFrame({
        "x": [1, 2, 3, 4],
        "y": [5, 6, 7, 8],
        "color": [10, 20, 30, 40],
        "col": ["C", "D", "C", "D"]
    })
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row=None,
        facet_col="col",
        color_name="color",
        colormap="Viridis",
        num_of_rows=1,
        num_of_cols=2,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.1,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )
    xs = [list(trace.x) for trace in fig.data]
    texts = [ann["text"] for ann in annotations]

def test_facet_row_col_scatter():
    """Test with faceting on both rows and columns."""
    df = pd.DataFrame({
        "x": [1, 2, 3, 4],
        "y": [10, 20, 30, 40],
        "color": [100, 200, 300, 400],
        "row": ["R1", "R1", "R2", "R2"],
        "col": ["C1", "C2", "C1", "C2"]
    })
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row="row",
        facet_col="col",
        color_name="color",
        colormap="Viridis",
        num_of_rows=2,
        num_of_cols=2,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )
    texts = [ann["text"] for ann in annotations]

def test_trace_type_scattergl():
    """Test with trace_type scattergl instead of scatter."""
    df = pd.DataFrame({
        "x": [1, 2],
        "y": [3, 4],
        "color": [5, 6],
    })
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row=None,
        facet_col=None,
        color_name="color",
        colormap="Viridis",
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scattergl",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )

def test_custom_facet_labels():
    """Test with custom facet labels as dict."""
    df = pd.DataFrame({
        "x": [1, 2],
        "y": [3, 4],
        "color": [5, 6],
        "row": ["foo", "bar"]
    })
    facet_row_labels = {"foo": "Alpha", "bar": "Beta"}
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row="row",
        facet_col=None,
        color_name="color",
        colormap="Viridis",
        num_of_rows=2,
        num_of_cols=1,
        facet_row_labels=facet_row_labels,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )
    # There should be two annotations, with custom label text
    texts = [ann["text"] for ann in annotations]

# Edge Test Cases
# ---------------

def test_empty_dataframe():
    """Test with empty dataframe."""
    df = pd.DataFrame(columns=["x", "y", "color"])
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row=None,
        facet_col=None,
        color_name="color",
        colormap="Viridis",
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )
    trace = fig.data[0]

def test_missing_facet_group():
    """Test with missing facet group (row/col combination does not exist)."""
    df = pd.DataFrame({
        "x": [1, 2],
        "y": [3, 4],
        "color": [5, 6],
        "row": ["A", "B"],
        "col": ["C", "C"]
    })
    # Only (A,C) and (B,C) exist, but grid is 2x2
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row="row",
        facet_col="col",
        color_name="color",
        colormap="Viridis",
        num_of_rows=2,
        num_of_cols=2,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )
    # Two traces should have data, two should be empty
    nonempty = [t for t in fig.data if len(t.x) > 0]
    empty = [t for t in fig.data if len(t.x) == 0]

def test_nan_in_color_column():
    """Test with NaN values in the color column."""
    df = pd.DataFrame({
        "x": [1, 2, 3],
        "y": [4, 5, 6],
        "color": [10, None, 30]
    })
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row=None,
        facet_col=None,
        color_name="color",
        colormap="Viridis",
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )
    # The color array should contain a None/NaN value
    colors = list(fig.data[0].marker.color)

def test_no_x_or_y_column():
    """Test with x=None or y=None."""
    df = pd.DataFrame({
        "x": [1, 2, 3],
        "y": [4, 5, 6],
        "color": [10, 20, 30]
    })
    # No x
    fig, _ = _facet_grid_color_numerical(
        df=df,
        x=None,
        y="y",
        facet_row=None,
        facet_col=None,
        color_name="color",
        colormap="Viridis",
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )
    # No y
    fig, _ = _facet_grid_color_numerical(
        df=df,
        x="x",
        y=None,
        facet_row=None,
        facet_col=None,
        color_name="color",
        colormap="Viridis",
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )

def test_flipped_rows_and_cols():
    """Test with flipped_rows and flipped_cols True."""
    df = pd.DataFrame({
        "x": [1, 2],
        "y": [3, 4],
        "color": [5, 6],
        "row": ["foo", "bar"],
        "col": ["baz", "qux"]
    })
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row="row",
        facet_col="col",
        color_name="color",
        colormap="Viridis",
        num_of_rows=2,
        num_of_cols=2,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=True,
        flipped_cols=True,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )

def test_custom_kwargs_marker_and_trace():
    """Test with custom kwargs_marker and kwargs_trace."""
    df = pd.DataFrame({
        "x": [1, 2],
        "y": [3, 4],
        "color": [5, 6]
    })
    fig, _ = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row=None,
        facet_col=None,
        color_name="color",
        colormap="Viridis",
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={"name": "TestTrace"},
        kwargs_marker={"size": 10}
    )

# Large Scale Test Cases
# ---------------------

def test_large_number_of_rows():
    """Test with a large number of rows (facets)."""
    n = 50
    df = pd.DataFrame({
        "x": list(range(n)),
        "y": list(range(n)),
        "color": list(range(n)),
        "row": [f"R{i}" for i in range(n)]
    })
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row="row",
        facet_col=None,
        color_name="color",
        colormap="Viridis",
        num_of_rows=n,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.01,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )

def test_large_number_of_cols():
    """Test with a large number of columns (facets)."""
    n = 40
    df = pd.DataFrame({
        "x": list(range(n)),
        "y": list(range(n)),
        "color": list(range(n)),
        "col": [f"C{i}" for i in range(n)]
    })
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row=None,
        facet_col="col",
        color_name="color",
        colormap="Viridis",
        num_of_rows=1,
        num_of_cols=n,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.01,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )

def test_large_grid():
    """Test with a large grid of row and col facets."""
    n = 10
    df = pd.DataFrame({
        "x": list(range(n*n)),
        "y": list(range(n*n)),
        "color": list(range(n*n)),
        "row": [f"R{i}" for i in range(n) for _ in range(n)],
        "col": [f"C{j}" for _ in range(n) for j in range(n)]
    })
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row="row",
        facet_col="col",
        color_name="color",
        colormap="Viridis",
        num_of_rows=n,
        num_of_cols=n,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.005,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )

def test_large_data_per_facet():
    """Test with large data per facet (1000 points, 4 facets)."""
    n = 1000
    df = pd.DataFrame({
        "x": list(range(n))*4,
        "y": list(range(n))*4,
        "color": list(range(n))*4,
        "row": ["A"]*n + ["B"]*n + ["A"]*n + ["B"]*n,
        "col": ["C"]*n + ["C"]*n + ["D"]*n + ["D"]*n
    })
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row="row",
        facet_col="col",
        color_name="color",
        colormap="Viridis",
        num_of_rows=2,
        num_of_cols=2,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.02,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )
    for trace in fig.data:
        pass

def test_large_unique_color_values():
    """Test with many unique color values."""
    n = 500
    df = pd.DataFrame({
        "x": list(range(n)),
        "y": list(range(n)),
        "color": list(range(n)),
    })
    fig, _ = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row=None,
        facet_col=None,
        color_name="color",
        colormap="Viridis",
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.01,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={}
    )
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import random

# We'll use pandas for DataFrame construction in tests
import pandas as pd
import plotly.graph_objs as go
# imports
import pytest
from plotly.figure_factory._facet_grid import _facet_grid_color_numerical

# function to test and its dependencies are assumed to be defined above

# ------------------------
# Basic Test Cases
# ------------------------

def test_single_trace_no_faceting():
    # Test with no faceting, just a colored scatter
    df = pd.DataFrame({
        "x": [1, 2, 3],
        "y": [4, 5, 6],
        "color": [10, 20, 30]
    })
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row=None,
        facet_col=None,
        color_name="color",
        colormap="Viridis",
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={},
    )
    # The trace should have correct x, y, and marker color
    trace = fig.data[0]

def test_facet_by_row():
    # Test faceting by row
    df = pd.DataFrame({
        "x": [1, 2, 3, 4],
        "y": [10, 20, 30, 40],
        "color": [100, 200, 300, 400],
        "row": ["A", "A", "B", "B"]
    })
    facet_row_labels = {"A": "Alpha", "B": "Beta"}
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row="row",
        facet_col=None,
        color_name="color",
        colormap="Viridis",
        num_of_rows=2,
        num_of_cols=1,
        facet_row_labels=facet_row_labels,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=True,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={},
    )
    # Check that each trace's x, y correspond to the correct group
    xs = [list(trace.x) for trace in fig.data]
    ys = [list(trace.y) for trace in fig.data]
    texts = [a["text"] for a in annotations]

def test_facet_by_col():
    # Test faceting by column
    df = pd.DataFrame({
        "x": [1, 2, 3, 4],
        "y": [10, 20, 30, 40],
        "color": [100, 200, 300, 400],
        "col": ["C1", "C2", "C1", "C2"]
    })
    facet_col_labels = {"C1": "Col1", "C2": "Col2"}
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row=None,
        facet_col="col",
        color_name="color",
        colormap="Viridis",
        num_of_rows=1,
        num_of_cols=2,
        facet_row_labels=None,
        facet_col_labels=facet_col_labels,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=True,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={},
    )
    xs = [list(trace.x) for trace in fig.data]
    texts = [a["text"] for a in annotations]

def test_facet_by_row_and_col():
    # Test faceting by both row and column
    df = pd.DataFrame({
        "x": [1, 2, 3, 4],
        "y": [10, 20, 30, 40],
        "color": [100, 200, 300, 400],
        "row": ["R1", "R1", "R2", "R2"],
        "col": ["C1", "C2", "C1", "C2"]
    })
    facet_row_labels = {"R1": "Row1", "R2": "Row2"}
    facet_col_labels = {"C1": "Col1", "C2": "Col2"}
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row="row",
        facet_col="col",
        color_name="color",
        colormap="Viridis",
        num_of_rows=2,
        num_of_cols=2,
        facet_row_labels=facet_row_labels,
        facet_col_labels=facet_col_labels,
        trace_type="scatter",
        flipped_rows=True,
        flipped_cols=True,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={},
    )
    # Each trace should have one point
    for trace in fig.data:
        pass
    texts = [a["text"] for a in annotations]

def test_marker_and_trace_kwargs_are_applied():
    # Test that marker and trace kwargs are passed through
    df = pd.DataFrame({
        "x": [1, 2],
        "y": [3, 4],
        "color": [5, 6]
    })
    fig, _ = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row=None,
        facet_col=None,
        color_name="color",
        colormap="Viridis",
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={"name": "mytrace"},
        kwargs_marker={"size": 10, "opacity": 0.5},
    )
    trace = fig.data[0]

# ------------------------
# Edge Test Cases
# ------------------------

def test_empty_dataframe():
    # Test with an empty DataFrame
    df = pd.DataFrame(columns=["x", "y", "color"])
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row=None,
        facet_col=None,
        color_name="color",
        colormap="Viridis",
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={},
    )
    trace = fig.data[0]

def test_missing_facet_value():
    # Test with missing facet combinations
    df = pd.DataFrame({
        "x": [1, 2, 3],
        "y": [10, 20, 30],
        "color": [100, 200, 300],
        "row": ["A", "A", "B"],
        "col": ["C1", "C2", "C1"]
    })
    facet_row_labels = {"A": "Alpha", "B": "Beta"}
    facet_col_labels = {"C1": "Col1", "C2": "Col2"}
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row="row",
        facet_col="col",
        color_name="color",
        colormap="Viridis",
        num_of_rows=2,
        num_of_cols=2,
        facet_row_labels=facet_row_labels,
        facet_col_labels=facet_col_labels,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={},
    )
    # The missing cell should have an empty trace
    # Find the trace with empty x/y
    empty_traces = [t for t in fig.data if len(t.x) == 0 and len(t.y) == 0]

def test_nan_in_color_column():
    # Test with NaN values in color column
    df = pd.DataFrame({
        "x": [1, 2, 3],
        "y": [4, 5, 6],
        "color": [10, float('nan'), 30]
    })
    fig, _ = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row=None,
        facet_col=None,
        color_name="color",
        colormap="Viridis",
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={},
    )
    trace = fig.data[0]

def test_non_numeric_color_column():
    # Test with non-numeric color column (should raise or handle gracefully)
    df = pd.DataFrame({
        "x": [1, 2, 3],
        "y": [4, 5, 6],
        "color": ["red", "green", "blue"]
    })
    # Should not raise, but marker.color will be non-numeric
    fig, _ = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row=None,
        facet_col=None,
        color_name="color",
        colormap="Viridis",
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={},
    )
    trace = fig.data[0]

def test_single_row_single_col_facet():
    # Test with only one unique value in facet_row and facet_col
    df = pd.DataFrame({
        "x": [1, 2],
        "y": [3, 4],
        "color": [5, 6],
        "row": ["A", "A"],
        "col": ["B", "B"]
    })
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row="row",
        facet_col="col",
        color_name="color",
        colormap="Viridis",
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels={"A": "Alpha"},
        facet_col_labels={"B": "Bravo"},
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={},
    )
    texts = [a["text"] for a in annotations]

def test_no_x_or_y_column():
    # Test with None for x or y
    df = pd.DataFrame({
        "a": [1, 2, 3],
        "b": [4, 5, 6],
        "color": [7, 8, 9]
    })
    # No x
    fig, _ = _facet_grid_color_numerical(
        df=df,
        x=None,
        y="b",
        facet_row=None,
        facet_col=None,
        color_name="color",
        colormap="Viridis",
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={},
    )
    trace = fig.data[0]
    # No y
    fig, _ = _facet_grid_color_numerical(
        df=df,
        x="a",
        y=None,
        facet_row=None,
        facet_col=None,
        color_name="color",
        colormap="Viridis",
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={},
    )
    trace = fig.data[0]

# ------------------------
# Large Scale Test Cases
# ------------------------

def test_large_number_of_points():
    # Test with a large number of data points
    N = 900
    df = pd.DataFrame({
        "x": list(range(N)),
        "y": list(range(N)),
        "color": list(range(N))
    })
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row=None,
        facet_col=None,
        color_name="color",
        colormap="Viridis",
        num_of_rows=1,
        num_of_cols=1,
        facet_row_labels=None,
        facet_col_labels=None,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.05,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={},
    )
    trace = fig.data[0]

def test_large_number_of_facets():
    # Test with a large number of facets (rows and cols)
    nrows = 10
    ncols = 10
    N = nrows * ncols
    df = pd.DataFrame({
        "x": list(range(N)),
        "y": list(range(N)),
        "color": list(range(N)),
        "row": [f"R{i}" for i in range(nrows) for _ in range(ncols)],
        "col": [f"C{j}" for _ in range(nrows) for j in range(ncols)],
    })
    facet_row_labels = {f"R{i}": f"Row{i}" for i in range(nrows)}
    facet_col_labels = {f"C{j}": f"Col{j}" for j in range(ncols)}
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row="row",
        facet_col="col",
        color_name="color",
        colormap="Viridis",
        num_of_rows=nrows,
        num_of_cols=ncols,
        facet_row_labels=facet_row_labels,
        facet_col_labels=facet_col_labels,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.01,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={},
    )
    # Check that all annotation texts are present
    texts = [a["text"] for a in annotations]
    for i in range(nrows):
        pass
    for j in range(ncols):
        pass

def test_large_facet_with_missing_cells():
    # Test with large number of facets but with some missing cells
    nrows = 8
    ncols = 8
    N = nrows * ncols
    # Remove some cells, e.g. only fill diagonal
    data = []
    for i in range(nrows):
        for j in range(ncols):
            if i == j:
                data.append({"x": i, "y": j, "color": i+j, "row": f"R{i}", "col": f"C{j}"})
    df = pd.DataFrame(data)
    facet_row_labels = {f"R{i}": f"Row{i}" for i in range(nrows)}
    facet_col_labels = {f"C{j}": f"Col{j}" for j in range(ncols)}
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row="row",
        facet_col="col",
        color_name="color",
        colormap="Viridis",
        num_of_rows=nrows,
        num_of_cols=ncols,
        facet_row_labels=facet_row_labels,
        facet_col_labels=facet_col_labels,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.01,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={},
    )
    nonempty = [t for t in fig.data if len(t.x) > 0]
    empty = [t for t in fig.data if len(t.x) == 0]

def test_performance_large_dataframe(monkeypatch):
    # This test ensures the function does not take too long for large data
    N = 999
    df = pd.DataFrame({
        "x": list(range(N)),
        "y": list(range(N)),
        "color": list(range(N)),
        "row": [f"R{i%10}" for i in range(N)],
        "col": [f"C{j%10}" for j in range(N)],
    })
    facet_row_labels = {f"R{i}": f"Row{i}" for i in range(10)}
    facet_col_labels = {f"C{j}": f"Col{j}" for j in range(10)}

    import time
    t0 = time.time()
    fig, annotations = _facet_grid_color_numerical(
        df=df,
        x="x",
        y="y",
        facet_row="row",
        facet_col="col",
        color_name="color",
        colormap="Viridis",
        num_of_rows=10,
        num_of_cols=10,
        facet_row_labels=facet_row_labels,
        facet_col_labels=facet_col_labels,
        trace_type="scatter",
        flipped_rows=False,
        flipped_cols=False,
        show_boxes=False,
        SUBPLOT_SPACING=0.01,
        marker_color=None,
        kwargs_trace={},
        kwargs_marker={},
    )
    t1 = time.time()
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-_facet_grid_color_numerical-mb2f1y3j and push.

Codeflash

Here’s an optimized version of your provided code, focused on.
- **Avoiding repeated DataFrame lookups and groupbys.**
- **More efficient label lookups.**
- **Reducing dict constructions per call (especially for marker, trace dicts, and annotation dicts).**
- **Avoiding unnecessary .tolist() conversions and DataFrame creation.**
- **Inlining fast paths and minimizing branching and overheads in hot loops.**
- **Preallocating where possible.**

### Details.
- Avoid calculating `df[color_name]` repeatedly inside loops, cache instead.
- Use local variables for attribute accesses.
- Use `get` for dict-style label lookup to avoid possible KeyErrors.
- `make_subplots` isn’t optimized (it forwards to plotly), kept as-is for compatibility.
- For groupby heavy code, use `df.groupby(fields, sort=False)` to avoid unnecessary sorting, and cache `df[color_name].values`.
- Use `.get()` for dict label lookup to safely fallback.
- Condense dict constructions outside hotspot functions (like `marker_dict`), reusing objects.
- Inline variable paths and branch reductions in `_annotation_dict`.
- When creating empty DataFrames, cache them rather than repeating.

Below is the rewritten, faster version.



---

### Summary of key changes.
- **Label lookup is O(1) and safely falls back if missing.**
- **Only one empty DataFrame for missing facets is ever constructed.**
- **df[<col>].values used for all colors/data for efficient numpy array access.**
- **Markers and trace dicts are built once per loop using cached color arrays.**
- **Removed .tolist() based empty checks; replaced with isnull().all(axis=None) on cached empty dataframe.**
- **Minimized repeated dict constructions by reusing colorbar_dict, marker dict template, etc.**
- **Branching in `_annotation_dict` is flattened for speed.**

If your DataFrames are large, these changes will effectively reduce memory allocation and improve runtime especially for the hot-path (_facet_grid_color_numerical).

Let me know if you would like further micro-optimization or vectorization based on the nature of your DataFrames or inputs!
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label May 24, 2025
@codeflash-ai codeflash-ai bot requested a review from misrasaurabh1 May 24, 2025 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⚡️ codeflash Optimization PR opened by Codeflash AI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants