Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Loading pandas dataframe with column type string[pyarrow] (ChunkedArray) into bigframes fails on AttributeError #2049

@tomer-lerer-immunai

Description

@tomer-lerer-immunai

Environment details

OS type and version: macOs Sequoia Version 15.6
"pip" version: uv 0.8.13
Python: 3.10.11
bigframes==2.18.0
google-cloud-bigquery==3.36.0
pandas==2.3.2
pyarrow==15.0.2
sqlglot==27.11.0

Steps to reproduce

  1. Create pandas dataframe with a column backed by ChunkedArray
    This can be done by concating columns of type string[pyarrows]

  2. Load pandas dataframe into bigframes
    e.g. by passing into the constructor or with read_pandas

Code example

import bigframes.pandas as bpd
import pandas as pd

s = pd.Series(['a', 'b'], dtype="string[pyarrow]")

df1 = pd.DataFrame({"col": s})
df2 = pd.DataFrame({"col": s})

df = pd.concat([df1, df2])

bpd.DataFrame(df)

Stack trace

Traceback (most recent call last):
  File "/dir/example.py", line 11, in <module>
    bpd.get_global_session().read_pandas(df).to_pandas()
  File "/dir/.venv/lib/python3.10/site-packages/bigframes/core/log_adapter.py", line 175, in wrapper
    return method(*args, **kwargs)
  File "/dir/.venv/lib/python3.10/site-packages/bigframes/session/__init__.py", line 1006, in read_pandas
    return self._read_pandas(pandas_dataframe, write_engine=write_engine)
  File "/dir/.venv/lib/python3.10/site-packages/bigframes/core/log_adapter.py", line 175, in wrapper
    return method(*args, **kwargs)
  File "/dir/.venv/lib/python3.10/site-packages/bigframes/session/__init__.py", line 1040, in _read_pandas
    return self._read_pandas_inline(pandas_dataframe)
  File "/dir/.venv/lib/python3.10/site-packages/bigframes/core/log_adapter.py", line 175, in wrapper
    return method(*args, **kwargs)
  File "/dir/.venv/lib/python3.10/site-packages/bigframes/session/__init__.py", line 1059, in _read_pandas_inline
    local_block = blocks.Block.from_local(pandas_dataframe, self)
  File "/dir/.venv/lib/python3.10/site-packages/bigframes/core/blocks.py", line 227, in from_local
    managed_data = local_data.ManagedArrowTable.from_pandas(pd_data)
  File "/dir/.venv/lib/python3.10/site-packages/bigframes/core/local_data.py", line 75, in from_pandas
    new_arr, bf_type = _adapt_pandas_series(col)
  File "/dir/.venv/lib/python3.10/site-packages/bigframes/core/local_data.py", line 280, in _adapt_pandas_series
    return _adapt_arrow_array(pa.array(series))
  File "/dir/.venv/lib/python3.10/site-packages/bigframes/core/local_data.py", line 308, in _adapt_arrow_array
    if array.offset != 0:  # Offset arrays don't have all operations implemented
AttributeError: 'pyarrow.lib.ChunkedArray' object has no attribute 'offset'

Workaround

Casting the pandas dataframe before passing to bigframes

df = pd.concat([df1, df2])

df['col'] = df['col'].astype('object')

bpd.DataFrame(df)

Metadata

Metadata

Labels

api: bigqueryIssues related to the googleapis/python-bigquery-dataframes API.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions