-
Notifications
You must be signed in to change notification settings - Fork 19
Open
Description
How do I handle a field with array of struct values like the following? I will send a full example shortly :
df.with_columns(pl.struct('aggregate_ratings.sub_ratings').alias('aggregate_ratings.sub_ratings').map(to_json, return_dtype=pl.Utf8))
id = pa.array([1,2,3])
complicated = pa.array([[{'average_rating': 4.9, 'crawled_date': '2023-06-06'},{'average_rating': 4.7, 'crawled_date': '2023-06-04'}]
,[{'average_rating': 4.8, 'crawled_date': '2023-05-06'},{'average_rating': 4.6, 'crawled_date': '2023-05-04'}]
,[{'average_rating': 4.7, 'crawled_date': '2023-04-06'},{'average_rating': 4.5, 'crawled_date': '2023-04-04'}]])
names = ["id", "complicated"]
complicated = array_to_json(complicated)
df = pa.RecordBatch.from_arrays(
[
pa.array([0, 1, 2]),
pa.array(array_to_json(complicated),
type=pa.list_(pa.struct(pa.field("average_rating", pa.double()),pa.field("crawled_date", pa.large_string()))),
),
],
schema=pa.schema(
[ ("id", pa.int32()),
pa.field(
"complicated",
pa.list_(pa.struct(pa.field("average_rating", pa.double()),pa.field("crawled_date", pa.large_string()))),
),
]
),names=names).to_pandas()
print(df)
I made an attempt with the following encoder but it fails on the copy because the output tye is Jsonb() instead of jsonb[]:
encoder = ArrowToPostgresBinaryEncoder.new_with_encoders(
schema,
{ 'main_key': Int32EncoderBuilder(schema.field('main_key')),
'aggregate_ratings.sub_ratings': LargeStringEncoderBuilder.new_with_output(
schema[schema.get_field_index('aggregate_ratings.sub_ratings')],
Jsonb()
)}
)
Error:
psycopg.errors.QueryCanceled: COPY from stdin failed: error from Python: PanicException - called `Result::unwrap()` on an `Err` value: ColumnTypeMismatch { field: "aggregate_ratings.sub_ratings", expected: "arrow_array::array::byte_array::GenericByteArray<arrow_array::types::GenericStringType<i64>>", actual: LargeList(Field { name: "item", data_type: Struct([Field { name: "average_rating", data_type: Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "crawled_date", data_type: LargeUtf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "metric", data_type: LargeUtf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }) }
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels