Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

gabalafou
Copy link
Contributor

@gabalafou gabalafou commented Jun 4, 2025

Adds to_csv() method to ColumnDataSource (via base class ColumnarDataSource).

Example usage:

const cds = new ColumnDataSource({data: {x: [5, 10], y: [25, 100]}})

cds.to_csv()
// 'x,y\n5,25\n10,100\n'

@gabalafou
Copy link
Contributor Author

@mattpap - gentle ping. Have you had a chance to look into the TypeScript compile issue with this PR?

@gabalafou gabalafou requested a review from mattpap June 25, 2025 10:22
@gabalafou gabalafou force-pushed the gabalafou/cds-csv branch from 13217f0 to 136ce14 Compare July 23, 2025 15:46
@gabalafou
Copy link
Contributor Author

gabalafou commented Jul 23, 2025

Notes from today's call with @mattpap

Todo in this PR:

  • Add licenses for code copied and modified from node-csv (it has an MIT license)
  • Fix eslint issues
  • Check how the CSV looks on the color scatter example

Leaving unsolved (or for a future PR):

  • How to represent 2d arrays (and higher dimensions) in CSV

@gabalafou gabalafou force-pushed the gabalafou/cds-csv branch from a32a01f to 0029a90 Compare July 29, 2025 18:16
@gabalafou
Copy link
Contributor Author

My latest code changes are a bit sloppy. Basically I added a function ndget which gets called when the column data source gets transformed into rows. This is to fix what felt like a broken CSV for the color_scatter example.

The color scatter example has 4000 circles, each with a color that is a function of the (x, y) coordinates of the circle's center point. When this example gets serialized and deserialized in the browser, the underlying column data source has a column called "fill_color" which contains a 12,000 long Uint8Array. Every three items in that array represent the red, green, and blue values for the color. So the problem is that (before my latest code changes), when this was serialized to CSV, the fill_color column in the CSV only gets the first 4,000 values of those 12,000. So I created a function that gets values from a flat-representation of an ndarray taking into account the shape of the array.

I also used the existing pretty printer class to stringify object types rather than use JSON.stringify.

Together this results in the following first two lines of CSV for the color_scatter example:

radius,x,y,fill_color
1.4252641788706195,41.7022004702574,32.664490177209615,"[Uint8Array([133, 95, 150])]"

This looks reasonable to me.

@gabalafou
Copy link
Contributor Author

gabalafou commented Jul 29, 2025

So the code changes are a bit sloppy, but I want to know if this is at least in the right direction.

My sense is that there really should be some way to index the NDArray class so that I do not have to create a code fork, i.e.:

if (is_NDArray(column)) {
  row[c] = ndget(column, r)
} else {
  row[c] = column[r]
}

But I don't have a good idea of how to go about that.

@gabalafou gabalafou changed the title Add CSV and JSON methods to ColumnarDataSource Add CSV method to ColumnarDataSource Jul 31, 2025
@gabalafou
Copy link
Contributor Author

gabalafou commented Aug 20, 2025

I want to add a few more test cases for the to_csv() method - specifically where the underlying test data source object has columns that are NDArrays, but otherwise this is ready for review so I am going to take it out of draft mode.

@gabalafou gabalafou marked this pull request as ready for review August 20, 2025 17:46
Copy link
Contributor Author

@gabalafou gabalafou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finished self review

@gabalafou
Copy link
Contributor Author

I believe @mattpap said that the following failing test is irrelevant to this PR.

✗ Bug
 └┬─ in issue #12718
  └── doesn't render Legend correctly when LegendItem.index is filtered out by CDSView
diff --git a/bokehjs/test/baselines/linux/Bug__in_issue_#12718__doesn't_render_Legend_correctly_when_LegendItem.index_is_filtered_out_by_CDSView.blf b/bokehjs/test/baselines/linux/Bug__in_issue_#12718__doesn't_render_Legend_correctly_when_LegendItem.index_is_filtered_out_by_CDSView.blf
index dbd519a..de67416 100644
--- a/bokehjs/test/baselines/linux/Bug__in_issue_#12718__doesn't_render_Legend_correctly_when_LegendItem.index_is_filtered_out_by_CDSView.blf
+++ b/bokehjs/test/baselines/linux/Bug__in_issue_#12718__doesn't_render_Legend_correctly_when_LegendItem.index_is_filtered_out_by_CDSView.blf
@@ -5,4 +5,4 @@ Figure bbox=[0, 0, 200, 200]
     Scatter bbox=[37, 92, 150, 0]
   LinearAxis bbox=[29, 178, 166, 22]
   LinearAxis bbox=[0, 5, 29, 173]
-  Legend bbox=[31, 15, 163, 42]
+  Legend bbox=[55, 15, 115, 42]

But I don't know about the following failing test:

✗ Plot
 └┬─ should support windowed auto-ranging
  └── with window_axis='y' when data changes
images differ by 4px (0.00%)

@gabalafou gabalafou mentioned this pull request Aug 26, 2025
3 tasks
@mattpap mattpap added this to the 3.9 milestone Aug 27, 2025
@pavithraes pavithraes added the grant: CZI R6 Funded by CZI Round 6 grant label Aug 28, 2025
@mattpap mattpap changed the base branch from branch-3.8 to branch-3.9 August 29, 2025 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
grant: CZI R6 Funded by CZI Round 6 grant status: ready
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ColumnarDataSource (CDS) to CSV/JSON method
3 participants