Thanks to visit codestin.com
Credit goes to github.com

Skip to content
This repository was archived by the owner on Aug 29, 2025. It is now read-only.

Conversation

@briandennis
Copy link
Contributor

@nicolaskruchten @n-riesco I'm still doing some manual testing, but I wanted to get this in front of you for review since I don't foresee the crux of it changing much.

Overview

This PR makes a core change to the way grids are updated in Falcon. Because of how central an alteration this is, I'm going to go into some detail about how it's implemented to try and catch any misunderstandings or issues up front! Please let me know if any of this sounds incorrect or contrary to your understanding 🙂

Previous to this PR, the update functionality worked as follows:

  • column uid's are passed in as a parameter to POST /queries
  • after running the supplied query, the updateGrid function is called which writes up to uids.length columns of data
  • column names are never altered

This is problematic because it relies on clients (Chart Studio, Falcon scheduling UI) to handle column renaming, appending, and deleting by manually updating the grid themselves. The Falcon scheduling UI isn't doing any of this which is what led to the problems in #507. Chart Studio does handle these updates, but only if the query is rerun before syncing with Falcon.

To eliminate this reliance, the PR expands the update grid functionality to take care of these operations. The algorithm it uses mimics the requests used in Chart Studio which seems to optimize for preserving uid's where possible (which is sensible since charts rely upon these)

After this PR, the update functionality works as follows:

  • after running the supplied query, format the column data to explicitly set column names and ordering
  • directly fetch the latest version of the grid and grab the uids
  • if there are now more columns than uids, update the existing columns in place and append any additional ones required
  • if there are now fewer columns than uids, delete surplus columns and update the required existing columns in place
  • otherwise the number of columns and uids are equal and the existing columns can be updated in place as in the older version

Backwards Compatibility

Though not used directly anymore, the latest uids are still stored with scheduled queries to support Old Falcon loading New Falcon yaml files. In cases we've found where this PR breaks from existing functionality, it does so by providing more data, not less. Regardless, I want to confirm the two main ones we're aware of aren't an issue:

  • updates via POST /queries now respond with the updated scheduled query rather than returning an empty object
  • requests which pass an outdated number of uids (with respect to the number of columns returned in the current query) will still set the correct number of columns. For example, previously if in Chart Studio you had an existing single columned query (select a) and updated it to a multi columned query (select a, b) and hit sync with connector before rerunning the query, the grid would incorrectly update with only one column whereas now it will populate both

closes #507

@briandennis briandennis requested a review from n-riesco August 8, 2018 13:01
@nicolaskruchten
Copy link
Contributor

Thanks for this PR!

High-level questions:

  1. with this PR, does Falcon behave the same when handling API calls from the webapp and its own UI? I.e. is work being done twice in the webapp case, once by the webapp and once by Falcon? This isn't necessarily a bad thing, especially if it leads to crisper syncing between the two, but I wanted to know what your intention was.
  2. I assume the logic here has only changed during the conversation between the UI and Falcon, and not e.g. while executing a query on a schedule? It is possible to construct queries that return variable numbers of columns from run to run (e.g. https://www.postgresql.org/docs/9.1/static/tablefunc.html)... With this PR, what would happen in principle if, say, on save the query returns columns a,b,c but on the first scheduled run an hour later it returns columns e,f,g ?

@briandennis
Copy link
Contributor Author

  1. Yes, it treats both calls the same. In the case where the webapp updates everything before posting to Falcon (query is updated and re-run before clicking sync with connector), Falcon does repeat work by intention. The appends/deletes won't happen again, but it will repopulate all of the columns. As you noted, the tradeoff here is that it allows for the sync implementation to be more simplistic and easy to understand.

  2. Under the hood, the same function is used by both the API handlers and the job scheduler. So this does impact how queries executing on a schedule are updated. It has the (truthfully unintentional) benefit of supporting variable column counts from the same query. In other words, it correctly handles your a,b,c -> e,f,g example as well as both a,b,c -> e and a,b,c -> e,f,g,h.

@nicolaskruchten
Copy link
Contributor

OK, thanks for the clarifications. I'm OK with the principles behind both answers. The answer to the second question implies a potentially-breaking change, but in the direction of correctness, so I'm in support.

@n-riesco how do you feel about the code?

@nicolaskruchten
Copy link
Contributor

I should note that the variable-columns-across-executions thing is actually much easier to do with non-SQL connection types like CSV and such, so it's not such an exotic case as I had first thought...

@n-riesco
Copy link
Contributor

n-riesco commented Aug 8, 2018

@briandennis @nicolaskruchten I've just skimmed through the PR. I'll review it more carefully later.

The main thing that has caught my eye is that some of the tests in https://github.com/plotly/falcon-sql-client/pull/511/files#diff-b5c6b550db1eb811e7de0d9a87f5a1eb have been updated to ignore the uids. Why is that? Would this affect the plots in Chart Studio that use scheduled queries?

return getGrid(fid, requestor);
}).then((res) => {
Logger.log(`Request to Plotly for fetching updated grid took ${process.hrtime(startTime)[0]} seconds`, 2);
return res.json();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test status before converting to JSON, otherwise the error will be reported as a failure to parse into JSON.

@briandennis
Copy link
Contributor Author

@n-riesco re ignoring uids:

Most of the omissions attempt to correct for quirks that are no longer the case. For example, SELECT * from ebola_2014 will store all of the columns returned by the query rather than only the subset of uids that are passed.

Though, you're right that we should probably still be checking that uids aren't overwritten somewhere. I went ahead and added some unit tests to do that explicitly.

try {
return Object.keys(grid.cols)
.map(key => grid.cols[key])
.sort((a, b) => a.order - b.order)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the content of order?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's a zero based index representing the where the column lies from left to right

const baseParams = { username, apiKey, accessToken };

// fetch latest grid to get the source of truth
return getGrid(fid, requestor)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially, this is a very expensive request, since it returns all the grid data, when we are only interested in column names, uid and order.

@tarzzz Is there any other way to get this information?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point! I believe the API supports GET for the /col endpoint

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API does support GET /col. I can make that update.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, our API is documented here: https://api.plot.ly/v2/grids/ ..

return data;
}

const uids = extractOrderedUids(data);
Copy link
Contributor

@n-riesco n-riesco Aug 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function orders uids by order.
Is this what we want?
Changing a query from select a,b to select b,a would break a plot.


Please, ignore this comment. I've checked and this is already the behaviour in master.

}
return res.json();
}).then(data => {
if (data.status) {
Copy link
Contributor

@n-riesco n-riesco Aug 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elsewhere we throw an error, why not here?
This is done so that it behaves like plotlyAPIRequest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mfix22 Judging by how we use updateGrid here, we shouldn't throw.

method: 'POST'
}).then((res) => {
if (res.status !== 200) {
return res;
Copy link
Contributor

@n-riesco n-riesco Aug 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throw?

@nicolaskruchten
Copy link
Contributor

Since I'm cranking through testing scenarios, I saw this UI issue which hopefully can be slid into this PR:

  1. Create a query through the webapp, to run every 5min
  2. Go to Falcon and update it to run, say, every week
  3. The success screen still says "Runs every 5 minutes" even thought if I close it it's correct in the list view etc and has been correctly saved.

@nicolaskruchten
Copy link
Contributor

Second small UI thing: when creating a query from within Falcon, the initial "saved successfully" window doesn't include a link to the resulting dataset, which seems like a very natural thing to want to see there!


if (numColumns > uids.length) {
// repopulate existing columns
const putUrl = `${baseUrl}?uid=${uids.join(',')}`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to slice uids here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, UIDs are shorter in length than numColumns. Not sure what we would slice it to.

queryObject = {
fid,
uids,
refreshInterval,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe Brian just wanted to test that a default refresh interval was set even if one isn't sent with the request.

I can add it back in and make the assertion:

assert.deepEqual(
    getQueries(),
    [queryObject],
    'Query has not been saved'
);

would you prefer that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please. Otherwise I won't know if this PR changes Falcon's behaviour.

@n-riesco
Copy link
Contributor

n-riesco commented Aug 8, 2018

I have to take a break. I'm not done with the review yet.

I want to understand why test/backend/routes.queries.spec.js fails in master, but it succeeds in PR.

My worry is that this PR changes POST /queries and hides the error we're currently seeing in master.

}).then((res) => {
Logger.log(`Request to Plotly for fetching updated grid took ${process.hrtime(startTime)[0]} seconds`, 2);
if (res.status !== 200) {
return res.text();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is silencing the error we currently see in master

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting 🤔This wasn't added until dc40912, but that test was still passing beforehand. Any idea how that is @n-riesco?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@briandennis After more debugging, I have a better idea of what's happening.

Currently, master is failing because we get 400 Bad Request in https://github.com/plotly/falcon-sql-client/pull/511/files#diff-a353f58a9179598892ca8c2a867f4e2dR309

In master, this causes an exception in https://github.com/plotly/falcon-sql-client/pull/511/files#diff-a353f58a9179598892ca8c2a867f4e2dL384 because res in undefined.

In this PR, execution moves to https://github.com/plotly/falcon-sql-client/pull/511/files#diff-a353f58a9179598892ca8c2a867f4e2dR390 as if the grid hadn't been deleted.

return POST('queries', queryObject)
.then(assertResponseStatus(201))
.then(getResponseJson).then(json2 => {
assert.deepEqual(json2, queryObject);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only line that needed updating in this test.
Let's revert everything else.

.then(getResponseJson).then(json => {
assert.deepEqual(json, [queryObject]);
.then(getResponseJson).then((json) => {
assert.deepEqual(omit('uids', json), omit('uids', queryObject));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only line that needed updating in this test.
Let's revert everything else.


Re the uids, is it possible to correct te const uids above?

@n-riesco
Copy link
Contributor

n-riesco commented Aug 8, 2018

@briandennis At the moment, there are 3 tests failing in master:

user@host:~/github/plotly-database-connector$ yarn test-unit-queries
yarn run v1.6.0
$ cross-env NODE_ENV=test BABEL_DISABLE_CACHE=1 electron-mocha --full-trace --timeout 90000 --compilers js:babel-register test/backend/routes.queries.spec.js


  Routes:
    queries:
      ✓ can create a grid when it registers a query (7323ms)
      ✓ registers a query and returns saved queries (9327ms)
      1) can register queries if the user is a collaborator
      ✓ can't register queries if the user can't view it (799ms)
      ✓ can't register queries if the user isn't a collaborator (1061ms)
      2) gets individual queries
      3) deletes queries
      ✓ returns 404s when getting queries that don't exist
      ✓ returns 404s when deleting queries that don't exist
      ✓ fails when the user's API keys or oauth creds aren't saved (51ms)
      ✓ fails when the user's API keys aren't correct (906ms)
      ✓ fails when it can't connect to the plotly server
      ✓ fails when there is a syntax error in the query (4046ms)


  10 passing (54s)
  3 failing

These failures are caused by changes in the uids of 2 grids: plotly-database-connector:718 and plotly-database-connector:197

fid: plotly-database-connector:718
status: 400
body: {"errors":[{"code":"UNKNOWN","message":"The uids: d8ba6c, dfa411 do not belong to this grid.","path":null,"field":null}],"detail":"The uids: d8ba6c, dfa411 do not belong to this grid."}


fid: plotly-database-connector:197
status: 400
body: {"errors":[{"code":"UNKNOWN","message":"The uids: d5d91e, 89d77e do not belong to this grid.","path":null,"field":null}],"detail":"The uids: d5d91e, 89d77e do not belong to this grid."}

The changes in queryAndUpdateGrid and test/backend/routes.queries.spec.js in this PR hide this failures.

To convince myself that this PR doesn't hide these failures:

Logger.log(`Request to Plotly for creating a grid took ${process.hrtime(startTime)[0]} seconds`, 2);
Logger.log(`Grid ${fid} has been updated.`, 2);

if (res.status !== 200) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@n-riesco without that change, wouldn't the error be swallowed if patchGrid responds non-200, but does not throw?

@briandennis
Copy link
Contributor Author

@n-riesco re errors on master:

So sorry for headache from the uids changing! Those were accidentally altered while experimenting with the update algorithm during development. I should have realized that after reading your post about the failure — apologies for not connecting the dots sooner, that's on me.

@n-riesco
Copy link
Contributor

n-riesco commented Aug 8, 2018

@briandennis No worries (one can't make an omelette without breaking some eggs 😄 ).

@mfix22 It's getting late for me. Tomorrow, I'll have a look again at the PR and test it a bit further.

@briandennis
Copy link
Contributor Author

@n-riesco CI is failing on that same collaborator test. From the error message, it looks like it's responding with a 403 to the PATCH request. After manually authenticating with a HTTP client, I confirmed I can GET the query, but PATCHing still responds403.

Any idea why this might be? Do you think the way collaborator permissions are handled may have changed?

@n-riesco
Copy link
Contributor

n-riesco commented Aug 9, 2018

@briandennis I'm taking that discussion to Slack.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants