Rewrite grid updating functionality #511

briandennis · 2018-08-08T12:57:23Z

@nicolaskruchten @n-riesco I'm still doing some manual testing, but I wanted to get this in front of you for review since I don't foresee the crux of it changing much.

Overview

This PR makes a core change to the way grids are updated in Falcon. Because of how central an alteration this is, I'm going to go into some detail about how it's implemented to try and catch any misunderstandings or issues up front! Please let me know if any of this sounds incorrect or contrary to your understanding 🙂

Previous to this PR, the update functionality worked as follows:

column uid's are passed in as a parameter to POST /queries
after running the supplied query, the updateGrid function is called which writes up to uids.length columns of data
column names are never altered

This is problematic because it relies on clients (Chart Studio, Falcon scheduling UI) to handle column renaming, appending, and deleting by manually updating the grid themselves. The Falcon scheduling UI isn't doing any of this which is what led to the problems in #507. Chart Studio does handle these updates, but only if the query is rerun before syncing with Falcon.

To eliminate this reliance, the PR expands the update grid functionality to take care of these operations. The algorithm it uses mimics the requests used in Chart Studio which seems to optimize for preserving uid's where possible (which is sensible since charts rely upon these)

After this PR, the update functionality works as follows:

after running the supplied query, format the column data to explicitly set column names and ordering
directly fetch the latest version of the grid and grab the uids
if there are now more columns than uids, update the existing columns in place and append any additional ones required
if there are now fewer columns than uids, delete surplus columns and update the required existing columns in place
otherwise the number of columns and uids are equal and the existing columns can be updated in place as in the older version

Backwards Compatibility

Though not used directly anymore, the latest uids are still stored with scheduled queries to support Old Falcon loading New Falcon yaml files. In cases we've found where this PR breaks from existing functionality, it does so by providing more data, not less. Regardless, I want to confirm the two main ones we're aware of aren't an issue:

updates via POST /queries now respond with the updated scheduled query rather than returning an empty object
requests which pass an outdated number of uids (with respect to the number of columns returned in the current query) will still set the correct number of columns. For example, previously if in Chart Studio you had an existing single columned query (select a) and updated it to a multi columned query (select a, b) and hit sync with connector before rerunning the query, the grid would incorrectly update with only one column whereas now it will populate both

closes #507

…params if updated from Chart Studio

nicolaskruchten · 2018-08-08T13:22:14Z

Thanks for this PR!

High-level questions:

with this PR, does Falcon behave the same when handling API calls from the webapp and its own UI? I.e. is work being done twice in the webapp case, once by the webapp and once by Falcon? This isn't necessarily a bad thing, especially if it leads to crisper syncing between the two, but I wanted to know what your intention was.
I assume the logic here has only changed during the conversation between the UI and Falcon, and not e.g. while executing a query on a schedule? It is possible to construct queries that return variable numbers of columns from run to run (e.g. https://www.postgresql.org/docs/9.1/static/tablefunc.html)... With this PR, what would happen in principle if, say, on save the query returns columns a,b,c but on the first scheduled run an hour later it returns columns e,f,g ?

briandennis · 2018-08-08T13:41:20Z

Yes, it treats both calls the same. In the case where the webapp updates everything before posting to Falcon (query is updated and re-run before clicking sync with connector), Falcon does repeat work by intention. The appends/deletes won't happen again, but it will repopulate all of the columns. As you noted, the tradeoff here is that it allows for the sync implementation to be more simplistic and easy to understand.
Under the hood, the same function is used by both the API handlers and the job scheduler. So this does impact how queries executing on a schedule are updated. It has the (truthfully unintentional) benefit of supporting variable column counts from the same query. In other words, it correctly handles your a,b,c -> e,f,g example as well as both a,b,c -> e and a,b,c -> e,f,g,h.

nicolaskruchten · 2018-08-08T13:45:35Z

OK, thanks for the clarifications. I'm OK with the principles behind both answers. The answer to the second question implies a potentially-breaking change, but in the direction of correctness, so I'm in support.

@n-riesco how do you feel about the code?

nicolaskruchten · 2018-08-08T14:18:05Z

I should note that the variable-columns-across-executions thing is actually much easier to do with non-SQL connection types like CSV and such, so it's not such an exotic case as I had first thought...

n-riesco · 2018-08-08T14:30:18Z

@briandennis @nicolaskruchten I've just skimmed through the PR. I'll review it more carefully later.

The main thing that has caught my eye is that some of the tests in https://github.com/plotly/falcon-sql-client/pull/511/files#diff-b5c6b550db1eb811e7de0d9a87f5a1eb have been updated to ignore the uids. Why is that? Would this affect the plots in Chart Studio that use scheduled queries?

n-riesco · 2018-08-08T16:29:16Z

backend/persistent/QueryScheduler.js

+            return getGrid(fid, requestor);
+        }).then((res) => {
+            Logger.log(`Request to Plotly for fetching updated grid took ${process.hrtime(startTime)[0]} seconds`, 2);
+            return res.json();


Test status before converting to JSON, otherwise the error will be reported as a failure to parse into JSON.

briandennis · 2018-08-08T16:43:36Z

@n-riesco re ignoring uids:

Most of the omissions attempt to correct for quirks that are no longer the case. For example, SELECT * from ebola_2014 will store all of the columns returned by the query rather than only the subset of uids that are passed.

Though, you're right that we should probably still be checking that uids aren't overwritten somewhere. I went ahead and added some unit tests to do that explicitly.

n-riesco · 2018-08-08T16:47:21Z

backend/utils/gridUtils.js

+  try {
+    return Object.keys(grid.cols)
+      .map(key => grid.cols[key])
+      .sort((a, b) => a.order - b.order)


what's the content of order?

I believe it's a zero based index representing the where the column lies from left to right

n-riesco · 2018-08-08T16:51:53Z

backend/persistent/plotly-api.js

+    const baseParams = { username, apiKey, accessToken };
+
+    // fetch latest grid to get the source of truth
+    return getGrid(fid, requestor)


Potentially, this is a very expensive request, since it returns all the grid data, when we are only interested in column names, uid and order.

@tarzzz Is there any other way to get this information?

good point! I believe the API supports GET for the /col endpoint

The API does support GET /col. I can make that update.

btw, our API is documented here: https://api.plot.ly/v2/grids/ ..

n-riesco · 2018-08-08T17:14:04Z

backend/persistent/plotly-api.js

+                return data;
+            }
+
+            const uids = extractOrderedUids(data);


This function orders uids by order.
Is this what we want?
Changing a query from select a,b to select b,a would break a plot.

Please, ignore this comment. I've checked and this is already the behaviour in master.

n-riesco · 2018-08-08T17:15:52Z

backend/persistent/plotly-api.js

+            }
+            return res.json();
+        }).then(data => {
+            if (data.status) {


~~Elsewhere we throw an error, why not here?~~
This is done so that it behaves like plotlyAPIRequest.

The status was being used by the tests so we just returned (https://github.com/plotly/falcon-sql-client/blob/dc4091249e48a10db12068abe1838900f36437a5/test/backend/QueryScheduler.spec.js#L629). Is it worth updating?

@mfix22 Judging by how we use updateGrid here, we shouldn't throw.

n-riesco · 2018-08-08T17:18:24Z

backend/persistent/plotly-api.js

+                    method: 'POST'
+                }).then((res) => {
+                    if (res.status !== 200) {
+                        return res;


nicolaskruchten · 2018-08-08T17:22:45Z

Since I'm cranking through testing scenarios, I saw this UI issue which hopefully can be slid into this PR:

Create a query through the webapp, to run every 5min
Go to Falcon and update it to run, say, every week
The success screen still says "Runs every 5 minutes" even thought if I close it it's correct in the list view etc and has been correctly saved.

nicolaskruchten · 2018-08-08T17:23:39Z

Second small UI thing: when creating a query from within Falcon, the initial "saved successfully" window doesn't include a link to the resulting dataset, which seems like a very natural thing to want to see there!

n-riesco · 2018-08-08T17:26:13Z

backend/persistent/plotly-api.js

+
+            if (numColumns > uids.length) {
+                // repopulate existing columns
+                const putUrl = `${baseUrl}?uid=${uids.join(',')}`;


do we need to slice uids here?

No, UIDs are shorter in length than numColumns. Not sure what we would slice it to.

n-riesco · 2018-08-08T18:08:34Z

test/backend/QueryScheduler.spec.js

            queryObject = {
                fid,
                uids,
-                refreshInterval,


I believe Brian just wanted to test that a default refresh interval was set even if one isn't sent with the request.

I can add it back in and make the assertion:

assert.deepEqual( getQueries(), [queryObject], 'Query has not been saved' );

would you prefer that?

Yes, please. Otherwise I won't know if this PR changes Falcon's behaviour.

n-riesco · 2018-08-08T18:42:28Z

I have to take a break. I'm not done with the review yet.

I want to understand why test/backend/routes.queries.spec.js fails in master, but it succeeds in PR.

My worry is that this PR changes POST /queries and hides the error we're currently seeing in master.

n-riesco · 2018-08-08T21:29:28Z

backend/persistent/QueryScheduler.js

+        }).then((res) => {
+            Logger.log(`Request to Plotly for fetching updated grid took ${process.hrtime(startTime)[0]} seconds`, 2);
+            if (res.status !== 200) {
+              return res.text();


this is silencing the error we currently see in master

Interesting 🤔This wasn't added until dc40912, but that test was still passing beforehand. Any idea how that is @n-riesco?

@briandennis After more debugging, I have a better idea of what's happening.

Currently, master is failing because we get 400 Bad Request in https://github.com/plotly/falcon-sql-client/pull/511/files#diff-a353f58a9179598892ca8c2a867f4e2dR309

In master, this causes an exception in https://github.com/plotly/falcon-sql-client/pull/511/files#diff-a353f58a9179598892ca8c2a867f4e2dL384 because res in undefined.

In this PR, execution moves to https://github.com/plotly/falcon-sql-client/pull/511/files#diff-a353f58a9179598892ca8c2a867f4e2dR390 as if the grid hadn't been deleted.

n-riesco · 2018-08-08T22:49:29Z

test/backend/routes.queries.spec.js

+                    return POST('queries', queryObject)
+                    .then(assertResponseStatus(201))
+                    .then(getResponseJson).then(json2 => {
+                        assert.deepEqual(json2, queryObject);


This is the only line that needed updating in this test.
Let's revert everything else.

n-riesco · 2018-08-08T22:54:51Z

test/backend/routes.queries.spec.js

-                .then(getResponseJson).then(json => {
-                    assert.deepEqual(json, [queryObject]);
+                .then(getResponseJson).then((json) => {
+                    assert.deepEqual(omit('uids', json), omit('uids', queryObject));


This is the only line that needed updating in this test.
Let's revert everything else.

Re the uids, is it possible to correct te const uids above?

n-riesco · 2018-08-08T23:11:59Z

@briandennis At the moment, there are 3 tests failing in master:

user@host:~/github/plotly-database-connector$ yarn test-unit-queries
yarn run v1.6.0
$ cross-env NODE_ENV=test BABEL_DISABLE_CACHE=1 electron-mocha --full-trace --timeout 90000 --compilers js:babel-register test/backend/routes.queries.spec.js


  Routes:
    queries:
      ✓ can create a grid when it registers a query (7323ms)
      ✓ registers a query and returns saved queries (9327ms)
      1) can register queries if the user is a collaborator
      ✓ can't register queries if the user can't view it (799ms)
      ✓ can't register queries if the user isn't a collaborator (1061ms)
      2) gets individual queries
      3) deletes queries
      ✓ returns 404s when getting queries that don't exist
      ✓ returns 404s when deleting queries that don't exist
      ✓ fails when the user's API keys or oauth creds aren't saved (51ms)
      ✓ fails when the user's API keys aren't correct (906ms)
      ✓ fails when it can't connect to the plotly server
      ✓ fails when there is a syntax error in the query (4046ms)


  10 passing (54s)
  3 failing

These failures are caused by changes in the uids of 2 grids: plotly-database-connector:718 and plotly-database-connector:197

fid: plotly-database-connector:718
status: 400
body: {"errors":[{"code":"UNKNOWN","message":"The uids: d8ba6c, dfa411 do not belong to this grid.","path":null,"field":null}],"detail":"The uids: d8ba6c, dfa411 do not belong to this grid."}


fid: plotly-database-connector:197
status: 400
body: {"errors":[{"code":"UNKNOWN","message":"The uids: d5d91e, 89d77e do not belong to this grid.","path":null,"field":null}],"detail":"The uids: d5d91e, 89d77e do not belong to this grid."}

The changes in queryAndUpdateGrid and test/backend/routes.queries.spec.js in this PR hide this failures.

To convince myself that this PR doesn't hide these failures:

queryAndUpdateGrid should report failures in https://github.com/plotly/falcon-sql-client/pull/511/files#diff-a353f58a9179598892ca8c2a867f4e2dR309 and thereafter (this is done in https://github.com/plotly/falcon-sql-client/blob/7a7483790488f897362479c6280658ab24ee20e5/backend/persistent/QueryScheduler.js#L361-L364 )
changes to test/backend/routes.queries.spec.js should be limited to updating the list of uids used in the failing test and to the assertion checking the result returned by POST /queries.

n-riesco · 2018-08-08T23:41:33Z

backend/persistent/QueryScheduler.js

            Logger.log(`Request to Plotly for creating a grid took ${process.hrtime(startTime)[0]} seconds`, 2);
            Logger.log(`Grid ${fid} has been updated.`, 2);

+            if (res.status !== 200) {


I think the PR was OK without this change. The error is already thrown here.

Note that res here is undefined when the execution goes through https://github.com/plotly/falcon-sql-client/blob/7a7483790488f897362479c6280658ab24ee20e5/backend/persistent/QueryScheduler.js#L338 or https://github.com/plotly/falcon-sql-client/blob/7a7483790488f897362479c6280658ab24ee20e5/backend/persistent/QueryScheduler.js#L359

@n-riesco without that change, wouldn't the error be swallowed if patchGrid responds non-200, but does not throw?

n-riesco · 2018-08-08T23:45:07Z

@mfix22 Since after this PR, POST /queries ignores uids, we should remove uids from test/backend/routes.queries.spec.js. E.g:

briandennis · 2018-08-08T23:56:12Z

@n-riesco re errors on master:

So sorry for headache from the uids changing! Those were accidentally altered while experimenting with the update algorithm during development. I should have realized that after reading your post about the failure — apologies for not connecting the dots sooner, that's on me.

n-riesco · 2018-08-08T23:59:26Z

@briandennis No worries (one can't make an omelette without breaking some eggs 😄 ).

@mfix22 It's getting late for me. Tomorrow, I'll have a look again at the PR and test it a bit further.

briandennis · 2018-08-09T02:55:55Z

@n-riesco CI is failing on that same collaborator test. From the error message, it looks like it's responding with a 403 to the PATCH request. After manually authenticating with a HTTP client, I confirmed I can GET the query, but PATCHing still responds403.

Any idea why this might be? Do you think the way collaborator permissions are handled may have changed?

n-riesco · 2018-08-09T11:35:56Z

@briandennis I'm taking that discussion to Slack.

briandennis and others added 10 commits August 7, 2018 08:33

set update scheduled query payload correctly

c8c0b28

update backend to correctly set names and add/remove columns

50ce572

fix updateGrid signature in tests

df31eec

update scheduler tests for new updateGrid API

81381f2

fix updateGrid to GET source or truth

9c014ce

if GET is not 200 in updateGrid, return the response

5d3b558

omit unnecessary info in tests

dbdd654

remove uid parameters where possible, maintain falcon specific query …

51ea8cc

…params if updated from Chart Studio

call scheduled query job with correct parameters

61749cb

move DELETE before PUT to minimize risk of column naming collision

a3037b6

briandennis requested a review from n-riesco August 8, 2018 13:01

n-riesco reviewed Aug 8, 2018

View reviewed changes

add uid tests for updateGrid

02ab47c

n-riesco reviewed Aug 8, 2018

View reviewed changes

GET /grids/:fid/col

1a2bfd5

n-riesco reviewed Aug 8, 2018

View reviewed changes

check for status as the end of queryAndUpdateGrid

dc40912

n-riesco reviewed Aug 8, 2018

View reviewed changes

fix syncing issue after updating query cronInterval

9049db3

mfix22 added 2 commits August 8, 2018 10:55

extract getGridColumn into API function

b4f4f0a

rename preview-modal.jsx -> preview-modal.test.jsx

9351cc9

n-riesco reviewed Aug 8, 2018

View reviewed changes

mfix22 and others added 2 commits August 8, 2018 11:54

revert removing refreshInterval from test

a4cf0d4

clean up SuccessMessage, show datasetUrl in create-modal

400b5c2

n-riesco reviewed Aug 8, 2018

View reviewed changes

mfix22 added 2 commits August 8, 2018 15:59

clean up routes.queries.spec

75fd5fc

remove omit from routes.queries

005702a

mfix22 added 2 commits August 8, 2018 16:16

keep routes.spec as close to master as possible

7a74837

throw non 200's in QueryScheduler

ee413c4

n-riesco reviewed Aug 8, 2018

View reviewed changes

confirm res and res.status are not undefined

c526883

remove uids references from routes.queries.spec POST bodys

8f1d1dd

disable collaborator query update test

c39a0fd

briandennis merged commit 0b44821 into 3.0-onprem Aug 9, 2018

mfix22 deleted the gridUpdate branch August 9, 2018 21:34

nicolaskruchten mentioned this pull request Aug 10, 2018

Updating queries with different numbers and names of columns #507

Closed

Uh oh!

Rewrite grid updating functionality #511

Rewrite grid updating functionality #511

Uh oh!

Conversation

briandennis commented Aug 8, 2018

Overview

Backwards Compatibility

Uh oh!

nicolaskruchten commented Aug 8, 2018

Uh oh!

briandennis commented Aug 8, 2018

Uh oh!

nicolaskruchten commented Aug 8, 2018

Uh oh!

nicolaskruchten commented Aug 8, 2018

Uh oh!

n-riesco commented Aug 8, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

briandennis commented Aug 8, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

n-riesco Aug 8, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

n-riesco Aug 8, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

n-riesco Aug 8, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nicolaskruchten commented Aug 8, 2018

Uh oh!

nicolaskruchten commented Aug 8, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

n-riesco commented Aug 8, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

n-riesco commented Aug 8, 2018 • edited by mfix22 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

n-riesco Aug 8, 2018 •

edited

Loading

n-riesco Aug 8, 2018 •

edited

Loading

n-riesco Aug 8, 2018 •

edited

Loading

n-riesco commented Aug 8, 2018 •

edited by mfix22

Loading