Add a hard upper limit to datastore_search(_sql) rows returned#4562
Conversation
67f3d8c to
2dc6bd1
Compare
…kan.datastore.search.rows_max
2dc6bd1 to
e6d6361
Compare
…rcleci is 2.7.15 I think
dfa7def to
e647bbd
Compare
|
ckan.datastore.search.rows_max is likely going to cause a problem with https://github.com/ckan/ckan/blob/master/ckanext/datastore/controller.py#L46 |
|
I guess the datastore dump should be exempt from the rows_max. It's not flexible with lots of options like datastore_search, so is very cacheable. And it's just a bit having an API to get a straight dump of a file and it turning up truncated. If so, dump_to could pass a context variable that tells datastore_search not to impose the rows_max. |
|
-1 on special behaviours based on context variables, just need to update dump to check for the results_truncated return value. If we try to be too clever we won't be able to support custom validation rules because we end up repeating them in other parts of the code. Also for choosing a default datastore_search limit set it at least as large as the dump PAGINATE_BY value. |
…mpler and more backwards compatible for clients.
…the rows_max is less than PAGINATE_BY then dump_as misses records.
|
Ok makes sense.
Rather than disallow rows_max to be less than 32000, I've done some code that allows it to be lower, and in this case it simply reduces the PAGINATE_BY value, so no rows are missed by the dumper. Ready for rereview @wardi |
…complicated - not sure if it is worth it.
|
@wardi Adding a 'records_truncated' to So I think the best we could do for datastore_search is say if the limit specified by the user has been lowered to rows_max. This is slightly useful, because the user doesn't know what rows_max is configured to be. I've not coded this yet, but it's simple. For consistency the 'Datastore dump' call should also warn the user if the records are curtailed. Because it uses datastore_search it too cannot say if you are over the rows_max, without that annoying extra query. But we can tell the user if we have returned rows up to the rows_max, even with the pagination going on - it's complicated but I've managed to implement it with the help of a bunch of tests. I'm not sure it is worth it - let me know what you think. |
|
If we're going to limit the number of records that the dump controller returns that should be a separate option that defaults to unlimited. The whole point of the dump controller is to dump all the data requested so that users don't need to paginate with the API themselves. The controller uses a constant amount of memory, it just takes more time if there are more records. |
|
Ok, I wasn't quite sure which way you were nudging, but that's clear now and helps! |
|
Backport for 2.7 is on branch 4561-limit-datastore_search-2.7-backport |
|
This is now ready for further comments @tino097 @smotornyuk |
|
@tino097 @smotornyuk Just a friendly ping to remind you about this PR :) |
|
Any more comments or is this good to merge @wardi @smotornyuk ? |
|
|
||
| A number of parameters from :meth:`~ckanext.datastore.logic.action.datastore_search` can be used: | ||
| ``offset``, ``limit``, ``filters``, ``q``, ``distinct``, ``plain``, ``language``, ``fields``, ``sort`` | ||
|
|
…he docs & changelog.
03cb7d8 to
51337ff
Compare
|
this looks good to me |
|
Thanks @wardi. @tino097 @smotornyuk did either of you want to look at this any more before I merge? |
|
@davidread It looks great, click the button |
Fixes #4561
Proposed fixes:
ckan.datastore.search.rows_maxis the new config option which limitsdatastore_searchanddatastore_search_sql.When
datastore_search_sqlreturns results that have hit this configured limit, then it includesrecords_truncated: Truein the response.It was too complicated to do the same for
datastore_search, however you can tell when the limit you requested is above theckan.datastore.search.rows_maxlimit, because the response includes alimitvalue which is changed from what you specified in the request tockan.datastore.search.rows_max.This PR includes some work to ensure 'datastore dump' still works (and remains not limited by
ckan.datastore.search.rows_max). I've updated modernized the related tests, and that work is also found in a separate PR: #4581 if you want to merge that separately (before this one).Features:
Please [X] all the boxes above that apply