Thanks to visit codestin.com
Credit goes to github.com

Skip to content

#5190 - Add metadata_modified field to resource#5236

Merged
amercader merged 11 commits into
ckan:masterfrom
pdelboca:fix/last-modified-for-resources
Mar 10, 2020
Merged

#5190 - Add metadata_modified field to resource#5236
amercader merged 11 commits into
ckan:masterfrom
pdelboca:fix/last-modified-for-resources

Conversation

@pdelboca
Copy link
Copy Markdown
Member

Fixes #5190

Proposed fixes:

This PR adds a metadata_modified field to the Resource model to keep track of changes. It is updated in resource_dict_save (model_save.py) only if new values being saved differs from the actual ones.

Features:

  • includes tests covering changes
  • includes updated documentation
  • includes user-visible changes
  • includes API changes
  • includes bugfix for possible backport

Please [X] all the boxes above that apply

@amercader amercader changed the title #5190 - Add metadata_modified field to resource (WIP) #5190 - Add metadata_modified field to resource Feb 25, 2020
@amercader amercader self-assigned this Feb 25, 2020
@pdelboca
Copy link
Copy Markdown
Member Author

@amercader regarding new unit tests, I'm thinking on what's the best way to do testing of this. In our codebase we have several ways to do it for the field in package:

test_model_dictize.py::test_package_dictize_basic:

today = datetime.date.today().strftime("%Y-%m-%d")
assert result["metadata_modified"].startswith(today)

test_get.py::TestPackageShow::test_package_show_with_full_dataset

def replace_datetime(dict_, key):
            assert key in dict_
            dict_[key] = u"2019-05-24T15:52:30.123456"

replace_datetime(dataset2, "metadata_modified")

test_jobs.py::TestDictizeJob::test_dictize_job

assert abs((now - dt).total_seconds()) < 10

@smotornyuk proposed in a PR (can't find it) to use freezegun but it mocks datetime.now() and we are using datetime.utcnow().

Creating a new mock is not that trivial, since Python built-in types are immutable. I tried to create a fixture like the following one proposed here:

import datetime
import pytest

FAKE_TIME = datetime.datetime(2020, 12, 25, 17, 5, 55)

@pytest.fixture
def patch_datetime_utcnow(monkeypatch):

    class mydatetime:
        @classmethod
        def utcnow(cls):
            return FAKE_TIME

    monkeypatch.setattr(datetime, 'datetime', mydatetime)


def test_patch_datetime(patch_datetime_now):
    assert datetime.datetime.now() == FAKE_TIME

But it throws errors on other tests and codebase because we are mocking the datetime.datetime object that it is widely used. For example:

ERROR    ckan.lib.search.common:common.py:63 mydatetime() takes no arguments
Traceback (most recent call last):
  File "/home/pdelboca/Repos/ckan/ckan/lib/search/common.py", line 61, in is_available
    conn.search(q="*:*", rows=1)
  File "/home/pdelboca/envs/ckan-py3/lib/python3.7/site-packages/pysolr.py", line 721, in search
    decoded = self.decoder.decode(response)
  File "/home/pdelboca/envs/ckan-py3/lib/python3.7/site-packages/simplejson/decoder.py", line 374, in decode
    obj, end = self.raw_decode(s)
  File "/home/pdelboca/envs/ckan-py3/lib/python3.7/site-packages/simplejson/decoder.py", line 404, in raw_decode
    return self.scan_once(s, idx=_w(s, idx).end())
  File "/home/pdelboca/Repos/ckan/ckan/lib/search/common.py", line 101, in solr_datetime_decoder
    date_values['second'])
TypeError: mydatetime() takes no arguments

I can keep testing as we are doing now, but I thought that it was good to raise this and found a holistic approach for how we test this time based logic.

@pdelboca
Copy link
Copy Markdown
Member Author

@amercader discard the last comment, writing it just make me read more carefully the docs and issues and seems that freezegun already mocks utcnow() as the README.md points out:

Once the decorator or context manager have been invoked, all calls to datetime.datetime.now(), datetime.datetime.utcnow(), datetime.date.today(), time.time(), time.localtime(), time.gmtime(), and time.strftime() will return the time that has been frozen.

Sorry, holidays are coming soon :)

I will add some tests using freezegun and see how it goes!

@wardi
Copy link
Copy Markdown
Contributor

wardi commented Feb 28, 2020

If we add these fields then we can stop updating the metadata_modified at the package level in the database and instead replace the metadata_modified value returned from package_show with max(package.metadata_modified, resource1.metadata_modified, ...) That might help reduce the contention raised in #5233

Copy link
Copy Markdown
Member

@amercader amercader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great @pdelboca , only one minor change about previously missing attributes

Comment thread ckan/model/resource.py
Column('size', types.BigInteger),
Column('created', types.DateTime, default=datetime.datetime.utcnow),
Column('last_modified', types.DateTime),
Column('metadata_modified', types.DateTime),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behaviour for metadata_modified at the package level is to have utcnow as default, which I think it makes sense if you are using this field to track changes in a resource

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see now that the field will be set to utcnow as the value of the field will be different, but I'd add it here anyway for consistency

obj.url_changed = True
if key == 'url' and not new and obj.url != value:
obj.url_changed = True
if getattr(obj, key) != value:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the key was not present in the resource before this will raise an AttributeError.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry ignore me, I was reading the code wrong

@amercader amercader merged commit 91cf220 into ckan:master Mar 10, 2020
@amercader
Copy link
Copy Markdown
Member

@pdelboca I went ahead and added the missing bits to this before merging. More important ones:

@amercader
Copy link
Copy Markdown
Member

@wardi this PR only included the addition of the metadata_modified field at the resource level. The dataset one remains unchanged. I guess that to do what you propose we need similar logic on model_save at the package level to see if the dataset really changed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Last modified dates not updated for resources

3 participants