-
Notifications
You must be signed in to change notification settings - Fork 2
Snowballing with Semantic Scholar #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I finally found the time to review your pull request. Thanks for your submission! It looks good in general.
I have addressed some concerns, mainly potential database inconsistency that should be addressed with django.db.transaction.atomic and provided additional albeit minor suggestions for further improvements.
|
|
||
| # Assign authors | ||
| for position, author in enumerate(authors): | ||
| publication_author, created = PublicationAuthor.objects.get_or_create( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If one of these operations fail, the database will be inconsistent: only some authors get assigned to the publication and the operations from here on will not be executed. Hence, the complete import of a single publication should happen inside a transaction.atomic() context:
with transaction.atomic():
…| title = "Reference" if is_reference else "Citation" | ||
| if 0 < len(objs): | ||
| self.echo(f"--- {title}s ---") | ||
| publications: List[Publication] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is assigned but never used. Only publications are added to the list.
|
|
||
| # Add to Semantic Scholar and link publications | ||
| if doi: | ||
| publication = Publication.objects.get(doi=doi) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to query the database to fetch the publication, as the publication variable already holds it.
| if title.index(' ') > -1: | ||
| cite_key += title[:title.index(' ')] | ||
| else: | ||
| cite_key += title |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if title.index(' ') > -1: | |
| cite_key += title[:title.index(' ')] | |
| else: | |
| cite_key += title | |
| cite_key += title.split(' ')[0] |
| first = True | ||
| cite_key = '' | ||
| for author in data.get('authors', []): | ||
| name = author.get('name', '') | ||
| author, created = Author.objects.get_or_create(name=name) | ||
| if created: | ||
| self.echo(f"Added author: {author}") | ||
| else: | ||
| self.echo(f"Author '{author}' alreay known") | ||
| authors.append(author) | ||
| if first: | ||
| first = False | ||
| if name.rindex(' ') > -1: | ||
| cite_key = name[name.rindex(' '):] | ||
| else: | ||
| cite_key = name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code seems a bit complex. The variable first is not really required, and a check whether cite_key is still empty would suffice. Since authors are collected anyway, one could simply extract the last name of the first author afterwards, e. g.:
| first = True | |
| cite_key = '' | |
| for author in data.get('authors', []): | |
| name = author.get('name', '') | |
| author, created = Author.objects.get_or_create(name=name) | |
| if created: | |
| self.echo(f"Added author: {author}") | |
| else: | |
| self.echo(f"Author '{author}' alreay known") | |
| authors.append(author) | |
| if first: | |
| first = False | |
| if name.rindex(' ') > -1: | |
| cite_key = name[name.rindex(' '):] | |
| else: | |
| cite_key = name | |
| for author in data.get('authors', []): | |
| name = author.get('name', '') | |
| author, created = Author.objects.get_or_create(name=name) | |
| if created: | |
| self.echo(f"Added author: {author}") | |
| else: | |
| self.echo(f"Author '{author}' alreay known") | |
| authors.append(author) | |
| cite_key = '' | |
| if authors: | |
| cite_key = authors[0].name.split(' ')[-1] |
| first = True | ||
| cite_key = '' | ||
| for author in data.get('authors', []): | ||
| name = author.get('name', '') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes no sense adding authors with empty names. A better approach might be:
| name = author.get('name', '') | |
| if name := author.get('name', None): | |
| … |
Addressing #2