-
Notifications
You must be signed in to change notification settings - Fork 869
Feat/support repo info #1483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/support repo info #1483
Conversation
Merge remote-tracking branch 'upstream' into feat/support_repo_info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @alcholiclg, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly enhances the HubApi by introducing new functionalities to retrieve comprehensive repository information. It adds a unified repo_info method to fetch detailed metadata for both models and datasets, including their latest commit details. Furthermore, it provides a list_repo_commits method to access the full commit history of a repository. The changes also improve the robustness of ModelInfo and DatasetInfo classes by making them compatible with potentially undefined parameters and integrating the new commit information.
Highlights
- New repo_info API: A unified method HubApi.repo_info is introduced to fetch detailed metadata for both models and datasets, including their latest commit information.
- Commit History Retrieval: A new HubApi.list_repo_commits method allows fetching the complete commit history for any given model or dataset repository.
- Enhanced Model and Dataset Information: The ModelInfo and DatasetInfo classes are updated to include fields for the latest commit SHA (sha), the full latest commit object (last_commit), and the last_modified timestamp, providing richer repository metadata.
- Improved Data Class Robustness: ModelInfo and DatasetInfo constructors are made more resilient to handle missing or undefined parameters, ensuring backward compatibility.
- Timestamp Conversion Utility: A new convert_timestamp utility function is added to modelscope.hub.utils.utils to reliably convert various timestamp formats (UNIX, ISO strings) into timezone-aware datetime objects.
- New Data Structures: OrganizationInfo, DetailedCommitInfo, and CommitHistoryResponse dataclasses are introduced to better structure and manage the retrieved repository and commit data.
- Comprehensive Test Coverage: A new test file tests/hub/test_hub_repo_info.py is added, providing extensive unit tests for the new API methods and the enhanced data classes, including tests with real repository data.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces significant enhancements to the HubApi
by adding support for retrieving repository information (repo_info
) and commit history (list_repo_commits
). The implementation includes new data classes for models, datasets, and organizations, which is a good approach. However, I've identified a few critical issues, such as type mismatches in dataclass initializers that could cause runtime errors, and some flawed error handling logic. Additionally, there are opportunities to improve code quality and robustness. The inclusion of comprehensive tests is a great practice.
modelscope/hub/api.py
Outdated
self.author = kwargs.pop('author', '') | ||
|
||
# backward compatibility | ||
self.__dict__.update(kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using self.__dict__.update(kwargs)
for backward compatibility can be risky. It makes the object's attributes dynamic and dependent on the API response, which can hide issues like typos in keys from the API. For improved robustness, it's better to explicitly handle all expected keys and perhaps log a warning for any unexpected keys remaining in kwargs
.
modelscope/hub/api.py
Outdated
self.last_modified = None | ||
|
||
# backward compatibility | ||
self.__dict__.update(kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using self.__dict__.update(kwargs)
for backward compatibility can be risky. It makes the object's attributes dynamic and dependent on the API response, which can hide issues like typos in keys from the API. For improved robustness, it's better to explicitly handle all expected keys and perhaps log a warning for any unexpected keys remaining in kwargs
.
if is_relative_path(repo_id) and repo_id.count('/') == 1: | ||
_owner, _dataset_name = repo_id.split('/') | ||
else: | ||
raise ValueError(f'Invalid repo_id: {repo_id} !') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variables _owner
and _dataset_name
are assigned but never used. They should be removed to improve code clarity. The validation logic can be simplified.
if is_relative_path(repo_id) and repo_id.count('/') == 1: | |
_owner, _dataset_name = repo_id.split('/') | |
else: | |
raise ValueError(f'Invalid repo_id: {repo_id} !') | |
if not (is_relative_path(repo_id) and repo_id.count('/') == 1): | |
raise ValueError(f'Invalid repo_id: {repo_id} !') |
except requests.exceptions.RequestException as e: | ||
raise Exception(f'Failed to get repository commits for {repo_id}: {str(e)}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raising a generic Exception
can obscure the original error and make debugging more difficult. It's better to raise a more specific exception, like RequestError
, and chain the original RequestException
using from e
to preserve the stack trace.
except requests.exceptions.RequestException as e: | |
raise Exception(f'Failed to get repository commits for {repo_id}: {str(e)}') | |
except requests.exceptions.RequestException as e: | |
raise RequestError(f'Failed to get repository commits for {repo_id}: {str(e)}') from e |
Merge remote-tracking branch 'upstream' into feat/support_repo_info
…cope into feat/support_repo_info Merge branch 'feat/support_repo_info' of github.com:alcholiclg/modelscope into feat/support_repo_info
This PR adds features to HubApi:
repo_info
support for retrieving repository-level metadata (datasets, models) and latest commit info.list_repo_commits
for fetching full repository commit history.ModelInfo
andDatasetInfo
.Usage example: