-
-
Notifications
You must be signed in to change notification settings - Fork 313
Added Script to fetch all Gsoc orgs #4159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe command for fetching Google Summer of Code (GSoC) organizations has been updated to allow retrieval of data for multiple years, not just a single year. The script now supports command-line arguments for specifying years, and iterates through each year to fetch and process organizations. Deduplication is improved by checking for existing organizations via website URL, and organizations are tagged both with a general "gsoc" tag and a year-specific tag. The organizations' participation years are tracked, and logging now includes year context. Error handling and tag assignment logic have also been refined. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Command
participant API
participant DB
User->>Command: Run fetch_gsoc_orgs with years specified
loop For each year
Command->>API: Fetch organizations for {year}
API-->>Command: Return organization data
loop For each organization
Command->>DB: Check if org exists by URL
alt Exists
Command->>DB: Update existing org, add year/tag
else Not exists
Command->>DB: Create new org, add year/tag
end
end
end
Command->>User: Output summary/log for all years
Assessment against linked issues
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (5)
website/management/commands/fetch_gsoc_orgs.py (5)
33-41: CLI help and flag semantics drift from reality
- The help string and
--current-onlydescription hard‑code “2025”, but the constant actually resolves toGSOC_YEARS[0](currently 2024).- Users have to remember to update two places whenever we roll over to a new year.
Tie the wording to
LATEST_YEAR(see previous comment) so the help stays accurate automatically.
79-82: Error branch discards useful contextReturning
0on exception means the caller cannot distinguish between “API returned 0 organisations” and “network failure”. Propagate the exception or returnNone/-1so callers (and automated jobs) can tell failures from empty results.
88-110:existing_orgshandling is fragile & N+1 prone
- When
urlis empty the variable is set toNone, yet later we callexisting_orgs and existing_orgs.exists(). Works, but obscures intent—prefer an always‑queryset pattern.- Multiple rows with the same URL return arbitrary
.first(). Either enforce uniqueness on theurlfield or pick a deterministic winner.- The update branch executes a cascade of attribute assignments; consider
update_fields=[...]orfilter().update()to avoid an extra SELECT/UPDATE round‑trip.
129-133: Logo refresh never happens when the URL changesThe logo is skipped if
org.logoalready exists, even if the upstreamlogo_urlchanged. Comparing the stored file name or a hash before skipping would keep logos up‑to‑date.
177-187: Duplicate tag fetches can be avoided
org.tags.add(tag)triggers a DB hit for every tag even though Django ignores duplicates. Collect tags first and use a singleadd(*tags)call to reduce chatter.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
website/management/commands/fetch_gsoc_orgs.py(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Run Tests
- GitHub Check: docker-test
Fixes #3264

You just have to run:
python manage.py fetch_gsoc_orgs --years 2020
if you want to fetch organizations for a specific year.
To fetch organizations for all years, simply run:
python manage.py fetch_gsoc_orgs
Summary by CodeRabbit
New Features
Bug Fixes
Enhancements