Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@igennova
Copy link
Contributor

@igennova igennova commented Apr 17, 2025

Fixes #3264
image
You just have to run:
python manage.py fetch_gsoc_orgs --years 2020
if you want to fetch organizations for a specific year.
To fetch organizations for all years, simply run:
python manage.py fetch_gsoc_orgs

Summary by CodeRabbit

  • New Features

    • Added support for fetching Google Summer of Code (GSoC) organizations data across multiple years.
    • Organizations now display year-specific participation tags and a general GSoC tag.
    • Participation history for each organization is tracked and updated.
  • Bug Fixes

    • Improved deduplication to update existing organizations by website URL, reducing duplicate entries.
  • Enhancements

    • Enhanced logging with clearer year context for better traceability.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Apr 17, 2025

Walkthrough

The command for fetching Google Summer of Code (GSoC) organizations has been updated to allow retrieval of data for multiple years, not just a single year. The script now supports command-line arguments for specifying years, and iterates through each year to fetch and process organizations. Deduplication is improved by checking for existing organizations via website URL, and organizations are tagged both with a general "gsoc" tag and a year-specific tag. The organizations' participation years are tracked, and logging now includes year context. Error handling and tag assignment logic have also been refined.

Changes

File(s) Change Summary
website/management/commands/fetch_gsoc_orgs.py Refactored to support multi-year fetching, added CLI arguments, improved deduplication by URL, year tagging, and participation tracking. Added/modified methods for argument parsing, per-year fetching, and organization processing. Logging and error handling enhanced.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Command
    participant API
    participant DB

    User->>Command: Run fetch_gsoc_orgs with years specified
    loop For each year
        Command->>API: Fetch organizations for {year}
        API-->>Command: Return organization data
        loop For each organization
            Command->>DB: Check if org exists by URL
            alt Exists
                Command->>DB: Update existing org, add year/tag
            else Not exists
                Command->>DB: Create new org, add year/tag
            end
        end
    end
    Command->>User: Output summary/log for all years
Loading

Assessment against linked issues

Objective Addressed Explanation
Add all previous GSOC organizations to organizations (#3264)
Add all the projects to projects (#3264) No logic for fetching/adding projects is present.
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (5)
website/management/commands/fetch_gsoc_orgs.py (5)

33-41: CLI help and flag semantics drift from reality

  1. The help string and --current-only description hard‑code “2025”, but the constant actually resolves to GSOC_YEARS[0] (currently 2024).
  2. Users have to remember to update two places whenever we roll over to a new year.

Tie the wording to LATEST_YEAR (see previous comment) so the help stays accurate automatically.


79-82: Error branch discards useful context

Returning 0 on exception means the caller cannot distinguish between “API returned 0 organisations” and “network failure”. Propagate the exception or return None/-1 so callers (and automated jobs) can tell failures from empty results.


88-110: existing_orgs handling is fragile & N+1 prone

  1. When url is empty the variable is set to None, yet later we call existing_orgs and existing_orgs.exists(). Works, but obscures intent—prefer an always‑queryset pattern.
  2. Multiple rows with the same URL return arbitrary .first(). Either enforce uniqueness on the url field or pick a deterministic winner.
  3. The update branch executes a cascade of attribute assignments; consider update_fields=[...] or filter().update() to avoid an extra SELECT/UPDATE round‑trip.

129-133: Logo refresh never happens when the URL changes

The logo is skipped if org.logo already exists, even if the upstream logo_url changed. Comparing the stored file name or a hash before skipping would keep logos up‑to‑date.


177-187: Duplicate tag fetches can be avoided

org.tags.add(tag) triggers a DB hit for every tag even though Django ignores duplicates. Collect tags first and use a single add(*tags) call to reduce chatter.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1a4840a and 0cda44d.

📒 Files selected for processing (1)
  • website/management/commands/fetch_gsoc_orgs.py (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: Run Tests
  • GitHub Check: docker-test

@DonnieBLT DonnieBLT added this pull request to the merge queue Apr 18, 2025
Merged via the queue into OWASP-BLT:main with commit ec032ea Apr 18, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add all previous GSOC organizations to organizations

2 participants