Thanks to visit codestin.com
Credit goes to github.com

Skip to content

fix: forward ticket_detection_config from GitAnalyzer to TicketExtractor#50

Merged
bobmatnyc merged 1 commit into
bobmatnyc:mainfrom
maui314159:fix/analyzer-ticket-detection-config
Apr 27, 2026
Merged

fix: forward ticket_detection_config from GitAnalyzer to TicketExtractor#50
bobmatnyc merged 1 commit into
bobmatnyc:mainfrom
maui314159:fix/analyzer-ticket-detection-config

Conversation

@maui314159
Copy link
Copy Markdown
Contributor

Summary

GitAnalyzer.__init__ builds its in-memory TicketExtractor without forwarding ticket_detection_config. As a result the analyze-stage re-extraction (the Analyzing commits for tickets pass) silently falls back to the hard-coded default platform patterns (jira / github / clickup / linear), ignoring whatever the user supplied under analysis.ticket_detection.patterns, exclude_patterns, or position in the YAML config.

GitDataFetcher already forwards the config correctly (core/data_fetcher.py:84), so the bug only surfaces on cache-hit re-runs where data fetching is skipped and the in-memory extractor's defaults take over. Coverage numbers and platform attribution between a fresh fetch and a cached re-run diverge for any user with custom patterns.

Buggy call site

src/gitflow_analytics/core/analyzer.py:63-68:

```python
self.ticket_extractor = build_ticket_extractor(
allowed_platforms=allowed_ticket_platforms,
ml_config=ml_categorization_config,
llm_config=llm_config,
cache_dir=cache.cache_dir / "ml_predictions",
# ticket_detection_config is missing here
)
```

Real-world impact

Analyzing a 26-repo Azure DevOps codebase with custom `AB#NNNN` / `NNNNNN-` patterns, the official report shows 39.8% ticket coverage, but a direct call to `TicketExtractor` instantiated from the same config produces 60.7%. The user-supplied config is silently ignored on every cached run.

Reproduction

  1. Set `analysis.ticket_detection.patterns: { github: '\\bABCD-(\\d+)\\b' }` and `ticket_platforms: [github]` in config.yaml.
  2. Run `gitflow-analytics analyze` against a repo whose commits contain `ABCD-NN` references but no `#NN` references.
  3. Fresh run (`--clear-cache`): tickets extracted via the custom regex; coverage > 0%.
  4. Cached re-run (no flags): the same config produces default-regex coverage instead — `#NN` matches everywhere, custom regex appears unused.

Fix

  • Thread `ticket_detection_config` through `GitAnalyzer.init` to `build_ticket_extractor()`, matching the existing pattern in `GitDataFetcher`.
  • Update the five `GitAnalyzer(...)` call sites (`pipeline_report`, `cli_analysis_orchestrator`, `cli_identity_commands` ×2, `training.pipeline`) to pass `cfg.analysis.ticket_detection`.
  • Adds two regression tests in `tests/core/test_analyzer.py`: one verifying custom patterns reach the extractor, one confirming default-config backward compatibility.

Test plan

  • `pytest tests/core/test_analyzer.py -v` — 11 passed (2 new + 9 existing)
  • `pytest tests/core/ tests/extractors/` — 255 passed
  • Verified the new regression test fails on unfixed code with `TypeError: GitAnalyzer.init() got an unexpected keyword argument 'ticket_detection_config'`
  • `ruff check` clean on changed files
  • `black --check` clean on changed files (pre-existing formatting drift in unrelated lines was deliberately not touched, to keep the diff focused on the fix)

Notes

The bug only surfaces on cache-hit re-runs because `GitDataFetcher` (which runs on cache miss) already forwards the config correctly. First-time analyses see correct coverage; the divergence appears on the second run unless `--clear-cache` is used. This makes the bug easy to miss in development but consistently misleading in long-lived deployments.

GitAnalyzer.__init__ silently dropped the user's ticket_detection settings
when building its in-memory TicketExtractor. The analyze-stage re-extraction
("Analyzing commits for tickets") therefore fell back to hard-coded default
platform patterns (jira/github/clickup/linear), ignoring any
analysis.ticket_detection.patterns / exclude_patterns / position from
the config.

GitDataFetcher already forwards this config correctly, so the bug only
surfaced on cache-hit re-runs where the data-fetch step is skipped and
the in-memory extractor's defaults take over. Coverage numbers and
platform attribution between fresh fetches and cached runs diverged for
any user with custom patterns.

Fix: thread ticket_detection_config through GitAnalyzer.__init__ to
build_ticket_extractor(), matching the existing pattern in
GitDataFetcher. Update all five callers (pipeline_report,
cli_analysis_orchestrator, cli_identity_commands x2, training.pipeline)
to pass it.

Adds a regression test that fails on unfixed code with TypeError and
verifies that custom patterns actually reach the extractor when supplied.
Copy link
Copy Markdown
Owner

@bobmatnyc bobmatnyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean fix. The bug is real — TicketExtractor in GitAnalyzer was silently falling back to defaults on cached re-runs while GitDataFetcher already forwarded the config correctly. The fix is surgical and consistent with the existing pattern. Tests cover both the regression case and backward compatibility. LGTM.

@bobmatnyc bobmatnyc merged commit c06e5e3 into bobmatnyc:main Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants