feat: support fasterp in fastp module#3392
Conversation
|
@multiqc-bot fix linting |
|
Starting off with an automated review: /review |
SummaryGreat work on adding the fasterp module! The implementation closely follows the fastp module pattern, which is appropriate given fasterp's compatible interface. However, there are two critical issues blocking this PR:
Detailed ReviewCritical Issues ❌1. Missing Test Data FilesThe tests expect data files at Action Required: Add test data files to 2. Type Annotation Errors (mypy)The module has several missing type annotations that cause mypy failures on Python 3.14: Lines 59-66 - Missing type annotations for class attributes: # Current (fails mypy):
self.fasterp_data = dict()
self.fasterp_duplication_plotdata = dict()
# ... etc
# Should be:
self.fasterp_data: Dict[str, Dict[str, float]] = {}
self.fasterp_duplication_plotdata: Dict[str, Dict[int, float]] = {}
self.fasterp_insert_size_data: Dict[str, Dict[int, float]] = {}
self.fasterp_all_data: Dict[str, Any] = {}
self.fasterp_qual_plotdata: Dict[str, Dict[str, Dict[int, float]]] = {}
self.fasterp_gc_content_data: Dict[str, Dict[str, Dict[int, float]]] = {}
self.fasterp_n_content_data: Dict[str, Dict[str, Dict[int, float]]] = {}
self.fasterp_overrepresented_sequences: Dict[str, Dict[str, Dict[str, int]]] = {}Lines 530-532 in # Current (fails mypy):
cnt_by_sample = defaultdict(Counter)
cnt_by_seq = Counter()
pct_by_seq = Counter()
# Should be:
cnt_by_sample: defaultdict[str, Counter[str]] = defaultdict(Counter)
cnt_by_seq: Counter[str] = Counter()
pct_by_seq: Counter[str] = Counter()Line 400 - Type error in Note: These same issues exist in the fastp module (multiqc/modules/fastp/fastp.py:70, 404, 534-536), so you're in good company! But since this is new code, it should meet current standards. Code Quality ✅Excellent Aspects:
Minor Issues & Suggestions 💡1. Module Ordering (multiqc/modules/fasterp/fasterp.py:81-83)The # Add software version
self.add_software_version(None)
# General Stats Table
self.fasterp_general_stats_table(self.fasterp_data)
# ... sections ...
# Save entire original parsed JSON
self.write_data_file(self.fasterp_all_data, "multiqc_fasterp")This is actually correct! However, the fastp module has these in a different order (write_data_file before sections), which may be worth noting for consistency. 2. Version Detection (Line 393-396)The code checks for both if "fastp_version" in parsed_json.get("summary", {}):
self.add_software_version(parsed_json["summary"]["fastp_version"], s_name)
elif "fasterp_version" in parsed_json.get("summary", {}):
self.add_software_version(parsed_json["summary"]["fasterp_version"], s_name)Question: Does fasterp actually output a 3. Config Option Documentation (Line 23)The docstring mentions config option: fasterp:
s_name_filenames: trueHowever, the code properly uses the modern # Recommended (modern approach):
use_filename_as_sample_name: ["fasterp"]
# Deprecated (still works but shows warning):
fasterp:
s_name_filenames: true4. Search Pattern Collision (multiqc/search_patterns.yaml:361-365)The search pattern for fasterp is identical to fastp: fasterp:
fn: "*.json"
contents: '"before_filtering": {'
num_lines: 50Potential Issue: Both modules will try to parse the same JSON files. This could lead to:
Suggestion: Can fasterp output be distinguished from fastp output? If fasterp includes a unique field (like 5. Test File OrganizationThe test file
Security Considerations ✅No security issues identified:
Performance Considerations ✅
Documentation 📝Good:
Suggestions:
Action ItemsBefore this PR can be merged:
Once these are addressed (especially items 1-2), this will be a solid contribution! The code is well-written and follows the project's conventions closely. Great work overall! Looking forward to seeing this merged. 🚀 |
|
@drbh - the review picked up the identical file search to fastp. Could you elaborate on this a little? It's not great having two modules that parse the same files. If fastp and fasterp have identical outputs then it's better to have a single module really (see also: Kraken and Bracken, for example). |
|
Thanks for the review! @ewels Thanks for the review! You're right - fasterp produces identical JSON output to fastp, so a separate module isn't needed. I've reworked this PR to just extend the fastp module. The only change is detecting the fasterp_version field in the JSON summary so the version is tracked separately. The standalone fasterp module has been removed. |
4ddc0e7 to
b7114e8
Compare
|
@multiqc-bot fix linting |
This PR adds the fasterp tool as a new module. Since fasterp exposes the same interface as fastp the module is very similar.
Please let me know if any changes are needed! Thanks!
tests
output
========================= test session starts ========================= platform darwin -- Python 3.11.1, pytest-7.4.4, pluggy-1.3.0 -- /Users/drbh/.pyenv/versions/3.11.1/bin/python3.11 cachedir: .pytest_cache rootdir: /Users/drbh/Projects/MultiQC configfile: pyproject.toml testpaths: tests, multiqc/modules plugins: syrupy-4.6.0, asyncio-0.23.7, anyio-3.7.1, typeguard-4.4.4 asyncio: mode=Mode.STRICT collected 526 items / 520 deselected / 6 selected tests/test_fasterp.py::test_fasterp_module PASSED [ 16%] tests/test_fasterp.py::test_fasterp_json_parsing PASSED [ 33%] tests/test_fasterp.py::test_fasterp_data_structure PASSED [ 50%] tests/test_fasterp.py::test_fasterp_vs_fastp_compatibility PASSED [ 66%] tests/test_modules_run.py::test_all_modules[fasterp-entry_point40] PASSED [ 83%] tests/test_modules_run.py::test_ignore_samples[fasterp-entry_point40] PASSED [100%] ========================== warnings summary =========================== ../../.pyenv/versions/3.11.1/lib/python3.11/site-packages/_pytest/config/__init__.py:1316 /Users/drbh/.pyenv/versions/3.11.1/lib/python3.11/site-packages/_pytest/config/__init__.py:1316: PytestRemovedIn8Warning: The --strict option is deprecated, use --strict-markers instead. self.issue_config_time_warning( -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html ============ 6 passed, 520 deselected, 1 warning in 1.23s =============