-
Notifications
You must be signed in to change notification settings - Fork 11
Description
Background:
The MiCall pipeline currently processes reads on per-real-sample basis and outputs an assembled consensus sequence for them. Each run relies on SampleSheet.csv files for input and output details. A feature to merge samples, ideally across different runs, would simplify the downstream analysis.
Feature Description:
Introduce a merger tool that takes a .csv mapping file and generates a merged SampleSheet.csv, RunInfo.xml, and a duplicate of the input .csv for traceability. The mapping file correlates sample_name and run_folder with output_name, specifying the merging plan.
Feature Objectives:
- Facilitate efficient sample mergers across different run folders.
- Ensure consistency and traceability for merged samples.
- Handle default values and conflicts in input
.csvfiles.
Functional Requirements:
- Input to the tool:
- Path to the mapping
.csvfile. - Path to the output folder.
- Path to the mapping
- Outputs of the tool:
SampleSheet.csvwith mergedoutput_namerecords.RunInfo.xmlcopied from the first associatedrun_folder.- Input
.csvfile to trace origins of merged data.
- Conflict resolution strategy, with a strict mode option (
--strictflag).
Conflict Resolution Rules:
project_nameheader field to follow the$current_date.mergedpattern.dateheader field to reflect the actual merge date.- All other fields should use the first observed value unless
--strictis enabled. - Fields
indexandindex2should default toXXXXX.
Implementation Tasks:
- Develop a merging script for the underlying sample files.
- Develop logic to parse the input
.csvand handle row defaults. - Implement conflict detection logic with stdout reporting.
- Create file generation procedures for
SampleSheet.csvandRunInfo.xml. - Build merging algorithm to create a consolidated
.csvfrom the mapping file. - Add a
--non-strictmode for conflict resolution, with it becoming the default. - Write unit tests to validate merging logic and conflict handling.
- Add documentation for the merger tool usage and features.