Thanks to visit codestin.com
Credit goes to github.com

Skip to content

A Merger Tool #1036

@Donaim

Description

@Donaim

Background:
The MiCall pipeline currently processes reads on per-real-sample basis and outputs an assembled consensus sequence for them. Each run relies on SampleSheet.csv files for input and output details. A feature to merge samples, ideally across different runs, would simplify the downstream analysis.

Feature Description:
Introduce a merger tool that takes a .csv mapping file and generates a merged SampleSheet.csv, RunInfo.xml, and a duplicate of the input .csv for traceability. The mapping file correlates sample_name and run_folder with output_name, specifying the merging plan.

Feature Objectives:

  1. Facilitate efficient sample mergers across different run folders.
  2. Ensure consistency and traceability for merged samples.
  3. Handle default values and conflicts in input .csv files.

Functional Requirements:

  • Input to the tool:
    • Path to the mapping .csv file.
    • Path to the output folder.
  • Outputs of the tool:
    • SampleSheet.csv with merged output_name records.
    • RunInfo.xml copied from the first associated run_folder.
    • Input .csv file to trace origins of merged data.
  • Conflict resolution strategy, with a strict mode option (--strict flag).

Conflict Resolution Rules:

  • project_name header field to follow the $current_date.merged pattern.
  • date header field to reflect the actual merge date.
  • All other fields should use the first observed value unless --strict is enabled.
  • Fields index and index2 should default to XXXXX.

Implementation Tasks:

  • Develop a merging script for the underlying sample files.
  • Develop logic to parse the input .csv and handle row defaults.
  • Implement conflict detection logic with stdout reporting.
  • Create file generation procedures for SampleSheet.csv and RunInfo.xml.
  • Build merging algorithm to create a consolidated .csv from the mapping file.
  • Add a --non-strict mode for conflict resolution, with it becoming the default.
  • Write unit tests to validate merging logic and conflict handling.
  • Add documentation for the merger tool usage and features.

Metadata

Metadata

Assignees

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions