# Working Directory Structure

NPLinker requires a fixed structure of working directory with fixed names for the input and output
data.


```bash
root_dir # (1)!
    │
    ├── nplinker.toml                           # (2)!
    ├── strain_mappings.json                [F] # (3)!
    ├── strains_selected.json               [F][O] # (4)!
    │
    ├── gnps                                [F] # (5)!
    │       ├── spectra.mgf                 [F]
    │       ├── molecular_families.tsv      [F]
    │       ├── annotations.tsv             [F]
    │       └── file_mappings.tsv (.csv)    [F] # (6)!
    │
    ├── antismash                           [F] # (7)!
    │   ├── GCF_000514975.1
    │   │   ├── xxx.region001.gbk
    │   │   └── ...
    │   ├── GCF_000016425.1
    │   │   ├── xxxx.region001.gbk
    │   │   └── ...
    │   └── ...
    │
    ├── bigscape                            [F][O] # (8)!
    │   ├── mix_clustering_c0.30.tsv        [F]    # (9)!
    │   └── bigscape_running_output
    │       └── ...
    │
    ├── downloads                           [F][A] # (10)!
    │       ├── paired_datarecord_4b29ddc3-26d0-40d7-80c5-44fb6631dbf9.4.json # (11)!
    │       ├── GCF_000016425.1.zip
    │       ├── GCF_0000514975.1.zip
    │       ├── c22f44b14a3d450eb836d607cb9521bb.zip
    │       ├── genome_status.json
    │       └── mibig_json_3.1.tar.gz
    │
    ├── mibig                               [F][A] # (12)!
    │   ├── BGC0000001.json
    │   ├── BGC0000002.json
    │   └── ...
    │
    ├── output                              [F][A] # (13)!
    │   └── ...
    │
    └── ...                                        # (14)!
```

1. `root_dir` is the working directory you created, used as the root directory for NPLinker.
2. `nplinker.toml` is the configuration file (toml format) provided by the user for running NPLinker. 
3. `strain_mappings.json` contains the mappings from strain to genomics and metabolomics data. It is
    generated by NPLinker for `podp` mode; for `local` mode, users need to create it manually.<br>
    `[F]` means the file name `nplinker.toml` is a fixed name (including the extension) and must be
    named as shown.
4. `strains_selected.json` is an optional file containing the list of strains to be used in the analysis.
    If it is not provided, NPLinker will use all strains detected from the input data. <br>
    `[O]` means the file `strains_selected.json` is optional for users to provide.
5. `gnps` directory contains the GNPS data. The files in this directory **must** be named as shown.
    See XXX for more information about the GNPS data.
6. This file could be `.tsv` or `.csv` format.
7. `antismash` directory contains a collection of AntiSMASH BGC data. The BGC data (`*.region*.gbk` 
    files) must be stored in subdirectories named after NCBI accession number (e.g. `GCF_000514975.1`).
8. `bigscape` directory is optional and contains the output of BigScape. If the directory is not
    provided, NPLinker will run BigScape automatically to generate the data using the AntiSMASH BGC
    data.
9.  `mix_clustering_c0.30.tsv` is an example output of BigScape. The file name must follow the pattern
    `mix_clustering_c{cutoff}.tsv`, where `{cutoff}` is the cutoff value used in the BigScape run.
10. `downloads` directory is automatically created and managed by NPLinker. It stores the downloaded data
   from the internet. Users can also use it to store their own downloaded data. <br>
    `[A]` means the directory is automatically created and/or managed by NPLinker.
11. This is an example file, the actual file would be different. Same as the other files in
    the `downloads` directory.
12. `mibig` directory contains the MIBiG metadata, which is automatically created and downloaded by
     NPLinker. Users should not interfere with this directory and its content.
13. `output` directory is automatically created by NPLinker. It stores the output data of NPLinker.
14. It's flexible to extend NPLinker by adding other types of data.

!!! tip
    - `[F]` means the file or directory name is fixed and must be named as shown. The names are defined in the [defaults](../api/nplinker.md#nplinker.defaults) module.
    - `[O]` means the file or directory is optional for users to provide. It does not mean the file
    or directory is optional for NPLinker to use. If it's not provided by the user, NPLinker may generate
    it.
    - `[A]` means the directory is automatically created and/or managed by NPLinker.
