See https://ter-trees.ucsd.edu/data/krank/ for a catalog of libraries, and query reads that we had simulated for benchmarking. Descriptions of libraries and a tutorial can be found in the main GitHub repository.
results/cscores-10kSpecies_-_combined.csv: taxonomic classification results for all tools on WoL queriesresults/all_tools-profiling_evaluation-CAMI1_hc.tsv: abundance profiling metrics of all tools on CAMI-I high-complexity datasetresults/cscores-10kSpecies_-_with_sizes.csv: taxonomic classification results for all tools on WoL queries with library sizes of each toolresults/resultsCAMI2-marine.tsv: abundance profiling metrics of all tools on CAMI-II marine datasetresults/cscores-10kSpecies_-_KRANK-candidates.csv: comparison of different sizes and parameters for KRANK on WoL taxonomic classificationresults/cscores-10kSpecies_-_Kraken-II_4Gb.csv: taxonomic classification results of Kraken-II on WoL using 4GBresults/resultsCAMI2-strain_madness.tsv: abundance profiling metrics of all tools on CAMI-II strain-madness datasetresults/cscores-10kSpecies_-_CLARK.csv: taxonomic classification results of CLARK on WoLresults/cscores-10kSpecies_-_Kraken-II_16Gb.csv: taxonomic classification results of Kraken-II on WoL using 16GBresults/cscores-10kSpecies_-_CONSULT-II.csv: taxonomic classification results of CONSULT on WoL using the default configurationresults/running_times-query.tsv: query and library construction running timesresults/cscores-10kSpecies_-_KRANK-rankingkmers_comparison.csv: comparison of different heuristics for KRANK's selection on WoL taxonomic classificationresults/cscores-10kSpecies_-_Kraken-II_default.csv: taxonomic classification results of Kraken-II on WoL using the default parametersresults/cscores-10kSpecies_-_KRANK-sizeconst_comparison.csv: comparison of different heuristics for KRANK's size constraint on WoL taxonomic classification
data/ReferenceTaxonomy-nodes.dmp.gz: WoL-v1 taxonomy nodesdata/ref_taxa_counts.txt: genome counts for each taxon in WoL-v1 with rank informationdata/10kBacteria-metadata.tsv: WoL-v1 metadata including download links and additional information for reference genomesdata/query_genomes_list.txt: IDs of genomes used in read classification on WoL (download simulated reads here)data/ref_genome_counts: genome counts for each taxon in WoL-v1data/ReferenceTaxonomyRWoL-nodes.dmp.gz: WoL-v1 taxonomy nodes reduced to species setdata/taxonomy_lookup: taxonomy lookup table used by CONSULT-II, parent list of each taxondata/dist_wrt_lastcommonrank.csv: Jaccard similarity between randomly sampled genomes and their corresponding groupsdata/query_ranks.tsv: taxonomy information for query genomes, ground truth for evaluation of taxonomic classificationdata/reference_genomes_list: reference genomes and corresponding speciesdata/uDance-ranks_tid.tsv: WoL-v2 taxonomic ranks, some queries were retrieved from heredata/10kBacteria-ranks_tid.tsv: WoL-v1 taxonomic ranks, all genomes in the reference librarydata/dist_to_closest.txt: closest reference genome of each query genome and their genomic distance similarity estimated by Mashdata/download-links/all_download.txt: all download links for WoL-v2 used in uDancedata/download-links/download_final_extra_queries.txt: download links for genomes that are not used in CONSULT-II paperdata/download-links/genomes_uniq_uDance.txtdata/download-links/downloads-uDance_exc10k.txtdata/auxiliary/sampleg_dists.txtdata/auxiliary/dist-extra-to-closest.txtdata/auxiliary/uDance_exc10k-ranks_tid.tsvdata/auxiliary/uDance-genera_list.txtdata/auxiliary/uDance-species_list.txtdata/auxiliary/uDance_oneperfamily-ranks_tid.tsvdata/auxiliary/uDance_exc10k-order_infodata/auxiliary/closest_taxon_wrank.txtdata/auxiliary/dist-bacteria-to-closest.txtdata/auxiliary/uDance_exc10k-ranks_tid-downloadable.tsvdata/auxiliary/dist-to-closest.txtdata/auxiliary/dist-archaea-to-closest.txt
scripts/construct_taxonomy_lookup.py: constructs the taxonomy lookup table for CONSULT-II from a taxonomy nodes filescripts/shrink_taxdump.py: given taxonomy nodes and names files and a set of species, reduces taxonomy to the set of species of interestscripts/evaluate_CLARK.py: custom script to evaluate the read classification output of CLARK, computes TP/FP/TN/FN for each rank and each readscripts/evaluate_KRANK.py: custom script to evaluate the read classification output of KRANK, computes TP/FP/TN/FN for each rank and each readscripts/evaluate_CONSULTII.py: custom script to evaluate the read classification output of CONSULT-II, computes TP/FP/TN/FN for each rank and each readscripts/evaluate_KrakenII.py: custom script to evaluate the read classification output of Kraken-II, computes TP/FP/TN/FN for each rank and each readscripts/summarize_evaluations.py: summarize TP/FN/TN/FN counts across ranks and genomes, should be used with the output of above evaluate_*.py scriptsscripts/prepprocess_methods_psummary.py: uses distances in dist/dist_to_closest.txt to compute F1/precision/recall for different distance levelsscripts/prepprocess_methods_summary.py: uses distances in dist/dist_to_closest.txt to compute F1/precision/recall across different novelty binsscripts/prepprocess_methods_csummary.py: uses distances in dist/dist_to_closest.txt to compute F1/precision/recall across taxon sizesscripts/match_closest_taxon.pyscripts/dist_wrt_lastcommonrank.pyscripts/get_taxa_count.pyscripts/count_taxa.shscripts/find_closest_taxon.shscripts/resource_benchmarking.Rscripts/shared_kmers_analysis.Rscripts/profiling_cami2_analysis.Rscripts/profiling_tool_comparision.Rscripts/comparison_-_withCONSULT-II.Rscripts/size_const_comparison-10kSpecies.Rscripts/kmer_ranking_comparison-10kSpecies.Rscripts/classification_comparison-10kSpecies.Rscripts/numgenomes_per_taxon-violinplot-10kSpecies.Rscripts/summary_analysis_cami2.Rscripts/weight_dist_simulations.Rscripts/query_info.R
figures/query_details.pdffigures/profiling-cami2_combined-tool_comparison-l1_unifrac-avg_ranks.pdffigures/classification_comparison-main-10kSpecies.pdffigures/classification_comparison-wrt_memory.pdffigures/classification_comparison-defaultsPrecisionRecall-10kSpecies.pdffigures/classification_comparison-main_new-10kSpecies.pdffigures/profiling-cami2_combined-tool_comparison-completeness_purity.pdffigures/profiling-cami2_combined-tool_comparison-strain_madness.pdffigures/size_const_comparison-wrt_group_size.pdffigures/shared_kmers_portion.pdffigures/profiling-cami2_combined-tool_comparison-l1_unifrac-avg_all.pdffigures/classification_comparison-null_model-10kSpecies.pdffigures/running_time-query.pdffigures/classification_comparison-defaultsF1-10kSpecies.pdffigures/classification_comparison_-_withCONSULT-II.pdffigures/size_const_comparison-10kSpecies.pdffigures/improvement_new_profiling.pdffigures/expected_num_matches.pdffigures/improvement_profiling-genome_size_correction.pdffigures/shared_kmers_analysis.pdffigures/num_genomes_per_taxon.pdffigures/improvement_profiling-new_method.pdffigures/classification_comparison-varying_memory-10kSpecies.pdffigures/kmer_ranking_comparison-10kSpecies.pdffigures/profiling-cami2_combined-tool_comparison-l1_unifrac.pdffigures/krank-illustration.pdffigures/krank-illustration.keyfigures/profiling-cami2_combined-tool_comparison.pdffigures/distance_to_closest.pdffigures/profiling_tool_comparison.pdf