Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Tool for quick offline batch conversion of Genbank IDs or accessions to taxonomy strings

Notifications You must be signed in to change notification settings

richardmleggett/acc2tax

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 

Repository files navigation

acc2tax
[email protected]

Given a file of accessions or Genbank IDs (one per line), this program will return a taxonomy string for each.

Lookup for Genbank IDs is quicker than for accessions, as the lookup table is stored in RAM (though this does mean it takes a couple of minutes to load). For accessions, the lookup is from disc.

Database files can be downloaded from:
ftp://ftp.ncbi.nih.gov/pub/taxonomy

The files required are:
nodes.dmp
names.dmp
gi_taxid_nucl.dmp
gi_taxid_prot.dmp

For accessions, you will need a merged sorted copy of some of the files from the accession2taxid directory. You need to:
cat nucl_est.accession2taxid nucl_gb.accession2taxid nucl_gss.accession2taxid nucl_wgs.accession2taxid dead_nucl.accession2taxid dead_wgs.accession2taxid | sort > nucl_all.txt
cat prot.accession2taxid dead_prot.accession2taxid | sort > prot_all.txt
and place thsese files in the same directory as the above database files.

To compile, type:
cc -o acc2tax acc2tax.c

For details of running, type:
acc2tax -h


About

Tool for quick offline batch conversion of Genbank IDs or accessions to taxonomy strings

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published