Reseek is a protein structure search and alignment algorithm which improves sensitivity in protein homolog detection compared to state-of-the-art methods including DALI, TM-align and Foldseek with similar speed to Foldseek.
Search a protein structure against AFDB, PDB or BFVD with typical results in 2 to 5 minutes.
On the SCOP40 benchmark test (see results later below), Reseek has substantially higher ability to discriminate homologs compared to previous algorithms including DALI, TM-align and Foldseek. This means that Reseek is better at sorting true homologs ahead of false positives.
Reseek also provides a much more accurate estimate of statistical significance (E-value), enabling users to set a cutoff based on an acceptable number of false positives for a given search, while DALI and Foldseek often over-estimate significance by 5 to 6 orders of magnitude (references below).
Reseek is based on sequence alignment where each residue in the protein backbone is represented by a letter in a novel “mega-alphabet” of 85,899,345,920 (∼1011) distinct structure states. This talk explains how it works.
Common commands -search # Alignment (e.g. DB search, pairwise, all-vs-all) -convert # Convert file formats (e.g. create DB) -alignpair # Pair-wise alignment and superposition Search against database reseek -search STRUCTS -db STRUCTS -output hits.txt # STRUCTS specifies structure(s), see below Recommended format for large database is .bca, e.g. reseek -convert /data/PDB_mirror/ -bca PDB.bca Align and superpose two structures reseek -alignpair 1XYZ.pdb -input2 2ABC.pdb -aln FILE # Sequence alignment (text) -output FILE # Rotated 1XYZ (PDB format) All-vs-all alignment reseek -search STRUCTS -output hits.txt Output options for -search -aln FILE # Alignments in human-readable format -output FILE # Hits in tabbed text format -columns name1+name2+name3... # Output columns, names are # query Query label # target Target label # qlo Start of aligment in query # qhi End of aligment in query # tlo Start of aligment in target # thi End of aligment in target # ql Query length # tl Target length # pctid Percent identity of alignment # cigar CIGAR string # pvalue P-value according to log-linear null model # evalue E-value according to log-linear null model # aq AQ (aln. qual., 0 to 1, >0.5 suggests homology) # qrow Aligned query sequence with gaps (local) # trow Aligned target sequence with gaps (local) # qrowg Aligned query sequence with gaps (global) # trowg Aligned target sequence with gaps (global) # std query+target+qlo+qhi+ql+tlo+thi+tl+pctid+evalue # default aq+query+target+evalue+pvalue Search and alignment options -fast, -sensitive or -verysensitive # Required -evalue E # Max E-value (default 10 unless -verysensitive) -omega X # Omega accelerator (floating-point) -minu U # K-mer accelerator (integer) -gapopen X # Gap-open penalty (floating-point >= 0) -gapext X # Gap-extend penalty (floating-point >= 0) -dbsize D # DB size (nr. chains) for E-value (default actual size) Convert between file formats reseek -convert STRUCTS [one or more output options] -cal FILENAME # .cal format, text with a.a. and C-alpha x,y,z -bca FILENAME # .bca format, binary .cal, recommended for DBs -fasta FILENAME # FASTA format Create input for Muscle-3D multiple structure alignment: reseek -pdb2mega STRUCTS -output structs.mega STRUCTS argument is one of: NAME.cif or NAME.mmcif # PDBx/mmCIF file NAME.pdb # Legacy format PDB file NAME.cal # C-alpha tabbed text format with chain(s) NAME.bca # Binary C-alpha, recommended for larger DBs NAME.files # Text file with one STRUCT per line, # may be filename, directory or .files DIRECTORYNAME # Directory (and its sub-directories) is searched # for known file types including .pdb, .files etc. Other options: -log FILENAME # Log file with errors, warnings, time and memory. -threads N # Number of threads, default number of CPU cores.
cd src/; chmod +x build_linux_x86.bash ; ./build_linux_x86.bash
cd src/ ; chmod +x build_osx_x86.bash ; ./build_osx_x86.bash
Load reseek.vcxproj
into Microsoft Visual Studio and use the Build command.
Don't worry about a warning something like this, it's expected:
warning: Using 'dlopen' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
Method sensitivity was measured on the SCOP40 benchmark using superfamily as the truth standard, focusing on the regime with false-positive error rates <10 per query, corresponding to E<10 for an ideal E-value.
https://github.com/rcedgar/reseek_bench
Edgar RC. "Protein structure alignment by Reseek improves sensitivity to remote homologs" (Bioinformatics 2024) Nov;40(11):btae687. https://academic.oup.com/bioinformatics/article/40/11/btae687/7901215
Edgar RC. and Sahakyan S. "Protein structure alignment significance is often exaggerated" (bioRxiv 2025) https://www.biorxiv.org/content/10.1101/2025.07.17.665375v1