Thanks to visit codestin.com
Credit goes to github.com

Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
78 commits
Select commit Hold shift + click to select a range
d510353
First commit
asishallab Jan 30, 2012
1c84fa2
Publishing source code to github
Feb 27, 2012
98eb38c
Added COMMON DEVELOPMENT AND DISTRIBUTION LICENSE (CDDL)
Feb 27, 2012
669972c
Removed .svn directories, switching to hosting on github with git
Feb 27, 2012
24a334c
Removed ahrd-output-files generated by running AHRD's test-suite.
Feb 27, 2012
31c5abe
Removing directory ./bin introduced by default settings of Eclipse, s…
Feb 27, 2012
4c4ef60
Everything in ./classes should be build on each system, as ./dist/ahr…
Feb 27, 2012
c52bae4
Adjusting .gitignore
Feb 27, 2012
7690568
.gitignore does not interpret './' before directory-paths. Removed th…
Feb 27, 2012
b73b548
Ignoring binary distribution dist/ahrd.jar
Feb 27, 2012
4e0d02c
Switching to usage of textile for README
Feb 27, 2012
e769a9d
Formatting README
Feb 27, 2012
3e7dc48
Adding formulae as images and including formulae and algorithm into R…
Feb 27, 2012
4ac3791
Adding image of all formulae used in AHRD.
Feb 27, 2012
cb17af1
Added display of image formulae in README
Feb 27, 2012
921d561
Correcting image-url in README
Feb 27, 2012
bee6e0c
Correcting image-url in README
Feb 27, 2012
2c0bc15
Correcting image-url in README
Feb 27, 2012
4f97c11
Adding explication of AHRD-parameters to README
Feb 27, 2012
621b808
Formatting parameter-pre's
Feb 27, 2012
011e18a
Formatting parameter-pre's
Feb 27, 2012
b026ab5
Formatting parameter-pre's
Feb 27, 2012
2a0dc5c
Corrected AHRD-Test-Run
Feb 27, 2012
84565b3
Enhanced layout of README
Feb 27, 2012
015161a
Adding Eclipse hidden project-files
Feb 27, 2012
7f4de10
Added explanation and example call for Batcher.
Feb 27, 2012
973a42e
Added authors to README
Feb 27, 2012
3f05cc9
Added authors to README
Feb 27, 2012
c3103f9
Added authors to README
Feb 27, 2012
d7b446b
Added documentation on AHRD quality codes.
Feb 27, 2012
b3de770
'Their' meaning, not 'There'
Feb 27, 2012
1401de1
Added explanation of output.
Feb 27, 2012
2241482
Added explanation of output.
Feb 27, 2012
6c329c9
Corrected explanation of AHRD-Quality-Code.
Feb 27, 2012
63841d1
Corrected input file used to start a AHRD test run.
Feb 27, 2012
a39e0fc
Better explanation of Blast-Database-Usage.
Feb 27, 2012
6310379
Better explanation of Parameters.
Feb 27, 2012
f79bd87
FIne tuned pre-tags
Feb 27, 2012
323e921
Removed typo
Feb 27, 2012
0ca48f5
Removed typo
Feb 27, 2012
5e21da4
Removed typo
Feb 27, 2012
b5fc676
Added new method to infer highest possible achievable evaluation scor…
Feb 28, 2012
bbb3143
Added git clone information to README
Mar 7, 2012
4620a7c
Highlighting ahrd_input_test_run.yml in README
Mar 7, 2012
777d70a
Moving repository to groups new one: https://github.com/groupschoof/AHRD
Mar 8, 2012
a899b4b
Moving repository to groups new one: https://github.com/groupschoof/AHRD
Mar 8, 2012
11c3ae7
Need to setup new repository before announcing to head there
Mar 8, 2012
b510e3d
Put moved and deprecated message into documentation.
asishallab Mar 13, 2012
c02cc5c
Finished introduction of Berkeley-Db and ReferenceProteins. Needs fir…
asishallab Sep 8, 2016
13d3815
Test passes: ReferenceProteins can be created and read out, even pers…
asishallab Sep 8, 2016
5006aa5
Access to persistent ReferenceProteins also works in Read-Only-Mode.
asishallab Sep 8, 2016
9287058
AHRD now successfully parses Fasta Blast-Databases and generates pers…
asishallab Sep 8, 2016
a9ceeaf
Successfully parsing reference GO annotations and persisting them in …
asishallab Sep 8, 2016
2f36c56
Wrote a controller to create or update AHRD's database of reference s…
asishallab Sep 9, 2016
2e7d1d6
All tests pass after introduction of Berkeley-Db. YEAH, baby.
asishallab Sep 9, 2016
2f51e90
AHRD's test-run now deletes AHRD_Database after finalization.
asishallab Sep 9, 2016
728ca91
Introducing new AHRD version with Berkeley-Db
asishallab Sep 9, 2016
d214c19
Added DatabaseSetup example files
asishallab Sep 9, 2016
b133ed8
Updated documentation about the new database
groupschoof Sep 9, 2016
25ecbed
Merge branch 'berkeley_db' of https://github.com/groupschoof/AHRD int…
asishallab Sep 9, 2016
abcc1e8
AHRD now uses 90 percent of the available memory for caching its data…
asishallab Sep 10, 2016
ebf2dd6
Switching to non-transactional and deferred-write to speed up things.…
asishallab Sep 10, 2016
f9872a6
Inserted missing check for option 'prefer_reference_with_go_annos'
asishallab Sep 14, 2016
c6fba80
Added missing closing of AHRD-DB in Evaluator
groupschoof Sep 17, 2016
b69e8cc
Added missing closing of AHRD-DB in Trainer
groupschoof Sep 17, 2016
f1789f2
Added missing closing of AHRD-DB in Trainer
asishallab Sep 18, 2016
205ffa8
Merge branch 'berkeley_db' of https://github.com/groupschoof/AHRD int…
asishallab Sep 18, 2016
90567fc
Enabling multiple JVMs to read from the same Berkeley-DB
asishallab Sep 18, 2016
7c524c2
Fixed Eclipse-Classpath
asishallab Sep 20, 2016
b63dea6
Evaluator can now return NaN F1-Score in case of no reference tokens
asishallab Sep 23, 2016
5bd10e2
Double.NaN is written as 'NA' into output files.
asishallab Sep 23, 2016
18d0b05
Double.NaN is written as 'NA' into output files.
asishallab Sep 23, 2016
a96c6f7
NaN or infinite numbers are written as 'NA' into output files.
asishallab Sep 23, 2016
f189532
Updated documentation to explain blacklisting and filtering when comp…
groupschoof Sep 24, 2016
585c96d
Docu: Fixed link to F-Score computation
groupschoof Sep 24, 2016
b1e05db
Merged berkeley_db and master
asishallab Jun 14, 2018
eb5cd3e
Recommending 'prefer_reference_with_go_annos: false'
asishallab Jun 14, 2018
6601e06
Merge remote-tracking branch 'asishallab/master'
asishallab Jun 14, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .classpath
Original file line number Diff line number Diff line change
Expand Up @@ -21,5 +21,6 @@
<classpathentry kind="lib" path="lib/xom-1.2.6.jar"/>
<classpathentry kind="lib" path="lib/yamlbeans-1.06.jar"/>
<classpathentry kind="lib" path="lib/mysql-connector-java-5.1.35-bin.jar"/>
<classpathentry kind="lib" path="lib/je-7.0.6.jar"/>
<classpathentry kind="output" path="classes"/>
</classpath>
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -37,3 +37,4 @@ trainer_batch_ymls
.externalToolBuilders/
.project
/classes/
AHRD_DB
21 changes: 18 additions & 3 deletions README.textile
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ h2. Table of contents
### "Build the executable jar":#122-build-the-executable-jar
# "Usage":#2-usage
## "AHRD example usages":#21-ahrd-example-usages
### "AHRD's Database":#211-ahrds-database
## "Input":#22-input
### "Required input data":#221-required-input-data
### "Optional input data":#222-optional-input-data
Expand Down Expand Up @@ -53,9 +54,9 @@ h4. 1.2.1 Get AHRD
Copy (clone) AHRD to your computer using git via command-line, then change into AHRD's directory, and finally use the latest stable version:
<pre>git clone https://github.com/groupschoof/AHRD.git
cd AHRD
git checkout tags/v3.3.3</pre>
git checkout tags/v3.4</pre>

Alternativelly without using @git@, you can download AHRD version @v3.3.3@ ("zip":https://github.com/groupschoof/AHRD/archive/v3.3.3.zip or "tar.gz":https://github.com/groupschoof/AHRD/archive/v3.3.3.tar.gz) and extract it.
Alternativelly without using @git@, you can download AHRD version @v3.4@ ("zip":https://github.com/groupschoof/AHRD/archive/v3.4.zip or "tar.gz":https://github.com/groupschoof/AHRD/archive/v3.4.tar.gz) and extract it.

h4. 1.2.2 Build the executable jar

Expand All @@ -75,6 +76,8 @@ All parameters can be set manually, or the default ones can be used as given in

In order to parallelize the protein function annotation processes, AHRD can be run on batches of recommended size between 1,000 to 2,000 proteins. If you want to annotate very large protein sets or have low memory capacities use the included Batcher to split your input-data into Batches of appropriate size (see section "2.3":#23-batcher). _Note:_ As of Java 7 or higher AHRD is quite fast and batching might no longer be necessary.

AHRD extracts some information about the reference proteins into a persitent database. As in tabular sequence similarity search outputs (Blast, Blat, Diamand etc) the description lines and the reference (Hit) lengths are not stored, this information has to be extracted from the original reference protein databases (in Fasta Format). Furthermore the user can provide a _single_ Gene Ontology Annotation (GOA) file for _all_ reference proteins, which will be parsed, too. If GOAs are available for the reference proteins AHRD will also annotate the query proteins with GO Terms (see section "3.3.1":#331-parameters-controlling-the-parsing-of-tabular-sequence-similarity-search-result-tables-legacy-blast-blast-and-blat).

h3. 2.1 AHRD example usages

There are _two_ template AHRD input files provided that you should use according to your use case. All example input files are stored in @./test/resources@ and are named @ahrd_example_input*.yml@. You can run AHRD on any of these use cases with <pre>java -Xmx2g -jar ./dist/ahrd.jar your_use_case_file.yml</pre>
Expand All @@ -83,6 +86,18 @@ There are _two_ template AHRD input files provided that you should use according
| Annotate your Query proteins with Human Readable Descriptions (HRD) | @./test/resources/ahrd_example_input.yml@ |
| Annotate your Query proteins with HRD _and_ Gene Ontology (GO) terms | @./test/resources/ahrd_example_input_go_prediction.yml@ |

h4. 2.1.1 AHRD's Database

_Do not worry_: You do not have to take care of or setup AHRD's database - it does that automatically for you. If AHRD does not find a database when run, it will extract all required information from the provided reference sequence databases (Fasta-Format) and optionally the provided Gene Ontology Annotation (GOA) file. By default, extracted information will be stored in the _current working directory_ in the database-directory @AHRD_DB@. You can use an alternative database-directory by providing the optional parameter @ahrd_db: path/2/your_db_dir@. If AHRD is started and finds an existing database it will not parse the reference fasta databases nor the GOA files and will expect to find the required information in the AHRD-Database instead. _Note_, that AHRD will terminate with an error if Blast-Hits are found that are not in its database!

If you want or need to, you can _just_ setup an AHRD-Database without running the analysis. In order to do so use the following command:

<pre>
java -Xmx20g -cp ./dist/ahrd.jar ahrd.controller.DatabaseSetup ./ahrd_example_database_setup_input.yml
</pre>

See file @ahrd_example_database_setup_input.yml@ for more details.

h3. 2.2 Input

Example files for all input files can be found under @./test/resources/@. _NOTE:_ Only files containing @example@ in their filename should be used as template input files. Other YAML files are used for testing purposes.
Expand Down Expand Up @@ -320,7 +335,7 @@ To have AHRD annotate your proteins with GO terms, you just need to provide the

h5. 3.3.2.0 Prefer reference proteins as candidates that have GO Term annotations

The parameter @prefer_reference_with_go_annos: true@ is highly recommended when aiming to annotate your query proteins with GO Terms. If this parameter is set to true only those candidate references are considered that also have GO Term annotations. However, if you put more emphasis on Human Readable Descriptions and are prepared to accept a couple of your queries to not get any GO Term predictions you can switch this off with @prefer_reference_with_go_annos: false@ or just omit the parameter as by default it is set to @false@.
The parameter @prefer_reference_with_go_annos: true@ can be used when aiming to annotate your query proteins with GO Terms. If this parameter is set to true only those candidate references are considered that also have GO Term annotations. However, if you put more emphasis on Human Readable Descriptions and are prepared to accept a couple of your queries to not get any GO Term predictions you can switch this off with @prefer_reference_with_go_annos: false@ or just omit the parameter as by default it is set to @false@. We recommend the default behaviour.

h5. 3.3.2.1 Custom reference Gene Ontology annotations (non UniprotKB GOA)

Expand Down
1 change: 1 addition & 0 deletions ahrd_example_database_setup_input.yml
2 changes: 2 additions & 0 deletions build.xml
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@
</junit>
<delete file="test/ahrd_output.csv" />
<delete file="test/sim_anneal_path_log.csv" />
<delete dir="AHRD_DB" />
</target>

<target name="test.run" depends="compile.test">
Expand All @@ -89,6 +90,7 @@
<formatter type="plain" usefile="false" />
<test name="ahrd.test.AhrdTestRun" />
</junit>
<delete dir="AHRD_DB" />
</target>

<target name="test.regexs" depends="compile.test">
Expand Down
Binary file added lib/je-7.0.6.jar
Binary file not shown.
94 changes: 18 additions & 76 deletions src/ahrd/controller/AHRD.java
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
package ahrd.controller;

import static ahrd.controller.DatabaseSetup.setupOrUseExistingDatabase;
import static ahrd.controller.Settings.getSettings;
import static ahrd.controller.Settings.setSettings;
import static ahrd.model.ReferenceGoAnnotations.parseReferenceGoAnnotations;
import static ahrd.model.AhrdDb.closeDb;

import java.io.IOException;
import java.sql.SQLException;
Expand All @@ -19,7 +20,6 @@
import ahrd.exception.MissingInterproResultException;
import ahrd.exception.MissingProteinException;
import ahrd.model.BlastResult;
import ahrd.model.GOterm;
import ahrd.model.InterproResult;
import ahrd.model.Protein;
import ahrd.view.FastaOutputWriter;
Expand All @@ -29,15 +29,12 @@

public class AHRD {

public static final String VERSION = "3.11";
public static final String VERSION = "3.4";

private Map<String, Protein> proteins;
private Map<String, Double> descriptionScoreBitScoreWeights = new HashMap<String, Double>();
private Map<String, Set<String>> referenceGoAnnotations;
private Set<String> uniqueBlastResultShortAccessions;
private long timestamp;
private long memorystamp;
private Map<String, GOterm> goDB;

protected long takeTime() {
// Measure time:
Expand All @@ -55,14 +52,12 @@ protected long takeMemoryUsage() {
}

public static void main(String[] args) {
System.out.println("Usage:\njava -Xmx2g -jar ahrd.jar input.yml\n");
System.out.println("Usage:\njava -Xmx30g -jar ahrd.jar input.yml\n");

try {
AHRD ahrd = new AHRD(args[0]);
// Load and parse all inputs
ahrd.setup(true);
// After the setup the unique short accessions are no longer needed:
ahrd.setUniqueBlastResultShortAccessions(null);

// Iterate over all Proteins and assign the best scoring Human
// Readable Description
Expand All @@ -77,11 +72,13 @@ public static void main(String[] args) {
// Log
System.out.println("Wrote output in " + ahrd.takeTime() + "sec, currently occupying "
+ ahrd.takeMemoryUsage() + " MB");

System.out.println("\n\nDONE");
} catch (Exception e) {
System.err.println("We are sorry, an un-expected ERROR occurred:");
e.printStackTrace(System.err);
} finally {
closeDb();
}
}

Expand All @@ -107,24 +104,20 @@ public static IOutputWriter initializeOutputWriter(Collection<Protein> proteins)
public AHRD(String pathToYmlInput) throws IOException {
super();
setSettings(new Settings(pathToYmlInput));
// The following fields are only used if AHRD is requested to generate
// Gene Ontology term annotations:
if (getSettings().hasGeneOntologyAnnotations()) {
this.setUniqueBlastResultShortAccessions(new HashSet<String>());
this.setReferenceGoAnnotations(new HashMap<String, Set<String>>());
}
}

public void initializeProteins() throws IOException, MissingAccessionException {
setProteins(Protein.initializeProteins(getSettings().getProteinsFasta()));
}

public void parseBlastResults() throws IOException, MissingProteinException, SAXException {
public void parseBlastResults()
throws IOException, MissingProteinException, SAXException, MissingAccessionException {
for (String blastDatabase : getSettings().getBlastDatabases()) {
BlastResult.readBlastResults(getProteins(), blastDatabase, getUniqueBlastResultShortAccessions());
BlastResult.readBlastResults(getProteins(), blastDatabase);
}
}

@Deprecated
public void parseInterproResult() throws IOException {
if (getSettings().hasInterproAnnotations()) {
Set<String> missingProteinAccessions = new HashSet<String>();
Expand All @@ -142,18 +135,6 @@ public void parseInterproResult() throws IOException {
}
}

/**
* Method finds GO term annotations for Proteins in the searched Blast
* databases and stores them in a Map.
*
* @throws IOException
*/
public void setUpReferenceGoAnnotations() throws IOException {
if (getSettings().hasGeneOntologyAnnotations()) {
setReferenceGoAnnotations(parseReferenceGoAnnotations(getUniqueBlastResultShortAccessions()));
}
}

public void filterBestScoringBlastResults(Protein prot) {
for (String blastDatabaseName : prot.getBlastResults().keySet()) {
prot.getBlastResults().put(blastDatabaseName,
Expand All @@ -163,7 +144,7 @@ public void filterBestScoringBlastResults(Protein prot) {

/**
* Method initializes the AHRD-run: 1. Loads Proteins 2. Parses BlastResults
* 3. Parses InterproResults 4. Parses Gene-Ontology-Results
* 3. Parses InterproResults. Step 3 is deprecated!
*
* @throws IOException
* @throws MissingAccessionException
Expand All @@ -173,6 +154,8 @@ public void filterBestScoringBlastResults(Protein prot) {
*/
public void setup(boolean writeLogMsgs)
throws IOException, MissingAccessionException, MissingProteinException, SAXException, ParsingException {
setupOrUseExistingDatabase(writeLogMsgs);

if (writeLogMsgs)
System.out.println("Started AHRD...\n");

Expand All @@ -189,15 +172,7 @@ public void setup(boolean writeLogMsgs)
System.out.println("...parsed blast results in " + takeTime() + "sec, currently occupying "
+ takeMemoryUsage() + " MB");

// Reference GO Annotations (for Proteins in the searched Blast
// Databases)
setUpReferenceGoAnnotations();
if (writeLogMsgs) {
System.out.println("...parsed reference Gene Ontology Annotations (GOA) in " + takeTime()
+ "sec, currently occupying " + takeMemoryUsage() + " MB");
}

// one single InterproResult-File
// one single InterproResult-File (DEPRECATED)
if (getSettings().hasValidInterproDatabaseAndResultFile()) {
InterproResult.initialiseInterproDb();
parseInterproResult();
Expand Down Expand Up @@ -229,19 +204,11 @@ public void assignHumanReadableDescriptions() throws MissingInterproResultExcept
// currentScore - (Token-High-Score / 2)
prot.getTokenScoreCalculator().filterTokenScores();
// Find the highest scoring Blast-Result:
prot.getDescriptionScoreCalculator().findHighestScoringBlastResult(this.getReferenceGoAnnotations());
// If AHRD is requested to annotate Gene Ontology Terms, do so:
if (getSettings().hasGeneOntologyAnnotations()
&& prot.getDescriptionScoreCalculator().getHighestScoringBlastResult() != null
&& getReferenceGoAnnotations().containsKey(
prot.getDescriptionScoreCalculator().getHighestScoringBlastResult().getShortAccession())) {
prot.setGoResults(getReferenceGoAnnotations()
.get(prot.getDescriptionScoreCalculator().getHighestScoringBlastResult().getShortAccession()));
}
prot.getDescriptionScoreCalculator().findHighestScoringBlastResult();
// filter for each protein's most-informative
// interpro-results
// interpro-results (DEPRECATED)
InterproResult.filterForMostInforming(prot);
}
}
}

public Map<String, Protein> getProteins() {
Expand All @@ -259,29 +226,4 @@ public Map<String, Double> getDescriptionScoreBitScoreWeights() {
public void setDescriptionScoreBitScoreWeights(Map<String, Double> descriptionScoreBitScoreWeights) {
this.descriptionScoreBitScoreWeights = descriptionScoreBitScoreWeights;
}

public Map<String, Set<String>> getReferenceGoAnnotations() {
return referenceGoAnnotations;
}

public void setReferenceGoAnnotations(Map<String, Set<String>> referenceGoAnnotations) {
this.referenceGoAnnotations = referenceGoAnnotations;
}

public Set<String> getUniqueBlastResultShortAccessions() {
return uniqueBlastResultShortAccessions;
}

public void setUniqueBlastResultShortAccessions(Set<String> uniqueBlastResultShortAccessions) {
this.uniqueBlastResultShortAccessions = uniqueBlastResultShortAccessions;
}

public Map<String, GOterm> getGoDB() {
return goDB;
}

public void setGoDB(Map<String, GOterm> goDB) {
this.goDB = goDB;
}

}
Loading