ConsPred – a rule-based (re-)annotation framework for prokaryotic genomes

Important notice: Conspred development has been ended with version 1.33 (except minor bug fixes). We are currently developing a completely rewritten Conspred2, which focuses on consensus gene prediction and high-quality gene start prediction. Conspred2 will not do any functional annotation (consider e.g. Prokka and GAMOLA2 for this purpose).

ConsPred is a prokaryotic genome annotation framework that performs various intrinsic gene predictions, homology searches, predictions of non-coding genes, and complex features and integrates all evidence into a consensus annotation. ConsPred achieves high-quality and comprehensive annotations based on rules and priorities, similar to decision-making in manual curation. Parameters controlling the annotation process are configurable by the user.

ConsPred can be easily extended and adapted to specific needs. ConsPred generates genome annotations in formats ready for submission to public sequence archives.

ConsPred is implemented in Java, Perl, and Shell and is freely available under the Creative Commons license from or as an Amazon Machine Image for cloud computing. ConsPred database files (about 60GB) are updated once per month (usually on first Fridays).

ConsPred workflow: Coding sequences (CDS) are predicted by combining different ab initio gene predictions, and conserved open reading frames (ORFs) detected by homology search against the NCBI nr database. Database entries from closely related taxa are excluded to prevent possible misannotations. Putative pseudogenes are exported for user inspection. From all predicted non-protein-coding elements (NCE) those that cannot overlap with CDS are considered blocking NCE. CDS overlapping with blocking NCE are removed. Consensus CDS are obtained from predicted CDS and conserved ORFs by using predefined weights and rules. Consensus CDS are functionally annotated and then merged with the NCE into the final annotation files.