Bachelor and master thesis projects

Bachelor theses

The practical course "300224 PP Genome analysis of prokaryots - applied bioinformatics for the analysis of a genome sequence" is well suited for a bachelor project in the Biology curriculum.
For bachelor theses in other curricula please contact Harald Marx or Thomas Rattei.

Master theses

Our research projects provide permanently new topics for master theses. We are happy to adapt the topic of a thesis work to your experience and interests. Please contact Harald Marx or Thomas Rattei for more information.

THESIS PROJECT EXAMPLE 1: Prediction of peptides in viral polyproteins

Different lineages of viruses encode polyproteins in their genomes. These are long proteins, which consist of different functional units. To become active, the polyprotein is cleaved by host or viral proteases into segments of biochemically active peptides. The computational prediction of peptides in polyproteins is so far very limited. Only one prediction tool for few lineages of human viruses has been developed (VIPR; unpublished). However, such a prediction tool would be extremely valuable for comparative genomics of viruses, such as in our Virus Orthologous Groups (VOGDB).

The aim of this thesis project is to utilize the rapidly growing number of completely sequenced virus genome. We want to analyze large datasets of polyproteins, to identify cleavage pattern and other characteristics of polyproteins. This information should then be used in a machine learning approach, which predicts cleavage sites in viral polyproteins. Mass spectrometry datasets will be used as independent test data for evaluation and validation of the new approach.

Approaching this problem will give you practical insight and experience in genomics of viruses. You should have a good background in computational science, bioinformatics and life science. You should be interested in programming and machine learning, as well as in microbiology, molecular biology and microbial ecology. The thesis project will provide you substantial training in these fields, and allows you to develop your own ideas and concepts within the frame of the project.

Contact: Thomas Rattei

THESIS PROJECT EXAMPLE 2: Proteogenomic search space construction to explore microbial peptidomes

Antibiotic-resistant bacteria, so called superbugs, are threatening to kill almost 10,000,000 people in 2050 worldwide. In infectious disease, transmissible superbugs make use of various strategies to invade and colonize a niche in one of the host’s inherent microbiomes. To discover novel drugs, it is paramount to fully understand those invasive mechanisms, but also putative microbiome defenses to ward off pathogens. A key, but not well-understood player in these processes are bioactive peptides, which display antimicrobial, signal, and regulatory properties.
Recent efforts in metagenomics provide a first glimpse into the genetic composition and complexity of the microbial peptidome. To advance in depth characterization, mass-spectrometry (MS) based proteomics offers orthogonal evidence in the hunt for elusive bioactive peptide encoding genes. Thus, proteogenomic approaches leverage genomic and proteomic data to improve the ongoing structural genome annotation.

However, searching MS data against genome-scale databases poses a computational challenge to common search engines, greatly reducing identification specificity. To alleviate this issue, the project goal is to build a database from large-scale ‘ome sources to identify most entities in a biological sample, striking a balance between search space completeness and complexity. This entails: i) to implement efficient search data structures, ii) to infer a probabilistic model for peptide detectability, and iii) to develop a MS-centric clustering algorithm.

The ideal candidate is highly motivated and has a background in computational science or bioinformatics. Strong programming skills in Java are required to succeed in this project. We provide a dynamic work environment in an exciting and upcoming research area. Please send your curriculum vitae and a brief statement of future career goals to Harald Marx.

THESIS PROJECT EXAMPLE 3: A classifier to assess spectrum quality in mass spectrometry-based proteomics

Mass spectrometry(MS)-based proteomics is a powerful method to analyze the proteome and peptidome. A typical large-scale, high-throughput MS experiment results in millions of spectra that greatly vary in information content due to manual and automatic experimental parameters, such as ion fill time, fragmentation method, collision energy, and transient time, among others, making analyte identification a computational challenge. To ease spectrum interpretation, pre-processing steps like charge state deconvolution, deisotoping, signal-to-noise filtering, simplify the spectrum representation. In the following identification step, a search algorithm matches theoretical spectra from a protein sequence database to the experimental spectra. 

Even though common search engines implement approaches to control false positive identifications, most do not control spectrum quality in the above steps. In this project we will build a binary classifier to assess spectrum quality prior to sequence assignment ultimately improving search results. Easy access and availability to data sets of large synthetic peptide libraries from various mass spectrometry platforms allows us to explore ubiquitous spectrum features that correlate with quality. 

The ideal candidate is highly motivated and has a background in computational science or bioinformatics. The project requires Java programming skills and a basic understanding of mass spectrometry. This is a great opportunity to learn the intricate analytical details of MS-based proteomics. We provide a dynamic work environment in an exciting and upcoming research area. Please send your curriculum vitae and a brief statement of future career goals to Harald Marx.