Quantitative Proteomics/Mass Spectrometry/Proteogenomics

Boris Macek

BorisMacek23-11-2012-08 cropped
  • Diploma in Molecular Biology 1999, University of Zagreb
  • PhD work 1999-2003 at the Institute for Medical Physics and Biophysics, University of Münster
  • Postdoctoral training at the University of Southern Denmark (Odense) and Max Planck Institute for Biochemistry (Martinsried)
  • Professor at the University of Tübingen since 2008

Research Interest

Rapidly growing numbers of sequenced genomes and metagenomes present a challenge for bioinformatics tools that predict protein-coding regions. Experimental evidence of expressed genomic regions, both at the RNA and protein level, is becoming invaluable for genome annotation and training of gene prediction algorithms. Evidence of gene expression at the protein level using mass spectrometry-based proteomics is increasingly used in refinement of raw genome sequencing data. In a typical ''proteogenomics'' experiment, the whole proteome of an organism is extracted, digested into peptides and measured by a mass spectrometer. The peptide fragmentation spectra are identified by searching against a six-frame translation of the raw genomic assembly, thus enabling the identification of hitherto unpredicted protein-coding genomic regions. Application of mass spectrometry to genome annotation presents a range of challenges to the standard workflows in proteomics, especially in terms of proteome coverage and database search strategies.

We are developing and applying workflows based on the latest LC-MS/MS instrumentation that enables comprehensive proteome coverage at high acquisition rates and is therefore especially well suited for proteogenomics applications. We are currently combining this methodology with quantitative proteomics to study effects of non-synonymous SNVs on signal transduction in lower and higher organisms

  • boris-macek fig01
    click to enlarge

Workflow in a typical proteomics experiment. Proteins are extracted from a tissue or cell line and digested by a protease into peptides. The resulting peptide mixtures are separated and analyzed by mass spectrometry; peptide masses are recorded in a "full scan" or "MS" spectrum; peptide ions are fragmented and the fragment ions are recorded in an "MS/MS" spectrum. Both levels of information are used in protein database search and peptide identification.

  • boris-macek fig02
    click to enlarge

Data processing workflow in proteogenomics. MS and MS/MS spectra are searched against a special database containing a six frame translation of the whole genome assembly. Identified peptides are mapped onto the genome and provide three levels of information:
i) confirmation of the existing gene models; ii) refinement of the existing gene models (e.g. repositioning of gene termini and exon/intron boundaries); iii) identification of new expressed genomic regions (genes).

Available PhD Projects

Quantitative proteogenomic analysis of aberrant gene expression in lower and higher organisms.

Selected Reading

1) Krug K, Popic S, Carpy A, Taumer C, Macek B. 2014. Construction and assessment of individualized proteogenomic databases for large-scale analysis of nonsynonymous single nucleotide variants. Proteomics 14(23-24):2699-708.

2) Krug, K, Carpy, A, Behrends, G, Matic, K, Soares, NC, Macek, B. (2013) Deep coverage of the Escherichia coli proteome enables the assessment of database search strategies in bacterial proteogenomics experiments. Mol Cell Proteomics. 12(11):3420-30.

3) Krug, K, Nahnsen, S, Macek, B. (2011) Mass spectrometry at the interface of genomics and proteomics. Mol Biosystems 7(2):284-91.

4) Borchert N, Dieterich C, Krug K, Schütz W, Jung S, Nordheim A, Sommer RJ, Macek B. (2010) Proteogenomics of Pristionchus pacificus reveals distinct proteome structure of nematode models. Genome Res. 20(6):837-46.