Gene Function Prediction

BII - Gene Function Prediction Team Photo


Our research focus is on the discovery of new biomolecular mechanisms from biological and medical data and especially the functional characterization of yet uncharacterized genes and pathways with theoretical/computational methods. Dramatic recent improvements of nucleic acid sequencing technologies enhance the prospect of general availability of genomes from patients, patient-specific pathogens and of gene expression data. This development has profound implications for life science research and biomedical applications. As biomolecular sequencing is becoming the most informative as well as most readily available research technologies in life sciences, sequence analysis and sequence-based structure and function prediction will be more important than ever.

Typically, a project starts with sets of uncharacterized sequences, expression profiles or other type of omics data associated with known phenotypes where the driving biomolecular mechanism is sought. Most of the work is with internal and/or external collaborators, also including partners in clinics (e.g., 34863649 + 34903892 – PubMed IDs here and below) and biotech/pharma industry (e.g., MeshBio in Singapore). The work with clinical data motivated us to discuss the problem of access to patient data for biomedical research from three different perspectives: patients’, clinicians’ and researchers’ (33717311). 

Unknown function of genomic regions will plague mankind for at least a century to come (22849370 + 30265449). Though it is generally believed that full human genome sequencing was a watershed event in human history that boosted biomedical research, biomolecular mechanism discovery and life science applications, there is no sign that biomolecular mechanism discovery happens at a faster pace than before. The opposite is true: Researchers in the field of genome annotation see that there is a persisting, substantial body of functionally insufficiently or completely not characterized genes (for example, ~10,000 protein-coding in the human genome) despite the availability of full genome sequences. A survey of the biomedical literature shows that the number of reported new protein functions had been steadily growing until 2000 but the trend reversed to a dramatic decline thereafter when, at the same time, the annual amount of new life science publications doubled between 2000 and 2017. So, the group is active on a fertile ground with lots of discovery potential.

Applications reach into medical data analysis, natural product, aging, and rare diseases research. Our success stories include the discovery of the SET domain methyltransferases (PMID: 10949293), ATGL (15550674), kleisins (12667442), many new protein domain functions and functional sequence patterns (for example in the GPI lipid anchor biosynthesis pathway such as the peptide synthetase activity of GPAA1 (24743167)). We discovered a new membrane-embedded protein domain evolutionarily multiplied in the GPI lipid anchor pathway proteins, BindGPILA (29764287). It functions as the unit for recognizing, binding and stabilizing the GPI lipid anchor in a modification-competent form.

Together with collaborators, we discovered that the dysfunction of the human gene SUGCT contributes to gut microbiota dysbiosis, leading to age-dependent pathological changes in kidney, liver, and adipose tissue (31722069). We contributed to the development of AllerCatPro, a tool that predicts the allergenic potential of proteins based on the similarity of their 3D structure as well as their amino acid sequence to a data set of known protein allergens (30657872). Both projects were carried out in collaboration with S. Maurer-Stroh’s team.

In several cases, this research effort has involved the development of algorithms and software for biomolecular sequence, omics, clinical and other life science data analysis. Examples are PTM (GPI lipid anchoring, myristoylation, prenylation, phosphorylation) and subcellular localization prediction tools for proteins (e.g., 20221930, 20221930, 19029837, and 15575971), a sophisticated ANNOTATOR software suite for protein function discovery from sequence (27115649) or NSC, the highly cited molecular surface computation algorithm (J. Comp. Chem. 16 pp. 273-284).

The collaboration with the G. Grüber crystallography lab (NTU, Singapore) resulted in a string of discoveries with regard to the structure, catalytic mechanism and sequence architecture significance of the AhpF/AhpC alkyl hydroperoxide reductase complex by studies of mutated versions of AhpF/AhpC (31047989 and references therein).

The team is involved in both academic and industry-funded projects in collaboration with the A*STAR Natural Organism Library (teams of Ng Siew Bee, Y. Kanagasundaram, and P. Arumugam) (29979661). Recently, an analog of Anthracimycin, an antibiotic that, so far, is only known to be produced by Streptomyces species, was predicted and verified to be produced by Nocardiopsis kunsanensis, a non-Streptomyces actinobacterial microorganism (29805716). Together with the BII NOL team, we discovered a new cyclic lipodepsipeptide, BII-Rafflesfungin, possessing antifungal activity that is produced by fungus Phoma sp. F3723 (31088369). We identified a biosynthetic gene cluster compatible with the production of this new compound and proposed a mechanism for its biosynthesis.


Figure 1. The most conserved sequence motifs of TMTC1/2/3/4 proteins come spatially together in model structures of the TMTCs and can be rationalized as a dolichylphospho- mannose (DPM) binding site 
The figure corresponds to Figure 4 in the publication (PubMed ID 33436046). We illustrate the spatial localization of the most conserved sequence motifs M1 (red), M2 (orange), M3 (yellow), M4 (green), M5 (blue), M6 (violet) and M7 (pink, all shown in ball mode) in human proteins TMTC1/2/3/4 at the background of the structural cartoon of the whole protein model. DPM is presented as blackish sticks, the divalent metal ion is represented as a reddish sphere. We show the case of TMTC1; the figures for the other TMTCs look very similar. The existence of a strongly conserved DPMbinding site together with all elements of the active site indicates that the TMTCs are enzymatically active sugar transferases belonging to the GT-C/ PMT superfamily. The DUF1736 segment, the loop between TM7 and TM8, is critical for catalysis and lipid-linked sugar moiety binding. Together with the available indirect experimental data, we conclude that the TMTCs are not only part of an O-mannosylation pathway in the endoplasmic reticulum of upper eukaryotes but, actually, they are the sought for mannosyltransferases.

Mohammad Alfatah, a talented young researcher, and his team are engaged in cellular aging research with regard to yeast and human cells. Based on a self-developed, efficient high-throughput capable experimental platform, the effect of a large number of culture conditions and added compounds on cellular aging can be studied. Besides mentoring, he receives bioinformatics support so that his team can make fast and substantial progress.


 A*STAR Senior Fellow, Senior Principal Investigator EISENHABER Frank   |    [View Bio]   
 Senior Principal Investigator  EISENHABER Birgit    |    [View Bio]  
 Senior Post-Doctoral Research Fellow SINHA Swati
 Senior Post-Doctoral Research Fellow SIROTA Fernanda L.
 Senior Post-Doctoral Research Fellow ALFATAH Mohammad
 Research Manager TANTOSO Erwin
 Research Officer ZHANG Yizhong
 PhD Student URBANIAK Konstancja


Selected Publications

  • Szenker-Ravi E, Ott T, Khatoo M, de Bellaing AM, Goh WX, Chong YL, Beckers A, Kannesan D, Louvel G, Anujan P, Ravi V, Bonnard C, Moutton S, Schoen P, Fradin M, Colin E, Megarbane A, Daou L, Chehab G, Di Filippo S, Rooryck C, Deleuze JF, Boland A, Arribard N, Eker R, Tohari S, Ng AY, Rio M, Lim CT, Eisenhaber B, Eisenhaber F, Venkatesh B, Amiel J, Crollius HR, Gordon CT, Gossler A, Roy S, Attie-Bitach T, Blum M, Bouvagnet P, Reversade B. Discovery of a genetic module essential for assigning left-right asymmetry in humans and ancestral vertebrates. Nat Genet. 2022 Jan;54(1):62-72. doi: 10.1038/s41588-021-00970-4. Epub 2021 Dec 13. PMID: 34903892

  • Tromp J, Seekings PJ, Hung CL, Iversen MB, Frost MJ, Ouwerkerk W, Jiang Z, Eisenhaber F, Goh RSM, Zhao H, Huang W, Ling LH, Sim D, Cozzone P, Richards AM, Lee HK, Solomon SD, Lam CSP, Ezekowitz JA. Automated interpretation of systolic and diastolic function on the echocardiogram: a multicohort study. Lancet Digit Health. 2022 Jan;4(1):e46-e54. doi: 10.1016/S2589-7500(21)00235-1. Epub 2021 Dec 1. PMID: 34863649

  • Sirota FL, Maurer-Stroh S, Li Z, Eisenhaber F, Eisenhaber B. Functional Classification of Super-Large Families of Enzymes Based on Substrate Binding Pocket Residues for Biocatalysis and Enzyme Engineering Applications. Front Bioeng Biotechnol. 2021 Aug 2;9:701120. doi: 10.3389/fbioe.2021.701120. eCollection 2021. PMID: 34409021

  • Eisenhaber B, Sinha S, Jadalanki CK, Shitov VA, Tan QW, Sirota FL, Eisenhaber F. Conserved sequence motifs in human TMTC1, TMTC2, TMTC3, and TMTC4, new O-mannosyltransferases from the GT-C/PMT clan, are rationalized as ligand binding sites. Biol Direct. 2021 Jan 12;16(1):4. doi: 10.1186/s13062-021-00291-w. PMID: 33436046

  • Su CT, Sinha S, Eisenhaber B, Eisenhaber F.Structural modelling of the lumenal domain of human GPAA1, the metallo-peptide synthetase subunit of the transamidase complex, reveals zinc-binding mode and two flaps surrounding the active site. Biol Direct. 2020 Sep 29;15(1):14. doi: 10.1186/s13062-020-00266-3. PMID: 32993792

  • Niska-Blakie J, Gopinathan L, Low KN, Kien YL, Goh CMF, Caldez MJ, Pfeiffenberger E, Jones OS, Ong CB, Kurochkin IV, Coppola V, Tessarollo L, Choi H, Kanagasundaram Y, Eisenhaber F, Maurer-Stroh S, Kaldis P. Knockout of the non-essential gene SUGCT creates diet-linked, age-related microbiome disbalance with a diabetes-like metabolic syndrome phenotype. Cell Mol Life Sci. 2020 Sep;77(17):3423-3439. doi: 10.1007/s00018-019-03359-z. Epub 2019 Nov 13. PMID: 31722069

  • Maurer-Stroh S, Krutz NL, Kern PS, Gunalan V, Nguyen MN, Limviphuvadh V, Eisenhaber F, Gerberick GF. AllerCatPro-prediction of protein allergenicity potential from the protein sequence. Bioinformatics. 2019 Sep 1;35(17):3020-3027. doi: 10.1093/bioinformatics/btz029. PMID: 30657872

  • Sinha S, Nge CE, Leong CY, Ng V, Crasta S, Alfatah M, Goh F, Low KN, Zhang H, Arumugam P, Lezhava A, Chen SL, Kanagasundaram Y, Ng SB, Eisenhaber F, Eisenhaber B. Genomics-driven discovery of a biosynthetic gene cluster required for the synthesis of BII-Rafflesfungin from the fungus Phoma sp. F3723. BMC Genomics. 2019 May 14;20(1):374. doi: 10.1186/s12864-019-5762-6. PMID: 31088369

  • Tantoso E, Wong WC, Tay WH, Lee J, Sinha S, Eisenhaber B, Eisenhaber F. Hypocrisy Around Medical Patient Data: Issues of Access for Biomedical Research, Data Quality, Usefulness for the Purpose and Omics Data as Game Changer. Asian Bioeth Rev. 2019 Jun 1;11(2):189-207. doi: 10.1007/s41649-019-00085-3. eCollection 2019 Jun. PMID: 33717311

  • Eisenhaber B, Sinha S, Wong WC, Eisenhaber F. Function of a membrane-embedded domain evolutionarily multiplied in the GPI lipid anchor pathway proteins PIG-B, PIG-M, PIG-U, PIG-W, PIG-V, and PIG-Z. Cell Cycle. 2018;17(7):874-880. doi: 10.1080/15384101.2018.1456294. Epub 2018 May 15. PMID: 29764287a

  • Ng SB, Kanagasundaram Y, Fan H, Arumugam P, Eisenhaber B, Eisenhaber F. The 160K Natural Organism Library, a unique resource for natural products research. Nat Biotechnol. 2018 Jul 6;36(7):570-573. doi: 10.1038/nbt.4187. PMID: 29979661

  • Sinha S, Eisenhaber B, Jensen LJ, Kalbuaji B, Eisenhaber F:  Darkness in the Human Gene and Protein Function Space: Widely Modest or Absent Illumination by the Life Science Literature and the Trend for Fewer Protein Function Discoveries Since 2000. Proteomics 2018, 18:e1800093.

  • Sirota FL, Goh F, Low KN, Yang LK, Crasta SC, Eisenhaber B, Eisenhaber F, Kanagasundaram Y, Ng SB: Isolation and Identification of an Anthracimycin Analogue from Nocardiopsis kunsanensis, a Halophile from a Saltern, by Genomic Mining Strategy. J Genomics 2018, 6:63-73.

  • View full list of publications here