Computational Biology is at the heart of genomics where we seek to integrate and analyse large and complex data sets in order to derive a more complete systems understanding of biological processes and diseases. This is driven by the development and application of sophisticated computational tools and pipelines for the study of a diverse range of datasets, including targeted and whole genome sequencing, transcriptome sequencing, chromatin state and transcription factor binding site (TFBS) profiling, metagenomics and single-cell omics. Our scientists bring to bear a range of expertise in biology, computer science, mathematics, and statistics to solve problems in genomic biology and medicine and collaborate closely with experimental groups working


A unique aspect of the algorithm development efforts in the GIS is the focus on bioinformatics tools with performance and optimality guarantees on genomic technologies, cancer, stem cells and development, human genetics and infectious diseases. The development of novel algorithms at the GIS is defined by two key strengths:

  • Optimal algorithms with performance guarantees.
    CBDS scientists have developed provably optimal and robust algorithms for a range of problems in genomics, from ultra-fast and exact read mapping (BatMis: to quality-aware rare-variant calling (LoFreq:, efficient, optimal algorithms for genome assembly (Opera:, FinIS: and optimal signal-processing of functional profiling datasets (DFilter/EFilter:
  • Algorithms for nextgen technologies. CBDS researchers have extensive experience designing methods to analyse novel datasets from cutting-edge genomic technologies and assays such as ChIA-PET, RNA Structure Probing, Single-cell Omics and Optical Mapping. These have allowed scientists at the GIS and researchers around the world to explore uncharted territories in genomic biology.

    The computational tools developed at the GIS are designed for and brought to bear on a diverse array of questions in evolutionary and genomic biology including the following central themes in the CSB programme:
    • Can we predict the impact of mutations (substitutions and indels) on protein function?
    • How do we improve diagnostics for common and rare disorders from genomic information?
    • What is the role of non-coding sequences in the human genome and how can we identify and characterise disease-causing non-coding polymorphisms?
    • What is the molecular basis of human-specific traits and how did they evolve?
    • Can patient-specific driver mutations be predicted and used to personalise cancer therapy?
    • How do viruses evolve and evade the immune system and can we effectively reconstruct their transmission patterns?
    • What is the role of microbial communities in the human body in health and disease states?
    • How does the biology of heterogeneous tissues relate to the genomes and transcriptomes of individual cells?

Featured Publications

  • . Javed A, Agrawal S, Ng PC. "Phen-Gen: combining phenotype and genotype to analyze rare disorders." Nat Methods. 2014 Sep;11(9):935-7.
  • Wilm A, Aw PP, Bertrand D, Yeo GH, Ong SH, Wong CH, Khor CC, Petric R, Hibberd ML, Nagarajan N. "LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets." Nucleic Acids Res. 2012 Dec;40(22):11189-201.
  • Kumar V, Muratani M, Rayan NA, Kraus P, Lufkin T, Ng HH, Prabhakar S. Uniform, optimal signal processing of mapped deep-sequencing data. Nature Biotechnology. 2013 Jun 16.
  • Fullwood MJ, Liu MH, Pan YF, Liu J, Xu H, Mohamed YB, Orlov YL, Velkov S, Ho A, Mei PH, Chew EG, Huang PY, Welboren WJ, Han Y, Ooi HS, Ariyaratne PN, Vega VB, Luo Y, Tan PY, Choy PY, Wansa KD, Zhao B, Lim KS, Leow SC, Yow JS, Joseph R, Li H, Desai KV, Thomsen JS, Lee YK, Karuturi RK, Herve T, Bourque G, Stunnenberg HG, Ruan X, Cacheux-Rataboul V, Sung WK, Liu ET, Wei CL, Cheung E, Ruan Y. An oestrogen- receptor-alpha-bound human chromatin interactome. Nature. 2009;462(7269):58-64.