An illustration of discovering genes hiding in DNA.

SINGAPORE – Scientists from A*STAR’s Genome Institute of Singapore (GIS) have developed a new tool, named Bambu, which uses artificial intelligence to identify and characterise new genes, enabling an adaptable analysis across various species and samples. With a better understanding of which and how genes are expressed in samples, Bambu provides a better understanding of how cells function. It is a long-read RNA sequencing tool that can be used in both clinical and research settings to discover how DNA encodes novel transcripts and quantifies them. This innovative tool is named after the bamboo plant, which has extremely long reeds that are analogous to the long reads that Bambu uses. A study detailing the methodology and evaluation of Bambu was published in Nature Methods on 12 June 2023.

The human genome, which comprises 3.2 billion letters, also known as base pairs, is dwarfed by the lungfish genome with 43 billion, and even more so by the Japanese flower Pari japonica with 149 billion base pairs. Despite a human’s relatively smaller genome, there are over 140,000 unique ways genes are encoded within—also referred to as a gene’s transcripts—and given the complexity of the body’s organs, life stages and responses to perturbations such as diseases, it is estimated that there are many yet to be identified. This is not only limited to humans, as scientists have been researching organisms such as the durian and Singapore’s national flower—and there remains a whole frontier of new genes to be discovered.

In order to explore the unknown parts of genomes, be it for human, fish or flowers, A*STAR’s researchers developed Bambu, which uses long-read RNA sequencing to identify and quantify transcripts.

Bambu employs a machine-learning model to rank the likelihood of candidate transcripts representing biologically relevant products. It can identify new transcripts and quantify them with a high degree of precision and sensitivity, providing a more comprehensive understanding of an organism's genetic makeup.

This will allow researchers to identify new roleplayers, such as genes, proteins, and other elements in their field of research and expand their ability to research organisms that are currently under-studied. Furthermore, the discovery of new genes, especially from clinical samples, can lead to the identification of biomarkers for the early detection of diseases or as targets of therapeutics.

An early release of Bambu has been benchmarked by two independent pre-print studies12 where it is shown to be a top performer amongst its contemporaries.

“It is fascinating to see that scientists are still discovering new genes even in genomes that have been studied for many years, such as the human or mouse genome. However, the key question is if these transcripts are relevant, or they could be artifacts. To address this, Bambu quantifies the probability that a transcript is real, making transcript and gene discovery much more reliable,” said Dr Jonathan Goke, Group Leader of the Laboratory of Computational Transcriptomics at A*STAR’s GIS and the corresponding author of the study. He went on to add: “By providing such a measure of confidence, Bambu can more reliably be applied to find new genes that play a role in human diseases such as cancer.”

Dr Andre Sim, Research Fellow at A*STAR’s GIS and co-first author of the study remarked, “Identifying new transcript models require numerous decisions. Bambu simplifies this process using its machine learning model, making this task more accessible to the scientific community.”

Prof Patrick Tan, Executive Director of A*STAR’s GIS, commented, “Annotating genomes is often the first step in modern genetics towards understanding an organism, and as scientists start looking to research new and exciting species, having accurate transcript discovery provided by tools such as Bambu will be essential.”

1Benchmarking long-read RNA-sequencing analysis tools using in silico mixtures. Xueyi Dong, Mei R. M. Du, Quentin Gouil, Luyi Tian, Pedro L. Baldoni, Gordon K. Smyth, Shanika L. Amarasinghe, Charity W. Law, Matthew E. Ritchie. bioRxiv 2022.07.22.501076; doi:
2Merging short and stranded long reads improves transcript assembly Amoldeep S. Kainth, Gabriela A. Haddad, Johnathon M. Hall, Alexander J. Ruthenburg bioRxiv 2022.12.13.520317; doi:

For media queries and clarifications, please contact:

Winnie Lim
Head, Office of Corporate Communications
Genome Institute of Singapore, A*STAR
Tel: +65 6808 8013
About A*STAR’s Genome Institute of Singapore (GIS)

The Genome Institute of Singapore (GIS) is an institute of the Agency for Science, Technology and Research (A*STAR). It has a global vision that seeks to use genomic sciences to achieve extraordinary improvements in human health and public prosperity. Established in 2000 as a centre for genomic discovery, the GIS pursues the integration of technology, genetics and biology towards academic, economic and societal impact, with a mission to "read, reveal and write DNA for a better Singapore and world".

Key research areas at the GIS include Precision Medicine & Population Genomics, Genome Informatics, Spatial & Single Cell Systems, Epigenetic & Epitranscriptomic Regulation, Genome Architecture & Design, and Sequencing Platforms. The genomics infrastructure at the GIS is also utilised to train new scientific talent, to function as a bridge for academic and industrial research, and to explore scientific questions of high impact.

For more information about GIS, please visit

About the Agency for Science, Technology and Research (A*STAR)

A*STAR is Singapore's lead public sector R&D agency. Through open innovation, we collaborate with our partners in both the public and private sectors to benefit the economy and society. As a Science and Technology Organisation, A*STAR bridges the gap between academia and industry. Our research creates economic growth and jobs for Singapore, and enhances lives by improving societal outcomes in healthcare, urban living, and sustainability. A*STAR plays a key role in nurturing scientific talent and leaders for the wider research community and industry. A*STAR’s R&D activities span biomedical sciences to physical sciences and engineering, with research entities primarily located in Biopolis and Fusionopolis. For ongoing news, visit

Follow us on
Facebook | LinkedIn | Instagram | YouTube