To 100,000 and beyond: scaling the Singapore genetic databank with analytics and AI technologies

As Singapore’s National Precision Medicine (NPM) programme enters Phase 2, the rich control dataset of Asian populations aims to revolutionise the way healthcare is delivered.

scaling the Singapore genetic databank 1030x450

A pregnant mother wanting to test for Down’s Syndrome in her unborn baby without invasive testing. A doctor trying to make a call on the optimal drug and dosage for a safer and more effective treatment.

These are some of the people that the Singapore National Precision Medicine (SG-NPM) programme targets to help. Established in 2017, the vision of this 10-year effort is to enable a healthcare strategy that is tailored to Singapore’s population diversity through precision medicine – a move that can revolutionise how healthcare is delivered.

Precision medicine takes individual variations in genetics, environmental and lifestyle factors into account, allowing doctors to more accurately predict which treatment and prevention strategies will work in different groups of people. Enabled by tools to analyse data on a large scale and with DNA sequencing becoming more affordable, precision medicine can improve healthcare by giving doctors a more detailed understanding of each patient.

Central to the effort is the Centre for Big data and Integrative Genomics (c-BIG), a collaboration between four A*STAR research institutes – the Genome Institute of Singapore (GIS), the Bioinformatics Institute (BII), the Institute of High Performance Computing (IHPC) and the Institute for Infocomm Research (I2R). These efforts are coordinated under A*STAR’s Artificial Intelligence, Artificial Intelligence Analytics & Informatics (AI3), which catalyses the development and application of A*STAR’s broad range data science, AI capabilities and technologies for a wide range of industry sectors.

“The first step was to build an IT infrastructure to securely store, analyse and share genomics data at scale in order to produce and distribute a reference catalog that captures the genetic variation of 10,000 healthy Singaporeans,” said Dr Shyam Prabhakar, Associate Director, Spatial and Single Cell Systems at A*STAR’s GIS.

This first phase of the NPM has been completed, where the researchers have created the world's largest genetic databank of Asian populations, which has three Asian populations: Chinese, Indian, and Malay represented. The time is now ripe for Phase 2, which will be to scale up the database.

“The next step is to extend the generation of genetic and phenotypic diversity data to 100,000 healthy Singaporeans in NPM Phase 2, drawing on the capabilities of A*STAR and our ecosystem partners,” said Prof Patrick Tan, Executive Director of GIS, and Executive Director of PRECISE (Precision Health Research Singapore).

“The richness of the data provided by the database, combined with our knowledge of Asian genetics accumulated over the years, means that the clinical applications of genomics are vast.”

Two-dimensional projection of the Whole-Genome-Sequencing
Two-dimensional projection of the Whole-Genome-Sequencing derived genetic relationship matrix capturing genetic variations across a compendium of ~10,000 NPM Phase1 Singaporean (SG10K) and ~3,000 International Genome Sample Resource (1KG) participating individuals. Points (individuals) are colored according to their geographic origin.

This genetic databank is useful for analysis to reveal patterns, trends, and associations, and especially to identify millions of novel Asian-specific genetic variants. Understanding the actual genetic makeup of the Asian population allows the tailoring of products and medicines for this specific market.

For example, genomics can be found at the core of diagnostic tests, such as the use of non-invasive prenatal testing (NIPT) in pregnancy to identify children who may be born with debilitating or fatal genetic defects. Similarly, knowing the genetic variants that an individual carries can be used to estimate their likelihood of suffering from diseases such as diabetes or schizophrenia. Genomics can also be used to guide targeted treatments, such as administering the right drug in the right dose, relevant in pharmacogenomics (PGx), the study of how genes can influence responses to drugs.

Custom-Built Tech

The c-BIG initiative has contributed to delivering that vision through a variety of technologies and ecosystems. Leveraging the data storage and computing power capability from the National Supercomputing Centre, the team was able to deploy state-of-the art genome analytics algorithms at an industrial scale to uncover the genetic variants of each individual. A custom-built secured cloud-based big-data infrastructure has also been developed to enable and facilitate controlled programmatic and web-based graphical interface data access and analysis capabilities to Singapore’s biomedical research community. As the programme grows in the next phase, c-BIG will continue to scale by building on next-level data management, analytics and artificial intelligence (AI).

“The custom data sharing services built by c-BIG will enable secure mining of the resource, and thus pave the way for the discovery of new research insights and actionable clinical findings,” said Dr Nicolas Bertin, Chief Architect of c-BIG’s NPM infrastructure.

Team members of the Singapore National Precision Medicine
Team members of the Singapore National Precision Medicine (SG-NPM) programme’s Phase 1 project (Photo was taken in 2019, before the current safe management measures were implemented)

As the team looks to tackle the new scalability challenges posed in NPM Phase 2, researchers are already working to source new types of data to enable richer integrative analyses, including methylation and single-cell expression signals.

The addition of new data types and scaling up of the databank will empower researchers and medical professionals to better understand the inherited diseases in Asian populations. This would pave the way to develop new treatments and ways to predict and diagnose diseases, and enable more effective and efficient healthcare services for both Singapore and Asian populations.

For more information on NPM Phase 2, and how one can play a part in a meaningful landmark study to help future generations of Singaporeans