Data Access
The NPM Phase I Data Access Committee (NPM DAC) has been established to oversee access to the SG10K_Pilot and SG10K_Health datasets to ensure:
- Data is used appropriately according to NPM terms and conditions, including adherence to informed consent forms and ethical approvals for the data in question.
- Data users are qualified investigators embedded within a recognised research-intensive organisation.
Interested applicants to read through the data access policies and data access forms listed below:
- SG10K_Health data access policy
- SG10K_Health data access form
- SG10K_Pilot data access policy
- SG10K_Pilot data access form
Click here to view the list of approved projects:
Datasets
SG10K_Pilot dataset refers to the EGAD00001005337 joint variant calling of 4,180 whole-genome sequencing data deposited on the EGA database. All datasets have been pseudonymised and so considered de-identified as described in the paper. Two files are available for access: 1) the genotype data arranged by chromosomes in VCF format, and 2) a metadata file containing the self-reported ethnicity.
Wu et al. Large-scale whole-genome sequencing of three diverse Asian populations in Singapore. Cell. 2019
SG10K_Health dataset is a collection of integrated genomic and phenotypic data of 10,000 healthy and consented individuals of Chinese, Malay and Indian ethnicities. The SG10K_Health data is contributed from six cohorts in Singapore: (1) Multi-Ethnic Cohort (MEC) study, (2) Health for Life in Singapore (HELIOS) study, (3) Growing Up in Singapore Towards healthy Outcomes (GUSTO) study, (4) TTSH Personalised Medicine Normal Controls (TTSH) study, (5) Singapore Epidemiology of Eye Diseases (SEED) study and (6) Biobank/SingHEART, SingHealth Duke-NUS Institute of Precision Medicine (PRISM) study.
| S/N | Dataset | Description |
| 1 | SG10K_Health metadata | A metadata file containing the self-reported ethnicity, sex and other research phenotypic variables. |
| 2 | SG10K_Health VCF (r5.3) | Whole genome GATK joint variant calling of 9,770 individuals of Chinese, Indian and Malay ethnicities containing 179,418,971 variants. |
| 3 | SG10K_Health DNA methylation array | Whole genome DNA methylation on Illumina Infinium Methylation EPIC array (850K) |
| 4 | SG10K_Health Structural Variants (r1.4) | 73,035 structural variants derived from 5,487 SG10K_Health participants using Manta, MELT and SurVindel (Tan et al, Nat Comms, 2024 ) |
Wong et. al. The Singapore National Precision Medicine Strategy. Nature Genetic. 2023
Data Access Platform
- For SG10K_Pilot dataset, approved researchers will be directed to the EGA portal to access the SG10K_Pilot data.
- For SG10K_Health dataset, approved researchers will access the data via the RAPTOR platform. Learn more about the RAPTOR platform here.
A*STAR celebrates International Women's Day

From groundbreaking discoveries to cutting-edge research, our researchers are empowering the next generation of female science, technology, engineering and mathematics (STEM) leaders.
Get inspired by our #WomeninSTEM