Data Access

The NPM Phase I Data Access Committee (NPM DAC) has been established to oversee access to the SG10K_Pilot and SG10K_Health datasets to ensure:

  • Data is used appropriately according to NPM terms and conditions, including adherence to informed consent forms and ethical approvals for the data in question.
  • Data users are qualified investigators embedded within a recognised research-intensive organisation.

Interested applicants to read through the data access policies and data access forms listed below:

Click here to view the list of approved projects: 

Datasets

SG10K_Pilot dataset refers to the EGAD00001005337 joint variant calling of 4,180 whole-genome sequencing data deposited on the EGA database. All datasets have been pseudonymised and so considered de-identified as described in the paper. Two files are available for access: 1) the genotype data arranged by chromosomes in VCF format, and 2) a metadata file containing the self-reported ethnicity.  
Wu et al. Large-scale whole-genome sequencing of three diverse Asian populations in Singapore. Cell. 2019

SG10K_Health dataset is a collection of integrated genomic and phenotypic data of 10,000 healthy and consented individuals of Chinese, Malay and Indian ethnicities. The SG10K_Health data is contributed from six  cohorts in Singapore: (1) Multi-Ethnic Cohort (MEC) study, (2) Health for Life in Singapore (HELIOS) study, (3) Growing Up in Singapore Towards healthy Outcomes (GUSTO) study, (4) TTSH Personalised Medicine Normal Controls (TTSH) study, (5) Singapore Epidemiology of Eye Diseases (SEED) study and (6) Biobank/SingHEART, SingHealth Duke-NUS Institute of Precision Medicine (PRISM) study. 

S/NDatasetDescription
1SG10K_Health metadataA metadata file containing the self-reported ethnicity, sex and other research phenotypic variables.
2SG10K_Health VCF (r5.3) Whole genome GATK joint variant calling of 9,770 individuals of Chinese, Indian and Malay ethnicities containing 179,418,971 variants. 
3SG10K_Health DNA methylation array Whole genome DNA methylation on Illumina Infinium Methylation EPIC array (850K) 
4SG10K_Health Structural Variants (r1.4) 73,035 structural variants derived from 5,487 SG10K_Health participants using Manta, MELT and SurVindel (Tan et al, Nat Comms, 2024 ) 

Wong et. al. The Singapore National Precision Medicine Strategy. Nature Genetic. 2023 

Data Access Platform 

  • For SG10K_Pilot dataset, approved researchers will be directed to the EGA portal to access the SG10K_Pilot data. 
  • For SG10K_Health dataset, approved researchers will access the data via the RAPTOR platform. Learn more about the RAPTOR platform here.