Computer Vision and Pattern Discovery



The group of Computer Vision and Pattern Discovery for BioImages uses advanced computer vision, machine learning and mathematical models to build better machines; for the improvement of health care and discovery of biological knowledge. The group analyses images of tissues, histological slides, radiology images and 2D/3D live cells assays. These images were acquired using a wide variety of imaging devices.

Novel Multiple Instance Learning Models Exploiting Coarse-level Labels For Fine-level Information

Given some big data and coarse-level labels, extracting fine-level information is a demanding yet rewarding challenge in data science. We developed a novel weakly supervised clustering framework utilizing big data and exploiting coarse-level labels to reveal fine-level details within the data. Given only the weak labels of whether an image contains metastases or not, this framework successfully segmented out breast cancer metastases in the lymph node sections. Our framework was based on the multiple instance learning (MIL) paradigm that learns the mapping between bags of instances and bag labels. One common component in all MIL methods is the MIL pooling filter, which obtains the bag-level representations from extracted features of instances. We introduced distribution-based pooling filters that obtain a bag-level representation by estimating marginal feature distributions. We formally proved that the distribution-based pooling filters are more expressive than the point estimate-based counterparts (like ‘max’ and ‘mean’ pooling) in terms of the amount of information captured while obtaining bag-level representations. Moreover, we empirically showed that models with distribution-based pooling filters perform equal or better than those with point estimate-based pooling filters on distinct real-world MIL tasks. Lastly, we developed a MIL model with a distribution pooling filter predicting tumor purity (percentage of cancer cells within a tissue section) from digital histopathology slides. Our model successfully predicted tumor purity in eight different TCGA cohorts and a local Singapore cohort. The predictions were highly consistent with genomic tumor purity values, which were inferred from genomic data and accepted as accurate for downstream analysis. Furthermore, our model provided tumor purity maps showing the spatial variation of tumor purity within sections, which can help better understand the tumor microenvironment.

Related Publications

Malignant Gland Detection in Prostate Core Needle Biopsies

Manual reading of core needle biopsy slides by pathologists is the gold standard for prostate cancer diagnosis. However, it requires the analysis of around 12 (6-18) biopsy cores, including hundreds of glands. It can be a tedious and challenging task to identify the few malignant glands among a large number of benign glands, especially for low-grade, low-volume prostate cancer. These few malignant glands can be easily overlooked, potentially resulting in missed therapeutic opportunities. To assist pathologists, we developed a deep learning-based pipeline to detect malignant glands in core needle biopsy slides of low-grade prostate cancer (Gleason Score 3+3 and 3+4). Our pipeline accepted a whole-slide image of prostate core needle biopsy as input, detected the glands within the slide, and finally classified each gland into benign or malignant. Our novel gland classification model processed multi-resolution patches of each gland and utilized both detailed morphology and neighboring spatial information from high (40x and 20x) and low (10x and 5x) resolution patches, respectively. We tested the pipeline end-to-end on the classification of tissue parts in core needle biopsies. The pipeline successfully classified the tissue parts (81 parts: 50 benign and 31 malignant), and an AUROC value of 0.997 (95% CI: 0.987 - 1.000) was obtained.

Intra-tumor Heterogeneity Through the Lens of Image Analysis

According to the World Health Organization, cancer is one of the major causes of death globally, and it is estimated to be responsible for 9.6 million deaths in 2018. This highly deadly disease starts in one cell or in a small group of cells that acquire mutations in their genetic material and become abnormal cells. Then, abnormal cells start to grow in an uncontrolled manner, which is called as cancer, come together and form tumors. Tumors are composed of abnormal cell groups with different genetic materials (so biological capabilities), which is called as intra-tumor heterogeneity, since cancer is a reiterative evolutionary process and abnormal cells are susceptible to further mutations during their lifetime. Hence, intra-tumor heterogeneity exists within the tumors and results in therapeutic failure and drug resistance in cancer. Therefore, intra-tumor heterogeneity is one of the key difficulties in cancer treatment.

We are developing deep learning models to predict the intratumor heterogeneity and reveal the histological features behind intra-tumor heterogeneity by analyzing histopathology images. We aim to support medical professionals in diagnosis, treatment plans, medication management and precision medicine of cancer in order to better address increased healthcare demands in the future. 

Machine learning guided discovery of highly selective antimicrobial peptides for treatment of drug resistant bacteria

Antimicrobial resistance (AMR, resistance of microbes to existing drugs) is an ongoing threat to human health with estimates showing a potential cumulative loss of US$100T/year and 700k-10M deaths by 20501. Our previous studies have shown that Synthetic Cationic Amphiphilic Polymers (SCAP, polycarbonates/peptides) possess strong antimicrobial activity with minimal resistance build-up2. Despite that, there is still a need to discover SCAPs with high selectivity toward bacteria over mammalian cells. Artificial intelligence (AI) techniques have been employed to accelerate the development of new, highly selective antimicrobial polymers. However, current AI methods utilize simplistic representations in creating models to identify antimicrobial non-hemolytic peptides that do not exploit their inherent graphical structure or membrane interaction information. We aim to design, implement and evaluate an AI model based on a temporal graphical neural network (to reflect the motions of SCAPs) for the prediction of non-hemolytic and antimicrobial activity of SCAP peptides.

  • O’Neill, J. "AMR Review Paper-Tackling a crisis for the health and wealth of nations." AMR Review Paper (2014).
  • Chin, Willy, et al. "A macromolecular approach to eradicate multidrug resistant bacterial infections while mitigating drug resistance onset." Nature communications 9.1 (2018): 1-14.

AI driven national Platform for CT cOronary angiography for clinicaL and industriaL applicatiOns (APOLLO)

Coronary artery disease (CAD), a blockage of the blood vessels, affects 6% of the general population and up to 20% of those over 65 years of age. CAD is a leading cause of cardiac mortality in Singapore and worldwide, with 19% of deaths in Singapore due to CAD (MOH website). Numbers of CAD cases are growing rapidly due to ageing and higher prevalence of diabetes. Computed Tomography Coronary Angiography (CTCA) is the first-line investigator for CAD as indicated by updated National Institute for Clinical Excellence (NICE) guidelines. Recent Prospective Multicenter Imaging Study for Evaluation of Chest Pain (PROMISE) and Scottish Computed Tomography of the Heart (SCOT-HEART) trials support CTCA as the dominant means for evaluating coronary anatomy and physiology because CTCA increases diagnostic certainty, improves efficiency of triage to invasive catheterization and reduces radiation exposure when compared with functional stress testing. Current practice of CAD report generation requires 3-6 hours of a CT specialist’s time for annotating the scans, and with inter-observer variability of 20%. In addition, there is no effective toolkit to analyse Agatston scores (a measure of calcified CAD), severity of stenosis, and plaque characterisation. These problems have strongly and severely constrained the effectiveness of CTCA as a diagnostic and research tool. We plan to build upon Singapore’s competitive advantages in artificial intelligence (AI) to provide a solution to these gaps. Our overall aim is to build an AI-driven CT Coronary Angiography platform for automated anonymization, reporting, Agatston scoring and plaque quantification in CAD. It is a “one-stop” platform spanning from diagnosis to clinical, management and prognosis, and aid in predicting therapy response in the pharmaceutical industries.

Assessing Coronary Artery Disease from Angiography Video Sequences

Coronary angiography is the gold standard imaging technique for visualizing the coronary arteries which aids in diagnosing coronary artery disease, and guiding patient management. Iodine-based contrast is injected into the coronary arteries and multiple moving X-ray images are acquired from different view angles around the patient torso. Cardiologists are trained to interpret the coronary angiogram, but this takes time and there may be interobserver disagreement. In a new collaboration with the National Heart Centre Singapore, we are exploring artificial intelligence approaches to analyzing X-ray video sequences with the goal of developing a quantitative assessment tool for repeatable and objective angiographic measurements.

Related Publications

Single-Cell In Situ Spatial Omics at Subcelluar Resolution (SCISSOR)

The SCISSOR is a program studying for convergence and disruption of the molecular diagnostics market. We collaborate with Shyam Prabhakar LAB in GIS (Genome Institute of Singapore) that master the spatial omics technique: MERFISH (multiplexed error-robust fluorescence in situ hybridization). Leveraging MERFISH, we can profile thousands of DNA and RNA targets, while retaining a tissue spatial context. Deploying cell segmentation method and assigning RNA molecules into single cell, we could obtain the cellular expression and locations. With spatial information, a new method BANKSY was proposed to discover tissue structures in spatial omics data by augmenting the transcriptomic profile of each cell with an average of the transcriptomes of its spatial neighbors. Apart from BANKSY, there are another three work packages, exploring sarcoma diagnostics, immunotherapy and liquid biopsy using MERFISH.

Related Publications

  • Singhal, Chou et al. (2022), bioRxiv 2022.04.14.488259


 Deputy Director (Training and Talent), Senior Principal Investigator  LEE Hwee Kuan   |    [View Bio]  
 Scientist CHENG Zi Yi, Nicholas 
 Senior Scientist II LIU Wei
 Senior Scientist I SINGH Malay
 Scientist MENG Zhenyu
 Scientist TAN Wei Ping Eddy
 Collaborator PARK Sojeong
 Senior Research Officer LIN Li
 Research Officer COPPOLA Davide
 Research Officer ZHANG Tianyi
 PhD Student CHEN Brian
 PhD Student REN Yu Jerome

Selected Publications

Journal Publications: Conference Publications: