The group of Computer Vision and Pattern Discovery for BioImages uses advanced computer vision, machine learning and mathematical models to build better machines; for the improvement of health care and discovery of biological knowledge. The group analyses images of tissues, histological slides, radiology images and 2D/3D live cells assays. These images were acquired using a wide variety of imaging devices.
Given some big data and coarse-level labels, extracting fine-level information is a demanding yet rewarding challenge in data science. We developed a novel weakly supervised clustering framework utilizing big data and exploiting coarse-level labels to reveal fine-level details within the data. Given only the weak labels of whether an image contains metastases or not, this framework successfully segmented out breast cancer metastases in the lymph node sections. Our framework was based on the multiple instance learning (MIL) paradigm that learns the mapping between bags of instances and bag labels. One common component in all MIL methods is the MIL pooling filter, which obtains the bag-level representations from extracted features of instances. We introduced distribution-based pooling filters that obtain a bag-level representation by estimating marginal feature distributions. We formally proved that the distribution-based pooling filters are more expressive than the point estimate-based counterparts (like ‘max’ and ‘mean’ pooling) in terms of the amount of information captured while obtaining bag-level representations. Moreover, we empirically showed that models with distribution-based pooling filters perform equal or better than those with point estimate-based pooling filters on distinct real-world MIL tasks. Lastly, we developed a MIL model with a distribution pooling filter predicting tumor purity (percentage of cancer cells within a tissue section) from digital histopathology slides. Our model successfully predicted tumor purity in eight different TCGA cohorts and a local Singapore cohort. The predictions were highly consistent with genomic tumor purity values, which were inferred from genomic data and accepted as accurate for downstream analysis. Furthermore, our model provided tumor purity maps showing the spatial variation of tumor purity within sections, which can help better understand the tumor microenvironment. Related Publications
Manual reading of core needle biopsy slides by pathologists is the gold standard for prostate cancer diagnosis. However, it requires the analysis of around 12 (6-18) biopsy cores, including hundreds of glands. It can be a tedious and challenging task to identify the few malignant glands among a large number of benign glands, especially for low-grade, low-volume prostate cancer. These few malignant glands can be easily overlooked, potentially resulting in missed therapeutic opportunities. To assist pathologists, we developed a deep learning-based pipeline to detect malignant glands in core needle biopsy slides of low-grade prostate cancer (Gleason Score 3+3 and 3+4). Our pipeline accepted a whole-slide image of prostate core needle biopsy as input, detected the glands within the slide, and finally classified each gland into benign or malignant. Our novel gland classification model processed multi-resolution patches of each gland and utilized both detailed morphology and neighboring spatial information from high (40x and 20x) and low (10x and 5x) resolution patches, respectively. We tested the pipeline end-to-end on the classification of tissue parts in core needle biopsies. The pipeline successfully classified the tissue parts (81 parts: 50 benign and 31 malignant), and an AUROC value of 0.997 (95% CI: 0.987 - 1.000) was obtained.
According to the World Health Organization, cancer is one of the major causes of death globally, and it is estimated to be responsible for 9.6 million deaths in 2018. This highly deadly disease starts in one cell or in a small group of cells that acquire mutations in their genetic material and become abnormal cells. Then, abnormal cells start to grow in an uncontrolled manner, which is called as cancer, come together and form tumors. Tumors are composed of abnormal cell groups with different genetic materials (so biological capabilities), which is called as intra-tumor heterogeneity, since cancer is a reiterative evolutionary process and abnormal cells are susceptible to further mutations during their lifetime. Hence, intra-tumor heterogeneity exists within the tumors and results in therapeutic failure and drug resistance in cancer. Therefore, intra-tumor heterogeneity is one of the key difficulties in cancer treatment.
We are developing deep learning models to predict the intratumor heterogeneity and reveal the histological features behind intra-tumor heterogeneity by analyzing histopathology images. We aim to support medical professionals in diagnosis, treatment plans, medication management and precision medicine of cancer in order to better address increased healthcare demands in the future.
Antimicrobial resistance (AMR, resistance of microbes to existing drugs) is an ongoing threat to human health with estimates showing a potential cumulative loss of US$100T/year and 700k-10M deaths by 20501. Our previous studies have shown that Synthetic Cationic Amphiphilic Polymers (SCAP, polycarbonates/peptides) possess strong antimicrobial activity with minimal resistance build-up2. Despite that, there is still a need to discover SCAPs with high selectivity toward bacteria over mammalian cells. Artificial intelligence (AI) techniques have been employed to accelerate the development of new, highly selective antimicrobial polymers. However, current AI methods utilize simplistic representations in creating models to identify antimicrobial non-hemolytic peptides that do not exploit their inherent graphical structure or membrane interaction information. We aim to design, implement and evaluate an AI model based on a temporal graphical neural network (to reflect the motions of SCAPs) for the prediction of non-hemolytic and antimicrobial activity of SCAP peptides.
Coronary artery disease (CAD), a blockage of the blood vessels, affects 6% of the general population and up to 20% of those over 65 years of age. CAD is a leading cause of cardiac mortality in Singapore and worldwide, with 19% of deaths in Singapore due to CAD (MOH website). Numbers of CAD cases are growing rapidly due to ageing and higher prevalence of diabetes. Computed Tomography Coronary Angiography (CTCA) is the first-line investigator for CAD as indicated by updated National Institute for Clinical Excellence (NICE) guidelines. Recent Prospective Multicenter Imaging Study for Evaluation of Chest Pain (PROMISE) and Scottish Computed Tomography of the Heart (SCOT-HEART) trials support CTCA as the dominant means for evaluating coronary anatomy and physiology because CTCA increases diagnostic certainty, improves efficiency of triage to invasive catheterization and reduces radiation exposure when compared with functional stress testing. Current practice of CAD report generation requires 3-6 hours of a CT specialist’s time for annotating the scans, and with inter-observer variability of 20%. In addition, there is no effective toolkit to analyse Agatston scores (a measure of calcified CAD), severity of stenosis, and plaque characterisation. These problems have strongly and severely constrained the effectiveness of CTCA as a diagnostic and research tool. We plan to build upon Singapore’s competitive advantages in artificial intelligence (AI) to provide a solution to these gaps. Our overall aim is to build an AI-driven CT Coronary Angiography platform for automated anonymization, reporting, Agatston scoring and plaque quantification in CAD. It is a “one-stop” platform spanning from diagnosis to clinical, management and prognosis, and aid in predicting therapy response in the pharmaceutical industries.
Coronary angiography is the gold standard imaging technique for visualizing the coronary arteries which aids in diagnosing coronary artery disease, and guiding patient management. Iodine-based contrast is injected into the coronary arteries and multiple moving X-ray images are acquired from different view angles around the patient torso. Cardiologists are trained to interpret the coronary angiogram, but this takes time and there may be interobserver disagreement. In a new collaboration with the National Heart Centre Singapore, we are exploring artificial intelligence approaches to analyzing X-ray video sequences with the goal of developing a quantitative assessment tool for repeatable and objective angiographic measurements. Related Publications
The SCISSOR is a program studying for convergence and disruption of the molecular diagnostics market. We collaborate with Shyam Prabhakar LAB in GIS (Genome Institute of Singapore) that master the spatial omics technique: MERFISH (multiplexed error-robust fluorescence in situ hybridization). Leveraging MERFISH, we can profile thousands of DNA and RNA targets, while retaining a tissue spatial context. Deploying cell segmentation method and assigning RNA molecules into single cell, we could obtain the cellular expression and locations. With spatial information, a new method BANKSY was proposed to discover tissue structures in spatial omics data by augmenting the transcriptomic profile of each cell with an average of the transcriptomes of its spatial neighbors. Apart from BANKSY, there are another three work packages, exploring sarcoma diagnostics, immunotherapy and liquid biopsy using MERFISH. Related Publications
Lee Hwee Kuan is a Senior Principal Investigator of the Imaging Informatics division in Bioinformatics Institute. His current research work involves developing of computer vision aglorithms for clinical and biological studies. Hwee Kuan obtained his Ph.D. in 2001 in Theoretical Physics from Carnegie Mellon University with a thesis on liquid-liquid phase transitions and quasicrystals. He then held a joint postdoctoral position with Oak Ridge National Laboratory (USA) and University of Georgia where he worked on developing advanced Monte Carlo methods and nano-magnetism. In 2003, with an award from the Japan Society for Promotion of Science, Hwee Kuan moved to Tokyo Metropolitan University where he developed solutions to extremely long time scaled problems and a reweighting method for nonequilibrium systems. In 2005 he returned home to join Data Storage Institute, investigating novel recording methods such as hard disk recording via magnetic resonance. In 2006, he joined Bioinformatics Institute as a Principle Investigator in the Imaging Informatics Division.
Lee Hwee Kuan's current research focuses on for analysis of tissues, histological and cellular images. These images are obtained from light microscopy, including image data sets from high-throughput screens.
From groundbreaking discoveries to cutting-edge research, our researchers are empowering the next generation of female science, technology, engineering and mathematics (STEM) leaders. Get inspired by our #WomeninSTEM