COMPUTATIONAL TOOLS AND RESOURCES
BatAlign is an algorithm that can align next generation sequencing reads rapidly and accurately taking into account both mismatches and indels.
Learn MoreDNA methylation plays a crucial role in higher organisms. Coupling bisulfite treatment with next-generation sequencing enables the interrogation of 5-methylcytosine sites in the genome. However, bisulfite conversion introduces mismatches between the reads and the reference genome, which makes mapping of Illumina and SOLiD reads slow and inaccurate. BatMeth is an algorithm that integrates novel mismatch counting, list filtering, mismatch stage filtering and fast mapping onto two indexes to improve unique mapping rate, speed and precision. Experimental results show that BatMeth is faster and more accurate than existing tools.
Learn MoreCENTDIST is a novel web-application for identifying co-localized transcription factors around ChIP-seq peaks. Unlike traditional motif scanning program, CENTDIST does not require any user-specific parameters and the background. It automatically learns the best set of parameters for different motifs and ranks them based on the skewness of their distribution around ChIP-seq peaks.
Learn MoreInference of Spatial Organizations of Chromosomes Using Semi-definite Embedding Approach and Hi-C Data.
Learn MoreCWig is a format and toolkit for storing and analysing genome-wide density signal data. CWig files use small space and provide fast access operations. It was developed as an alternative for bigWig format from UCSC. The project aims to give flexible and convenience tools to support visualization and analysis process.
Learn MoreDfilter is a generalized signal detection tool for analyzing next-gen massively-parallel sequencing data by using ROC-AUC maximizing linear filter.
Hence it is an ideal tool for detecting peaks in tag-profile of ChIP-seq, DNase-seq, FAIRE-seq, ATAC-seq, MNase-seq, RIP-seq, CLIP-seq, ChIP-exo, Sono-seq etc.
Learn MoreEDDA
Experimental Design in Differential Abundance analysis (EDDA) is a tool for systematic assessment of the impact of experimental design and the statistical test used on the ability to detect differential abundance. EDDA can be used on data from a range of experiments including RNA-seq , ChIP-seq , Nanostring assays and Metagenomics sequencing. It is currently available as a web service and a R Bioconductor package.
Learn MoreThe G-SCI Test is a statistical test to detect chromatin QTLs (SNPs that are correlated with chromatin state variation in a population). The test can be applied to any collection of sequencing-based chromatin profiles. For example, it can be applied to ChIP-seq, ATAC-seq, DNase-seq and FAIRE-seq data. To enhance statistical power, the G-SCI test includes both peak height variation and allelic imbalance in the likelihood function.
LoFreq is a fast, sensitive and robust variant-caller for inferring SNVs and indels - including their somatic counterparts - from high-throughput sequencing data. It incorporates several sources of error inherent to next-gen sequencing (e.g. base-call, mapping- and alignment-errors) into its model. LoFreq is largely parameter-free and avoids heuristic filters, which allows for robust and sensitive calls in data from a variety of sequencing methodologies (targeted resequencing, exome, WGS, metagenomics etc.) or platforms (e.g. Illumina, Ion Torrent).
Learn MorePhen-Gen combines patients' disease symptoms and sequencing data with prior domain knowledge to identify the causative genes for rare disorders. It is the first algorithm to integrate disease symptoms for genome-wide predictions generating results in 15-30 minutes. Our results show that Phen-Gen outperforms existing methods by 13-58%. Phen-Gen aims to help patients with undiagnosed conditions. The software is available online and as a standalone.
Learn MoreRCA (Reference Component Analysis) is a computational approach for robust cell type annotation of single cell RNA sequencing data (scRNAseq). It provides a user-friendly framework incorporating multiple commonly used downstream analysis modules and can be easily applied to analyse both human and mouse data.
Learn MoreSimultaneously Learning DNA Motif along with Its Position and Sequence Rank Preferences through EM Algorithm
Learn MoreSIFT predicts whether an amino acid substitution affects protein function. It is widely used in bioinformatics, genetics, disease, and mutation studies. SIFT 4G is a faster version of SIFT that enables us to scale up and provide SIFT predictions for more organisms. It annotates and provides damaging/tolerated predictions for single nucleotide variants. For indels, only annotation is provided.
Learn MoreTACO, or Transcription factor Association from Complex Overrepresentation, is a program for motif complex analysis in regulatory genomic sequences. It takes as input any genome-wide set of regulatory elements and predicts cell-type–specific transcription factor dimers based on enrichment of their motif complexes. This is the first tool of such kind that can accommodate motif complexes composed of overlapping motifs, which are a characteristic feature of many known transcription factor dimers.
Learn MoreTherMos is a de novo motif discovery algorithm which estimates a position-specific energy matrix (PSEM) by fully exploiting the information of the ChIP-seq/ChIP-exo tag profile.
A*STAR celebrates International Women's Day
From groundbreaking discoveries to cutting-edge research, our researchers are empowering the next generation of female science, technology, engineering and mathematics (STEM) leaders.
Get inspired by our #WomeninSTEM