Research Projects

Advanced Techniques for handling Imbalanced/Unlabelled Data for Classification in Aerospace MRO industry

An Intelligent Data Expert Active Learning System (IDEALS) is developed with novel machine learning methods of hybrid rebalancing algorithm for handling imbalanced and unlabeled data to reduce the burden of tedious manual data labeling. With proposed hybrid rebalancing method, it is able to achieve an overall accuracy at 81% and training data requirement is reduced significantly.  The proposed method is found to be useful for aerospace MRO industry and many other industries wherever classification is needed for imbalanced datasets.

Contact PersonLi Xiang(

SIMTech team has worked with I2R team together to develop an Intelligent Data Expert Active Learning System (IDEALS) with novel machine learning algorithms. The IDEALS is a web-based system designed specifically to select a small number of key boundary unlabeled data records for domain experts to label, learn how the domain experts label the data and help with building classification models for reliable and accurate fault detection.  The system is found to be useful for aerospace MRO industry with the following features:

  1. Platform for domain expert to label data
  2. Active learning algorithm to choose the most informative data for domain expert to label
  3. Knowledge discovery and retention
  4. Model Building for automatic data classification


  • Minimize time needed to engage domain experts without compromising accuracy
  • Retention of domain expert's knowledge
  • Build up reliable models for accurate fault detection in condition-based maintenance with limited data
  • Continuous improvement of model with the domain expert's involvement
  • Web-based system allows access via any portable devices.

Accurate fault detection in condition-based maintenance, repair and overhaul (MRO) with limited data in Aerospace MRO industry

Problems Addressed 

Data in the aerospace domain is often: 

Imbalanced: Negative examples far exceed positive examples, e.g. Most of the sensor data from aircraft represent normal condition and few examples are available to train a model to recognise abnormal conditions. 

Unlabelled: A substantial amount of data is unlabelled as only domain experts can label them correctly and such experts do not have the time to do so, e.g., it is difficult to automatically derive from flight data the usage profile of an aircraft. 

Many techniques exist to manage such data generally for classification purposes but there is no easy way to determine the best technique based on the nature of the data and the context.