I²R Techs & Solutions

MERaLiON is Available for Download from Hugging Face

Overview of MERaLiON

Multimodal Empathetic Reasoning and Learning in One Network (MERaLiON)

MERaLiON is part of Singapore’s National Multimodal Large Language Model (LLM) Programme to expand Singapore’s capabilities in Artificial Intelligence (AI) research and innovation.The Programme was launched in collaboration with Singapore’s Infocomm Media Development Authority (IMDA) and AI Singapore (AISG), leveraging on the high-performance computing resources from the National Supercomputing Centre (NSCC) Singapore. 

A cornerstone of this Programme is the development of multimodal LLMs that are localized for Singapore and the region to understand context and values related to the diverse cultures and languages of Southeast Asia.

MERaLiON draws on Institute for Infocomm Research's (I2R) transformative work in speech and language research that has been widely applied in language transcription and translation to support various public agencies and private sector companies.Successful projects include those implemented by the State Courts and SG Translate Together.

Developed to enhance the understanding of human communication dynamics through its multimodal integration, MERaLiON marks a significant leap forward in building the next bounds of AI capabilities for Singapore and the Southeast Asia region.

WHY MERaLiON

For better contextual understanding and versatility across different tasks, MERaLiON harnesses cutting-edge AI techniques to process and learn complementary patterns from diverse data sources in a single unified framework.The data sources include various forms of verbal, visual, auditory and audiovisual communication.

MERaLiON series excel in speech summarization, stance detection, inference, and contextual understanding, making it a versatile tool to power applications that demand deep understanding of context, intent, and interpretation of speech cues and paralinguistics nuances.

Key Features

We have designed our data pipelines, model training, and evaluation frameworks with a strong emphasis on scalability, robustness, and adaptability, ensuring the model's effectiveness across different tasks and environments.

The 1st phase leverages on multimodal and multilingual representation learning, alignment for more effective training and better model generalization to comprehend colloquial language and solve downstream tasks.Unique to MERaLiON, the model caters for code-switching and offers key features that include:

  • Multilingual Speech Transcription and Translation
    Accurately transcribes and translates speech across multiple languages, ensuring seamless communication in diverse linguistic settings

  • Speech Summarization
    Generates concise and coherent summaries of lengthy speech recordings, enhancing accessibility and productivity

  • Speech Question and Answer
    Provides accurate and contextually relevant answers to user queries by analysing and understanding spoken input

  • Audio Scene Understanding
    Identifies and interprets the auditory environment to provide context-aware insights, such as recognizing background sounds and events

  • Para-lingual Understanding
    Analyses non-verbal elements of speech, such as tone, pitch, volume, intonation and non-lexical vocables to gain deeper insights into speaker intent and sentiment

  • Support Local Speech Understanding
    Specializes in accurately processing the diverse linguistic landscape of Singapore and Southeast Asia, including Singlish, regional dialects, and accents, to promote inclusive and effective communication across multicultural communities

Potential Use Cases

  • Customer Interfacing Automation
    Aid the interaction between callers and call-takers by automatically transcribe and analyse calls in multiple languages and dialects, extracting critical information to ensure urgent cases are promptly followed up, to improve customer satisfaction and overall efficiency such as in retail, e-commerce, banking and public services

  • Knowledge Management and Discovery
    Enable businesses to analyse and discover new, valuable insights from multimodal data (in text, speech, emotion and non-verbal formats) to gain deeper understanding of customers for providing personalized experiences and achieve better outcomes, such as in education or telemedicine

  • Agentic Decision Making
    Enable informed and autonomous choices for real-time, evidence-based decision making as an outcome of LLM managing and synthesizing insights from enormous datasets that exceed human capacity.Applications include anomaly detection for surveillance applications as well as real-time analysis and recommendations for workflow automation

Key Releases

Download from Hugging Face

MERaLiON – AudioLLM
Hugging Face Download
Technical Report

MERaLiON – Speech Encoder
Hugging Face Download
Technical Report

MERaLiON – TextLLM
Hugging Face Download
Technical Report

We aim to build a LLM ecosystem and foster strong expertise in developing and deploying scalable, impactful AI solutions of high value and relevance to citizenries and businesses.  

We encourage tech companies and businesses to harness the collaborative power and contributions of the open-source community to develop more diverse representations that further enhance the MERaLiON model!

Evaluation Benchmarks and Leaderboard

AudioBench
SeaEval

Contact Institute for Infocomm Research

For collaboration or technical inquires, contact us here.