I²R Techs & Solutions
MERaLiON is Available for Download from Hugging Face
Overview of MERaLiON
Multimodal Empathetic Reasoning and Learning in One Network (MERaLiON)
MERaLiON is part of Singapore’s National Multimodal Large Language Model (LLM) Programme to expand Singapore’s capabilities in Artificial Intelligence (AI) research and innovation.The Programme was launched in collaboration with Singapore’s Infocomm Media Development Authority (IMDA) and AI Singapore (AISG), leveraging on the high-performance computing resources from the National Supercomputing Centre (NSCC) Singapore.
A cornerstone of this Programme is the development of multimodal LLMs that are localized for Singapore and the region to understand context and values related to the diverse cultures and languages of Southeast Asia.
MERaLiON draws on Institute for Infocomm Research's (I2R) transformative work in speech and language research that has been widely applied in language transcription and translation to support various public agencies and private sector companies.Successful projects include those implemented by the State Courts and SG Translate Together.
Developed to enhance the understanding of human communication dynamics through its multimodal integration, MERaLiON marks a significant leap forward in building the next bounds of AI capabilities for Singapore and the Southeast Asia region.
WHY MERaLiON
For better contextual understanding and versatility across different tasks, MERaLiON harnesses cutting-edge AI techniques to process and learn complementary patterns from diverse data sources in a single unified framework.The data sources include various forms of verbal, visual, auditory and audiovisual communication.
MERaLiON series excel in speech summarization, stance detection, inference, and contextual understanding, making it a versatile tool to power applications that demand deep understanding of context, intent, and interpretation of speech cues and paralinguistics nuances.
Key Features
We have designed our data pipelines, model training, and evaluation frameworks with a strong emphasis on scalability, robustness, and adaptability, ensuring the model's effectiveness across different tasks and environments.
The 1st phase leverages on multimodal and multilingual representation learning, alignment for more effective training and better model generalization to comprehend colloquial language and solve downstream tasks.Unique to MERaLiON, the model caters for code-switching and offers key features that include:
Multilingual Speech Transcription and Translation
Accurately transcribes and translates speech across multiple languages, ensuring seamless communication in diverse linguistic settingsSpeech Summarization
Generates concise and coherent summaries of lengthy speech recordings, enhancing accessibility and productivitySpeech Question and Answer
Provides accurate and contextually relevant answers to user queries by analysing and understanding spoken inputAudio Scene Understanding
Identifies and interprets the auditory environment to provide context-aware insights, such as recognizing background sounds and eventsPara-lingual Understanding
Analyses non-verbal elements of speech, such as tone, pitch, volume, intonation and non-lexical vocables to gain deeper insights into speaker intent and sentimentSupport Local Speech Understanding
Specializes in accurately processing the diverse linguistic landscape of Singapore and Southeast Asia, including Singlish, regional dialects, and accents, to promote inclusive and effective communication across multicultural communities
Potential Use Cases
Customer Interfacing Automation
Aid the interaction between callers and call-takers by automatically transcribe and analyse calls in multiple languages and dialects, extracting critical information to ensure urgent cases are promptly followed up, to improve customer satisfaction and overall efficiency such as in retail, e-commerce, banking and public servicesKnowledge Management and Discovery
Enable businesses to analyse and discover new, valuable insights from multimodal data (in text, speech, emotion and non-verbal formats) to gain deeper understanding of customers for providing personalized experiences and achieve better outcomes, such as in education or telemedicineAgentic Decision Making
Enable informed and autonomous choices for real-time, evidence-based decision making as an outcome of LLM managing and synthesizing insights from enormous datasets that exceed human capacity.Applications include anomaly detection for surveillance applications as well as real-time analysis and recommendations for workflow automation
Key Releases
MERaLiON – AudioLLM
Hugging Face Download
Technical Report
MERaLiON – Speech Encoder
Hugging Face Download
Technical Report
MERaLiON – TextLLM
Hugging Face Download
Technical Report
We aim to build a LLM ecosystem and foster strong expertise in developing and deploying scalable, impactful AI solutions of high value and relevance to citizenries and businesses.
We encourage tech companies and businesses to harness the collaborative power and contributions of the open-source community to develop more diverse representations that further enhance the MERaLiON model!
Evaluation Benchmarks and Leaderboard
Contact Institute for Infocomm Research
For collaboration or technical inquires, contact us here.
A*STAR celebrates International Women's Day
From groundbreaking discoveries to cutting-edge research, our researchers are empowering the next generation of female science, technology, engineering and mathematics (STEM) leaders.
Get inspired by our #WomeninSTEM