Visual Intelligence

A*STAR I²R creates innovative capabilities and advanced technologies to automate and augment visual intelligence. We aim to achieve human-AI symbiosis through visual analysis, Q&A, augmentation, reasoning, and interactive synthesis. To address practical problems, we develop fast, responsive visual understanding models for real-time interactions. This fosters synergy between human workers and AI applications, augmenting workflow without replacing human involvement.

Our research focuses on visual learning and reasoning through continual learning, neuroscience-inspired AI, and controllable image/video generation. We specialise in technologies such as 3D computer vision, image/video and point cloud analysis, and 3D geometry modeling and quantification. Notably, we conduct indoor mapping for building inspection and construction progress tracking and have successfully demonstrated warehouse content mapping and tracking.

We focus on these specific areas of research and development. 

Collaborative Visual Intelligence (CVI)

Envision seamless human-AI symbiosis from active observation and understanding of human-environment interactions, we explore the teaching of robot new skills through human demonstration. By observing human coworkers, robots acquire new skills, facilitated by multimodal sensory understanding and compositional learning. This allows robots to replicate human operations and apply learned skills in diverse contexts for seamless collaboration.

KUKA Innovation Award: AAAI-21 Best Demo Award
KUKA Innovation Award: AAAI-21 Best Demo Award

Visual Learning and Reasoning (VLR)

Enable machines to interpret visual data and reason about spatiotemporal relationships involving hierarchical task structures, we work on neuro-symbolic visual reasoning, unsupervised domain adaptation, and image/video generation to allow controllable instructional video creation across various domains.

Keyword-Aware Graph Networks
Keyword-Aware Graph Networks for Video Question Answering
Continuous Satellite Image
Continuous Satellite Image Learning and Reasoning

Visual Modelling and Quantification (VMQ)

Utilises AI for predictive analytics, leveraging texture and geometry from images, videos, and point clouds, we pioneer the shift towards multimodal spatial-temporal analysis, vital for visual intelligence in diverse industry domains. 

Multimodal visual modelling
Multimodal visual modelling, quantification and synthesis

Human Factors and Visual Intelligence (HFVI)

Develops real-time visual understanding and task reasoning technologies, for enhancing decision-making in wearable AR environments, and providing explainability involving complex data. We aim to augment humans in performing various tasks, utilising video streaming and neural network-based visual inference.

Real-Time Visual Understanding
Real-Time Visual Understanding for Augmenting Human Workers

Spatial Computing (SC)

Focus on enhancing efficiency and safety in building and construction through the integration of computer vision, AI, robotics, and 3D spatial mapping technologies, reducing the manpower required for applications in these domains. Additionally, efforts are directed towards developing a Large Multi-Modal Model specialised in construction safety, facilitating quick identification of safety issues in videos and assisting safety inspectors in assessing work practices and environmental safety measures.

Warehouse content mapping tracking