Visual Intelligence

Visual perception and cognition is core to human intelligence to understand and function in the real world, by providing information on where we are, what objects are around, how they affect, and affected by us. We thrives to develop computational vision capabilities to augment (work alongside us) or automate (replace us on) these tasks.

Cognitive Vision

The Cognitive Vision Group explores and develops new hybrid intelligence capabilities in computational vision systems, harnessing the powerful deep neural networks and machine learning techniques, the robustness of probabilistic reasoning, as well as human-like knowledge representation with symbolic processing. These capabilities are applied on human robot collaborations using multi-modal visual grounding, collaborative learning and question-answering, task graph representation and learning, and task graph guided operation tracking and guidance. Other applications also include visual intelligent systems for industrial inspection tasks, such as airplane inspection for surface defect detection and infrastructure inspection in construction industries.

Visual Learning and Reasoning

Recent advances in machine learning, especially deep learning, have enabled many computer vision applications, such as object detection and image retrieval. However, we are still far from general human-level intelligence with the capability of reasoning and learning across different tasks, which requires going beyond images and objects. For instance, humans can easily answer an open-ended question about the visual content of a video clip, whereas it remains an extremely challenging task for computers to reason the correct answer.

Visual Learning and Reasoning focuses on learning spatiotemporal relationships among objects, and their interactions with agents (e.g., human, robot, or both), as well as hierarchical structures of tasks. Our main capabilities include learning cross-modal representations from instructional and demonstration videos, and reasoning by imagination, leveraging image/video synthesis using conditional generative adversarial networks.

Human Factors and Visual Interaction

As AI becomes more pervasive, considering factors that influence effective use of technology by human and their interactions and developing techniques that leads to better Human-AI collaboration is key. The HFVI group focuses on two areas: (1) Augmented Human and (2) AI Interactions. Augmenting the human cognition with intuitive visual cues and intelligent guidance can improve performance, decision making and learning. We have developed an AI-based task reasoning technology to assist with task guidance in a wearable AR environment. Although, AI in the last decade had spectacular success, the black-box nature makes it hard for the human to trust and use it. Our team is hence researching on enhancing interactions between Human and AI, so as to improve the explainability and experience when dealing with large multivariate data and machine intelligence.  We have developed a large-scale imaging and AI-diagnostic technology, and are working on improving the usability and interpretability of the AI for our key decision-makers.

Visual Modelling and Quantification

The Visual Modelling and Quantification Group is capable of visual data acquisition, 3D point cloud generation and reconstruction, visual modelling and synthesis, visual and geometrical analysis and quantification. Currently we apply our techniques in visual (both appearance and geometry) inspection, defect localization, change detection, 3D reconstruction, modelling and quantification from 2D images/videos and 3D scans, with the objective of improving the productivity of the inspection and decision making process

Spatial Computing

Spatial Computing focuses on understanding the 3D space around us and enabling intuitive user interactions with it. It is the underlying technology powering technology trends, such as Augment and Virtual Reality, as well as emerging applications of 3D spatial understanding to areas such as building and construction, logistics and retail.

The Spatial Computing group focuses on using portable and wearable multi-camera based systems for these mapping and tracking tasks, where videos tend to be jerky and less predictable. At present, the goal is to robustly and accurately map the interior of buildings, ranging from large office spaces to narrow corridors, in the time it takes to walk around the spaces to be mapped. Although not as accurate as laser scans, this system is faster, cheaper and can work even with people moving around while the system is mapping.
The second area of focus is towards building a true-3D computational holographic display, which is able to render each pixel at the same depth as the real world 3D space. We also work on acceleration of interactive hologram computation and developing calibration-free eye-tracking for foveated rendering.

News & Accolades