I²R Research Highlights

Image Analysis tool helps pick out human actions

Using deep-learning techniques to locate potential human activities in videos

When a police officer begins to raise a hand in traffic, human drivers realize that the officer is about to signal them to stop. But computers find it harder to work out people’s next likely actions based on their current behaviour.

Now, a team of A*STAR’s Institute for Infocomm Research (I2R) researchers and colleagues has developed the ‘YoTube’ detector that can successfully pick out where human actions will occur in videos, in almost real-time1.

It combines two types of neural networks in parallel: a static neural network, which has already proven to be accurate at processing still images, and a recurring neural network, typically used for processing changing data, for speech recognition.

The team tested YoTube on more than 3,000 videos routinely used in computer vision experiments. They report that it outperformed state-of-the-art detectors at correctly picking out potential human actions by approximately 20 per cent for videos showing general everyday activities and around 6 per cent for sports videos. The detector occasionally makes mistakes if the people in the video are small, or if there are many people in the background. Nonetheless, Zhu says, "we've demonstrated that we can detect most potential human action regions in an almost real-time manner."

Some of Krishnaswamy's collaborators will now look to validate the approach for possible neuroscience and clinical applications. She will investigate ways to further develop such statistical machine learning approaches for adjacent applications in medical image analysis where the goals are to resolve low signal-to-noise features, enhance reconstruction quality and ultimately reduce diagnostic error.

The A*STAR-affiliated researchers contributing to this research are from the Visual Intelligence department of Institute for Infocomm Research.

Extracted from:
A*STAR Research - Image Analysis tool helps pick out human actions

Paper can be found in:
IEEE Journal - Transactions on Image Processing: YoTube: Searching Action Proposal Via Recurrent and Static Regression Networks

1 Zhu, H., Vial, R., Lu, S., Peng, X., Fu, H., Tian, Y. & Cao, X. YoTube: Searching actionproposal via recurrent and static regression networks, IEEE Transactions on Image Processing 27, 2609–2622 (2018).