Research Pillars

Artificial General Intelligence (AGI)

Artificial General Intelligence (AGI) is increasingly framed as the pursuit of general-purpose agents that learn, plan, and adapt in open-ended environments, moving beyond the task-bound paradigm that has long defined conventional AI. These emerging systems build on prior experience to tackle unfamiliar challenges, reason over long-term objectives, and adapt fluidly as their worlds evolve. By accumulating knowledge that endures and compounds over time, they begin to drive discovery rather than merely execute predefined tasks, marking a decisive step toward general intelligence.

The pathway to AGI lies not simply in scaling models, but in expanding the scope of what intelligence can achieve. Each generation of systems has widened this horizon: from rule-based programmes that encoded explicit knowledge, to learning algorithms that extracted patterns from data, to large-scale models that generalise across tasks. Building on these foundations, today’s agentic systems are beginning to operate autonomously, plan over long horizons, and adapt continually. This trajectory is converging toward high-capacity AGI systems where agents can grow within open-ended worlds, self-improve through experience, and drive discovery across domains.

Fig. 1 Pathway to AGI

Core Capabilities

At A*STAR CFAR, we explain general intelligence through four primary capabilities:


  • Agency endows agents with the autonomy to set their own goals, plan over long horizons, and coordinate actions to achieve them.
  • Grounding anchors abstract representations in real-world perception and interaction, enabling knowledge to reflect the true structure of the world.
  • Adaptivity allows agents to reshape their policies through experience, improving with every encounter and generalising across tasks, environments, and timescales.
  • Prediction equips agents to model their environment, anticipate consequences, and choose actions that shape desired futures.

These capabilities form the foundation for intelligence to be open-ended, continually learning, discovering, and pushing the boundaries of what is possible.

Research Pillars

Progress toward AGI requires integrating decision-making, reasoning, generation, and embodiment into a coherent system of continually expanding capabilities. Our research is anchored in four closely interconnected pillars:

Reinforcement Learning

We push the frontier of adaptive decision-making, creating sample-efficient algorithms, lifelong learning frameworks, and preference-based reward systems that allow agents to acquire and refine skills over long horizons and evolving tasks.

Agentic Workflow

We develop agentic workflow optimisation systems that enable agents to construct, adapt, and evolve complex task structures with scalability, efficiency, and trustworthiness. Our research spans verifiable workflow representations that enforce formal correctness by design, self-improving evaluation mechanisms that drive continual refinement, and safety-constrained search algorithms that explore vast workflow spaces under limited resources.

Generative Models

We build controllable generators and world models that let agents simulate, imagine, and design within complex domains. Our research tackles the challenge of optimising generative models under sparse, black-box feedback, advancing methods for goal-directed diffusion, inverse design, and programmatic composition that steer models without labeled data or differentiable rewards. These capabilities allow agents to explore vast creative spaces while maintaining reliability and control.

Embodied AI

We pursue embodied systems that learn through perception and interaction. Our research integrates visual commonsense reasoning, skill composition, predictive world models, and lifelong adaptive control to create agents that can continually evolve within physical and social environments.

Fig. 2 A*STAR CFAR's core pillars of AGI research

Research Highlights

Verifiable Agentic Workflow Optimisation

MermaidFlow: Redefining Agentic Workflow Generation via Safety-Constrained Evolutionary Programming
ICML’25 Workshop on MAS

MermaidFlow introduces a new paradigm for agentic workflow generation, reframing multi-step reasoning as verifiable symbolic graphs rather than opaque code. This enables workflows to be safely evolved, reused, and optimised, overcoming fragility and hallucination issues common in LLM-generated plans.

  • Verifiable Workflow Representation with Mermaid: Represents workflows as Mermaid graphs, which not only offer a clearer and more modular structure than Python scripts, but also provide a built-in compiler that enforces workflow correctness. This makes workflows intrinsically verifiable, composable, and safe to evolve as the system grows in complexity.
  • Safety-Constrained Workflow Evolution: Applies evolutionary search operators under formal constraints to discover and optimise workflow structures without breaking correctness.
  • Robust Optimisation with Weak LLMs: Enables running robust workflow optimisation pipelines using lightweight models such as GPT-4o-mini, achieving strong performance on GSM8K, MATH, HumanEval, and MBPP at only a few dollars of cost.

By making workflows verifiable and evolvable, MermaidFlow enables reliable multi-step planning and scalable experimentation, pushing agentic reasoning beyond the limits of traditional LLM prompting.

Black-Box Target Generation

Fast Direct: Query-Efficient Online Black-Box Guidance for Diffusion-Model Target Generation
ICLR 2025

Building intelligent generative systems requires the ability to steer models using only sparse, black-box feedback, operating without labeled data, differentiable rewards, or retraining. Fast Direct is a novel framework for black-box guided generative modeling that enables stable and precise control over generation while maintaining high sample efficiency. The method casts generation as a sequential decision process and applies adaptive guidance updates that converge within just tens of sampling steps. This enables reliable optimisation even when feedback is sparse or expensive, making the approach practical for real-world black-box settings.

Fast Direct is built on two key components:

  • Guided Noise Sequence Optimisation (GNSO): identifies universal directions on the data manifold to align generation with external objectives.
  • Pseudo-Target Construction: transforms sparse or binary feedback into evolving surrogate targets, stabilising optimisation during inference.

This approach makes black-box-guided diffusion orders-of-magnitude more efficient (up to 44× fewer queries), opening a path toward goal-directed generation in domains where gradients and data are scarce.

Quality-Diversity Policy Search

Diversifying Policy Behaviours via Extrinsic Behaviour Curiosity
ICML 2025

QD-IRL is a framework that unifies quality–diversity (QD) search with inverse reinforcement learning (IRL) to enable policy discovery in sparse- and underspecified-reward environments. Rather than converging toward a single optimal behaviour, QD-IRL actively cultivates behavioural diversity while learning reward functions from expert demonstrations, allowing agents to uncover a broad repertoire of high-performing strategies that are robust, creative, and reusable.

At its core, QD-IRL alternates between collecting trajectories, inferring rewards from expert or task data, assigning exploration bonuses based on behavioural coverage in an evolving archive, and updating policies through gradient-based optimisation guided by both task reward and diversity signals. As the search progresses, the policy distribution shifts toward regions of the behaviour space that are both novel and high quality, expanding beyond the narrow scope of expert data.

This framework offers a principled route toward exploration-driven policy learning, equipping agents to generalise more effectively and thrive in domains where explicit reward specification is infeasible.