News
A*STAR CFAR Scientists Receive Prestigious J2C Certification
Congratulations to Dr Atsushi Nitanda, Principal Scientist, and Dr David Bossens, Senior Scientist, from A*STAR Centre for Frontier AI Research (A*STAR CFAR) on their paper, “Mirror Descent Policy Optimisation for Robust Constrained Markov Decision Processes,” receiving the Journal-to-Conference (J2C) certification.
Awarded by Transactions on Machine Learning Research (TMLR), this selective and prestigious certification is granted to only 10% of accepted TMLR papers, reflecting strong endorsement from both the Action Editor and reviewers. This recognition provides an opportunity for their work to be presented and gain visibility at leading international conferences, including the International Conference on Learning Representations (ICLR), International Conference on Machine Learning (ICML), and Conference on Neural Information Processing Systems (NeurIPS).
This achievement highlights A*STAR CFAR’s strengths in theory development, optimisation, reinforcement learning, and safe and robust AI. We extend our heartfelt congratulations to Dr Nitanda and Dr Bossens on this milestone.
Abstract:
Safety is a fundamental requirement for reinforcement learning systems. The emerging framework of robust constrained Markov decision processes (RCMDPs) enables the learning of high-performing policies that satisfy long-term constraints while providing guarantees under epistemic uncertainty.
In this work, the authors propose Mirror Descent Policy Optimisation with Robust Lagrangian (MDPO-Robust-Lag), which leverages mirror descent–based algorithms to jointly optimise the policy (as a maximiser) and the transition kernel (as an adversarial minimiser) over the Lagrangian of a constrained Markov decision process. The proposed method achieves convergence guarantees comparable to those of standard constrained Markov decision processes, while also introducing an algorithm for designing adversarial environments for general Markov decision processes.
Empirical evaluations on inventory management and continuous control tasks demonstrate the effectiveness of mirror descent policy optimisation for both constrained and unconstrained settings. In particular, MDPO-Robust-Lag delivers significant improvements in constrained performance under robustness tests when compared with baseline policy optimisation algorithms.
With real-world and virtual high-stakes applications in mind, the enhanced safety and robustness offered by this approach shows strong promise for deploying autonomous agents whose decisions better align with user-defined safety requirements and societal norms—even when simulation models are imperfect or poorly specified, such as during sim-to-real transfer.
> Read the full paper here.
> Learn more about the J2C Certification.
A*STAR celebrates International Women's Day

From groundbreaking discoveries to cutting-edge research, our researchers are empowering the next generation of female science, technology, engineering and mathematics (STEM) leaders.
Get inspired by our #WomeninSTEM