16 Papers Accepted at ICLR 2026

Congratulations to the following scientists from A*STAR Centre for Frontier AI Research (A*STAR CFAR) on having their papers accepted at the International Conference on Learning Representations (ICLR):

Prof Ivor Tsang, Director, A*STAR CFAR
Prof Ong Yew Soon, Chief Artificial Intelligence (AI) Scientist and Advisor
Dr Cheston Tan, Senior Principal Scientist
Dr Atsushi Nitanda, Principal Scientist
Dr Feng Zeyu, Senior Scientist
Dr He Tiantian, Senior Scientist
Dr Pan Yuangang, Senior Scientist
Dr Yin Haiyan, Senior Scientist
Dr He Xin, Scientist
Dr He Yang, Scientist
Dr Lyu Yueming, Scientist
Dr Qian Hangwei, Scientist
Dr Yao Yinghua, Scientist
Dr Yu Xingrui, Scientist
Mr Chen Caishun, Lead Research Engineer

Held from 23 – 27 April 2026 in Brazil, ICLR 2026 brings together leading researchers, industry practitioners, and students to advance the field of representation learning. As one of premier international conferences in artificial intelligence (AI) and machine learning (ML), ICLR is renowned for its open culture, high interactivity, and strong emphasis on innovation in deep learning and AI research.

List of accepted papers:

1.	SPIRAL: Self-Play on Zero-Sum Games Incentivises Reasoning via Multi-Agent Multi-Turn Reinforcement Learning Bo Liu, Simon Yu, Zichen Liu, Leon Guertler, Penghui Qi, Daniel Balcells, Mickel Liu, Cheston Tan**, Weiyan Shi, Min Lin, Wee Sun Lee, Natasha Jaques Self-play on multiple zero-sum language games enables large language models to acquire transferable reasoning skills, improving mathematical and general reasoning benchmarks by up to 10% without domain-specific training data.
2.	SeRI: Gradient-Free Sensitive Region Identification in Decision-Based Black-Box Attacks Feiyang Wang,* Xingquan Zuo, Hai Huang, Gang Chen, Hangwei Qian This paper presents Sensitive Region Identification (SeRI), a novel sensitive region identification approach that efficiently enhances adversarial perturbations based on sensitive region identification in decision-based attacks.
3.	Alternating Diffusion for Proximal Sampling with Zeroth Order Queries Hirohane Takagi, Atsushi Nitanda* This paper proposes an approximate proximal sampler using only zeroth-order potential evaluations, simulating the backward heat-flow step via a Gaussian-mixture model to obtain a Monte Carlo score estimator from sampleable components. With controlled score error, the method retains exponential convergence under isoperimetric conditions.
4.	Smooth Calibration Error: Uniform Convergence and Functional Gradient Analysis Futoshi Futami, Atsushi Nitanda Calibration is crucial for reliable probabilistic prediction, especially in high-risk settings. This work studies smooth calibration error (CE) and establishes a uniform convergence bound - the population smooth CE is bounded by the training smooth CE plus a generalisation gap.
5.	BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs Junxiao Yang, Jinzhe Tu, Haoran Liu, Xiaoce Wang, Chujie Zheng, Zhexin Zhang, Shiyao Cui, Caishun Chen, Tiantian He, Hongning Wang, Yew-Soon Ong*, Minlie Huang This paper presents BARREL, a framework that enhances the factual reliability of Large Reasoning Models by promoting concise, boundary-aware reasoning, allowing models to stay accurate while using “I don’t know” responses under uncertainty.
6.	Ghost in the Cloud: Your Geo-Distributed Large Language Models Training is Easily Manipulated Zichen Tang, Zhenheng Tang, Gaoning Pan, Buhua Liu, Xin He, Kunfeng Lai, Xiaowen Chu, Bo Li This paper investigates jailbreak risks in geo-distributed and federated training of large language models (LLMs). It demonstrates that even a single malicious client can inject harmful behaviour (jailbreak knowledge) into the global model during collaborative training.
7.	Masked Skill Token Training for Hierarchical Off-Dynamics Transfer Zeyu Feng, Haiyan Yin, Yew-Soon Ong, Harold Soh An offline hierarchical RL framework that enables policy transfer using observation demos. It proposes a masked Bellman update approach that abstracts dynamics shifts as constraints over a tokenisation space of temporally extended skills.
8.	Verification and Co-Alignment via Heterogeneous Consistency for Preference-Aligned LLM Annotations Cheng Chen*, Haiyan Yin, Ivor Tsang** This paper proposes a heterogeneous consistency estimation framework to improve the verification and alignment of preference-based annotations generated by large language models.
9.	DSA: Efficient Inference for Video Generation Models via Distributed Sparse Attention Shenggui Li*, Runyu Lu, Qiaoling Chen, Haiyan Yin, Yueming Lyu, Yonggang Wen, Ivor Tsang**, Tianwei Zhang This paper introduces distributed sparse attention to significantly accelerate inference in diffusion-based video generation models.
10.	FlowSearcher: Synthesising Memory-Guided Agentic Workflows for Web Information Seeking Keyi Xiang*, Zeyu Feng, Zhuoyi Lin, Yueming Lyu, Boyuan Shi , Yew-Soon Ong, Ivor Tsang, Haiyan Yin* This paper presents an agentic framework that optimises multi-step web information-seeking workflows through memory-guided planning and execution.
11.	Dataset Colour Quantisation: A Training-Oriented Framework for Dataset-Level Compression *Chenyue Yu, Lingao Xiao**, Jinhong Deng, Ivor Tsang, Yang He** Dataset Colour Quantisation (DCQ) is a unified framework that compresses visual datasets by reducing colour-space redundancy while preserving information crucial for model training.
12.	Mitigating Mismatch within Reference-based Preference Optimisation *Suqin Yuan, Xingrui Yu, Jiyang Zheng, Lei Feng, Dadong Wang, Ivor Tsang**, Tongliang Liu This paper shows that standard Direct Preference Optimisation (DPO) can stop learning too early when the reference model is wrong, and proposes HyPO, a one-line modification that conditionally ignores pessimistic reference signals to better align training with inference while keeping DPO’s stability and efficiency.
13.	Sample Reward Soups: Query-efficient Multi-Reward Guidance for Text-to-Image Diffusion Models Yinghua Yao, Yuangang Pan, Guoji Fu, Ivor Tsang This paper introduces SRSoup, the first inference-time soup strategy for Pareto-optimal sampling across the entire space of preferences, significantly reducing the number of queries required in the early stages without sacrificing performance.
14.	FZOO: Fast Zeroth-Order Optimiser for Fine‑Tuning Large Language Models towards Adam‑Scale Speed Sizhe Dang, Yangyang Guo, Yanjun Zhao, Haishan Ye, Xiaodong Zheng, Guang Dai, Ivor Tsang FZOO achieves fine‑tuning speed within the same order of magnitude as Adam for LLMs while using only inference‑level GPU memory, enabling efficient large scale LLM optimisation.
15.	TS²: Training with Sparsemax+, Testing with Softmax for Accurate and Diverse LLM Fine-Tuning Ziyang Xu, Ananthu Rajendran Pillai, Yinghua Yao, Yuangang Pan This paper introduces a practical, drop-in solution for fine-tuning LLMs that are both more accurate and creative.
16.	WaterDrum: Watermark-based Data-centric Unlearning Metric Xinyang Lu, Xinyuan Niu*, Gregory Kang Ruey Lau, Nhung Bui, Rachael Hwee Ling Sim, John Russell Himawan, Fanyu Wen, Chuan-Sheng Foo, See-Kiong Ng, Bryan Kian Hsiang Low This paper proposes the first data-centric unlearning metric for LLMs based on watermarking, enabling effective and measurable data removal.

* denotes former student at A*STAR CFAR
** denotes former researcher at A*STAR CFAR
*** denotes current student at A*STAR CFAR
(accurate at the time of posting)

Learn more about ICLR 2026.

News

16 Papers Accepted at ICLR 2026

A*STAR celebrates International Women's Day