27 Papers Accepted at ICML 2026

Congratulations to the following scientists from A*STAR Centre for Frontier AI Research (A*STAR CFAR) on having their papers accepted at the International Conference on Machine Learning (ICML):

Prof Ivor Tsang, Director, A*STAR CFAR
Prof Ong Yew Soon, Chief Artificial Intelligence (AI) Scientist and Advisor
Dr Cheston Tan, Senior Principal Scientist
Dr Joey Zhou, Deputy Director, A*STAR CFAR and Principal Scientist
Dr Atsushi Nitanda, Principal Scientist
Dr Du Jiawei, Senior Scientist
Dr Feng Zeyu, Senior Scientist
Dr Yin Haiyan, Senior Scientist
Dr He Yang, Senior Scientist
Dr Lyu Yueming, Senior Scientist
Dr Qian Hangwei, Scientist
Dr Yu Xingrui, Scientist
Dr Zhang Jie, Scientist and Innovation Lead
Mr Poon Tze-Yang, Research Engineer

Held from 6 – 11 July 2026 in Seoul, ICML 2026 is globally recognised as one of the premier conferences showcasing cutting-edge research across machine learning, artificial intelligence, statistics, and data science, as well as key application domains including computer vision, computational biology, speech recognition, and robotics.

List of accepted papers:

1.	MEMO: Memory-Augmented Model Context Optimisation for Robust Multi-Turn Multi-Agent LLM Games Yunfei Xie, Kevin Wang, Bobby Cheng, Jianzhu Yao, Zhizhou Sha, Alexander Duffy, Yihan Xi, Hongyuan Mei, Cheston Tan, Chen Wei, Pramod Viswanath, Zhangyang Wang Multi-agent LLM game evaluations are unstable, as early deviations and prompt differences amplify across interactions, biasing outcomes and rankings. MEMO improves performance and stability by optimising inference context through memory retention and exploration, significantly boosting win rates with limited self-play.
2.	Bias in Zeroth-Order Normal Estimation for Decision-Based Attacks Feiyang Wang*, Hangwei Qian, Xingquan Zuo, Gang Chen, Ivor Tsang** We propose SAR, a plug-in query-efficient refinement that leverages sensitivity-aware rescaling to produce less perceptible hard-label adversarial examples across datasets and models.
3.	Provable Benefit of Curriculum in Transformer Tree-Reasoning Post-Training Dake Bu, Wei Huang, Andi Han, Atsushi Nitanda*, Hau-San Wong, Qingfu Zhang, Taiji Suzuki We show that, under outcome-only reward signals, reinforcement learning finetuning with both curriculum strategies achieves high accuracy with polynomial sample complexity, whereas non-curriculum counterpart encounters an exponential complexity bottleneck.
4.	Rethinking LLM Ensembling from the Perspective of Mixture Models Jiale Fu, Yuchu Jiang, PeiJun WU, Chonghan Liu, Joey Tianyi Zhou, Xu Yang In this paper, we propose the Mixture-model-like Ensemble (ME). By reinterpreting the ensemble as a mixture model, ME stochastically selects a single model at each step to generate the next token, thereby avoiding the need to explicitly compute the full ensemble distribution.
5.	Evolving Quantitative Reasoning through Self-Play in Digital Twin Markets Tianmi Ma, Wenxin Huang, Jiawei Du, Lin Li, Xian Zhong, Joey Tianyi Zhou Large Language Models (LLMs) exhibit strong capabilities in high-level semantic understanding and strategic planning, yet they suffer from persistent quantitative failure modes, such as imprecise computation and the illusion of quantitative coherence, which limit their reliability in high-stakes decision-making. To address these limitations, we decouple reasoning from computation by assigning LLMs to planning, analysis, and result interpretation, while delegating numerical computation and statistical inference to specialised external tools.
6.	OBJVanish: Prompt-Driven Generation of Physically Realisable 3D LiDAR-Invisible Objects Bing Li, Wuqi Wang, Yanan Zhang, Jingzheng Li, Haigen Min, Wei Feng, Xingyu Zhao, Jie Zhang, Qing Guo** We introduce OBJVanish, a prompt-driven text-to-3D adversarial generation framework that enables physically realisable attacks by generating 3D object models that are effectively invisible to LiDAR-based 3D object detectors.
7.	When Search Goes Wrong: Red-Teaming Web-Augmented Large Language Models Haoran Ou, Kangjie Chen, Xingshuo Han, Gelei Deng, Jie Zhang, Han Qiu, Tianwei Zhang, Kwok-Yan Lam Existing red-teaming methods are largely designed for standalone LLMs as they primarily focus on unsafe generation, ignoring risks emerging from the complex search workflow. To address this gap, we propose CREST-Search, a pioneering red-teaming framework for LLMs with web search.
8.	State-Dependent Safety Failures in Multi-Turn Language Model Interaction Pengcheng Li, Jie Zhang, Tianwei Zhang, Han Qiu, Zhang kejun, Weiming Zhang, Nenghai Yu, Wenbo Zhou We introduce STAR, a state-oriented diagnostic framework that treats dialogue history as a state transition operator and enables controlled analysis of safety behaviour along interaction trajectories.
9.	Approximate Proportionality in Online Fair Division Davin Choo, Winston Fu, Derek Khu, Tzeh Yuan Neoh, Tze-Yang Poon**, Nicholas Teh We show the first positive results for online fair division of indivisible goods, achieving approximately proportional allocation of goods to each agent.

10.	Generative Online Reinforcement Learning Chubin Zhang, Zhenglin Wan, Feng Chen, Fuchao Yang, Lang Feng, Yaxin Zhou, Xingrui Yu, Yang You, Ivor Tsang*, Bo An Building on the structural principle of decoupling from generation, we introduce GoRL (Generative Online Reinforcement Learning), an algorithm-agnostic framework that trains expressive policies from scratch by confining policy optimisation to a tractable latent space while delegating action synthesis to a conditional generative decoder.
11.	Letting Trajectories Spread: Quality-Preserving Control for Diverse Flow Matching Jingxuan Wu, Zhenglin Wan, Xingrui Yu, Yuzhe Yang, Bo An, Ivor Tsang*, Yang You We present a training-free, inference-time control mechanism that makes the flow itself diversity-aware by encouraging diversity through guidance that is geometrically decoupled from the model’s quality-seeking direction.
12.	Flow Inverse Reinforcement Learning Zhenglin Wan, Jingxuan Wu, Xingrui Yu, Chubin Zhang, Mingcong Lei, Bo An, Ivor Tsang*, Yang You Flow Inverse Reinforcement Learning introduces a teacher–student framework that uses a Flow Matching model trained on expert demonstrations to derive a reward model and behaviour regulariser, while a lightweight MLP student policy performs efficient online RL exploration. This design preserves distributional expressiveness while overcoming poor generalisation, gradient instability, and high inference cost, leading to better efficiency, robustness, and performance especially with suboptimal demonstrations.
13.	Advancing Analytic Class-Incremental Learning through Vision-Language Calibration Binyu Zhao, Wei Zhang, Xingrui Yu, Zhaonian Zou, Ivor Tsang We propose VILA, a dual-branch framework for class-incremental learning that addresses the failure modes of analytic learning with pre-trained models, specifically representation rigidity, through a two-level vision-language calibration strategy. It fuses task-adapted and frozen semantic features geometrically while using cross-modal priors to correct prediction bias, preserving analytic learning's efficiency while improving robustness across diverse benchmarks.
14.	Self-Calibrated Consistency can Fight Back for Adversarial Robustness in Vision-Language Models Jiaxiang Liu, Jiawei Du*, Xiao Liu, Shangyang Li, Songchen Ma, Changshuo Wang, Prayag Tiwari, Mingkun Xu Pre-trained VLMs like CLIP achieve strong zero-shot performance but are highly vulnerable to adversarial perturbations due to semantic and viewpoint fragility. We propose Self-Calibrated Consistency (SCC), a plug-and-play test-time defence that enforces semantic and spatial consistency to stabilise cross-modal alignment, significantly improving zero-shot robustness across 22 benchmarks while preserving accuracy and generalises well to other VLMs such as BioMedCLIP.
15.	Plan in Sandbox, Navigate in Open Worlds: Learning Physics-Grounded Abstracted Experience for Embodied Navigation Zhixuan Shen*, Jiawei Du, Ziyu Guo, Han Luo, Lilan Peng, Joey Tianyi Zhou**, Haonan Luo, Tianrui Li VLMs exhibit strong reasoning abilities, but embodied navigation remains constrained by limited aligned vision–control data and poor transfer from photorealistic simulators; to address this, we propose SAGE, a framework that trains agents in physics-grounded semantic abstractions, mimicking human mental simulation, and operates through Genesis, Evolution (with RL and adaptive clipping), and Navigation to achieve significantly improved performance and generalisation, including a +9.7% gain on A-EQA.
16.	DIVER: Diving Deeper into Distilled Data via Expressive Semantic Recovery Qianxin Xia, Zhiyong Shu, Wenbo Jiang, Jiawei Du, Jielei Wang, Guoming Lu Dataset distillation aims to synthesise compact, privacy-preserving proxy data, but existing single-stage methods often overfit to specific architectures and lose semantic generality. We propose a dual-stage framework that leverages pretrained diffusion models for expressive semantic recovery via semantic inheritance, guidance, and fusion, filtering architecture-specific noise while preserving intrinsic semantics. This design significantly improves cross-architecture generalisation with high efficiency, matching raw DiT processing time on ImageNet while using only 4 GB of GPU memory.
17.	CHESS: Chebyshev Spectral Synthesis for Trajectory Condensation Ruituo Wu, Hongyu Zhang, Qiang Wang, Jiawei Du, Wei Cui, Ce Zhu, Bing Li Learning from compressed, irregularly sampled sensor trajectories is fundamentally misaligned when optimised as discrete values, often producing non-physical artifacts and poor generalisation. We propose CHESS, a function-first synthesis framework that models continuous-time trajectories via low-rank spatial coherence and piecewise Chebyshev temporal parameterisation, enforcing physically meaningful structure. With theoretical smoothness guarantees, CHESS achieves superior performance under high compression while enabling strong cross-architecture generalisation and zero-shot adaptation across sampling rates.
18.	Beyond Soft Labels: Unifying Dataset Pruning and Distillation for Efficient Large-scale Compression Lingao Xiao*, Songhua Liu, Yang He**, Xinchao Wang DP and DD are becoming increasingly connected as recent DD methods rely more on original images and soft labels. We introduce a unified dataset compression benchmark, showing that soft labels can weaken the need for distillation and that pruning often performs better at small scales.

19.	VIA-SD: Verification via Intra-Model Routing for Speculative Decoding Yuchen Xian*, Yang He**, Yunqiu Xu, Yi Yang VIA-SD improves speculative decoding by introducing an intra-model routed slim verifier between direct acceptance and full-model verification. Instead of fully recomputing all rejected draft tokens, VIA-SD uses the slim verifier for medium-confidence cases, reducing expensive full-model calls. This multi-tier verification framework achieves consistent inference speedups.
20.	From 2D Grids to 1D Tokens: Reforming Shared Representations for Multimodal Image Fusion Yuchen Xian*, Yunqiu Xu, Yang He**, Yi Yang This work improves multimodal image fusion by replacing conventional 2D grid-based shared representations with compact 1D image tokens. Using TiTok tokenisation and Selective Token Editing, it updates only a small set of critical tokens to enhance global appearance consistency while preserving local details.
21.	Lifting Traces to Logic: Programmatic Skill Induction with Neuro-Symbolic Learning for Long-Horizon Agentic Tasks Jiejing Shao*, Haiyan Yin, Yueming Lyu, Xingrui Yu, Lanzhe Guo, Ivor Tsang**, James Kwok, Yufeng Li NSI introduces a Neuro-Symbolic Skill Induction framework that transforms transient interaction traces into modular, logic-grounded programs, enabling agentic systems to discover structured control flows and generalise complex cognitive skills from minimal examples.
22.	EvoCF: Multi-Agent Collaboration via Agentic Memory-Driven Evolutionary Counterfactual Planning Haotian Chi, Zeyu Feng, Xingrui Yu, Linibo Luo, Yew-Soon Ong, Ivor Tsang, Hechang Chen, Yi Chang, Haiyan Yin* We introduce EvoCF, an agentic, memory-driven embodied planning system that discovers robust multi-agent collaboration strategies by synthesising symbolic constraints from failures into an evolving rule library. Through evolutionary counterfactual reasoning and rule-conditioned mutations, the framework systematically explores plan variants to overcome the physical and coordination constraints that typically cause one-shot LLM planners to fail.
23.	HypoSpace: A Diagnostic Benchmark for Set-Valued Hypothesis Generation under Underdetermination Tingting Chen, Beibei Lin, Zifeng Yuan, Qiran Zou, Hongyu He, Anirudh Goyal, Yew-Soon Ong, Dianbo Liu HypoSpace is a diagnostic benchmark developed for ICML 2026 to evaluate Large Language Models (LLMs) on their ability to generate multiple valid hypotheses under underdetermined settings. It tests LLMs as samplers on structured tasks (causal graph inference, 3D voxel reconstruction, boolean genetics) using three key metrics: Validity, Uniqueness, and Recovery.
24.	Breaking Multi-Task Curse: Reward-Weighted Evolution for Black-Box Many-Task Optimisation Yanchi Li, Jiao Liu, Wenyin Gong, Qiong Gu, Yue Zhao, Yew-Soon Ong This paper identifies the “Multi-Task Curse” in evolutionary multi-tasking, where optimisation performance degrades across many low-similarity tasks due to evaluation budget dispersion and negative transfer. To address this, the authors propose MES-RET, a framework that uses reward-weighted evaluation, robust transfer mechanisms, and semantic parameter alignment to improve scalability and enable effective knowledge transfer across diverse optimisation and reinforcement learning tasks.
25.	Possibilistic Predictive Uncertainty for Deep Learning Yao Ni, Jeremie Houssineau, Yew-Soon Ong, Piotr Koniusz This paper introduces Dirichlet-approximated possibilistic posterior predictions (DAPPr), a new framework for modelling epistemic uncertainty in deep neural networks that aims to balance principled uncertainty estimation with computational efficiency. By leveraging possibility theory and approximating projected posteriors with learnable Dirichlet possibility functions, DAPPr achieves strong uncertainty quantification performance comparable to or better than existing evidential deep learning methods while retaining a mathematically grounded and efficient training approach.
26.	Olaf-World: Orienting Latent Actions for Video World Modelling Yuxin Jiang*, Yuchao Gu, Ivor Tsang**, Mike Zheng Shou Seq-REPA addresses the lack of action labels in world models by aligning latent actions using observable control effects across video sequences. It anchors latent actions to temporal feature differences, improving cross-context consistency. Built on this, Olaf-World pretrains action-conditioned models from passive video, enabling stronger zero-shot action transfer and more data-efficient adaptation than existing approaches.
27.	SpecForge: A Flexible and Efficient Open-Source Training Framework for Speculative Decoding Shenggui Li*, Chao Wang, Yikai Zhu, Yubo Wang, Fan Yin, Shuai Shi, Yonggang Wen, Ivor Tsang**, Tianwei Zhang Speculative decoding accelerates LLM inference by using a lightweight draft model to propose tokens verified in parallel by a larger model, reducing compute overhead. SpecForge enables efficient training with optimisations like target–draft decoupling and hybrid parallelism, achieving major speedups. SpecBundle provides high-quality draft models, improving adoption and delivering up to 4.48× inference and 9.9× training speed gains.

* denotes former student at A*STAR CFAR
** denotes former researcher at A*STAR CFAR
*** denotes current student at A*STAR CFAR
(accurate at the time of posting)

Learn more about ICML 2026.

News

27 Papers Accepted at ICML 2026

A*STAR celebrates International Women's Day