CMU researchers are presenting 156 papers on the Thirty-Ninth Annual Convention on Neural Info Processing Techniques (NeurIPS 2025), held from December 2nd-December seventh on the San Diego Conference. Here’s a fast overview of the areas our researchers are engaged on:
Listed below are our most frequent collaborator establishments:
This paper introduces an Encoder–Attender–Decoder (EAD) framework to review task-optimized neural networks for tactile processing utilizing lifelike whisker-based simulations. Convolutional recurrent neural networks (ConvRNNs) emerge as the best encoders, each for tactile categorization and for producing representations that intently match exercise in rodent somatosensory cortex, revealing a linear hyperlink between job efficiency and neural alignment. Notably, self-supervised contrastive ConvRNN fashions obtain neural matches corresponding to supervised coaching, indicating that label-free studying can seize biologically related tactile representations. These findings spotlight the significance of recurrent processing for understanding cortical tactile computation and for constructing sturdy embodied AI programs.
Authors: Yuxuan Zhou (CISPA Helmholtz Heart for Info Safety), Heng Li (Carnegie Mellon College), Zhi-Qi Cheng (College of Washington), Xudong Yan (Metropolis College of Macao), Yifei Dong (Carnegie Mellon College), Mario Fritz (CISPA Helmholtz Heart for Info Safety), Margret Keuper (College of Mannheim)
Label Smoothing is usually used to cut back overconfidence and enhance generalization, however it could actually paradoxically improve confidence in misclassified samples and collapse characteristic representations. This work analytically decomposes the LS loss, revealing an error-amplification time period that strengthens incorrect predictions and drives illustration collapse. To beat this, the authors suggest Max Suppression (MaxSup), which regularizes predictions uniformly by penalizing the top-1 logit as an alternative of the ground-truth logit. Experiments present that MaxSup preserves intra-class range, improves class separation, and persistently outperforms LS throughout large-scale classification and downstream duties.
Authors: Liwei Jiang (College of Washington), Yuanjun Chai (College of Washington), Margaret Li (College of Washington), Mickel Liu (College of Washington), Raymond Fok (College of Washington), Nouha Dziri (Allen Institute for AI), Yulia Tsvetkov (Division of Laptop Science, College of Washington), Maarten Sap (Carnegie Mellon College), Yejin Choi (UW => Stanford / NVIDIA)
This paper introduces INFINITY-CHAT, a large-scale dataset of 26,000 various open-ended person queries and a complete taxonomy of immediate varieties to guage creativity and variety in language mannequin outputs. Utilizing this useful resource, the authors establish a pronounced “Synthetic Hivemind” impact marked by each repetitive responses inside a single mannequin and hanging similarities throughout completely different fashions. The dataset additionally contains over 31,000 human annotations enabling evaluation of collective and particular person preferences. Outcomes present that current fashions and analysis strategies are poorly calibrated to idiosyncratic human judgments, highlighting dangers of homogenized AI outputs.
Authors: Zhengyang Geng (CMU), Mingyang Deng (Massachusetts Institute of Expertise), Xingjian Bai (Massachusetts Institute of Expertise), Zico Kolter (Carnegie Mellon College), Kaiming He (MIT)
The authors introduce MeanFlow, a principled one-step generative modeling framework based mostly on the idea of common velocity reasonably than the instantaneous velocity utilized in prior flow-matching strategies. The authors derive a proper id linking common and instantaneous velocities to information neural community coaching in a self-contained strategy requiring no pretraining, distillation, or curriculum studying. MeanFlow achieves robust outcomes, together with a 3.43 FID on ImageNet 256×256 with a single operate analysis, outperforming earlier one-step fashions. These outcomes considerably slender the efficiency hole between one-step and multi-step diffusion and flow-based strategies.
Authors: Xinyuan Wang (College of Hong Kong), Bowen Wang (College of Hong Kong), Dunjie Lu (SUN YAT-SEN UNIVERSITY), Junlin Yang (Tsinghua College), Tianbao Xie (the College of Hong Kong, College of Hong Kong), Junli Wang (Alibaba Group), Jiaqi Deng (The College of Hong Kong), Xiaole Guo (College of Hong Kong), Yiheng Xu (College of Hong Kong), Chen Wu (Carnegie Mellon College), Zhennan Shen (Shanghai Jiaotong College), Zhuokai Li (College of Hong Kong), Ryan Li (Laptop Science Division, Stanford College), Xiaochuan Li (Tsinghua College), Junda Chen (Harbin Institute of Expertise), Boyuan Zheng (The College of Hong Kong), Li Peihang (College of Hong Kong), Fangyu Lei (Institute of automation, Chinese language academy of science, Chinese language Academy of Sciences), Ruisheng Cao (Shanghai Jiaotong College), Yeqiao Fu (College of Hong Kong), Dongchan Shin (College of Hong Kong), Martin Shin (College of Hong Kong), Hu Jiarui (College of Hong Kong), Yuyan Wang (Johns Hopkins College), Jixuan Chen (College of California, San Diego), Yuxiao Ye (The Hong Kong College of Science and Expertise), Danyang Zhang (Shanghai Jiao Tong College), Yipu Wang (Institute of automation, Chinese language academy of science, Chinese language Academy of Sciences), Heng Wang (College of Illinois Urbana-Champaign), Diyi Yang (Stanford College), Victor Zhong (College of Waterloo), Y.Charles (Moonshot AI), Zhilin Yang (Tsinghua College, Tsinghua College), Tao Yu (College of Hong Kong)
This paper introduces OpenCUA, an open-source framework designed to allow clear analysis into computer-use brokers constructed with imaginative and prescient–language fashions. The framework contains an annotation system for gathering human demonstrations, AgentNet, a large-scale dataset spanning three working programs and 200+ functions, and a scalable pipeline that converts demonstrations into state–motion knowledge with reflective chain-of-thought reasoning. Finish-to-end agent fashions educated with OpenCUA present robust benchmark efficiency, with OpenCUA-72B attaining a forty five.0% success price on OSWorld-Verified, setting a brand new open-source state-of-the-art.
Authors: Jiatong Shi (Carnegie Mellon College), Yifan Cheng (Huazhong College of Science and Expertise), Bo-Hao Su (Carnegie Mellon College), Hye-jin Shim (Carnegie Mellon College), Jinchuan Tian (Carnegie Mellon College), Samuele Cornell (Università Politecnica delle Marche), Yiwen Zhao (Faculty of Laptop Science, Carnegie Mellon College), Siddhant Arora (Carnegie Mellon College), Shinji Watanabe (Carnegie Mellon College)
This work presents ARECHO, an autoregressive chain-based framework for collectively evaluating a number of speech high quality metrics reminiscent of PESQ (Perceptual Analysis of Speech High quality), STOI (Quick-Time Goal Intelligibility), and MOS (Imply Opinion Rating), which historically differ in scale and assumptions. ARECHO introduces a complete tokenization pipeline, a dynamic classifier chain to mannequin inter-metric dependencies, and a confidence-oriented two-step decoding scheme to enhance inference reliability. Experiments present that ARECHO persistently outperforms baseline strategies throughout speech enhancement, technology analysis, and noisy-speech eventualities. The strategy additionally improves interpretability and suppleness by enabling reference-free analysis and subset metric queries.
Authors: Brandon Wooden (FAIR at Meta), Misko Dzamba (Fb), Xiang Fu (Periodic Labs), Meng Gao (Fb), Muhammed Shuaibi (FAIR, Meta), Luis Barroso-Luque (Fb), Kareem Abdelmaqsoud (Carnegie Mellon College), Vahe Gharakhanyan (Meta), John Kitchin (Carnegie Mellon College), Daniel Levine (Meta FAIR), Kyle Michel (Meta), Anuroop Sriram (Meta FAIR), Taco Cohen (Meta / FAIR), Abhishek Das (FAIR, Meta AI), Sushree Sahoo (Fb), Ammar Rizvi (Meta), Zachary Ulissi (FAIR, Meta AI), Larry Zitnick (Elementary AI Analysis at Meta AI)
This paper introduces Common Fashions for Atoms (UMA), a household of large-scale fashions designed to quickly and precisely predict properties from atomic simulations throughout chemistry and supplies science. Educated on over 500 million distinctive 3D atomic buildings spanning molecules, supplies, and catalysts, UMA leverages empirical scaling legal guidelines and a novel mixture-of-linear-experts structure to extend capability with out sacrificing pace. Evaluations present {that a} single UMA mannequin, with out fine-tuning, matches or outperforms specialised fashions throughout various functions.
Authors: Arnav Kumar Jain (College de Montreal), Vibhakar Mohta (Nuro Inc.), Subin Kim (Korea Superior Institute of Science & Expertise), Atiksh Bhardwaj (Cornell College), Juntao Ren (Stanford College), Yunhai Feng (Cornell College), Sanjiban Choudhury (Cornell College), Gokul Swamy (Carnegie Mellon College)
This work addresses a key limitation of behavioral cloning (BC) in imitation studying: BC solely teaches an agent to imitate professional actions at states the professional visited, leaving it unable to get well from errors. To beat this, the authors suggest SAILOR, which leverages studying to go looking (L2S) by coaching a world mannequin and a reward mannequin to plan and get well towards professional outcomes even after errors. SAILOR achieves steady and sample-efficient studying with out extra human corrections and persistently outperforms state-of-the-art diffusion-policy BC strategies throughout visible manipulation benchmarks. It additionally demonstrates robustness to nuanced failures and reward hacking, and the efficiency hole persists even when BC is educated with 5–10x extra demonstrations.
Authors: Jiajun Shi (Beijing College of Aeronautics and Astronautics), Jian Yang (Alibaba Group), Jiaheng Liu (Nanjing College), Xingyuan Bu (Alibaba Group), Jiangjie Chen (ByteDance Seed), Junting Zhou (Peking College), Kaijing Ma (Tongji College), Zhoufutu Wen (ByteDance Inc.), Bingli Wang (Sichuan Agricultural College), Yancheng He (Alibaba Group), Liang Tune (M-A-P), Hualei Zhu (Beijing College of Aeronautics and Astronautics), Shilong Li (Beijing College of Posts and Telecommunications), Xingjian Wang (Shanghai College of Electrical Energy), Wei Zhang (Beijing College of Aeronautics and Astronautics), Ruibin Yuan (Carnegie Mellon College), Yifan Yao (Beijing College of Posts and Telecommunications), Wenjun Yang (College School London, College of London), Yunli Wang (Kuaishou Expertise), Siyuan Fang (Beijing College of Posts and Telecommunications), Siyu Yuan (Fudan College), Qianyu He (Fudan College), Robert Tang (Yale College), Yingshui Tan (Alibaba Group), Wangchunshu Zhou (Guangdong OPPO Cell Telecommunications Corp.,Ltd.), ZHAO-XIANG ZHANG (Chinese language Academy of Sciences, China), Zhoujun Li (Beijing College of Aeronautics and Astronautics), Wenhao Huang (Key Laboratory of Machine Notion), Ge Zhang (College of Michigan – Ann Arbor)
The authors introduce KORGym, a dynamic analysis platform designed to comprehensively assess the reasoning talents of enormous language fashions (LLMs) and vision-language fashions (VLMs). Not like current domain-specific benchmarks, KORGym gives over 50 interactive video games in textual and visible codecs, together with multi-turn and reinforcement studying eventualities. Experiments on 19 LLMs and eight VLMs reveal constant reasoning patterns inside mannequin households and spotlight the superior efficiency of closed-source fashions. The platform additionally allows evaluation of things reminiscent of modality, reasoning methods, reinforcement studying approaches, and response size, offering a sturdy software for advancing reasoning analysis in complicated environments.
Authors: Zhiqiu Lin (Carnegie Mellon College), Siyuan Cen (College of Massachusetts at Amherst), Daniel Jiang (Carnegie Mellon College), Jay Karhade (CMU, Carnegie Mellon College), Hewei Wang (Carnegie Mellon College), Chancharik Mitra (CMU, Carnegie Mellon College), Yu Tong Tiffany Ling (CMU, Carnegie Mellon College), Yuhan Huang (Carnegie Mellon College), Rushikesh Zawar (Carnegie Mellon College), Xue Bai (Adobe Techniques), Yilun Du (Google Deepmind / Harvard), Chuang Gan (IBM), Deva Ramanan (Carnegie Mellon College)
This work presents CameraBench, a large-scale dataset and benchmark for evaluating digital camera movement understanding, comprising roughly 3,000 various movies annotated by means of a rigorous expert-driven course of. A key contribution is a taxonomy of digital camera movement primitives, developed with cinematographers, which captures motions that require each geometric and semantic understanding. Human research present that area experience and focused coaching considerably enhance movement recognition, reminiscent of distinguishing zoom from ahead translation. Evaluations reveal that Construction-from-Movement fashions battle with semantic motions, whereas generative video-language fashions battle with geometric ones, and fine-tuning a generative VLM on CameraBench allows robust efficiency throughout motion-augmented captioning, video QA, and video-text retrieval duties.
Authors: Weiwei Solar (Carnegie Mellon College), Haokun Liu (Division of Laptop Science, College of Toronto), Nikhil Kandpal (Division of Laptop Science), Colin Raffel (College of Toronto, Vector Institute and Hugging Face), Yiming Yang (CMU)
This paper presents AirRep, a scalable representation-based technique for coaching knowledge attribution (TDA) that learns task-specific, model-aligned representations optimized for measuring how coaching knowledge impacts mannequin predictions. AirRep incorporates a trainable encoder for attribution high quality and an attention-based pooling mechanism to estimate group-wise affect precisely. Educated utilizing a rating goal over subsets labeled by their empirical impact, AirRep matches the efficiency of gradient-based strategies like affect capabilities whereas being practically 100× extra environment friendly at inference.
Authors: Vijay Viswanathan (Carnegie Mellon College), Yanchao Solar (College of Maryland, School Park), Xiang Kong (Apple), Meng Cao (Apple), Graham Neubig (Carnegie Mellon College), Sherry Wu (Carnegie Mellon College)
This work introduces Reinforcement Studying from Guidelines Suggestions (RLCF), a technique for enhancing instruction-following in language fashions utilizing versatile, instruction-specific standards reasonably than mounted metrics like helpfulness or harmfulness. RLCF extracts checklists from directions and evaluates responses in opposition to every merchandise utilizing AI judges and verifier applications to compute rewards for reinforcement studying. Utilized to fashions like Qwen2.5-7B-Instruct, RLCF improves efficiency throughout 5 benchmarks, attaining notable features in exhausting satisfaction charges and win charges, and may also improve different fashions off-policy, reminiscent of Llama 3.1 8B Instruct and OLMo 2 7B Instruct. The authors launch their WildChecklists dataset, fashions, and code to help additional analysis in versatile instruction alignment.
Authors: Ziyang Cai (Princeton College), Nayoung Lee (College of Wisconsin-Madison), Avi Schwarzschild (Carnegie Mellon College), Samet Oymak (College of Michigan – Ann Arbor), Dimitris Papailiopoulos (College of Wisconsin-Madison)
This paper research size generalization in transformer language fashions—the flexibility to deal with longer inputs than seen throughout coaching—by means of the idea of job affiliation. The authors present that coaching on an extended, associated auxiliary job can enhance generalization to longer inputs on a goal job throughout algorithmic domains like arithmetic, string manipulation, and maze navigation. They discover comparable switch results in pretrained language fashions, suggesting pretraining gives reusable computational scaffolding. Mechanistic evaluation signifies that this size generalization switch is linked to the reuse of consideration heads between duties, highlighting how transformers leverage compositional inductive buildings.
Authors: Xinyu Yang (CMU), Yuwei An (Carnegie Mellon College), Hongyi Liu (Carnegie Mellon College), Tianqi Chen (Carnegie Mellon College), Beidi Chen (CMU / Amazon)
This work introduces Multiverse, a generative mannequin that allows natively parallel technology by internalizing a MapReduce paradigm with Map, Course of, and Scale back levels. The strategy contains Multiverse Curator for automated knowledge creation, Multiverse Consideration for separating parallel reasoning steps, and Multiverse Engine for dynamic sequential-parallel inference. After minimal fine-tuning, Multiverse-32B matches main autoregressive LLMs in efficiency whereas attaining as much as 2× speedup and higher scaling effectivity. The authors have open-sourced the total Multiverse ecosystem, together with fashions, knowledge, serving programs, and coaching pipelines.
Authors: Yujia Zheng (Carnegie Mellon College), Zhuokai Zhao (Meta), Zijian Li (Mohamed bin Zayed College of Synthetic Intelligence), Yaqi Xie (CMU), Mingze Gao (Meta Inc.), Lizhu Zhang (Meta), Kun Zhang (CMU & MBZUAI)
This work introduces thought communication, a paradigm for multi-agent interplay that goes past pure language by enabling brokers to share latent, mind-like representations instantly. The authors formalize this course of as a latent variable mannequin, proving that each shared and personal ideas, in addition to the worldwide construction of thought sharing amongst brokers, could be recognized and recovered with theoretical ensures. They develop a framework that extracts and distributes related latent ideas to brokers, enhancing collaboration throughout modalities. Experiments on artificial and real-world benchmarks validate the strategy, exhibiting that thought communication can unlock collaborative benefits past what is feasible with surface-level language-based exchanges.
Authors: Eray Can Elumar (CMU, Carnegie Mellon College), Cem Tekin (Bilkent College), Osman Yagan (Carnegie Mellon College)
This paper introduces CaMVo, a technique for labeling datasets with massive language fashions (LLMs) whereas retaining prices low. As an alternative of querying many LLMs for each instance, CaMVo adaptively chooses just a few fashions based mostly on how assured they’re prone to be. It makes use of concepts from contextual bandits (LinUCB) and a Bayesian confidence estimator to determine which fashions to question and how one can weight their votes—while not having any ground-truth labels. Experiments on MMLU and IMDB present that CaMVo matches or beats full majority voting however with far fewer LLM calls, making it a sensible strategy for environment friendly large-scale annotation.
The authors introduce C-MICL, a framework for studying constraints in optimization issues whereas guaranteeing that the ensuing options stay possible with excessive likelihood. Conventional discovered constraints can fail as a result of mannequin error or restricted knowledge, however C-MICL makes use of conformal prediction so as to add uncertainty-aware changes that guarantee feasibility at a user-specified confidence degree. The tactic works for each regression- and classification-based constraint studying and avoids the heavy computational overhead of ensemble approaches. Experiments present that C-MICL reliably meets feasibility targets, preserves robust optimization efficiency, and is considerably extra environment friendly, providing a principled option to mix machine studying with secure decision-making.
The authors current SuffixDecoding, a brand new speculative decoding technique tailor-made for rising AI workloads like LLM-based brokers, which generate lengthy, repetitive, and predictable sequences. Not like current speculative decoding approaches designed for various, impartial requests, SuffixDecoding makes use of suffix bushes to effectively cache and reuse lengthy stretches of previous tokens from prompts and mannequin outputs. It adaptively adjusts what number of tokens to take a position—increasing aggressively when predictions are prone to be accepted and backing off when uncertainty is larger. Experiments on agent-style duties reminiscent of SWE-Bench and Textual content-to-SQL present that SuffixDecoding can ship as much as 3.9× speedups, making it nicely fitted to quick, iterative agentic inference.
Authors: Seohong Park (UC Berkeley), Kevin Frans (UC Berkeley), Deepinder Mann (UC Berkeley), Benjamin Eysenbach (Princeton), Aviral Kumar (Carnegie Mellon College), Sergey Levine (UC Berkeley)
This paper examines why offline reinforcement studying (RL) typically fails to scale, even when given huge datasets, massive fashions, and ample compute. The authors discover that lengthy resolution horizons—the variety of steps required to propagate rewards—are a key bottleneck that stops customary offline RL algorithms from enhancing with extra knowledge. By way of intensive experiments, they present that decreasing the efficient horizon dramatically improves scalability and efficiency on difficult duties. Constructing on this perception, they introduce SHARSA, a easy horizon-reduction technique that achieves the strongest scaling habits and greatest asymptotic efficiency throughout their benchmarks.
Authors: Yuda Tune (Carnegie Mellon College), Dhruv Rohatgi (Massachusetts Institute of Expertise), Aarti Singh (CMU), J. Bagnell (Carnegie Mellon College)
This paper research when it’s higher to distill privileged professional insurance policies—which have entry to latent state data throughout coaching—versus instantly studying from partial observations in reinforcement studying. Utilizing a easy theoretical mannequin (the perturbed Block MDP) and managed locomotion experiments, the authors present that the trade-off relies upon strongly on how stochastic the underlying latent dynamics are. When the latent state is straightforward to deduce, distillation works nicely, however when it’s extremely stochastic, imitating the latent optimum coverage can truly harm efficiency. The outcomes present sensible steerage: the very best latent coverage isn’t at all times the very best one to distill, and deciding when to distill versus instantly studying relies on the underlying uncertainty construction of the duty.
Authors: Alexander Goldberg (Laptop Science Division, Faculty of Laptop Science), Giulia Fanti (CMU), Nihar Shah (CMU)
MERIT is a principled framework for utilizing randomized choice in settings like peer assessment or grant funding, the place evaluations are noisy and uncertainty could make deterministic rankings unreliable. As an alternative of counting on ad-hoc randomization, MERIT makes use of interval estimates (e.g., confidence intervals) to mannequin uncertainty after which optimizes for the worst-case anticipated variety of true top-k gadgets chosen. The authors develop a polynomial-time algorithm that scales to massive datasets and present that MERIT satisfies fascinating equity and robustness properties that current strategies lack. Experiments on artificial peer-review knowledge present that MERIT matches prior probabilistic strategies in anticipated efficiency whereas offering stronger ensures in worst-case eventualities.
Authors: Thomas Kuntz (EPFL – EPF Lausanne), Agatha Duzan (EPFL – EPF Lausanne), Hao Zhao (EPFL – EPF Lausanne), Francesco Croce (College of Tübingen), Zico Kolter (Carnegie Mellon College), Nicolas Flammarion (EPFL), Maksym Andriushchenko (ELLIS Institute Tübingen and MPI-IS)
OS-Hurt is a benchmark for evaluating the security of LLM-based pc use brokers that work together instantly with working system interfaces. OS-Hurt checks brokers throughout three hurt classes—deliberate misuse, immediate injection assaults, and mannequin misbehavior—utilizing 150 duties spanning functions like electronic mail, browsers, and code editors. An automatic choose evaluates each job efficiency and security, attaining robust settlement with human annotations. Evaluations of main brokers reveal that fashions typically adjust to unsafe instructions, are susceptible to immediate injections, and typically take unsafe actions, highlighting the necessity for sturdy security measures in these programs.
Authors: Pengrun Huang (College of California, San Diego), Chhavi Yadav (CMU), Kamalika Chaudhuri (FAIR, Meta and UCSD), Ruihan Wu (College of California, San Diego)
PropInfer is a benchmark designed to guage whether or not massive language fashions (LLMs) can leak delicate properties of the datasets used for fine-tuning, notably in domains like healthcare. It checks property inference beneath each question-answering and chat-completion setups. Two tailor-made assaults—a prompt-based technology assault and a shadow-model assault leveraging phrase frequency—are proposed to extract dataset-level data. Empirical outcomes present that these assaults can succeed throughout a number of pretrained LLMs, revealing an essential and beforehand underexplored privateness threat.
Authors: Hyeong Kyu Choi (College of Wisconsin-Madison, Laptop Sciences), Jerry Zhu (Carnegie Mellon College), Sharon Li (College of Wisconsin-Madison)
Multi-Agent Debate (MAD) improves massive language mannequin efficiency by having a number of brokers purpose collaboratively, however its key drivers had been unclear. By separating Majority Voting from inter-agent debate, experiments throughout seven NLP benchmarks present that the majority features come from majority voting reasonably than the talk itself. A theoretical evaluation fashions debate as a stochastic course of, revealing that debate alone doesn’t enhance anticipated correctness, although focused interventions that bias perception updates can improve its influence. These outcomes counsel that whereas MAD has potential, easy ensembling strategies typically stay a extra dependable and efficient strategy.
Authors: Ioannis Anagnostides (Carnegie Mellon College), Ioannis Panageas (UC Irvine), Tuomas Sandholm (CMU, Technique Robotic, Optimized Markets, Strategic Machine), Jingming Yan (College of California, Irvine)
The research analyzes the complexity of computing equilibria in team-based zero-sum video games and symmetric min-max optimization. It exhibits that discovering epsilon-Nash equilibria in 3-player adversarial workforce video games (2 vs. 1) is CLS-complete, resolving an open query about such video games. Moreover, computing symmetric equilibria in symmetric min-max issues is PPAD-complete, even for quadratic targets, and this extends to 6-player workforce video games (3 vs. 3), implying that widespread symmetric dynamics can not reliably converge. Lastly, computing non-symmetric equilibria with polynomial precision is FNP-hard, highlighting the basic issue of equilibrium computation in these settings.
Authors: Emile Anand (Georgia Institute of Expertise and Cognition Labs), Ishani Karmarkar (Stanford College), Guannan Qu (Carnegie Mellon College)
Scaling multi-agent reinforcement studying (MARL) is tough as a result of exponential progress of joint state and motion areas as brokers improve. SUBSAMPLE-MFQ introduces a technique that mixes subsampling brokers with mean-field Q-learning and a decentralized randomized coverage, permitting environment friendly studying for any subset of ok brokers. The algorithm’s runtime scales polynomially in ok, not the whole variety of brokers n, making it sensible for giant programs. Theoretical ensures present that the discovered coverage converges to the optimum coverage at a price of roughly 1 over root ok, impartial of the whole agent rely.
Authors: Zheng He (College of British Columbia), Roman Pogodin (Google), Yazhe Li (Microsoft), Namrata Deka (Carnegie Mellon College), Arthur Gretton (Google Deepmind / UCL), Danica J. Sutherland (College of British Columbia + Amii)
Conditional independence (CI) checks are central to duties like causal discovery and equity analysis, however they typically fail in follow regardless of theoretical ensures. Specializing in the Kernel-based Conditional Independence (KCI) take a look at, the work exhibits that many current CI checks are particular instances of a Generalized Covariance Measure. Sensible efficiency is basically pushed by errors in estimating the conditional imply, which have an effect on Kind I error, and by the selection of conditioning kernel, which influences take a look at energy however may also inflate false positives. These insights make clear why fashionable CI checks typically underperform and spotlight how cautious kernel and estimation decisions are essential for dependable outcomes.
Authors: Xiangcheng Zhang (Tsinghua), Yige Hong (Carnegie Mellon College), Weina Wang (Laptop Science Division, Carnegie Mellon College)
Heterogeneity creates main challenges in large-scale decision-making, particularly in weakly-coupled Markov resolution processes (WCMDPs) the place every subproblem has distinct dynamics. Within the totally heterogeneous setting, the authors present that an effectively computable coverage can obtain an O(1/root N) optimality hole in long-run common reward per subproblem because the variety of subproblems N grows. This work gives the primary asymptotic optimality assure for totally heterogeneous average-reward WCMDPs. Key to this result’s a novel use of projection-based Lyapunov capabilities that guarantee convergence of rewards and prices even beneath full heterogeneity.
Authors: Hyungjoo Chae (Georgia Institute of Expertise), Seonghwan Kim (Yonsei College), Junhee Cho (Yonsei College), Seungone Kim (Carnegie Mellon College), Seungjun Moon (Yonsei College), Gyeom Hwangbo (College of Seoul), Dongha Lim (Korea Superior Institute of Science & Expertise), Minjin Kim (Yonsei College), Yeonjun Hwang (Yonsei College), Minju Gwak (Yonsei College), Dongwook Choi (Chung-Ang College), Minseok Kang (Yonsei College), Gwanhoon Im (Yonsei College), ByeongUng Cho (Yonsei College), Hyojun Kim (Yonsei College), Jun Han (Yonsei College), Taeyoon Kwon (Yonsei College), Minju Kim (Yonsei College), Beong-woo Kwak (Yonsei College), Dongjin Kang (Yonsei College), Jinyoung Yeo (Yonsei College)
Net navigation poses a long-horizon sequential decision-making problem that goes past typical multimodal LLM duties, however step-level reward fashions have been missing. Net-Shepherd, the primary course of reward mannequin (PRM) for net navigation, evaluates trajectories at every step, enabling each coaching and test-time evaluation. The strategy is supported by the WebPRM Assortment, a 40K step-level dataset with annotated desire pairs, and WebRewardBench, a benchmark for evaluating PRMs. Experiments present Net-Shepherd outperforms GPT-4o by ~30 factors on WebRewardBench and improves coverage efficiency on WebArena-lite by 10.9 factors whereas decreasing verification value by 10×, demonstrating a sensible and environment friendly resolution for net navigation duties.
Blended-motive multi-agent reinforcement studying requires balancing particular person incentives with collective targets, which are sometimes in battle. The proposed adaptive conflict-aware gradient adjustment technique dynamically balances coverage gradients from particular person and collective targets, selling cooperation whereas preserving equity in task-specific rewards. Theoretical evaluation ensures monotonic enchancment in each collective and particular person outcomes, guaranteeing equity throughout brokers. Experiments in sequential social dilemma environments present that this strategy outperforms baselines in social welfare whereas sustaining equitable outcomes for all brokers.
Authors: Haoyang Fang (AWS), Boran Han (AWS), Nick Erickson (Amazon Net Providers), Xiyuan Zhang (AWS AI), Su Zhou (Carnegie Mellon College), Anirudh Dagar (AWS), Jiani Zhang (Google), Caner Turkmen (Amazon Net Providers), Tony Hu (AWS AI), Huzefa Rangwala (George Mason College), Ying Nian Wu (College of California, Los Angeles), Yuyang (Bernie) Wang (AWS AI), George Karypis (College of Minnesota, Minneapolis)
Authors: Muquan Yu (Chinese language College of Hong Kong), Mu Nan (College of Hong Kong), Hossein Adeli (Columbia College), Jacob Prince (Harvard College), John A. Pyles (College of Washington), Leila Wehbe (Carnegie Mellon College), Maggie Henderson (Carnegie Mellon College), Michael Tarr (Carnegie Mellon College), Andrew Luo (College of Hong Kong)
Authors: Jifan Zhang (Northwestern College), Fangxin Wang (College of Illinois at Chicago), Zihe Tune (College of Illinois at Chicago), Philip S Yu (UIC), Kaize Ding (Northwestern College), Shixiang Zhu (Carnegie Mellon College)
Authors: Yue Huang (College of Notre Dame ), Zhengzhe Jiang (Sichuan College), Xiaonan Luo (College of Notre Dame), Kehan Guo (college of notre dame), Haomin Zhuang (College of Notre Dame), Yujun Zhou (College of Notre Dame), Zhengqing Yuan (College of Notre Dame), Xiaoqi Solar (Massachusetts Institute of Expertise), Jules Schleinitz (California Institute of Expertise), Yanbo Wang (Mohamed bin Zayed College of Synthetic Intelligence), Shuhao Zhang (Carnegie Mellon College), Mihir Surve (College of Notre Dame), Nitesh Chawla (College of Notre Dame), Olaf Wiest (College of Notre Dame), Xiangliang Zhang (College of Notre Dame)
Authors: Yang Xiao (Hong Kong Polytechnic College), Jiashuo WANG (HKPU), Ruifeng Yuan (Hong Kong Polytechnic College), Chunpu Xu (Hong Kong Polytechnic College), Kaishuai Xu (Hong Kong Polytechnic College), Wenjie Li (The Hong Kong Polytechnic College), Pengfei Liu (Carnegie Mellon College)
Authors: Jiaqi Wei (Zhejiang College), Hao Zhou (South China College of Expertise), Xiang Zhang (College of British Columbia), Di Zhang (Shanghai Synthetic Intelligence Laboratory), Zijie Qiu (Fudan College), Noah Wei (Carnegie Mellon College), Jinzhe Li (Fudan College), Wanli Ouyang (Shanghai AI Lab), Siqi Solar (Fudan College)
Authors: Ziyang Ma (Shanghai Jiao Tong College), Yinghao Ma (Centre for Digital Music, Queen Mary College of London), Yanqiao Zhu (Shanghai Jiaotong College), Chen Yang (Shanghai Jiaotong College), Yi-Wen Chao (Nanyang Technological College), Ruiyang Xu (Shanghai Jiaotong College), Wenxi Chen (Shanghai Jiaotong College), Yuanzhe Chen (ByteDance Inc.), Zhuo Chen (ByteDance Inc.), Jian Cong (ByteDance Inc.), Kai Li (Tsinghua College, Tsinghua College), Keliang Li (, Chinese language Academy of Sciences), Siyou Li (Queen Mary College of London), Xinfeng Li (Nanyang Technological College), Xiquan Li (Shanghai Jiaotong College), Zheng Lian (Institute of automation, Chinese language academy of science, Chinese language Academy of Sciences), Yuzhe Liang (Shanghai Jiaotong College), Minghao Liu (2077AI), Zhikang Niu (Shanghai Jiaotong College), Tianrui Wang (Tianjin College), Wang Yuping (College of Science and Expertise of China), Yuxuan Wang (ByteDance), Yihao Wu (Nanyang Technological College), Guanrou Yang (Shanghai Jiaotong College), Jianwei Yu (Microsoft), Ruibin Yuan (Carnegie Mellon College), Zhisheng Zheng (College of Texas at Austin), Ziya Zhou (Hong Kong College of Science and Expertise), Haina Zhu (Shanghai Jiaotong College), Wei Xue (Hong Kong College of Science and Expertise), Emmanouil Benetos (Queen Mary College of London), Kai Yu (Shanghai Jiao Tong College), Eng-Siong Chng (Nanyang Technological College), Xie Chen (Shanghai Jiaotong College)
Authors: Joel Ye (Carnegie Mellon College), Fabio Rizzoglio (Northwestern College), Xuan Ma (Northwestern College), Adam Smoulder (CMU, Carnegie Mellon College), Hongwei Mao (College of Pittsburgh), Gary Blumenthal (College of Pittsburgh), William Hockeimer (College of Pittsburgh), Nicolas Kunigk (College of Pittsburgh), Dalton Moore (College of Chicago), Patrick Marino (Phantom Neuro), Raeed Chowdhury (None), J. Patrick Mayo (College of Pittsburgh), Aaron Batista (College of Pittsburgh), Steven Chase (None), Michael Boninger (College of Pittsburgh), Charles Greenspon (College of Chicago), Andrew B Schwartz (College of Pittsburgh), Nicholas Hatsopoulos (College of Chicago), Lee Miller (Northwestern College at Chicago), Kristofer Bouchard (Lawrence Berkeley Nationwide Laboratory), Jennifer Collinger (College of Pittsburgh), Leila Wehbe (Carnegie Mellon College), Robert Gaunt (College of Pittsburgh)
Authors: Chandler Smith (Oxford College), Marwa Abdulhai (College of California, Berkeley), Manfred Díaz (Mila, Quebec), Marko Tesic (College of Cambridge), Rakshit Trivedi (Massachusetts Institute of Expertise), Sasha Vezhnevets (DeepMind), Lewis Hammond (College of Oxford / Cooperative AI Basis), Jesse Clifton (Heart on Lengthy-Time period Threat), Minsuk Chang (Google Deepmind), Edgar Duenez-Guzman (Google DeepMind), John Agapiou (Google DeepMind), Jayd Matyas (DeepMind), Danny Karmon (Google DeepMind), Beining Zhang (College of Southampton ), Jim Dilkes (College of Southampton), Akash Kundu (Heritage Institute of Expertise), Hieu Minh Nguyen (Aside Analysis), Emanuel Tewolde (Carnegie Mellon College), Jebish Purbey (Tribhuvan College), Ram Mohan Rao Kadiyala (), Siddhant Gupta (Indian Institute of Expertise, Roorkee), Aliaksei Korshuk (Coframe), Buyantuev Alexander (Increased Faculty of Economics), Ilya Makarov (AIRI & ISP RAS), Gang Zhao (Shanghai Analysis Institute for Clever Autonomous Techniques, Tongji College), Rolando Fernandez (College of Texas at Austin), Zhihan Wang (College of Texas at Austin), Caroline Wang (The College of Texas at Austin | Google DeepMind), Jiaxun Cui (Meta), Lingyun Xiao (College of Texas at Austin), Di Shi (College of Texas at Austin), Yoonchang Sung (Nanyang Technological College), Muhammad Arrasy Rahman (The College of Texas at Austin), Peter Stone (The College of Texas at Austin, Sony AI), Yipeng Kang (Nationwide Key Laboratory of Common Synthetic Intelligence), Hyeonggeun Yun (Companoid Labs), Ananya Ananya (Stanford College), Taehun Cha (Korea College), Zhiqiang Wu (Tongji College), Elizaveta Tennant (College School London), Olivia Macmillan-Scott (UCL), Marta Segura (College School London, College of London), Diana Riazi (Division of Laptop Science, College School London, College of London), Fuyang Cui (College of Toronto), Sriram Ganapathi (College of Waterloo), Toryn Klassen (College of Toronto), Nico Schiavone (College of Toronto), Mogtaba Alim (College of Toronto), Sheila McIlraith (College of Toronto and Vector Institute), Manuel Rios (Universidad de los Andes), Oswaldo Peña (Universidad Nacional de Colombia), Carlos Rojas (Grupo Bancolombia), Manuela Chacon-Chamorro (Universidad de los Andes), Rubén Manrique (Universidad de Los Andes), Luis Felipe Giraldo (Universidad de Los Andes), Nicanor Quijano (Universidad de Los Andes), Yiding Wang (Peking College), Yuxuan Chen (the College of Hong Kong, College of Hong Kong), Fangwei Zhong (Beijing Regular College), Mengmeng Wang (State Key Laboratory of Common Synthetic Intelligence), Wenming Tu (Shanghai Jiaotong College), Zhaowei Zhang (Peking College), Ziang Chen (Tsinghua College, Tsinghua College), Zixia Jia (BigAI), Xue Feng (BIGAI), Zilong Zheng (Beijing Institute for Common Synthetic Intelligence), Chichen Lin (), Weijian Fan (Communication College of China), Chenao Liu (Communication College of China), Sneheel Sarangi (New York College Abu Dhabi), Ziyan Wang (King’s School London; Microsoft Analysis), shuqing shi (Kings School London), Yali Du (King‘s School London), Avinaash Anand Kulandaivel (None), Yang Liu (BIGAI), Wu Ruiyang (Communication College of China), Chetan Talele (None), 陆孙嘉 (Communication College of China), Gema Parreno (–), Shamika Dhuri (Carnegie Mellon College), Bain McHale (CMU, Carnegie Mellon College), Tim Baarslag (Centrum Wiskunde & Informatica / Eindhoven College of Expertise), Dylan Hadfield-Menell (MIT), Natasha Jaques (College of Washington, Google DeepMind), José Hernández-Orallo (Universitat Politècnica de València), Joel Leibo (DeepMind)
Authors: Runsong Zhu (The Chinese language College of Hong Kong), Ka-Hei Hui (Autodesk), Zhengzhe Liu (Carnegie Mellon College), Qianyi Wu (Monash College), Weiliang Tang (The Chinese language College of Hong Kong), Shi Qiu (The Chinese language College of Hong Kong), Pheng-Ann Heng (The Chinese language College of Hong Kong), Chi-Wing Fu (The Chinese language College of Hong Kong)
Authors: Yizhi Li (The College of Manchester), Ge Zhang (College of Michigan – Ann Arbor), Yinghao Ma (Centre for Digital Music, Queen Mary College of London), Ruibin Yuan (Carnegie Mellon College), Zhu (Guangdong OPPO Cell Telecommunications Corp.,Ltd.), Hangyu Guo (Alibaba Group), Yiming Liang (College of the Chinese language Academy of Sciences), Jiaheng Liu (Nanjing College), Noah Wang (), Jian Yang (Alibaba Group), Siwei Wu (Nanjing College of Science and Expertise), Xingwei Qu (College of Manchester), Jinjie Shi (Queen Mary, College of London), Xinyue Zhang (Nationwide College of Singapore), Zhenzhu Yang (China College of Geoscience Beijing), Yidan WEN (Northwest Polytechnical College Xi’an), Yanghai Wang (nanjing college), Shihao Li (nanjing college), ZHAO-XIANG ZHANG (Chinese language Academy of Sciences, China), Ruibo Liu (Google DeepMind), Emmanouil Benetos (Queen Mary College of London), Wenhao Huang (Key Laboratory of Machine Notion), Chenghua Lin (College of Manchester)
Authors: Yunlong Tang (College of Rochester), Pinxin Liu (College of Rochester), Mingqian Feng (College of Rochester), Zhangyun Tan (College of Rochester), Rui Mao (College of Rochester), Chao Huang (Division of Laptop Science, College of Rochester), Jing Bi (College of Rochester), Yunzhong Xiao (Carnegie Mellon College), Susan Liang (College of Rochester), Dangle Hua (College of Rochester), Ali Vosoughi (College of Rochester), Luchuan Tune (College of Rochester), Zeliang Zhang (College of Rochester), Chenliang Xu (College of Rochester)
Authors: Tianchen Zhao (Amazon), Xuanbai Chen (Carnegie Mellon College), Zhihua Li (Amazon), Jun Fang (Amazon AGI), DONGSHENG An (State College of New York, Stony Brook), Xiang Xu (Amazon), Zhuowen Tu (College of California, San Diego), Yifan Xing (Amazon)
Authors: Nikhil Kandpal (Division of Laptop Science), Brian Lester (Google DeepMind/College of Toronto), Colin Raffel (College of Toronto, Vector Institute and Hugging Face), Sebastian Majstorovic (EleutherAI), Stella Biderman (The Eleutherai Institute), Baber Abbasi (EleutherAI), Luca Soldaini (Allen Institute for AI), Enrico Shippole (Teraflop AI), A. Feder Cooper (Stanford College), Aviya Skowron (EleutherAI), Shayne Longpre (Massachusetts Institute of Expertise), Lintang Sutawika (Carnegie Mellon College), Alon Albalak (Lila Sciences), Zhenlin Xu (Boson AI), Guilherme Penedo (HuggingFace), Loubna Ben allal (Hugging Face), Elie Bakouch (Hugging Face), John Pressman (EleutherAI Institute), Honglu Fan (Google DeepMind), Dashiell Stander (EleutherAI), Guangyu Tune (EleutherAI), Aaron Gokaslan (MBZUAI Institute of Basis Fashions), John Kirchenbauer (College of Maryland, School Park), Tom Goldstein (College of Maryland), Brian Bartoldson (Lawrence Livermore Nationwide Laboratory), Bhavya Kailkhura (Lawrence Livermore Nationwide Laboratory), Tyler Murray (Allen Institute for Synthetic Intelligence)
Authors: Kiljae Lee (The Ohio State College), Ziqi Liu (Carnegie Mellon College), Weijing Tang (Carnegie Mellon College), Yuan Zhang (Ohio State College, Columbus)
Authors: Harsha Vardhan simhadri (Microsoft ), Martin Aumüller (IT College of Copenhagen), Matthijs Douze (Fb AI Analysis), Dmitry Baranchuk (Yandex), Amir Ingber (Pinecone), Edo Liberty (Yale College), George Williams (Ansible AI), Ben Landrum (Cornell College), Magdalen Manohar (Carnegie Mellon College), Mazin Karjikar (College of Maryland, School Park), Laxman Dhulipala (UMD), Meng Chen (Fudan College), Yue Chen (Fudan College), Rui Ma (Fudan College), Kai Zhang (Fudan College), Yuzheng Cai (Fudan College), Jiayang Shi (Fudan College), Weiguo Zheng (Fudan College), Yizhuo Chen (Fudan College), Jie Yin (Tencent), Ben Huang (Baidu)
Authors: Dongkeun Yoon (KAIST), Seungone Kim (Carnegie Mellon College), Sohee Yang (College School London, College of London), Sunkyoung Kim (LG AI Analysis), Soyeon Kim (LG Company), Yongil Kim (LG Company), Eunbi Choi (LG AI Analysis), Yireun Kim (LG AI Analysis), Minjoon Web optimization (KAIST)
Authors: Qingyun Chen (College of California, Santa Cruz), Sungjin Im (College of California, Santa Cruz), Ben Moseley (Carnegie Mellon College), Ryan Milstrey (College of California, Merced), Chenyang Xu (Zhejiang College), Ruilong Zhang (Technische Universität München)
Authors: Huiyi Wang (McGill College), Chun Kwang Tan (Northeastern College), Balint Hodossy (Imperial School London), Shirui Lyu (King’s School London, College of London), Pierre Schumacher (Max Planck Institute for Clever Techniques, Max-Planck Institute), James Heald (College School London, College of London), Kai Biegun (College School London, College of London), Samo Hromadka (Gatsby Computational Neuroscience Unit), Maneesh Sahani (Gatsby Unit, UCL), Gunwoo Park (KAIST), Beomsoo Shin (KAIST), JongHyeon Park (None), Seungbum Koo (KAIST), Chenhui Zuo (Tsinghua College, Tsinghua College), Chengtian Ma (Tsinghua College, Tsinghua College), Yanan Sui (Tsinghua College), Nick Hansen (UC San Diego), Stone Tao (College of California – San Diego), Yuan Gao (Carnegie Mellon College), Hao Su (UCSD), Seungmoon Tune (Stanford College), Letizia Gionfrida (King’s School London, College of London), Massimo Sartori (College of Twente), Guillaume Durandau (McGill College), Vikash Kumar (CMU / MyoLab), Vittorio Caggiano (MyoSuite)
Authors: Benjamin Li (Carnegie Mellon College), Shuyang Shi (Faculty of Laptop Science, Carnegie Mellon College), Lucia Romero (College of Pittsburgh), Huao Li (Massachusetts Institute of Expertise), Yaqi Xie (CMU), Woojun Kim (Carnegie Mellon College), Stefanos Nikolaidis (College of Southern California), Charles Lewis (College of Pittsburgh), Katia Sycara (Carnegie Mellon College), Simon Stepputtis (Virginia Polytechnic Institute and State College)
Authors: Junhong Shen (Carnegie Mellon College), Hao Bai (College of Illinois at Urbana-Champaign), Lunjun Zhang (College of Toronto), Yifei Zhou (College of California, Berkeley), Amrith Setlur (Carnegie Mellon College), Peter Tong (New York College), Diego Caples (AGI, Inc.), Nan Jiang (College of Illinois at Urbana-Champaign), Tong Zhang (UIUC), Ameet Talwalkar (CMU, Datadog), Aviral Kumar (Carnegie Mellon College)
Authors: Xeron Du (01.AI), Yifan Yao (Beijing College of Posts and Telecommunications), Kaijing Ma (Tongji College), Bingli Wang (Sichuan Agricultural College), Tianyu Zheng (Beijing College of Posts and Telecommunications), Zhu (Guangdong OPPO Cell Telecommunications Corp.,Ltd.), Minghao Liu (2077AI), Yiming Liang (College of the Chinese language Academy of Sciences), Xiaolong Jin (Purdue College), Zhenlin Wei (Harbin Engineering College), Chujie Zheng (Tsinghua College), Kaixin Deng (Hokkaido College), Shuyue Guo (Beijing College of Posts and Telecommunications), Shian Jia (Zhejiang College), Sichao Jiang (zhejiang college), Yiyan Liao (Peking College), Rui Li (Peking College), Qinrui Li (Cornell College), Sirun Li (Peking College), Yizhi Li (The College of Manchester), Yunwen Li (Chinese language College of Hong Kong(shenzhen)), Dehua Ma (Beijing College of Posts and Telecommunications), Yuansheng Ni (College of Waterloo), Haoran Que (Beijing College of Aeronautics and Astronautics), Qiyao Wang (henzhen Institute of Superior Expertise, Chinese language Academy of Sciences), Zhoufutu Wen (ByteDance Inc.), Siwei Wu (Nanjing College of Science and Expertise), Tianshun Xing (Beijing College of Posts and Telecommunications), 明 许 (01.AI), Zhenzhu Yang (China College of Geoscience Beijing), Noah Wang (), Junting Zhou (Peking College), yuelin bai (Shenzhen Institutes of Superior Expertise, Chinese language Academy of Sciences, Chinese language Academy of Sciences), Xingyuan Bu (Alibaba Group), chenglin cai (Huawei Applied sciences Ltd.), Liang Chen (Peking College), Yifan Chen (ByteDance Inc.), Cheng Chengtuo (Zhejiang College), Tianhao Cheng (Fudan College), Keyi Ding (2077AI), Siming Huang (College of Melbourne), HUANG YUN (nationwide college of singaore, Nationwide College of Singapore), Yaoru Li (Zhejiang College), Yizhe Li (Zhejiang College), Zhaoqun Li (Zhejiang College), Tianhao Liang (Zhejiang College), Chengdong Lin (Hangzhou Dianzi College), Hongquan Lin (College of Science and Expertise of China), Yinghao Ma (Centre for Digital Music, Queen Mary College of London), Zhongyuan Peng (Fudan College), Zifan Peng (The Hong Kong College of Science and Expertise (Guangzhou)), Qige Qi (ByteDance Inc.), Shi Qiu (Peking College), Xingwei Qu (College of Manchester), Shanghaoran Quan (Alibaba Group), Yizhou Tan (Harvard College), Zili Wang (stepfun), 王晨清 (abaka), Hao Wang (Beijing College of Aeronautics and Astronautics), Yiya Wang (Peking College), Yubo Wang (College of Waterloo), Jiajun Xu (Fb), Kexin Yang (Alibaba Group), Ruibin Yuan (Carnegie Mellon College), Yuanhao Yue (Fudan College), Tianyang Zhan (ByteDance Inc.), Chun Zhang (ByteDance Inc.), Jinyang Zhang (Peking College), Xiyue Zhang (Peking College), Owen Zhang (Division of Laptop Science, Princeton College), Yue Zhang (Suzhou College), Yongchi Zhao (Alibaba Group), Xiangyu Zheng (Fudan College), ChenghuaZhong (College of Science and Expertise Beijing), Yang Gao (Nanjing College), Zhoujun Li (Beijing College of Aeronautics and Astronautics), Dayiheng Liu (Alibaba Group), Qian Liu (TikTok (Singapore)), Tianyu Liu (Alibaba), Shiwen Ni (Shenzhen Institutes of Superior Expertise, Chinese language Academy of Sciences), Junran Peng (Institute of automation, Chinese language academy of science), Yujia Qin (Bytedance), Wenbo Su (Alibaba Group), Guoyin Wang (Alibaba Qwen Pilot), Shi Wang (Institute of Computing Science, Chinese language Academy of Sciences), Jian Yang (Alibaba Group), Min Yang (Shenzhen Institutes of Superior Expertise, Chinese language Academy of Sciences, Chinese language Academy of Sciences), Meng Cao (Mohamed bin Zayed College of Synthetic Intelligence), Xiang Yue (Carnegie Mellon College), ZHAO-XIANG ZHANG (Chinese language Academy of Sciences, China), Wangchunshu Zhou (Guangdong OPPO Cell Telecommunications Corp.,Ltd.), Jiaheng Liu (Nanjing College), Qunshu Lin (Abaka AI), Wenhao Huang (Key Laboratory of Machine Notion), Ge Zhang (College of Michigan – Ann Arbor)
Authors: Mucong Ding (Division of Laptop Science, College of Maryland, School Park), Bang An (College of Maryland, School Park), Tahseen Rabbani (College of Chicago), Chenghao Deng (College of Maryland), Anirudh Satheesh (College of Maryland, School Park), Souradip Chakraborty (College of Maryland, School Park), Mehrdad Saberi (Division of Laptop Science, College of Maryland, School Park), Yuxin Wen (College of Maryland), Kyle Sang (College of Maryland), Aakriti Agrawal (College of Maryland, School Park), Xuandong Zhao (UC Berkeley), Mo Zhou (Johns Hopkins College), Mary-Anne Hartley (EPFL), Lei Li (Carnegie Mellon College), Yu-Xiang Wang (UCSD), Vishal Patel (Johns Hopkins College), Soheil Feizi (College of Maryland), Tom Goldstein (College of Maryland), Furong Huang (College of Maryland)
Authors: Andy Zou (CMU, Grey Swan AI), Maxwell Lin (College of California, Berkeley), Eliot Jones (Grey Swan), Micha Nowak (Bayerische Julius-Maximilians-Universität Würzburg), Mateusz Dziemian (Impartial), Nick Winter (Grey Swan AI), Valent Nathanael (Grey Swan AI), Ayla Croft (Grey Swan AI), Xander Davies (College of Oxford), Jai Patel (UK AI Safety Institute), Robert Kirk (College School London), Yarin Gal (College of Oxford), Dan Hendrycks (Heart for AI Security), Zico Kolter (Carnegie Mellon College), Matt Fredrikson (CMU)
Authors: Daniel Pfrommer (Massachusetts Institute of Expertise), Zehao Dou (OpenAI), Christopher Scarvelis (MIT), Max Simchowitz (Carnegie Mellon College), Ali Jadbabaie (MIT)
Authors: Peng Xing (Nanjing College of Science and Expertise), Haofan Wang (Carnegie Mellon College), Yanpeng Solar (Nanjing College of Science and Expertise), wangqixun (Tencent Hunyuan), Baixu (ByteDance Inc.), Hao Ai (Beijing College of Aeronautics and Astronautics), Jen-Yuan Huang (Peking College), Zechao Li (Nanjing College of Science and Techonolgy)
Authors: Dravyansh Sharma (Toyota Technological Institute at Chicago), Colin White (Meta), Maria-Florina Balcan (Carnegie Mellon College)
Machine studying efficiency relies upon strongly on the info and on the selection of algorithms and hyperparameters, making hyperparameter tuning and algorithm choice important. We survey extensively used sensible strategies, together with Bayesian optimization, bandit-based approaches, and up to date strategies for giant language fashions reminiscent of scaling legal guidelines and parameterization-aware strategies, noting their restricted theoretical ensures. We then assessment current theory-driven advances that characterize how efficiency varies with hyperparameters for core algorithms—together with resolution bushes, linear fashions, and deep studying—enabling structure-aware tuning strategies with PAC generalization ensures. We conclude with open challenges in combining principled and sensible approaches, optimizing over high-dimensional or discrete areas, and scaling to distributed settings.
Authors: Pratyush Maini (Carnegie Mellon College/ DatologyAI), Joseph C. Gratz (Companion, Morrison Foerster LLP), A. Feder Cooper (Yale/Stanford)
Generative fashions are educated on huge datasets that usually include private knowledge and copyrighted content material. As lawsuits, rules, and requirements emerge, practitioners more and more want concrete, technically grounded steerage on how privateness and copyright regulation work together with the realities of contemporary mannequin growth. This tutorial connects knowledge privateness, memorization, and copyright. We are going to alternate between technical materials (assaults, defenses, measurement, and system design) and authorized evaluation (doctrines, energetic instances, and regulatory futures), with a give attention to sensible workflows that ML researchers, engineers, and coverage groups can undertake right now.
Authors: Adam Block (Columbia College), Dylan Foster (Microsoft Analysis), Max Simchowitz (Carnegie Mellon College)
This tutorial frames imitation studying (IL) as a unifying option to perceive supervised coaching of basis fashions—studying by imitating massive corpora of domain-specific demonstrations—throughout areas like massive language mannequin pre-training, robotics, and chemistry/life sciences. It surveys current idea on when and why IL works with highly effective generative fashions, explains the interventions and greatest practices the sphere has converged on, and factors to alternatives to raised join idea and follow. A central theme is how domain-specific settings form options, contrasting discrete issues like language modeling with continuous-control challenges in robotics. It additionally hyperlinks strategies throughout domains, casting next-token prediction as habits cloning with log-loss and relating publicity bias in technology to compounding error in management, whereas motivating instruments like motion chunking, rating matching, and interactive knowledge assortment.
Massive language fashions have made main features on reasoning duties by scaling test-time compute utilizing strategies like chain-of-thought and sampling, which may enhance efficiency past what pretraining alone delivers. Nevertheless, deploying extra test-time compute is difficult as a result of inference workloads are inclined to have low parallelism, irregular execution, heavy reminiscence I/O, and dynamic management stream—creating bottlenecks like consideration reminiscence overhead and poor compute utilization. The tutorial surveys each programs advances (e.g., extra environment friendly KV-cache administration, optimized consideration kernels, smarter scheduling) and algorithmic instructions (e.g., architectures and parallel technology higher suited to {hardware}). Its objective is to attach scaling idea with actual deployment constraints and inspire sensible, scalable LLM agent programs.
Authors: Ziqiao Ma (College of Michigan), Michael Saxon (College of Washington), Xiang Yue (Carnegie Mellon College/Meta)
This tutorial argues that trendy AI analysis wants a extra principled view of what benchmarks truly measure—and what they systematically miss—as fashions and use instances evolve. It maps out key pitfalls in right now’s benchmarking follow (particularly static metrics that fail to trace altering mannequin habits) and frames analysis as an epistemic design downside reasonably than only a leaderboard train. The tutorial then surveys rising paradigms—together with adversarial and dynamic benchmarks, mannequin arenas, scaled human analysis, simulators/sandboxes, and utilized interpretability—plus a panel to check views throughout the neighborhood.