Carnegie Mellon at NeurIPS 2025 – Machine Studying Weblog

CMU researchers are presenting 156 papers on the Thirty-Ninth Annual Convention on Neural Data Processing Techniques (NeurIPS 2025), held from December 2nd-December seventh on the San Diego Conference. Here’s a fast overview of the areas our researchers are engaged on:

Listed here are our most frequent collaborator establishments:

Oral Papers

Job-Optimized Convolutional Recurrent Networks Align with Tactile Processing within the Rodent Mind

Authors: Trinity Chung (Carnegie Mellon College), Yuchen Shen (Carnegie Mellon College), Nathan Kong (MIT), Aran Nayebi (Faculty of Pc Science, Carnegie Mellon College)

This paper introduces an Encoder–Attender–Decoder (EAD) framework to check task-optimized neural networks for tactile processing utilizing lifelike whisker-based simulations. Convolutional recurrent neural networks (ConvRNNs) emerge as the best encoders, each for tactile categorization and for producing representations that intently match exercise in rodent somatosensory cortex, revealing a linear hyperlink between activity efficiency and neural alignment. Notably, self-supervised contrastive ConvRNN fashions obtain neural suits corresponding to supervised coaching, indicating that label-free studying can seize biologically related tactile representations. These findings spotlight the significance of recurrent processing for understanding cortical tactile computation and for constructing strong embodied AI methods.

MaxSup: Overcoming Illustration Collapse in Label Smoothing

Authors: Yuxuan Zhou (CISPA Helmholtz Heart for Data Safety), Heng Li (Carnegie Mellon College), Zhi-Qi Cheng (College of Washington), Xudong Yan (Metropolis College of Macao), Yifei Dong (Carnegie Mellon College), Mario Fritz (CISPA Helmholtz Heart for Data Safety), Margret Keuper (College of Mannheim)

Label Smoothing is usually used to scale back overconfidence and enhance generalization, however it may paradoxically improve confidence in misclassified samples and collapse function representations. This work analytically decomposes the LS loss, revealing an error-amplification time period that strengthens incorrect predictions and drives illustration collapse. To beat this, the authors suggest Max Suppression (MaxSup), which regularizes predictions uniformly by penalizing the top-1 logit as an alternative of the ground-truth logit. Experiments present that MaxSup preserves intra-class variety, improves class separation, and constantly outperforms LS throughout large-scale classification and downstream duties.

Synthetic Hivemind: The Open-Ended Homogeneity of Language Fashions (and Past)

Authors: Liwei Jiang (College of Washington), Yuanjun Chai (College of Washington), Margaret Li (College of Washington), Mickel Liu (College of Washington), Raymond Fok (College of Washington), Nouha Dziri (Allen Institute for AI), Yulia Tsvetkov (Division of Pc Science, College of Washington), Maarten Sap (Carnegie Mellon College), Yejin Choi (UW => Stanford / NVIDIA)

This paper introduces INFINITY-CHAT, a large-scale dataset of 26,000 various open-ended consumer queries and a complete taxonomy of immediate varieties to guage creativity and variety in language mannequin outputs. Utilizing this useful resource, the authors establish a pronounced “Synthetic Hivemind” impact marked by each repetitive responses inside a single mannequin and putting similarities throughout completely different fashions. The dataset additionally contains over 31,000 human annotations enabling evaluation of collective and particular person preferences. Outcomes present that current fashions and analysis strategies are poorly calibrated to idiosyncratic human judgments, highlighting dangers of homogenized AI outputs.

Imply Flows for One-step Generative Modeling

Authors: Zhengyang Geng (CMU), Mingyang Deng (Massachusetts Institute of Expertise), Xingjian Bai (Massachusetts Institute of Expertise), Zico Kolter (Carnegie Mellon College), Kaiming He (MIT)

The authors introduce MeanFlow, a principled one-step generative modeling framework primarily based on the idea of common velocity reasonably than the instantaneous velocity utilized in prior flow-matching strategies. The authors derive a proper id linking common and instantaneous velocities to information neural community coaching in a self-contained method requiring no pretraining, distillation, or curriculum studying. MeanFlow achieves robust outcomes, together with a 3.43 FID on ImageNet 256×256 with a single operate analysis, outperforming earlier one-step fashions. These outcomes considerably slim the efficiency hole between one-step and multi-step diffusion and flow-based strategies.

Highlight Papers

OpenCUA: Open Foundations for Pc-Use Brokers

Authors: Xinyuan Wang (College of Hong Kong), Bowen Wang (College of Hong Kong), Dunjie Lu (SUN YAT-SEN UNIVERSITY), Junlin Yang (Tsinghua College), Tianbao Xie (the College of Hong Kong, College of Hong Kong), Junli Wang (Alibaba Group), Jiaqi Deng (The College of Hong Kong), Xiaole Guo (College of Hong Kong), Yiheng Xu (College of Hong Kong), Chen Wu (Carnegie Mellon College), Zhennan Shen (Shanghai Jiaotong College), Zhuokai Li (College of Hong Kong), Ryan Li (Pc Science Division, Stanford College), Xiaochuan Li (Tsinghua College), Junda Chen (Harbin Institute of Expertise), Boyuan Zheng (The College of Hong Kong), Li Peihang (College of Hong Kong), Fangyu Lei (Institute of automation, Chinese language academy of science, Chinese language Academy of Sciences), Ruisheng Cao (Shanghai Jiaotong College), Yeqiao Fu (College of Hong Kong), Dongchan Shin (College of Hong Kong), Martin Shin (College of Hong Kong), Hu Jiarui (College of Hong Kong), Yuyan Wang (Johns Hopkins College), Jixuan Chen (College of California, San Diego), Yuxiao Ye (The Hong Kong College of Science and Expertise), Danyang Zhang (Shanghai Jiao Tong College), Yipu Wang (Institute of automation, Chinese language academy of science, Chinese language Academy of Sciences), Heng Wang (College of Illinois Urbana-Champaign), Diyi Yang (Stanford College), Victor Zhong (College of Waterloo), Y.Charles (Moonshot AI), Zhilin Yang (Tsinghua College, Tsinghua College), Tao Yu (College of Hong Kong)

This paper introduces OpenCUA, an open-source framework designed to allow clear analysis into computer-use brokers constructed with imaginative and prescient–language fashions. The framework contains an annotation system for gathering human demonstrations, AgentNet, a large-scale dataset spanning three working methods and 200+ functions, and a scalable pipeline that converts demonstrations into state–motion information with reflective chain-of-thought reasoning. Finish-to-end agent fashions educated with OpenCUA present robust benchmark efficiency, with OpenCUA-72B reaching a forty five.0% success price on OSWorld-Verified, setting a brand new open-source state-of-the-art.

ARECHO: Autoregressive Analysis through Chain-Based mostly Speculation Optimization for Speech Multi-Metric Estimation

Authors: Jiatong Shi (Carnegie Mellon College), Yifan Cheng (Huazhong College of Science and Expertise), Bo-Hao Su (Carnegie Mellon College), Hye-jin Shim (Carnegie Mellon College), Jinchuan Tian (Carnegie Mellon College), Samuele Cornell (Università Politecnica delle Marche), Yiwen Zhao (Faculty of Pc Science, Carnegie Mellon College), Siddhant Arora (Carnegie Mellon College), Shinji Watanabe (Carnegie Mellon College)

This work presents ARECHO, an autoregressive chain-based framework for collectively evaluating a number of speech high quality metrics equivalent to PESQ (Perceptual Analysis of Speech High quality), STOI (Quick-Time Goal Intelligibility), and MOS (Imply Opinion Rating), which historically differ in scale and assumptions. ARECHO introduces a complete tokenization pipeline, a dynamic classifier chain to mannequin inter-metric dependencies, and a confidence-oriented two-step decoding scheme to enhance inference reliability. Experiments present that ARECHO constantly outperforms baseline strategies throughout speech enhancement, technology analysis, and noisy-speech situations. The method additionally improves interpretability and suppleness by enabling reference-free analysis and subset metric queries.

UMA: A Household of Common Fashions for Atoms

Authors: Brandon Wooden (FAIR at Meta), Misko Dzamba (Fb), Xiang Fu (Periodic Labs), Meng Gao (Fb), Muhammed Shuaibi (FAIR, Meta), Luis Barroso-Luque (Fb), Kareem Abdelmaqsoud (Carnegie Mellon College), Vahe Gharakhanyan (Meta), John Kitchin (Carnegie Mellon College), Daniel Levine (Meta FAIR), Kyle Michel (Meta), Anuroop Sriram (Meta FAIR), Taco Cohen (Meta / FAIR), Abhishek Das (FAIR, Meta AI), Sushree Sahoo (Fb), Ammar Rizvi (Meta), Zachary Ulissi (FAIR, Meta AI), Larry Zitnick (Elementary AI Analysis at Meta AI)

This paper introduces Common Fashions for Atoms (UMA), a household of large-scale fashions designed to quickly and precisely predict properties from atomic simulations throughout chemistry and supplies science. Educated on over 500 million distinctive 3D atomic buildings spanning molecules, supplies, and catalysts, UMA leverages empirical scaling legal guidelines and a novel mixture-of-linear-experts structure to extend capability with out sacrificing velocity. Evaluations present {that a} single UMA mannequin, with out fine-tuning, matches or outperforms specialised fashions throughout various functions.

A Clean Sea By no means Made a Expert SAILOR: Sturdy Imitation through Studying to Search

Authors: Arnav Kumar Jain (College de Montreal), Vibhakar Mohta (Nuro Inc.), Subin Kim (Korea Superior Institute of Science & Expertise), Atiksh Bhardwaj (Cornell College), Juntao Ren (Stanford College), Yunhai Feng (Cornell College), Sanjiban Choudhury (Cornell College), Gokul Swamy (Carnegie Mellon College)

This work addresses a key limitation of behavioral cloning (BC) in imitation studying: BC solely teaches an agent to imitate knowledgeable actions at states the knowledgeable visited, leaving it unable to get better from errors. To beat this, the authors suggest SAILOR, which leverages studying to look (L2S) by coaching a world mannequin and a reward mannequin to plan and get better towards knowledgeable outcomes even after errors. SAILOR achieves steady and sample-efficient studying with out extra human corrections and constantly outperforms state-of-the-art diffusion-policy BC strategies throughout visible manipulation benchmarks. It additionally demonstrates robustness to nuanced failures and reward hacking, and the efficiency hole persists even when BC is educated with 5–10x extra demonstrations.

KORGym: A Dynamic Recreation Platform for LLM Reasoning Analysis

Authors: Jiajun Shi (Beijing College of Aeronautics and Astronautics), Jian Yang (Alibaba Group), Jiaheng Liu (Nanjing College), Xingyuan Bu (Alibaba Group), Jiangjie Chen (ByteDance Seed), Junting Zhou (Peking College), Kaijing Ma (Tongji College), Zhoufutu Wen (ByteDance Inc.), Bingli Wang (Sichuan Agricultural College), Yancheng He (Alibaba Group), Liang Tune (M-A-P), Hualei Zhu (Beijing College of Aeronautics and Astronautics), Shilong Li (Beijing College of Posts and Telecommunications), Xingjian Wang (Shanghai College of Electrical Energy), Wei Zhang (Beijing College of Aeronautics and Astronautics), Ruibin Yuan (Carnegie Mellon College), Yifan Yao (Beijing College of Posts and Telecommunications), Wenjun Yang (College School London, College of London), Yunli Wang (Kuaishou Expertise), Siyuan Fang (Beijing College of Posts and Telecommunications), Siyu Yuan (Fudan College), Qianyu He (Fudan College), Robert Tang (Yale College), Yingshui Tan (Alibaba Group), Wangchunshu Zhou (Guangdong OPPO Cellular Telecommunications Corp.,Ltd.), ZHAO-XIANG ZHANG (Chinese language Academy of Sciences, China), Zhoujun Li (Beijing College of Aeronautics and Astronautics), Wenhao Huang (Key Laboratory of Machine Notion), Ge Zhang (College of Michigan – Ann Arbor)

The authors introduce KORGym, a dynamic analysis platform designed to comprehensively assess the reasoning skills of enormous language fashions (LLMs) and vision-language fashions (VLMs). In contrast to current domain-specific benchmarks, KORGym presents over 50 interactive video games in textual and visible codecs, together with multi-turn and reinforcement studying situations. Experiments on 19 LLMs and eight VLMs reveal constant reasoning patterns inside mannequin households and spotlight the superior efficiency of closed-source fashions. The platform additionally permits evaluation of things equivalent to modality, reasoning methods, reinforcement studying approaches, and response size, offering a strong software for advancing reasoning analysis in advanced environments.

In the direction of Understanding Digicam Motions in Any Video

Authors: Zhiqiu Lin (Carnegie Mellon College), Siyuan Cen (College of Massachusetts at Amherst), Daniel Jiang (Carnegie Mellon College), Jay Karhade (CMU, Carnegie Mellon College), Hewei Wang (Carnegie Mellon College), Chancharik Mitra (CMU, Carnegie Mellon College), Yu Tong Tiffany Ling (CMU, Carnegie Mellon College), Yuhan Huang (Carnegie Mellon College), Rushikesh Zawar (Carnegie Mellon College), Xue Bai (Adobe Techniques), Yilun Du (Google Deepmind / Harvard), Chuang Gan (IBM), Deva Ramanan (Carnegie Mellon College)

This work presents CameraBench, a large-scale dataset and benchmark for evaluating digicam movement understanding, comprising roughly 3,000 various movies annotated via a rigorous expert-driven course of. A key contribution is a taxonomy of digicam movement primitives, developed with cinematographers, which captures motions that require each geometric and semantic understanding. Human research present that area experience and focused coaching considerably enhance movement recognition, equivalent to distinguishing zoom from ahead translation. Evaluations reveal that Construction-from-Movement fashions battle with semantic motions, whereas generative video-language fashions battle with geometric ones, and fine-tuning a generative VLM on CameraBench permits robust efficiency throughout motion-augmented captioning, video QA, and video-text retrieval duties.

Enhancing Coaching Information Attribution with Representational Optimization

Authors: Weiwei Solar (Carnegie Mellon College), Haokun Liu (Division of Pc Science, College of Toronto), Nikhil Kandpal (Division of Pc Science), Colin Raffel (College of Toronto, Vector Institute and Hugging Face), Yiming Yang (CMU)

This paper presents AirRep, a scalable representation-based methodology for coaching information attribution (TDA) that learns task-specific, model-aligned representations optimized for measuring how coaching information impacts mannequin predictions. AirRep contains a trainable encoder for attribution high quality and an attention-based pooling mechanism to estimate group-wise affect precisely. Educated utilizing a rating goal over subsets labeled by their empirical impact, AirRep matches the efficiency of gradient-based strategies like affect features whereas being almost 100× extra environment friendly at inference.

Checklists Are Higher Than Reward Fashions For Aligning Language Fashions

Authors: Vijay Viswanathan (Carnegie Mellon College), Yanchao Solar (College of Maryland, School Park), Xiang Kong (Apple), Meng Cao (Apple), Graham Neubig (Carnegie Mellon College), Sherry Wu (Carnegie Mellon College)

This work introduces Reinforcement Studying from Guidelines Suggestions (RLCF), a way for enhancing instruction-following in language fashions utilizing versatile, instruction-specific standards reasonably than mounted metrics like helpfulness or harmfulness. RLCF extracts checklists from directions and evaluates responses towards every merchandise utilizing AI judges and verifier applications to compute rewards for reinforcement studying. Utilized to fashions like Qwen2.5-7B-Instruct, RLCF improves efficiency throughout 5 benchmarks, reaching notable positive factors in laborious satisfaction charges and win charges, and can even improve different fashions off-policy, equivalent to Llama 3.1 8B Instruct and OLMo 2 7B Instruct. The authors launch their WildChecklists dataset, fashions, and code to assist additional analysis in versatile instruction alignment.

Extrapolation by Affiliation: Size Generalization Switch In Transformers

Authors: Ziyang Cai (Princeton College), Nayoung Lee (College of Wisconsin-Madison), Avi Schwarzschild (Carnegie Mellon College), Samet Oymak (College of Michigan – Ann Arbor), Dimitris Papailiopoulos (College of Wisconsin-Madison)

This paper research size generalization in transformer language fashions—the power to deal with longer inputs than seen throughout coaching—via the idea of activity affiliation. The authors present that coaching on an extended, associated auxiliary activity can enhance generalization to longer inputs on a goal activity throughout algorithmic domains like arithmetic, string manipulation, and maze navigation. They discover related switch results in pretrained language fashions, suggesting pretraining gives reusable computational scaffolding. Mechanistic evaluation signifies that this size generalization switch is linked to the reuse of consideration heads between duties, highlighting how transformers leverage compositional inductive buildings.

Multiverse: Your Language Fashions Secretly Determine The way to Parallelize and Merge Era

Authors: Xinyu Yang (CMU), Yuwei An (Carnegie Mellon College), Hongyi Liu (Carnegie Mellon College), Tianqi Chen (Carnegie Mellon College), Beidi Chen (CMU / Amazon)

This work introduces Multiverse, a generative mannequin that permits natively parallel technology by internalizing a MapReduce paradigm with Map, Course of, and Cut back levels. The method contains Multiverse Curator for automated information creation, Multiverse Consideration for separating parallel reasoning steps, and Multiverse Engine for dynamic sequential-parallel inference. After minimal fine-tuning, Multiverse-32B matches main autoregressive LLMs in efficiency whereas reaching as much as 2× speedup and higher scaling effectivity. The authors have open-sourced the total Multiverse ecosystem, together with fashions, information, serving methods, and coaching pipelines.

Thought Communication in Multiagent Collaboration

Authors: Yujia Zheng (Carnegie Mellon College), Zhuokai Zhao (Meta), Zijian Li (Mohamed bin Zayed College of Synthetic Intelligence), Yaqi Xie (CMU), Mingze Gao (Meta Inc.), Lizhu Zhang (Meta), Kun Zhang (CMU & MBZUAI)

This work introduces thought communication, a paradigm for multi-agent interplay that goes past pure language by enabling brokers to share latent, mind-like representations immediately. The authors formalize this course of as a latent variable mannequin, proving that each shared and personal ideas, in addition to the worldwide construction of thought sharing amongst brokers, will be recognized and recovered with theoretical ensures. They develop a framework that extracts and distributes related latent ideas to brokers, enhancing collaboration throughout modalities. Experiments on artificial and real-world benchmarks validate the method, displaying that thought communication can unlock collaborative benefits past what is feasible with surface-level language-based exchanges.

Value-aware LLM-based On-line Dataset Annotation

Authors: Eray Can Elumar (CMU, Carnegie Mellon College), Cem Tekin (Bilkent College), Osman Yagan (Carnegie Mellon College)

This paper introduces CaMVo, a way for labeling datasets with giant language fashions (LLMs) whereas maintaining prices low. As a substitute of querying many LLMs for each instance, CaMVo adaptively chooses only some fashions primarily based on how assured they’re prone to be. It makes use of concepts from contextual bandits (LinUCB) and a Bayesian confidence estimator to resolve which fashions to question and easy methods to weight their votes—without having any ground-truth labels. Experiments on MMLU and IMDB present that CaMVo matches or beats full majority voting however with far fewer LLM calls, making it a sensible method for environment friendly large-scale annotation.

Conformal Blended-Integer Constraint Studying with Feasibility Ensures

Authors: Daniel Ovalle (Carnegie Mellon College), Lorenz Biegler (Carnegie Mellon College), Ignacio Grossmann (CMU, Carnegie Mellon College), Carl Laird (Carnegie Mellon College), Mateo Dulce Rubio (CMU)

The authors introduce C-MICL, a framework for studying constraints in optimization issues whereas guaranteeing that the ensuing options stay possible with excessive likelihood. Conventional discovered constraints can fail resulting from mannequin error or restricted information, however C-MICL makes use of conformal prediction so as to add uncertainty-aware changes that guarantee feasibility at a user-specified confidence stage. The tactic works for each regression- and classification-based constraint studying and avoids the heavy computational overhead of ensemble approaches. Experiments present that C-MICL reliably meets feasibility targets, preserves robust optimization efficiency, and is considerably extra environment friendly, providing a principled technique to mix machine studying with secure decision-making.

SuffixDecoding: Excessive Speculative Decoding for Rising AI Functions

Authors: Gabriele Oliaro (Carnegie Mellon College), Zhihao Jia (Faculty of Pc Science, Carnegie Mellon College), Daniel Campos (Zipf AI), Aurick Qiao (Snowflake)

The authors current SuffixDecoding, a brand new speculative decoding methodology tailor-made for rising AI workloads like LLM-based brokers, which generate lengthy, repetitive, and predictable sequences. In contrast to current speculative decoding approaches designed for various, unbiased requests, SuffixDecoding makes use of suffix bushes to effectively cache and reuse lengthy stretches of previous tokens from prompts and mannequin outputs. It adaptively adjusts what number of tokens to invest—increasing aggressively when predictions are prone to be accepted and backing off when uncertainty is increased. Experiments on agent-style duties equivalent to SWE-Bench and Textual content-to-SQL present that SuffixDecoding can ship as much as 3.9× speedups, making it properly fitted to quick, iterative agentic inference.

Horizon Discount Makes RL Scalable

Authors: Seohong Park (UC Berkeley), Kevin Frans (UC Berkeley), Deepinder Mann (UC Berkeley), Benjamin Eysenbach (Princeton), Aviral Kumar (Carnegie Mellon College), Sergey Levine (UC Berkeley)

This paper examines why offline reinforcement studying (RL) usually fails to scale, even when given huge datasets, giant fashions, and ample compute. The authors discover that lengthy choice horizons—the variety of steps required to propagate rewards—are a key bottleneck that forestalls normal offline RL algorithms from enhancing with extra information. By way of intensive experiments, they present that decreasing the efficient horizon dramatically improves scalability and efficiency on difficult duties. Constructing on this perception, they introduce SHARSA, a easy horizon-reduction methodology that achieves the strongest scaling conduct and finest asymptotic efficiency throughout their benchmarks.

To Distill or Determine? Understanding the Algorithmic Commerce-off in Partially Observable RL

Authors: Yuda Tune (Carnegie Mellon College), Dhruv Rohatgi (Massachusetts Institute of Expertise), Aarti Singh (CMU), J. Bagnell (Carnegie Mellon College)

This paper research when it’s higher to distill privileged knowledgeable insurance policies—which have entry to latent state data throughout coaching—versus immediately studying from partial observations in reinforcement studying. Utilizing a easy theoretical mannequin (the perturbed Block MDP) and managed locomotion experiments, the authors present that the trade-off relies upon strongly on how stochastic the underlying latent dynamics are. When the latent state is simple to deduce, distillation works properly, however when it’s extremely stochastic, imitating the latent optimum coverage can really harm efficiency. The outcomes present sensible steerage: one of the best latent coverage isn’t all the time one of the best one to distill, and deciding when to distill versus immediately studying is dependent upon the underlying uncertainty construction of the duty.

A Principled Strategy to Randomized Choice below Uncertainty: Functions to Peer Evaluate and Grant Funding

Authors: Alexander Goldberg (Pc Science Division, Faculty of Pc Science), Giulia Fanti (CMU), Nihar Shah (CMU)

MERIT is a principled framework for utilizing randomized choice in settings like peer overview or grant funding, the place evaluations are noisy and uncertainty could make deterministic rankings unreliable. As a substitute of counting on ad-hoc randomization, MERIT makes use of interval estimates (e.g., confidence intervals) to mannequin uncertainty after which optimizes for the worst-case anticipated variety of true top-k objects chosen. The authors develop a polynomial-time algorithm that scales to giant datasets and present that MERIT satisfies fascinating equity and robustness properties that current strategies lack. Experiments on artificial peer-review information present that MERIT matches prior probabilistic strategies in anticipated efficiency whereas offering stronger ensures in worst-case situations.

OS-Hurt: A Benchmark for Measuring Security of Pc Use Brokers

Authors: Thomas Kuntz (EPFL – EPF Lausanne), Agatha Duzan (EPFL – EPF Lausanne), Hao Zhao (EPFL – EPF Lausanne), Francesco Croce (College of Tübingen), Zico Kolter (Carnegie Mellon College), Nicolas Flammarion (EPFL), Maksym Andriushchenko (ELLIS Institute Tübingen and MPI-IS)

OS-Hurt is a benchmark for evaluating the security of LLM-based laptop use brokers that work together immediately with working system interfaces. OS-Hurt exams brokers throughout three hurt classes—deliberate misuse, immediate injection assaults, and mannequin misbehavior—utilizing 150 duties spanning functions like e mail, browsers, and code editors. An automatic choose evaluates each activity efficiency and security, reaching robust settlement with human annotations. Evaluations of main brokers reveal that fashions usually adjust to unsafe instructions, are weak to immediate injections, and typically take unsafe actions, highlighting the necessity for strong security measures in these methods.

Can We Infer Confidential Properties of Coaching Information from LLMs?

Authors: Pengrun Huang (College of California, San Diego), Chhavi Yadav (CMU), Kamalika Chaudhuri (FAIR, Meta and UCSD), Ruihan Wu (College of California, San Diego)

PropInfer is a benchmark designed to guage whether or not giant language fashions (LLMs) can leak delicate properties of the datasets used for fine-tuning, significantly in domains like healthcare. It exams property inference below each question-answering and chat-completion setups. Two tailor-made assaults—a prompt-based technology assault and a shadow-model assault leveraging phrase frequency—are proposed to extract dataset-level data. Empirical outcomes present that these assaults can succeed throughout a number of pretrained LLMs, revealing an necessary and beforehand underexplored privateness danger.

Debate or Vote: Which Yields Higher Selections in Multi-Agent Giant Language Fashions?

Authors: Hyeong Kyu Choi (College of Wisconsin-Madison, Pc Sciences), Jerry Zhu (Carnegie Mellon College), Sharon Li (College of Wisconsin-Madison)

Multi-Agent Debate (MAD) improves giant language mannequin efficiency by having a number of brokers purpose collaboratively, however its key drivers have been unclear. By separating Majority Voting from inter-agent debate, experiments throughout seven NLP benchmarks present that the majority positive factors come from majority voting reasonably than the controversy itself. A theoretical evaluation fashions debate as a stochastic course of, revealing that debate alone doesn’t enhance anticipated correctness, although focused interventions that bias perception updates can improve its influence. These outcomes counsel that whereas MAD has potential, easy ensembling strategies usually stay a extra dependable and efficient method.

The Complexity of Symmetric Equilibria in Min-Max Optimization and Staff Zero-Sum Video games

Authors: Ioannis Anagnostides (Carnegie Mellon College), Ioannis Panageas (UC Irvine), Tuomas Sandholm (CMU, Technique Robotic, Optimized Markets, Strategic Machine), Jingming Yan (College of California, Irvine)

The examine analyzes the complexity of computing equilibria in team-based zero-sum video games and symmetric min-max optimization. It exhibits that discovering epsilon-Nash equilibria in 3-player adversarial workforce video games (2 vs. 1) is CLS-complete, resolving an open query about such video games. Moreover, computing symmetric equilibria in symmetric min-max issues is PPAD-complete, even for quadratic goals, and this extends to 6-player workforce video games (3 vs. 3), implying that widespread symmetric dynamics can’t reliably converge. Lastly, computing non-symmetric equilibria with polynomial precision is FNP-hard, highlighting the basic issue of equilibrium computation in these settings.

Imply-Subject Sampling for Cooperative Multi-Agent Reinforcement Studying

Authors: Emile Anand (Georgia Institute of Expertise and Cognition Labs), Ishani Karmarkar (Stanford College), Guannan Qu (Carnegie Mellon College)

Scaling multi-agent reinforcement studying (MARL) is troublesome as a result of exponential progress of joint state and motion areas as brokers improve. SUBSAMPLE-MFQ introduces a way that mixes subsampling brokers with mean-field Q-learning and a decentralized randomized coverage, permitting environment friendly studying for any subset of ok brokers. The algorithm’s runtime scales polynomially in ok, not the whole variety of brokers n, making it sensible for big methods. Theoretical ensures present that the discovered coverage converges to the optimum coverage at a price of roughly 1 over root ok, unbiased of the whole agent depend.

On the Hardness of Conditional Independence Testing In Apply

Authors: Zheng He (College of British Columbia), Roman Pogodin (Google), Yazhe Li (Microsoft), Namrata Deka (Carnegie Mellon College), Arthur Gretton (Google Deepmind / UCL), Danica J. Sutherland (College of British Columbia + Amii)

Conditional independence (CI) exams are central to duties like causal discovery and equity analysis, however they usually fail in follow regardless of theoretical ensures. Specializing in the Kernel-based Conditional Independence (KCI) check, the work exhibits that many latest CI exams are particular circumstances of a Generalized Covariance Measure. Sensible efficiency is essentially pushed by errors in estimating the conditional imply, which have an effect on Kind I error, and by the selection of conditioning kernel, which influences check energy however can even inflate false positives. These insights make clear why common CI exams usually underperform and spotlight how cautious kernel and estimation selections are essential for dependable outcomes.

Projection-based Lyapunov methodology for absolutely heterogeneous weakly-coupled MDPs

Authors: Xiangcheng Zhang (Tsinghua), Yige Hong (Carnegie Mellon College), Weina Wang (Pc Science Division, Carnegie Mellon College)

Heterogeneity creates main challenges in large-scale decision-making, particularly in weakly-coupled Markov choice processes (WCMDPs) the place every subproblem has distinct dynamics. Within the absolutely heterogeneous setting, the authors present that an effectively computable coverage can obtain an O(1/root N) optimality hole in long-run common reward per subproblem because the variety of subproblems N grows. This work gives the primary asymptotic optimality assure for absolutely heterogeneous average-reward WCMDPs. Key to this result’s a novel use of projection-based Lyapunov features that guarantee convergence of rewards and prices even below full heterogeneity.

Net-Shepherd: Advancing PRMs for Reinforcing Net Brokers

Authors: Hyungjoo Chae (Georgia Institute of Expertise), Seonghwan Kim (Yonsei College), Junhee Cho (Yonsei College), Seungone Kim (Carnegie Mellon College), Seungjun Moon (Yonsei College), Gyeom Hwangbo (College of Seoul), Dongha Lim (Korea Superior Institute of Science & Expertise), Minjin Kim (Yonsei College), Yeonjun Hwang (Yonsei College), Minju Gwak (Yonsei College), Dongwook Choi (Chung-Ang College), Minseok Kang (Yonsei College), Gwanhoon Im (Yonsei College), ByeongUng Cho (Yonsei College), Hyojun Kim (Yonsei College), Jun Han (Yonsei College), Taeyoon Kwon (Yonsei College), Minju Kim (Yonsei College), Beong-woo Kwak (Yonsei College), Dongjin Kang (Yonsei College), Jinyoung Yeo (Yonsei College)

Net navigation poses a long-horizon sequential decision-making problem that goes past typical multimodal LLM duties, however step-level reward fashions have been missing. Net-Shepherd, the primary course of reward mannequin (PRM) for internet navigation, evaluates trajectories at every step, enabling each coaching and test-time evaluation. The method is supported by the WebPRM Assortment, a 40K step-level dataset with annotated choice pairs, and WebRewardBench, a benchmark for evaluating PRMs. Experiments present Net-Shepherd outperforms GPT-4o by ~30 factors on WebRewardBench and improves coverage efficiency on WebArena-lite by 10.9 factors whereas decreasing verification price by 10×, demonstrating a sensible and environment friendly answer for internet navigation duties.

Truthful Cooperation in Blended-Motive Video games through Battle-Conscious Gradient Adjustment

Authors: Woojun Kim (Carnegie Mellon College), Katia Sycara (Carnegie Mellon College)

Blended-motive multi-agent reinforcement studying requires balancing particular person incentives with collective objectives, which are sometimes in battle. The proposed adaptive conflict-aware gradient adjustment methodology dynamically balances coverage gradients from particular person and collective goals, selling cooperation whereas preserving equity in task-specific rewards. Theoretical evaluation ensures monotonic enchancment in each collective and particular person outcomes, guaranteeing equity throughout brokers. Experiments in sequential social dilemma environments present that this method outperforms baselines in social welfare whereas sustaining equitable outcomes for all brokers.

Poster Papers

Functions

MLZero: A Multi-Agent System for Finish-to-end Machine Studying Automation

Authors: Haoyang Fang (AWS), Boran Han (AWS), Nick Erickson (Amazon Net Companies), Xiyuan Zhang (AWS AI), Su Zhou (Carnegie Mellon College), Anirudh Dagar (AWS), Jiani Zhang (Google), Caner Turkmen (Amazon Net Companies), Tony Hu (AWS AI), Huzefa Rangwala (George Mason College), Ying Nian Wu (College of California, Los Angeles), Yuyang (Bernie) Wang (AWS AI), George Karypis (College of Minnesota, Minneapolis)

Meta-Studying an In-Context Transformer Mannequin of Human Greater Visible Cortex

Authors: Muquan Yu (Chinese language College of Hong Kong), Mu Nan (College of Hong Kong), Hossein Adeli (Columbia College), Jacob Prince (Harvard College), John A. Pyles (College of Washington), Leila Wehbe (Carnegie Mellon College), Maggie Henderson (Carnegie Mellon College), Michael Tarr (Carnegie Mellon College), Andrew Luo (College of Hong Kong)

Topology-Conscious Conformal Prediction for Stream Networks

Authors: Jifan Zhang (Northwestern College), Fangxin Wang (College of Illinois at Chicago), Zihe Tune (College of Illinois at Chicago), Philip S Yu (UIC), Kaize Ding (Northwestern College), Shixiang Zhu (Carnegie Mellon College)

ChemOrch: Empowering LLMs with Chemical Intelligence through Groundbreaking Artificial Directions

Authors: Yue Huang (College of Notre Dame ), Zhengzhe Jiang (Sichuan College), Xiaonan Luo (College of Notre Dame), Kehan Guo (college of notre dame), Haomin Zhuang (College of Notre Dame), Yujun Zhou (College of Notre Dame), Zhengqing Yuan (College of Notre Dame), Xiaoqi Solar (Massachusetts Institute of Expertise), Jules Schleinitz (California Institute of Expertise), Yanbo Wang (Mohamed bin Zayed College of Synthetic Intelligence), Shuhao Zhang (Carnegie Mellon College), Mihir Surve (College of Notre Dame), Nitesh Chawla (College of Notre Dame), Olaf Wiest (College of Notre Dame), Xiangliang Zhang (College of Notre Dame)

LIMOPro: Reasoning Refinement for Environment friendly and Efficient Take a look at-time Scaling

Authors: Yang Xiao (Hong Kong Polytechnic College), Jiashuo WANG (HKPU), Ruifeng Yuan (Hong Kong Polytechnic College), Chunpu Xu (Hong Kong Polytechnic College), Kaishuai Xu (Hong Kong Polytechnic College), Wenjie Li (The Hong Kong Polytechnic College), Pengfei Liu (Carnegie Mellon College)

Retrieval is Not Sufficient: Enhancing RAG via Take a look at-Time Critique and Optimization

Authors: Jiaqi Wei (Zhejiang College), Hao Zhou (South China College of Expertise), Xiang Zhang (College of British Columbia), Di Zhang (Shanghai Synthetic Intelligence Laboratory), Zijie Qiu (Fudan College), Noah Wei (Carnegie Mellon College), Jinzhe Li (Fudan College), Wanli Ouyang (Shanghai AI Lab), Siqi Solar (Fudan College)

MMAR: A Difficult Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Combine

Authors: Ziyang Ma (Shanghai Jiao Tong College), Yinghao Ma (Centre for Digital Music, Queen Mary College of London), Yanqiao Zhu (Shanghai Jiaotong College), Chen Yang (Shanghai Jiaotong College), Yi-Wen Chao (Nanyang Technological College), Ruiyang Xu (Shanghai Jiaotong College), Wenxi Chen (Shanghai Jiaotong College), Yuanzhe Chen (ByteDance Inc.), Zhuo Chen (ByteDance Inc.), Jian Cong (ByteDance Inc.), Kai Li (Tsinghua College, Tsinghua College), Keliang Li (, Chinese language Academy of Sciences), Siyou Li (Queen Mary College of London), Xinfeng Li (Nanyang Technological College), Xiquan Li (Shanghai Jiaotong College), Zheng Lian (Institute of automation, Chinese language academy of science, Chinese language Academy of Sciences), Yuzhe Liang (Shanghai Jiaotong College), Minghao Liu (2077AI), Zhikang Niu (Shanghai Jiaotong College), Tianrui Wang (Tianjin College), Wang Yuping (College of Science and Expertise of China), Yuxuan Wang (ByteDance), Yihao Wu (Nanyang Technological College), Guanrou Yang (Shanghai Jiaotong College), Jianwei Yu (Microsoft), Ruibin Yuan (Carnegie Mellon College), Zhisheng Zheng (College of Texas at Austin), Ziya Zhou (Hong Kong College of Science and Expertise), Haina Zhu (Shanghai Jiaotong College), Wei Xue (Hong Kong College of Science and Expertise), Emmanouil Benetos (Queen Mary College of London), Kai Yu (Shanghai Jiao Tong College), Eng-Siong Chng (Nanyang Technological College), Xie Chen (Shanghai Jiaotong College)

A Generalist Intracortical Motor Decoder

Authors: Joel Ye (Carnegie Mellon College), Fabio Rizzoglio (Northwestern College), Xuan Ma (Northwestern College), Adam Smoulder (CMU, Carnegie Mellon College), Hongwei Mao (College of Pittsburgh), Gary Blumenthal (College of Pittsburgh), William Hockeimer (College of Pittsburgh), Nicolas Kunigk (College of Pittsburgh), Dalton Moore (College of Chicago), Patrick Marino (Phantom Neuro), Raeed Chowdhury (None), J. Patrick Mayo (College of Pittsburgh), Aaron Batista (College of Pittsburgh), Steven Chase (None), Michael Boninger (College of Pittsburgh), Charles Greenspon (College of Chicago), Andrew B Schwartz (College of Pittsburgh), Nicholas Hatsopoulos (College of Chicago), Lee Miller (Northwestern College at Chicago), Kristofer Bouchard (Lawrence Berkeley Nationwide Laboratory), Jennifer Collinger (College of Pittsburgh), Leila Wehbe (Carnegie Mellon College), Robert Gaunt (College of Pittsburgh)

Evaluating Generalization Capabilities of LLM-Based mostly Brokers in Blended-Motive Situations Utilizing Concordia

Authors: Chandler Smith (Oxford College), Marwa Abdulhai (College of California, Berkeley), Manfred Díaz (Mila, Quebec), Marko Tesic (College of Cambridge), Rakshit Trivedi (Massachusetts Institute of Expertise), Sasha Vezhnevets (DeepMind), Lewis Hammond (College of Oxford / Cooperative AI Basis), Jesse Clifton (Heart on Lengthy-Time period Threat), Minsuk Chang (Google Deepmind), Edgar Duenez-Guzman (Google DeepMind), John Agapiou (Google DeepMind), Jayd Matyas (DeepMind), Danny Karmon (Google DeepMind), Beining Zhang (College of Southampton ), Jim Dilkes (College of Southampton), Akash Kundu (Heritage Institute of Expertise), Hieu Minh Nguyen (Aside Analysis), Emanuel Tewolde (Carnegie Mellon College), Jebish Purbey (Tribhuvan College), Ram Mohan Rao Kadiyala (), Siddhant Gupta (Indian Institute of Expertise, Roorkee), Aliaksei Korshuk (Coframe), Buyantuev Alexander (Greater Faculty of Economics), Ilya Makarov (AIRI & ISP RAS), Gang Zhao (Shanghai Analysis Institute for Clever Autonomous Techniques, Tongji College), Rolando Fernandez (College of Texas at Austin), Zhihan Wang (College of Texas at Austin), Caroline Wang (The College of Texas at Austin | Google DeepMind), Jiaxun Cui (Meta), Lingyun Xiao (College of Texas at Austin), Di Shi (College of Texas at Austin), Yoonchang Sung (Nanyang Technological College), Muhammad Arrasy Rahman (The College of Texas at Austin), Peter Stone (The College of Texas at Austin, Sony AI), Yipeng Kang (Nationwide Key Laboratory of Normal Synthetic Intelligence), Hyeonggeun Yun (Companoid Labs), Ananya Ananya (Stanford College), Taehun Cha (Korea College), Zhiqiang Wu (Tongji College), Elizaveta Tennant (College School London), Olivia Macmillan-Scott (UCL), Marta Segura (College School London, College of London), Diana Riazi (Division of Pc Science, College School London, College of London), Fuyang Cui (College of Toronto), Sriram Ganapathi (College of Waterloo), Toryn Klassen (College of Toronto), Nico Schiavone (College of Toronto), Mogtaba Alim (College of Toronto), Sheila McIlraith (College of Toronto and Vector Institute), Manuel Rios (Universidad de los Andes), Oswaldo Peña (Universidad Nacional de Colombia), Carlos Rojas (Grupo Bancolombia), Manuela Chacon-Chamorro (Universidad de los Andes), Rubén Manrique (Universidad de Los Andes), Luis Felipe Giraldo (Universidad de Los Andes), Nicanor Quijano (Universidad de Los Andes), Yiding Wang (Peking College), Yuxuan Chen (the College of Hong Kong, College of Hong Kong), Fangwei Zhong (Beijing Regular College), Mengmeng Wang (State Key Laboratory of Normal Synthetic Intelligence), Wenming Tu (Shanghai Jiaotong College), Zhaowei Zhang (Peking College), Ziang Chen (Tsinghua College, Tsinghua College), Zixia Jia (BigAI), Xue Feng (BIGAI), Zilong Zheng (Beijing Institute for Normal Synthetic Intelligence), Chichen Lin (), Weijian Fan (Communication College of China), Chenao Liu (Communication College of China), Sneheel Sarangi (New York College Abu Dhabi), Ziyan Wang (King’s School London; Microsoft Analysis), shuqing shi (Kings School London), Yali Du (King‘s School London), Avinaash Anand Kulandaivel (None), Yang Liu (BIGAI), Wu Ruiyang (Communication College of China), Chetan Talele (None), 陆孙嘉 (Communication College of China), Gema Parreno (–), Shamika Dhuri (Carnegie Mellon College), Bain McHale (CMU, Carnegie Mellon College), Tim Baarslag (Centrum Wiskunde & Informatica / Eindhoven College of Expertise), Dylan Hadfield-Menell (MIT), Natasha Jaques (College of Washington, Google DeepMind), José Hernández-Orallo (Universitat Politècnica de València), Joel Leibo (DeepMind)

Pc Imaginative and prescient

Grounded Reinforcement Studying for Visible Reasoning

Authors: Gabriel Sarch (Princeton College), Snigdha Saha (Google), Naitik Khandelwal (Carnegie Mellon College), Ayush Jain (CMU, Carnegie Mellon College), Michael Tarr (Carnegie Mellon College), Aviral Kumar (Carnegie Mellon College), Katerina Fragkiadaki (Carnegie Mellon College)

COS3D: Collaborative Open-Vocabulary 3D Segmentation

Authors: Runsong Zhu (The Chinese language College of Hong Kong), Ka-Hei Hui (Autodesk), Zhengzhe Liu (Carnegie Mellon College), Qianyi Wu (Monash College), Weiliang Tang (The Chinese language College of Hong Kong), Shi Qiu (The Chinese language College of Hong Kong), Pheng-Ann Heng (The Chinese language College of Hong Kong), Chi-Wing Fu (The Chinese language College of Hong Kong)

OmniBench: In the direction of The Way forward for Common Omni-Language Fashions

Authors: Yizhi Li (The College of Manchester), Ge Zhang (College of Michigan – Ann Arbor), Yinghao Ma (Centre for Digital Music, Queen Mary College of London), Ruibin Yuan (Carnegie Mellon College), Zhu (Guangdong OPPO Cellular Telecommunications Corp.,Ltd.), Hangyu Guo (Alibaba Group), Yiming Liang (College of the Chinese language Academy of Sciences), Jiaheng Liu (Nanjing College), Noah Wang (), Jian Yang (Alibaba Group), Siwei Wu (Nanjing College of Science and Expertise), Xingwei Qu (College of Manchester), Jinjie Shi (Queen Mary, College of London), Xinyue Zhang (Nationwide College of Singapore), Zhenzhu Yang (China College of Geoscience Beijing), Yidan WEN (Northwest Polytechnical College Xi’an), Yanghai Wang (nanjing college), Shihao Li (nanjing college), ZHAO-XIANG ZHANG (Chinese language Academy of Sciences, China), Ruibo Liu (Google DeepMind), Emmanouil Benetos (Queen Mary College of London), Wenhao Huang (Key Laboratory of Machine Notion), Chenghua Lin (College of Manchester)

UFM: A Easy Path in direction of Unified Dense Correspondence with Stream

Authors: Yuchen Zhang (Carnegie Mellon College), Nikhil Keetha (Carnegie Mellon College), Chenwei Lyu (TikTok Inc.), Bhuvan Jhamb (CMU, Carnegie Mellon College), Yutian Chen (Carnegie Mellon College), Yuheng Qiu (Carnegie Mellon College), Jay Karhade (CMU, Carnegie Mellon College), Shreyas Jha (Nissan Superior Expertise Heart), Yaoyu Hu (Carnegie Mellon College), Deva Ramanan (Carnegie Mellon College), Sebastian Scherer (Carnegie Mellon College), Wenshan Wang (Faculty of Pc Science, Carnegie Mellon College)

HoliGS: Holistic Gaussian Splatting for Embodied View Synthesis

Authors: Xiaoyuan Wang (Carnegie Mellon College), Yizhou Zhao (Carnegie Mellon College), Botao Ye (ETH Zurich), Shan Xiaojun (), Weijie Lyu (College of California, Merced), Lu Qi (College of California, Merced), Kelvin Chan (Nanyang Technological College), Yinxiao Li (Google Deepmind), Ming-Hsuan Yang (Google / UC Merced)

MMPerspective: Do MLLMs Perceive Perspective? A Complete Benchmark for Perspective Notion, Reasoning, and Robustness

Authors: Yunlong Tang (College of Rochester), Pinxin Liu (College of Rochester), Mingqian Feng (College of Rochester), Zhangyun Tan (College of Rochester), Rui Mao (College of Rochester), Chao Huang (Division of Pc Science, College of Rochester), Jing Bi (College of Rochester), Yunzhong Xiao (Carnegie Mellon College), Susan Liang (College of Rochester), Cling Hua (College of Rochester), Ali Vosoughi (College of Rochester), Luchuan Tune (College of Rochester), Zeliang Zhang (College of Rochester), Chenliang Xu (College of Rochester)

CAT: Content material-Adaptive Picture Tokenization

Authors: Junhong Shen (Carnegie Mellon College), Kushal Tirumala (Meta AI Analysis, FAIR), Michihiro Yasunaga (Stanford College), Ishan Misra (Fb AI Analysis), Luke Zettlemoyer (College of Washington; Meta), LILI YU (Meta), Chunting Zhou (FAIR)

OSCAR: One-Step Diffusion Codec Throughout A number of Bit-rates

Authors: Jinpei Guo (Shanghai Jiaotong College), Yifei Ji (Shanghai Jiaotong College), Zheng Chen (Shanghai Jiao Tong College), Kai Liu (Shanghai Jiaotong College), Min Liu (Skild AI), Wang Rao (Carnegie Mellon College), Wenbo Li (JD Pleasure Future Academy), Yong Guo (Max Planck Institute for Informatics), Yulun Zhang (Shanghai Jiao Tong College)

Salient Idea-Conscious Generative Information Augmentation

Authors: Tianchen Zhao (Amazon), Xuanbai Chen (Carnegie Mellon College), Zhihua Li (Amazon), Jun Fang (Amazon AGI), DONGSHENG An (State College of New York, Stony Brook), Xiang Xu (Amazon), Zhuowen Tu (College of California, San Diego), Yifan Xing (Amazon)

Information-centric AI

ORBIT – Open Advice Benchmark for Reproducible Analysis with Hidden Assessments

Authors: Jingyuan He (Faculty of Pc Science, Carnegie Mellon College), Jiongnan Liu (None), Vishan Oberoi (Carnegie Mellon College), Bolin Wu (Carnegie Mellon College), Mahima Jagadeesh Patel (Carnegie Mellon College), Kangrui Mao (Carnegie Mellon College), Chuning Shi (CMU, Carnegie Mellon College), I-Ta Lee (Meta Platform Inc.), Arnold Overwijk (Meta), Chenyan Xiong (Faculty of Pc Science, Carnegie Mellon College)

The Frequent Pile v0.1: An 8TB Dataset of Public Area and Overtly Licensed Textual content

Authors: Nikhil Kandpal (Division of Pc Science), Brian Lester (Google DeepMind/College of Toronto), Colin Raffel (College of Toronto, Vector Institute and Hugging Face), Sebastian Majstorovic (EleutherAI), Stella Biderman (The Eleutherai Institute), Baber Abbasi (EleutherAI), Luca Soldaini (Allen Institute for AI), Enrico Shippole (Teraflop AI), A. Feder Cooper (Stanford College), Aviya Skowron (EleutherAI), Shayne Longpre (Massachusetts Institute of Expertise), Lintang Sutawika (Carnegie Mellon College), Alon Albalak (Lila Sciences), Zhenlin Xu (Boson AI), Guilherme Penedo (HuggingFace), Loubna Ben allal (Hugging Face), Elie Bakouch (Hugging Face), John Pressman (EleutherAI Institute), Honglu Fan (Google DeepMind), Dashiell Stander (EleutherAI), Guangyu Tune (EleutherAI), Aaron Gokaslan (MBZUAI Institute of Basis Fashions), John Kirchenbauer (College of Maryland, School Park), Tom Goldstein (College of Maryland), Brian Bartoldson (Lawrence Livermore Nationwide Laboratory), Bhavya Kailkhura (Lawrence Livermore Nationwide Laboratory), Tyler Murray (Allen Institute for Synthetic Intelligence)

DATE-LM: Benchmarking Information Attribution Analysis for Giant Language Fashions

Authors: Cathy Jiao (Carnegie Mellon College), Yijun Pan (Yale College), Emily Xiao (Carnegie Mellon College), Daisy Sheng (Carnegie Mellon College), Niket Jain (Carnegie Mellon College), Hanzhang Zhao (CMU, Carnegie Mellon College), Ishita Dasgupta (Faculty of Pc Science, Carnegie Mellon College), Jiaqi Ma (College of Illinois Urbana-Champaign), Chenyan Xiong (Faculty of Pc Science, Carnegie Mellon College)

Devoted Group Shapley Worth

Authors: Kiljae Lee (The Ohio State College), Ziqi Liu (Carnegie Mellon College), Weijing Tang (Carnegie Mellon College), Yuan Zhang (Ohio State College, Columbus)

What’s Your Information Value to GPT? LLM-Scale Information Valuation with Affect Features

Authors: Sang Choe (Anthropic), Hwijeen Ahn (Carnegie Mellon College), Juhan Bae (Anthropic), Kewen Zhao (Faculty of Pc Science, Carnegie Mellon College), Youngseog Chung (CMU, Carnegie Mellon College), Adithya Pratapa (Carnegie Mellon College, Amazon), Willie Neiswanger (USC), Emma Strubell (Carnegie Mellon College), Teruko Mitamura (Carnegie Mellon College), Jeff Schneider (CMU), Eduard Hovy (Carnegie Mellon College), Roger Grosse (College of Toronto), Eric Xing (CMU/MBZUAI/GenBio)

Deep Studying

Outcomes of the Large ANN: NeurIPS’23 competitors

Authors: Harsha Vardhan simhadri (Microsoft ), Martin Aumüller (IT College of Copenhagen), Matthijs Douze (Fb AI Analysis), Dmitry Baranchuk (Yandex), Amir Ingber (Pinecone), Edo Liberty (Yale College), George Williams (Ansible AI), Ben Landrum (Cornell College), Magdalen Manohar (Carnegie Mellon College), Mazin Karjikar (College of Maryland, School Park), Laxman Dhulipala (UMD), Meng Chen (Fudan College), Yue Chen (Fudan College), Rui Ma (Fudan College), Kai Zhang (Fudan College), Yuzheng Cai (Fudan College), Jiayang Shi (Fudan College), Weiguo Zheng (Fudan College), Yizhuo Chen (Fudan College), Jie Yin (Tencent), Ben Huang (Baidu)

GOOD: Coaching-Free Guided Diffusion Sampling for Out-of-Distribution Detection

Authors: Xin Gao (Fudan College), Jiyao Liu (Fudan College), Guanghao Li (Fudan College), Yueming Lyu (Nanjing college), Jianxiong Gao (None), Weichen Yu (Carnegie Mellon College), Ningsheng Xu (Fudan College), Liang Wang (NLPR, China), Caifeng Shan (Nanjing College), Ziwei Liu (Nanyang Technological College), Chenyang Si (Sea AI Lab)

Reasoning Fashions Higher Categorical Their Confidence

Authors: Dongkeun Yoon (KAIST), Seungone Kim (Carnegie Mellon College), Sohee Yang (College School London, College of London), Sunkyoung Kim (LG AI Analysis), Soyeon Kim (LG Company), Yongil Kim (LG Company), Eunbi Choi (LG AI Analysis), Yireun Kim (LG AI Analysis), Minjoon Search engine marketing (KAIST)

Normal Machine Studying

Mitra: Blended Artificial Priors for Enhancing Tabular Basis Fashions

Authors: Xiyuan Zhang (AWS AI), Danielle Maddix Robinson (AWS AI Labs), Junming Yin (Amazon), Nick Erickson (Amazon Net Companies), Abdul Fatir Ansari (Amazon), Boran Han (AWS), Shuai Zhang (AWS AI), Leman Akoglu (CMU), Christos Faloutsos (CMU), Michael Mahoney (UC Berkeley), Tony Hu (AWS AI), Huzefa Rangwala (George Mason College), George Karypis (College of Minnesota, Minneapolis), Yuyang (Bernie) Wang (AWS AI)

Optimization

A Past-Worst-Case Evaluation of Grasping k-means++

Authors: Qingyun Chen (College of California, Santa Cruz), Sungjin Im (College of California, Santa Cruz), Ben Moseley (Carnegie Mellon College), Ryan Milstrey (College of California, Merced), Chenyang Xu (Zhejiang College), Ruilong Zhang (Technische Universität München)

Probabilistic Strategies

Reinforcement Studying

MyoChallenge 2024: A New Benchmark for Physiological Dexterity and Agility in Bionic People

Authors: Huiyi Wang (McGill College), Chun Kwang Tan (Northeastern College), Balint Hodossy (Imperial School London), Shirui Lyu (King’s School London, College of London), Pierre Schumacher (Max Planck Institute for Clever Techniques, Max-Planck Institute), James Heald (College School London, College of London), Kai Biegun (College School London, College of London), Samo Hromadka (Gatsby Computational Neuroscience Unit), Maneesh Sahani (Gatsby Unit, UCL), Gunwoo Park (KAIST), Beomsoo Shin (KAIST), JongHyeon Park (None), Seungbum Koo (KAIST), Chenhui Zuo (Tsinghua College, Tsinghua College), Chengtian Ma (Tsinghua College, Tsinghua College), Yanan Sui (Tsinghua College), Nick Hansen (UC San Diego), Stone Tao (College of California – San Diego), Yuan Gao (Carnegie Mellon College), Hao Su (UCSD), Seungmoon Tune (Stanford College), Letizia Gionfrida (King’s School London, College of London), Massimo Sartori (College of Twente), Guillaume Durandau (McGill College), Vikash Kumar (CMU / MyoLab), Vittorio Caggiano (MyoSuite)

Reasoning as an Adaptive Protection for Security

Authors: Taeyoun Kim (Carnegie Mellon College), Fahim Tajwar (Carnegie Mellon College), Aditi Raghunathan (Carnegie Mellon College), Aviral Kumar (Carnegie Mellon College)

Compute-Optimum Scaling for Worth-Based mostly Deep RL

Authors: Preston Fu (College of California, Berkeley), Oleh Rybkin (College of California, Berkeley), Zhiyuan (Paul) Zhou (UC Berkeley, PI), Michal Nauman (College of Warsaw), Pieter Abbeel (UC Berkeley & Amazon), Sergey Levine (UC Berkeley), Aviral Kumar (Carnegie Mellon College)

TheAgentCompany: Benchmarking LLM Brokers on Consequential Actual World Duties

Authors: Frank (Fangzheng) Xu (Microsoft AI), Yufan Tune (Carnegie Mellon College), Boxuan Li (Microsoft), Yuxuan Tang (Oracle), Kritanjali Jain (Faculty of Pc Science, Carnegie Mellon College), Mengxue Bao (Tiktok), Zora Wang (Carnegie Mellon College), Xuhui Zhou (CMU, Carnegie Mellon College), Zhitong Guo (Meta), Murong Cao (College of Hong Kong), Mingyang Yang (Carnegie Mellon College), Hao Yang Lu (Carnegie Mellon College), Amaad Martin (Faculty of Pc Science, Carnegie Mellon College), Zhe Su (Carnegie Mellon College), Leander Maben (CMU, Carnegie Mellon College), Raj Mehta (Carnegie Mellon College), Wayne Chi (Carnegie Mellon College), Lawrence Jang (Carnegie Mellon College), Yiqing Xie (Carnegie Mellon College), Shuyan Zhou (Fb), Graham Neubig (Carnegie Mellon College)

Adaptively Coordinating with Novel Companions through Discovered Latent Methods

Authors: Benjamin Li (Carnegie Mellon College), Shuyang Shi (Faculty of Pc Science, Carnegie Mellon College), Lucia Romero (College of Pittsburgh), Huao Li (Massachusetts Institute of Expertise), Yaqi Xie (CMU), Woojun Kim (Carnegie Mellon College), Stefanos Nikolaidis (College of Southern California), Charles Lewis (College of Pittsburgh), Katia Sycara (Carnegie Mellon College), Simon Stepputtis (Virginia Polytechnic Institute and State College)

Scaling Offline RL through Environment friendly and Expressive Shortcut Fashions

Authors: Nicolas Espinosa-Cube (Cornell College), Yiyi Zhang (Cornell College), Yiding Chen (Cornell College), Bradley Guo (Cornell College), Owen Oertell (Cornell College), Gokul Swamy (Carnegie Mellon College), Kianté Brantley (Kempner and SEAS at Harvard College), Wen Solar (Cornell College and Databricks)

Considering vs. Doing: Enhancing Agent Reasoning by Scaling Take a look at-Time Interplay

Authors: Junhong Shen (Carnegie Mellon College), Hao Bai (College of Illinois at Urbana-Champaign), Lunjun Zhang (College of Toronto), Yifei Zhou (College of California, Berkeley), Amrith Setlur (Carnegie Mellon College), Peter Tong (New York College), Diego Caples (AGI, Inc.), Nan Jiang (College of Illinois at Urbana-Champaign), Tong Zhang (UIUC), Ameet Talwalkar (CMU, Datadog), Aviral Kumar (Carnegie Mellon College)

Social Features

Struct-Bench: A Benchmark for Differentially Personal Structured Textual content Era

Authors: Shuaiqi Wang (CMU, Carnegie Mellon College), Vikas Raunak (Google DeepMind), Arturs Backurs (TTIC), Victor Reis (Microsoft), Pei Zhou (College of Southern California), Sihao Chen (Microsoft), Longqi Yang (Microsoft), Zinan Lin (Microsoft Analysis), Sergey Yekhanin (Microsoft), Giulia Fanti (CMU)

Validating LLM-as-a-Decide Techniques below Ranking Indeterminacy

Authors: Luke Guerdan (Carnegie Mellon College), Solon Barocas (Microsoft Analysis; Cornell College), Kenneth Holstein (Carnegie Mellon College), Hanna Wallach (Microsoft), Steven Wu (Carnegie Mellon College), Alex Chouldechova (Microsoft)

Legitimate Inference with Imperfect Artificial Information

Authors: Yewon Byun (Carnegie Mellon College), Shantanu Gupta (Carnegie Mellon College), Zachary Lipton (Carnegie Mellon College / Abridge), Rachel Childers (College of Zurich), Bryan Wilder (Carnegie Mellon College)

Personal Evolution Converges

Authors: Tomás González Lara (Carnegie Mellon College), Giulia Fanti (CMU), Aaditya Ramdas (Carnegie Mellon College)

Principle

Uncategorized

SuperGPQA: Scaling LLM Analysis throughout 285 Graduate Disciplines

Authors: Xeron Du (01.AI), Yifan Yao (Beijing College of Posts and Telecommunications), Kaijing Ma (Tongji College), Bingli Wang (Sichuan Agricultural College), Tianyu Zheng (Beijing College of Posts and Telecommunications), Zhu (Guangdong OPPO Cellular Telecommunications Corp.,Ltd.), Minghao Liu (2077AI), Yiming Liang (College of the Chinese language Academy of Sciences), Xiaolong Jin (Purdue College), Zhenlin Wei (Harbin Engineering College), Chujie Zheng (Tsinghua College), Kaixin Deng (Hokkaido College), Shuyue Guo (Beijing College of Posts and Telecommunications), Shian Jia (Zhejiang College), Sichao Jiang (zhejiang college), Yiyan Liao (Peking College), Rui Li (Peking College), Qinrui Li (Cornell College), Sirun Li (Peking College), Yizhi Li (The College of Manchester), Yunwen Li (Chinese language College of Hong Kong(shenzhen)), Dehua Ma (Beijing College of Posts and Telecommunications), Yuansheng Ni (College of Waterloo), Haoran Que (Beijing College of Aeronautics and Astronautics), Qiyao Wang (henzhen Institute of Superior Expertise, Chinese language Academy of Sciences), Zhoufutu Wen (ByteDance Inc.), Siwei Wu (Nanjing College of Science and Expertise), Tianshun Xing (Beijing College of Posts and Telecommunications), 明许 (01.AI), Zhenzhu Yang (China College of Geoscience Beijing), Noah Wang (), Junting Zhou (Peking College), yuelin bai (Shenzhen Institutes of Superior Expertise, Chinese language Academy of Sciences, Chinese language Academy of Sciences), Xingyuan Bu (Alibaba Group), chenglin cai (Huawei Applied sciences Ltd.), Liang Chen (Peking College), Yifan Chen (ByteDance Inc.), Cheng Chengtuo (Zhejiang College), Tianhao Cheng (Fudan College), Keyi Ding (2077AI), Siming Huang (College of Melbourne), HUANG YUN (nationwide college of singaore, Nationwide College of Singapore), Yaoru Li (Zhejiang College), Yizhe Li (Zhejiang College), Zhaoqun Li (Zhejiang College), Tianhao Liang (Zhejiang College), Chengdong Lin (Hangzhou Dianzi College), Hongquan Lin (College of Science and Expertise of China), Yinghao Ma (Centre for Digital Music, Queen Mary College of London), Zhongyuan Peng (Fudan College), Zifan Peng (The Hong Kong College of Science and Expertise (Guangzhou)), Qige Qi (ByteDance Inc.), Shi Qiu (Peking College), Xingwei Qu (College of Manchester), Shanghaoran Quan (Alibaba Group), Yizhou Tan (Harvard College), Zili Wang (stepfun), 王晨清 (abaka), Hao Wang (Beijing College of Aeronautics and Astronautics), Yiya Wang (Peking College), Yubo Wang (College of Waterloo), Jiajun Xu (Fb), Kexin Yang (Alibaba Group), Ruibin Yuan (Carnegie Mellon College), Yuanhao Yue (Fudan College), Tianyang Zhan (ByteDance Inc.), Chun Zhang (ByteDance Inc.), Jinyang Zhang (Peking College), Xiyue Zhang (Peking College), Owen Zhang (Division of Pc Science, Princeton College), Yue Zhang (Suzhou College), Yongchi Zhao (Alibaba Group), Xiangyu Zheng (Fudan College), ChenghuaZhong (College of Science and Expertise Beijing), Yang Gao (Nanjing College), Zhoujun Li (Beijing College of Aeronautics and Astronautics), Dayiheng Liu (Alibaba Group), Qian Liu (TikTok (Singapore)), Tianyu Liu (Alibaba), Shiwen Ni (Shenzhen Institutes of Superior Expertise, Chinese language Academy of Sciences), Junran Peng (Institute of automation, Chinese language academy of science), Yujia Qin (Bytedance), Wenbo Su (Alibaba Group), Guoyin Wang (Alibaba Qwen Pilot), Shi Wang (Institute of Computing Science, Chinese language Academy of Sciences), Jian Yang (Alibaba Group), Min Yang (Shenzhen Institutes of Superior Expertise, Chinese language Academy of Sciences, Chinese language Academy of Sciences), Meng Cao (Mohamed bin Zayed College of Synthetic Intelligence), Xiang Yue (Carnegie Mellon College), ZHAO-XIANG ZHANG (Chinese language Academy of Sciences, China), Wangchunshu Zhou (Guangdong OPPO Cellular Telecommunications Corp.,Ltd.), Jiaheng Liu (Nanjing College), Qunshu Lin (Abaka AI), Wenhao Huang (Key Laboratory of Machine Notion), Ge Zhang (College of Michigan – Ann Arbor)

Security Pretraining: Towards the Subsequent Era of Protected AI

Authors: Pratyush Maini (Carnegie Mellon College/ DatologyAI), Sachin Goyal (CMU, Carnegie Mellon College), Dylan Sam (OpenAI, Carnegie Mellon College), Alexander Robey (Carnegie Mellon College), Yash Savani (Carnegie Mellon College), Yiding Jiang (Google Deepmind), Andy Zou (CMU, Grey Swan AI), Matt Fredrikson (CMU), Zachary Lipton (Carnegie Mellon College / Abridge), Zico Kolter (Carnegie Mellon College)

A Technical Report on “Erasing the Invisible”: The 2024 NeurIPS Competitors on Stress Testing Picture Watermarks

Authors: Mucong Ding (Division of Pc Science, College of Maryland, School Park), Bang An (College of Maryland, School Park), Tahseen Rabbani (College of Chicago), Chenghao Deng (College of Maryland), Anirudh Satheesh (College of Maryland, School Park), Souradip Chakraborty (College of Maryland, School Park), Mehrdad Saberi (Division of Pc Science, College of Maryland, School Park), Yuxin Wen (College of Maryland), Kyle Sang (College of Maryland), Aakriti Agrawal (College of Maryland, School Park), Xuandong Zhao (UC Berkeley), Mo Zhou (Johns Hopkins College), Mary-Anne Hartley (EPFL), Lei Li (Carnegie Mellon College), Yu-Xiang Wang (UCSD), Vishal Patel (Johns Hopkins College), Soheil Feizi (College of Maryland), Tom Goldstein (College of Maryland), Furong Huang (College of Maryland)

Safety Challenges in AI Agent Deployment: Insights from a Giant Scale Public Competitors

Authors: Andy Zou (CMU, Grey Swan AI), Maxwell Lin (College of California, Berkeley), Eliot Jones (Grey Swan), Micha Nowak (Bayerische Julius-Maximilians-Universität Würzburg), Mateusz Dziemian (Unbiased), Nick Winter (Grey Swan AI), Valent Nathanael (Grey Swan AI), Ayla Croft (Grey Swan AI), Xander Davies (College of Oxford), Jai Patel (UK AI Safety Institute), Robert Kirk (College School London), Yarin Gal (College of Oxford), Dan Hendrycks (Heart for AI Security), Zico Kolter (Carnegie Mellon College), Matt Fredrikson (CMU)

Antidistillation Sampling

Authors: Yash Savani (Carnegie Mellon College), Asher Trockman (CMU), Zhili Feng (OpenAI), Yixuan Xu (Carnegie Mellon College), Avi Schwarzschild (Carnegie Mellon College), Alexander Robey (Carnegie Mellon College), Marc Finzi (Carnegie Mellon College), Zico Kolter (Carnegie Mellon College)

Is Your Diffusion Mannequin Really Denoising?

Authors: Daniel Pfrommer (Massachusetts Institute of Expertise), Zehao Dou (OpenAI), Christopher Scarvelis (MIT), Max Simchowitz (Carnegie Mellon College), Ali Jadbabaie (MIT)

CSGO: Content material-Type Composition in Textual content-to-Picture Era

Authors: Peng Xing (Nanjing College of Science and Expertise), Haofan Wang (Carnegie Mellon College), Yanpeng Solar (Nanjing College of Science and Expertise), wangqixun (Tencent Hunyuan), Baixu (ByteDance Inc.), Hao Ai (Beijing College of Aeronautics and Astronautics), Jen-Yuan Huang (Peking College), Zechao Li (Nanjing College of Science and Techonolgy)

RBench-V: A Major Evaluation for Visible Reasoning Fashions with Multimodal Outputs

Authors: Meng-Hao Guo (Tsinghua College), Xuanyu Chu (Tsinghua College), Qianrui Yang (Tsinghua College), Zhe-Han Mo (Tsinghua College), Yiqing Shen (Tsinghua College), Pei-lin Li (Tsinghua College, Tsinghua College), Xinjie Lin (Tsinghua College, Tsinghua College), Jinnian Zhang (College of Wisconsin, Madison), Xin-Sheng Chen (Tsinghua College), Yi Zhang (Beihang College), Kiyohiro Nakayama (Stanford College), Zhengyang Geng (CMU), Houwen Peng (Microsoft Analysis), Han Hu (Microsft Analysis Asia), Shi-min Hu (Tsinghua College, Tsinghua College)

Kinetics: Rethinking Take a look at-Time Scaling Regulation

Authors: Ranajoy Sadhukhan (Carnegie Mellon College), Zhuoming Chen (Carnegie Mellon College), Haizhong Zheng (CMU, Carnegie Mellon College), Beidi Chen (CMU / Amazon)

AHa-Bench: Benchmarking Audio Hallucinations in Giant Audio-Language Fashions

Authors: Xize Cheng (zhejiang college), Dongjie Fu (Zhejiang College), Chenyuhao Wen (College of Digital Science and Expertise of China), Shannon Yu (Tianjin College), Zehan Wang (Zhejiang College), Shengpeng Ji (Zhejiang College), Siddhant Arora (Carnegie Mellon College), Tao Jin (Zhejiang College), Shinji Watanabe (Carnegie Mellon College), Zhou Zhao (Zhejiang College)

Carnegie Mellon at NeurIPS 2025 – Machine Studying Weblog | ML@CMU