Apple Machine Studying Analysis at ICLR 2026

April 23, 2026

67

Apple is advancing AI and ML with basic analysis, a lot of which is shared by publications and engagement at conferences in an effort to speed up progress on this vital area and assist the broader group. This week, the Fourteenth Worldwide Convention on Studying Representations (ICLR) can be held in Rio de Janeiro, Brazil, and Apple is proud to once more take part on this vital occasion for the analysis group and to assist it with sponsorship.

On the predominant convention and related workshops, Apple researchers will current new analysis throughout a wide range of subjects, together with work unlocking large-scale coaching for Recurrent Neural Networks, a method for bettering State House Fashions, a brand new strategy to unifying picture understanding and era, a technique for producing 3D scenes from a single picture, and a brand new strategy to protein folding.

Throughout exhibition hours, attendees will have the ability to expertise demonstrations of Apple’s ML analysis in our sales space #204, together with native LLM inference on Apple silicon with MLX and Sharp Monocular View Synthesis in Much less Than a Second. Apple can be sponsoring and taking part in a lot of affinity group-hosted occasions that assist underrepresented teams within the ML group.

A complete overview of Apple’s participation in and contributions to ICLR 2026 could be discovered right here, and a choice of highlights follows beneath.

Recurrent Neural Networks (RNNs) are naturally suited to environment friendly inference, requiring far much less reminiscence and compute than attention-based architectures, however the sequential nature of their computation has traditionally made it impractical to scale up RNNs to billions of parameters. A brand new development from Apple researchers makes RNN coaching dramatically extra environment friendly — enabling large-scale coaching for the primary time and widening the set of structure selections out there to practitioners in designing LLMs, notably for resource-constrained deployment.

In ParaRNN: Unlocking Parallel Coaching of Nonlinear RNNs for Giant Language Fashions, a brand new paper accepted to ICLR 2026 as an Oral, Apple researchers share a brand new framework for parallelized RNN coaching that achieves a 665× speedup over the standard sequential strategy (see Determine 1). This effectivity achieve permits the coaching of the primary 7-billion-parameter classical RNNs that may obtain language modeling efficiency aggressive with transformers (see Determine 2).

To speed up analysis in environment friendly sequence modeling and allow researchers and practitioners to discover new nonlinear RNN fashions at scale, the ParaRNN codebase has been launched as an open-source framework for computerized training-parallelization of nonlinear RNNs.

At ICLR, the paper’s first writer can even ship an Expo Discuss about this analysis.

Speedup from Parallel RNN Coaching

Determine 1: Runtime comparability for parallel and sequential software of the tailored ParaGRU and ParaLSTM cells as a perform of enter sequence size. ParaRNN unlocks training-time parallelizability, permitting dramatic speedups over vanilla sequential software.

Efficiency of Giant-Scale Basic RNNs

Determine 2: Perplexity (decrease is healthier) for varied mannequin sizes for Mamba2, ParaLSTM, ParaGRU, and a transformer. With large-scale coaching enabled by parallelization, the tailored GRU and LSTM fashions present perplexity aggressive with a transformer and Mamba2.

State House Fashions (SSMs) like Mamba have turn into the main different to Transformers for sequence modeling duties. Their major benefit is effectivity in long-context and long-form era, enabled by fixed-size reminiscence and linear scaling of computational complexity. To Infinity and Past: Device-Use Unlocks Size Generalization in State House Fashions, a brand new Apple paper accepted as an Oral at ICLR, explores the capabilities and limitations of SSMs for long-form era duties. The paper exhibits that the effectivity of SSMs comes at a value of inherent efficiency degradation. In actual fact, SSMs fail to unravel long-form era duties when the complexity of the duty will increase past the capability of the mannequin, even when the mannequin is allowed to generate chain-of-thought (CoT) of any size. This limitation arises from the bounded reminiscence of the mannequin, which limits the expressive energy when producing lengthy sequences.

The paper exhibits that this limitation could be mitigated by permitting SSMs interactive entry to exterior instruments. Given the proper selection of device entry and problem-dependent coaching knowledge, SSMs can be taught to unravel any tractable downside and generalize to arbitrary downside size and complexity (see Determine 3). The work demonstrates that tool-augmented SSMs obtain robust size generalization on a wide range of arithmetic, reasoning, and coding duties. These findings spotlight SSMs as a possible environment friendly different to Transformers in interactive tool-based and agentic settings.

Determine 3: Left: Illustration of an interactive tool-use agent trajectory with pointer-based reminiscence device for fixing multi-digit addition. The agent can generate ideas (blue), outputs (purple) or instructions (orange), and obtain observations (inexperienced) from the reminiscence device. At every step, we present the state of the reminiscence context on the highest row, and beneath it present the sequence of generated tokens. Proper: Accuracy of recurrent/SSM fashions (Mamba, LSTM, GRU) and Transformers (Pythia, Mistral) educated on trajectories for ≤5-digit addition, evaluated on as much as 1,000-digits (log scale).

Unified multimodal LLMs that may each perceive and generate photos are interesting not just for architectural simplicity and effectivity, but in addition as a result of shared representations may end up in deeper understanding and higher vision-language alignment, and might allow distinctive capabilities like picture enhancing by directions.

Nevertheless, present open-source fashions usually undergo from a efficiency trade-off between picture understanding and era capabilities. At ICLR, Apple researchers will share MANZANO: A Easy and Scalable Unified Multimodal Mannequin with a Hybrid Imaginative and prescient Tokenizer. As described within the paper, Manzano is a unified framework designed to scale back this efficiency trade-off with a easy architectural concept (see Determine 4) and a coaching recipe that scales effectively throughout mannequin sizes.

Manzano makes use of a single shared imaginative and prescient encoder to feed two light-weight adapters that produce steady embeddings for image-to-text understanding and discrete tokens for text-to-image era inside a shared semantic area. A unified autoregressive LLM predicts high-level semantics within the type of textual content and picture tokens, and an auxiliary diffusion decoder then interprets the picture tokens into pixels. This structure, along with a unified coaching recipe over understanding and era knowledge, permits scalable joint studying of each capabilities. Manzano achieves state-of-the-art outcomes amongst unified fashions, and is aggressive with specialist fashions, notably on text-rich analysis.

Determine 4: Our hybrid tokenizer workflow. (Left): The tokenizer produces two distinct however homogeneous characteristic streams by separate adapters. Throughout coaching, one adapter output is randomly sampled and handed to a small LLM decoder for alignment. (Proper): As soon as the tokenizer is educated, the proper panel illustrates how these two characteristic sorts are utilized to understanding and era duties.

At ICLR, Apple researchers can even share Sharp Monocular View Synthesis in Much less Than a Second, which presents a technique for producing a 3D Gaussian illustration from {a photograph}, utilizing a single ahead go by a neural community in lower than a second on a normal GPU. The ensuing illustration can then be rendered in actual time from close by views, as a high-resolution photorealistic 3D scene (see Determine 5).

Known as SHARP (Single-image Excessive-Accuracy Actual-time Parallax), this method delivers a illustration that’s metric, with absolute scale, supporting metric digicam actions. Experimental outcomes display that SHARP delivers strong zero-shot generalization throughout datasets. It additionally units a brand new cutting-edge on a number of datasets, lowering LPIPS by 25-34% and DISTS by 21-43% versus one of the best prior mannequin, whereas decreasing the synthesis time by three orders of magnitude.

To allow the group to additional discover and construct on this strategy, code is on the market right here.

ICLR attendees will have the ability to expertise this work firsthand in a demo on the Apple sales space #204 throughout exhibition hours.

Determine 5: SHARP synthesizes a photorealistic 3D illustration from a single {photograph} in lower than a second. High: Enter picture; Backside: Novel view synthesized by SHARP. The synthesized illustration helps high-resolution rendering of close by views, with sharp particulars and high quality constructions, at greater than 100 frames per second on a normal GPU.

Protein folding is a foundational but notoriously difficult downside in computational biology. At its core, this downside includes predicting the exact three-dimensional coordinates for every atom inside a protein construction, based mostly solely on its amino acid sequence (i.e., a string of characters with 20 potential values for every character). Predicting the 3D construction of proteins is critically vital as a result of a protein’s perform is inherently linked to its spatial configuration. Breakthroughs on this space allow researchers to quickly design and perceive proteins, probably revolutionizing drug discovery, biotechnology, and past.

At ICLR, Apple researchers will share SimpleFold: Folding Proteins is Easier than You Suppose, which particulars a brand new strategy that makes use of a general-purpose structure based mostly solely on normal transformer blocks (much like text-to-image or text-to-3D fashions). This strategy permits SimpleFold to dispense with the advanced architectural designs of prior approaches, whereas sustaining efficiency (see Determine 6). To allow the analysis group to construct on this technique, the paper is accompanied by code and mannequin checkpoints that may be effectively run regionally on Mac with Apple silicon utilizing MLX.

Determine 6: Instance predictions of SimpleFold on targets (a) chain A of 7QSW (RubisCO massive subunit) and (b) chain A of 8DAY (Dimethylallyltryptophan synthase 1), with floor reality proven in mild aqua and prediction in deep teal. (c) Generated ensembles of goal chain B of 6NDW (Flagellar hook protein FlgE) with SimpleFold fine-tuned on MD ensemble knowledge. (d) Efficiency of SimpleFold on CASP14 with growing mannequin sizes from 100M to 3B. (e) Inference time of various sizes of SimpleFold on an M2 Max 64GB MacBook Professional.

Throughout exhibition hours, ICLR attendees will have the ability to work together with reside demos of Apple ML analysis in sales space #204 together with:

SHARP – This demo exhibits SHARP working on a set of pre-recorded photos or photos captured straight by the consumer throughout the demo. Guests will expertise the quick course of from deciding on a picture, processing it with SHARP, and viewing the generated 3D Gaussian level cloud on iPad Professional with the M5 chip.
Native LLM inference on Apple silicon with MLX – This demo will showcase on-device LLM inference on a MacBook Professional with M5 Max utilizing MLX, Apple’s open-source array framework purpose-built for Apple silicon, working a quantized frontier coding mannequin fully regionally inside Xcode’s native improvement setting. The complete stack — MLX, mlx-lm, and mannequin weights — is open supply, inviting the analysis group to construct on and lengthen these strategies independently.

We’re proud to once more sponsor affinity teams internet hosting occasions onsite at ICLR, together with Girls in Machine Studying (WiML) (social on April 24), and Queer in AI (social on April 25). Along with supporting these teams with sponsorship, Apple workers can even be taking part in these and different affinity occasions.

ICLR brings collectively professionals devoted to the development of deep studying, and Apple is proud to once more share revolutionary new analysis on the occasion and join with the group attending it. This submit highlights only a choice of the works Apple ML researchers will current at ICLR 2026, and a complete overview and schedule of our participation could be discovered right here.

Apple Machine Studying Analysis at ICLR 2026

Speedup from Parallel RNN Coaching

Efficiency of Giant-Scale Basic RNNs

Related Articles

Lastly the Steady Diff-in-Diff Estimator Reveals Up!

Testing Claude Fable 5: Hype or Actuality?

The highly effective Claude Mythos 5 makes its public launch as Fable 5

Latest Articles

Lastly the Steady Diff-in-Diff Estimator Reveals Up!

Testing Claude Fable 5: Hype or Actuality?

The highly effective Claude Mythos 5 makes its public launch as Fable 5

Scientists are fast-tracking 3 Ebola vaccines in hopes of shortening the outbreak — when might they be prepared?

A crank components for π