Liquid AI shipped LFM2.5-230M, it’s the corporate’s smallest mannequin to this point. The discharge targets a selected job: working agentic duties on telephones, robots, and automation gadgets. Each the bottom and instruction-tuned checkpoints are open-weight on Hugging Face.
The pitch is slim on goal. This isn’t a normal reasoning mannequin. It’s constructed for knowledge extraction and gear use on edge {hardware}.
TL;DR
- Liquid AI’s LFM2.5-230M is its smallest mannequin but: 230M params, open-weight, constructed on LFM2.
- Runs on-device at 213 tok/s on a Galaxy S25 Extremely and 42 on a Raspberry Pi 5.
- Beats bigger fashions (Qwen3.5-0.8B, Gemma 3 1B) on instruction following and knowledge extraction.
- Tuned for device use and extraction; not for math, code era, or artistic writing.
- Day-one help throughout llama.cpp, MLX, vLLM, SGLang, and ONNX, with a 293–375 MB footprint.
What’s LFM2.5-230M?
LFM2.5-230M is a 230-million-parameter, text-only mannequin. It’s constructed on the LFM2 structure. The mannequin has 14 layers whole. Eight are double-gated LIV convolution blocks. The remaining six are grouped-query consideration (GQA) blocks. The hybrid structure targets quick CPU inference.
The context size is 32,768 tokens. The vocabulary dimension is 65,536. The data cutoff is mid-2024. It helps ten languages, together with English, Chinese language, Arabic, and Japanese.
Liquid AI crew ships two checkpoints. LFM2.5-230M-Base is the pre-trained mannequin for fine-tuning. LFM2.5-230M is the general-purpose instruction-tuned model. The license is lfm1.0.
Coaching and Publish-Coaching
The mannequin was pre-trained on 19 trillion tokens. That whole features a 32K context extension part. The post-training recipe then runs in three levels.
First comes supervised fine-tuning with distillation from the bigger LFM2.5-350M. Second is direct choice optimization (DPO). Third is multi-domain reinforcement studying. This preserves flexibility for downstream specialization.
The distillation step is what retains a 230M mannequin aggressive with bigger checkpoints. It inherits habits from the larger LFM2.5-350M on focused duties.
Benchmark
Liquid AI crew evaluated LFM2.5-230M throughout ten benchmarks. They span data, instruction following, knowledge extraction, and gear use.
The instruction-following outcomes help that. On IFEval, LFM2.5-230M scores 71.71. That beats Qwen3.5-0.8B (59.94) and Gemma 3 1B IT (63.49). On IFBench it scores 38.40, forward of each. On CaseReportBench, a medical data-extraction check, it scores 22.51.
| Mannequin | Params | IFEval | IFBench | CaseReportBench | BFCLv4 | MMLU-Professional |
|---|---|---|---|---|---|---|
| LFM2.5-230M | 230M | 71.71 | 38.40 | 22.51 | 21.03 | 20.25 |
| LFM2.5-350M | 350M | 76.96 | 40.69 | 32.45 | 21.86 | 20.01 |
| Granite 4.0-H-350M | 350M | 61.27 | 17.22 | 12.44 | 13.28 | 13.14 |
| Qwen3.5-0.8B (Instruct) | 800M | 59.94 | 22.87 | 13.83 | 18.70 | 37.42 |
| Gemma 3 1B IT | 1B | 63.49 | 20.33 | 2.28 | 7.17 | 14.04 |
LFM2.5-230M leads on instruction following and knowledge extraction. It trails on broad data: MMLU-Professional is 20.25, behind Qwen3.5-0.8B’s 37.42. Additionally it is weak on some agentic device use. On τ²-Bench Telecom it scores simply 5.26.
Liquid AI is direct concerning the limits. It doesn’t suggest the mannequin for reasoning-heavy workloads. Meaning superior math, code era, and artistic writing.
Use Circumstances With Examples
The mannequin matches two jobs effectively.
- The primary is large-scale knowledge extraction pipelines. Image a pipeline parsing 100,000 medical reviews into structured fields. A 4-bit construct with a 293–375 MB reminiscence footprint runs that on commodity CPUs. You extract domestically, with no per-token API invoice.
- The second job is light-weight on-device agentic workloads. Suppose a house automation hub that turns speech into device calls. Or a cellphone assistant that routes a request to the precise operate.
As an early sign, Liquid AI deployed the mannequin on a Unitree G1 humanoid robotic. It ran completely on the robotic’s onboard NVIDIA Jetson Orin. There the mannequin acted as a skill-selection layer. It turned one natural-language instruction right into a sequence of device calls. These calls invoked low-level abilities from NVIDIA’s SONIC framework.
LFM2.5 helps operate calling in 4 steps. You outline instruments as JSON within the system immediate. The mannequin writes a Pythonic operate name between particular tokens. You execute the decision and return the outcome. The mannequin then writes a plain-text reply.
By default the decision is a Python listing. It sits between the <|tool_call_start|> and <|tool_call_end|> tokens. Right here is the documented sample, with the device JSON abbreviated:
<|im_start|>system
Record of instruments: [{"name": "get_candidate_status",
"parameters": {"candidate_id": {"type": "string"}}}]<|im_end|>
<|im_start|>consumer
What's the present standing of candidate ID 12345?<|im_end|>
<|im_start|>assistant
<|tool_call_start|>[get_candidate_status(candidate_id="12345")]<|tool_call_end|>Checking the present standing of candidate ID 12345.<|im_end|>
You may also drive JSON-formatted calls by way of the system immediate.
Working It: A Minimal Instance
The mannequin works with Transformers 5.0.0 and up. The advisable era settings are temperature 0.1, top_k 50, and repetition_penalty 1.05. Observe the do_sample=True flag, which is required for these sampling settings to use.
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "LiquidAI/LFM2.5-230M"
mannequin = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
dtype="bfloat16",
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
inputs = tokenizer.apply_chat_template(
[{"role": "user", "content": "What is C. elegans?"}],
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(mannequin.gadget)
output = mannequin.generate(
**inputs,
do_sample=True,
temperature=0.1,
top_k=50,
repetition_penalty=1.05,
max_new_tokens=512,
)
print(tokenizer.decode(output[0][inputs["input_ids"].form[-1]:], skip_special_tokens=True))
Liquid AI additionally publishes fine-tuning recipes. They cowl SFT, DPO, and GRPO with LoRA, by way of Unsloth and TRL. Every ships as a Colab pocket book.
Interactive Explainer
‘+m.n+’ ‘+m.p+’
‘+
”+
‘
‘+m.d[idx].toFixed(2)+’
