HomeSample Page

Sample Page Title


Liquid AI has launched LFM2.5-1.2B-Pondering, a 1.2 billion parameter reasoning mannequin that runs totally on machine and suits in about 900 MB on a contemporary cellphone. What wanted an information middle 2 years in the past can now run offline on client {hardware}, with a give attention to structured reasoning traces, software use, and math, reasonably than normal chat.

Place within the LFM2.5 household and core specs

LFM2.5-1.2B-Pondering is a part of the LFM2.5 household of Liquid Basis Fashions, which extends the sooner LFM2 structure with extra pre-training and multi stage reinforcement studying for edge deployment.

The mannequin is textual content solely and normal function with the next configuration:

  • 1.17B parameters, reported as a 1.2B class mannequin
  • 16 layers, with 10 double gated LIV convolution blocks and 6 GQA blocks
  • Coaching finances of 28T tokens
  • Context size of 32,768 tokens
  • Vocabulary measurement of 65,536
  • 8 languages, English, Arabic, Chinese language, French, German, Japanese, Korean, Spanish

Reasoning first habits and considering traces

The ‘Pondering’ variant is educated particularly for reasoning. At inference time it produces inside considering traces earlier than the ultimate reply. These traces are chains of intermediate steps that the mannequin makes use of to plan software calls, confirm partial outcomes, and work by multi step directions.

Liquid AI workforce recommends this mannequin for agentic duties, information extraction pipelines, and retrieval augmented era flows the place you need specific reasoning and verifiable intermediate steps. A sensible means to consider it, you utilize LFM2.5-1.2B-Pondering because the planning mind inside brokers and instruments, and use different fashions whenever you want broad world information or code heavy workflows.

Benchmarks versus different 1B class fashions

Liquid AI workforce evaluates LFM2.5-1.2B-Pondering towards fashions round 1B parameters on a collection of reasoning and instruction benchmarks.

https://huggingface.co/LiquidAI/LFM2.5-1.2B-Pondering

In comparison with LFM2.5-1.2B-Instruct, three metrics enhance strongly, math reasoning rises from about 63 to 88 on MATH 500, instruction following rises from about 61 to 69 on Multi IF, and power use rises from about 49 to 57 on BFCLv3.

LFM2.5-1.2B-Pondering competes with Qwen3-1.7B in considering mode on most reasoning benchmarks whereas utilizing round 40 % fewer parameters and fewer output tokens on common. It additionally outperforms different 1B class baselines akin to Granite-4.0-H-1B, Granite-4.0-1B, Gemma-3-1B-IT, and Llama-3.2-1B Instruct on many of those duties.

Coaching recipe and doom looping mitigation

Reasoning fashions usually undergo from doom looping, the place the mannequin repeats fragments of its chain of thought as an alternative of ending the reply. LFM2.5-1.2B-Pondering makes use of a multi stage coaching pipeline to cut back this.

The method begins with mid coaching that features reasoning traces so the mannequin learns a ‘cause first then reply’ sample. Then supervised superb tuning on artificial chains improves chain of thought era. After that, choice alignment and RLVR are utilized. In choice alignment, the analysis workforce generates 5 temperature sampled candidates and 1 grasping candidate per immediate and makes use of an LLM decide to choose most popular and rejected outputs, whereas additionally labeling looping outputs explicitly. Throughout RLVR they add an n gram repetition penalty early in coaching. This reduces the doom loop charge from 15.74 % at mid coaching to 0.36 % after RLVR on a set of consultant prompts.

The result’s a small reasoning mannequin that may produce considering traces with out getting caught in lengthy repetitive outputs, which is necessary for interactive brokers and on machine UX.

Inference efficiency and {hardware} footprint

A key design goal is quick inference with a small reminiscence footprint on CPUs and NPUs. LFM2.5-1.2B-Pondering can decode at about 239 tokens per second on an AMD CPU and about 82 tokens per second on a cellular NPU, whereas operating beneath 1 GB of reminiscence, with broad day one assist for llama.cpp, MLX, and vLLM.

The detailed {hardware} desk makes use of 1K prefill and 100 decode tokens and provides the next examples for LFM2.5-1.2B-Pondering

https://huggingface.co/LiquidAI/LFM2.5-1.2B-Pondering

These numbers present that the mannequin suits comfortably beneath 1 GB on telephones and embedded units whereas sustaining helpful throughputs even at lengthy contexts.

Key Takeaways

  1. LFM2.5-1.2B-Pondering is a 1.17B parameter reasoning mannequin with 32,768 context size and runs beneath 1 GB on telephones and laptops.
  2. The mannequin is optimized for specific considering traces, agentic workflows, information extraction, and RAG.
  3. It reaches robust scores for a 1B class mannequin, for instance 87.96 on MATH 500, 85.60 on GSM8K, and aggressive efficiency with Qwen3 1.7B in considering mode with fewer parameters.
  4. The coaching pipeline makes use of midtraining with reasoning traces, supervised superb tuning, choice alignment with 5 sampled together with 1 grasping candidate, and RLVR with n gram penalties, which reduces doom loops from 15.74 % to 0.36 %.
  5. The mannequin runs effectively on AMD and Qualcomm NPUs and CPUs with runtimes like llama.cpp, FastFlowLM, and NexaML, is out there in GGUF, ONNX, and MLX codecs, and will be loaded simply from Hugging Face for on machine deployment.

Internet hosting Suppliers/Deployment

You possibly can entry or host the mannequin by the next suppliers and platforms:

Cloud & API Suppliers

Mannequin Repositories (Self-Internet hosting)

If you wish to run the mannequin regionally or by yourself infrastructure, the weights can be found in numerous codecs:


Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles