Liquid AI just released LFM2.5-1.2B-Thinking, a reasoning model that runs entirely on-device with 900 MB of memory. What needed a data center two years ago now fits on any smartphone.
This is the kind of engineering that actually impresses me - not because it's the biggest model or achieves the highest benchmark scores, but because it represents genuine innovation in making AI useful under constraints.
The model is specifically trained for reasoning tasks. It generates internal "thinking traces" before producing answers, enabling systematic problem-solving at edge-scale latency. The approach is similar to what OpenAI did with o1, but in a package 100 times smaller.
According to Liquid AI's benchmarks, LFM2.5-1.2B-Thinking matches or exceeds Qwen3-1.7B (in thinking mode) across most performance metrics, despite having 40% fewer parameters. At inference time, it outperforms both pure transformer models and hybrid architectures in speed and memory efficiency.
The question, as always, is: does it actually work?
I tested it. On my phone. With 900 MB of RAM allocated. And... yeah, it works. Not "works for a demo" or "works on carefully selected examples." It handles tool use, math problems, and instruction following at a level that would have seemed impossible for an on-device model even a year ago.
This matters because on-device AI has been mostly vaporware. Companies love to announce "runs on your phone!" features that are either just calling cloud APIs with extra steps, or are so limited they're useless for real work. LFM2.5 is different - it's a genuine reasoning model that runs locally, with the privacy and latency benefits that implies.
The architecture is also interesting. Liquid AI is using what they call "liquid neural networks" - a different fundamental approach from the transformers that power GPT and similar models. I'm not going to pretend to fully understand the math, but the key advantage seems to be efficiency: they're getting more capability per parameter than transformer models.
What makes this release particularly notable is the ecosystem support. It launched with day-one availability on Hugging Face, LEAP, and Liquid AI's own playground. It's quantized and ready to run. No "coming soon" or "research preview" asterisks.
The use cases are obvious: privacy-sensitive applications where you can't send data to the cloud, offline environments, latency-critical tasks where round-trip time to a server matters, and devices where internet connectivity is expensive or unreliable.
This is also a preview of where the industry is heading. The race to bigger models has practical limits - you can't run a trillion-parameter model on a phone, and most users don't want to. But a 1.2B parameter model that's genuinely useful? That's the kind of innovation that changes what's possible.
The technology is impressive. The question is whether developers build applications that take advantage of it, or if this becomes another "technically cool but practically unused" release.
I'm betting on the former. On-device AI that actually works is too useful to ignore.




