Gemini 3.1 Flash-Lite: Built for intelligence at scale

meetpateltech 51 points 28 comments March 03, 2026

Discussion Highlights (9 comments)

sh4jid

The Gemini Pro models just don't do it for me. But I still use 2.5 Flash Lite for a lot of my non-coding jobs, super cheap but great performance. I am looking forward to this upgrade!

sync

Unfortunate, significant price increase for a 'lite' model: $0.25 IN / $1.50 OUT vs. Gemini 2.5 Flash-Lite $0.10 IN / $0.40 OUT.

zacksiri

This is going to be a fun one to play with. I've been conducting tests on various models for my agentic workflow. I was just wishing they would make a new flash-lite model, these things are so fast. Unfortunately 2.5-flash and therefore 2.5-flash-lite failed some of my agentic workflows. If 3.1-flash-lite can do the job, this solves basically all latency issues for agentic workflows. I publish my benchmarks here in case anyone is interested: https://upmaru.com/llm-tests/simple-tama-agentic-workflow-q1... P.S: The pricing bump is quiet significant, but still stomachable if it performs well. It is significant though.

rohansood15

For the last 2 years, startup wisdom has been that models will continue to get cheaper and better. Claude first, and now Gemini has shown that it's not the case. We priced an enterprise contract using Flash 1.5 pricing last summer, and today that contract would be unit economic negative if we used Flash 3. Flash 2.5 and now Flash 3.1 Lite barely breaks even. I predict open-source models and fine-tuning are going to make a real comeback this year for economic reasons.

k9294

You can test Gemini 3.1 Lite transcription capabilities in https://ottex.ai — the only dictation app supporting Gemini models with native audio input. We benchmarked it for real-life voice-to-text use cases: <10s 10-30s 30s-1m 1-2m 2-3m Flash 2548 2732 3177 4583 5961 Flash Lite 1390 1468 1772 2362 3499 Faster by 1.83x 1.86x 1.79x 1.94x 1.70x (latency in ms, median over 5 runs per sample, non-streaming) Key takeaways: - 1.8x faster than Gemini 3 Flash on average - ~1.4 sec transcription time for short to medium recordings - ~$0.50/mo for heavy users (10h+ transcription) - Close to SOTA audio understanding and formatting instruction following - Multilingual: one model, 100+ languages Gemini is slowly making $15/month voice apps obsolete.

GodelNumbering

That's a 150% increase in the input costs and 275% increase on output costs over the same sized previous generation (2.5-flash-lite) model

xnx

I'm still clinging to gemini-2.0-flash which I think is free free for API use(?!).

vlmutolo

Lots of comments about the price change, but Artifical Analysis reports that 3.1 Flash-Lite (reasoning) used fewer than half of the tokens of 2.5 Flash-Lite (reasoning). This will likely bring the cost below 2.5 flash-lite for many tasks (depends on the ratio of input to output tokens). That said, AA also reports that 3.1 FL was 20% more expensive to run for their complete Intelligence index benchmark. The overall point is that cost is extremely task-dependent, and it doesn’t work to just measure token cost because reasoning can burn so many tokens, reasoning token usage varies by both task and model, and similarly the input/output ratios vary by task.

msp26

What the fuck is this price hike? It was such a nice low end, fast model. Who needs 10 years of reasoning on this model size?? I'm gonna switch some workflows to qwen3.5. There's a lot of tasks that benefit from just having a mildly capable LLM and 2.5 Flash Lite worked out of the box for cheap. Can we get flash lite lite please? Edit: Logan said: "I think open source models like Gemma might be the answer here" Implying that they're not interested in serving lower end Gemini models?

Gemini 3.1 Flash-Lite: Built for intelligence at scale

Discussion Highlights (9 comments)

Related Discussions