Show HN: Lance – image/video generation and understanding in one model

cleardusk 58 points 15 comments May 20, 2026
github.com · View on Hacker News

The model has 3B active parameters. We put the code, homepage, paper and model links here: - Code: https://github.com/bytedance/Lance - Homepage: https://lance-project.github.io/ - Paper: https://arxiv.org/abs/2605.18678 - Model: https://huggingface.co/bytedance-research/Lance p.s. Lance is a research project, not a polished product. The model was trained using fewer than 128 GPUs.

Discussion Highlights (8 comments)

Tsarp

Nice work. Wish they had picked another name given how popular lance/lancedb is.

asadm

last dance for lance vance!

CrzyLngPwd

Imagine having virtually unlimited compute and programming resources, and silly little slop videos is the result. Fabulous.

popalchemist

Seems like the video output is crippled. Resolution is low (720 or so), as is the frame rate. The samples are shown up-scaled and frame-interpolated. Why do that? Seems strange to be building sub-hd resolution video models in 2026.

nkvdev

Great quality, forked and going to try

bguberfain

Any plans to port to sglang or vLLM?

embedding-shape

Video understanding is kind of new, especially if done well, and hopefully working well with UI and UX, that'd be great. Current agents already struggle a bit with 2D space with normal screenshots of unconventional UIs, wonder if this model would do better with actual recordings of navigating and using applications, feels like it could help a bunch with understanding UX at least hopefully. Will be fun to play around with :)

wxw

What’s SOTA for video understanding? AFAIK most video search is powered by transcription and not the actual video. This seems impressive.

Semantic search powered by Rivestack pgvector
8,303 stories · 78,303 chunks indexed