Show HN: Lance – image/video generation and understanding in one model
The model has 3B active parameters. We put the code, homepage, paper and model links here: - Code: https://github.com/bytedance/Lance - Homepage: https://lance-project.github.io/ - Paper: https://arxiv.org/abs/2605.18678 - Model: https://huggingface.co/bytedance-research/Lance p.s. Lance is a research project, not a polished product. The model was trained using fewer than 128 GPUs.
Discussion Highlights (8 comments)
Tsarp
Nice work. Wish they had picked another name given how popular lance/lancedb is.
asadm
last dance for lance vance!
CrzyLngPwd
Imagine having virtually unlimited compute and programming resources, and silly little slop videos is the result. Fabulous.
popalchemist
Seems like the video output is crippled. Resolution is low (720 or so), as is the frame rate. The samples are shown up-scaled and frame-interpolated. Why do that? Seems strange to be building sub-hd resolution video models in 2026.
nkvdev
Great quality, forked and going to try
bguberfain
Any plans to port to sglang or vLLM?
embedding-shape
Video understanding is kind of new, especially if done well, and hopefully working well with UI and UX, that'd be great. Current agents already struggle a bit with 2D space with normal screenshots of unconventional UIs, wonder if this model would do better with actual recordings of navigating and using applications, feels like it could help a bunch with understanding UX at least hopefully. Will be fun to play around with :)
wxw
What’s SOTA for video understanding? AFAIK most video search is powered by transcription and not the actual video. This seems impressive.