TIPSv2: Advancing Vision-Language Pretraining with Enhanced Patch-Text Alignment
gmays
21 points
1 comment
April 24, 2026
Related Discussions
Found 5 related stories in 101.2ms across 10,002 title embeddings via pgvector HNSW
- Alignment pretraining: AI discourse creates self-fulfilling (mis)alignment anigbrowl · 44 pts · May 18, 2026 · 47% similar
- ChatGPT Images 2.0 wahnfrieden · 655 pts · April 21, 2026 · 47% similar
- ChatGPT Images 2.0 meetpateltech · 144 pts · April 21, 2026 · 47% similar
- TADA: Speech generation through text-acoustic synchronization smusamashah · 97 pts · March 11, 2026 · 44% similar
- Advancing voice intelligence with new models in the API meetpateltech · 33 pts · May 07, 2026 · 43% similar
Discussion Highlights (1 comments)
jiggawatts
I just tested their online demo with a challenging photo of a snowboarder in dark clothing in front of a dark forest. The low contrast makes it difficult to distinguish their black helmet against the shadowed trees immediately behind and around it. Dinov3 segmented this perfectly, as good as a human might, TIPSv2 cut the head off and marked it with the same PCA values as the forest. Similarly, TIPSv2 "split" the snow in the foreground into two different PCA values despite it being visually (and physically) contiguous and not significantly distinct.