Show HN: CPU-only transcription for YouTube, TikTok, X, Instagram videos
mrkn1
52 points
19 comments
May 20, 2026
Related Discussions
Found 5 related stories in 111.8ms across 10,324 title embeddings via pgvector HNSW
- Show HN: Yt-x v0.8.0 – Browse, play, and download YouTube from the terminal Benex254 · 18 pts · May 19, 2026 · 65% similar
- Show HN: Local-first fast CPU image to text for screenshots, PDFs, webpages mrkn1 · 14 pts · June 05, 2026 · 64% similar
- Show HN: VidStudio, a browser based video editor that doesn't upload your files kolx · 263 pts · April 21, 2026 · 57% similar
- Show HN: On-device transcriber that's 97% accurate at identifying speakers marshalla · 14 pts · June 05, 2026 · 57% similar
- Show HN: Claude-replay – A video-like player for Claude Code sessions es617 · 79 pts · March 06, 2026 · 54% similar
Discussion Highlights (8 comments)
spudlyo
So, this project consists of a ~175 line README and a ~500 line Python program that glues yt-dlp and Kroko together. Neat. I guess if it encourages you to install and figure out how to use ffmpeg, yt-dlp, kroko, numpy, and onnx that's a good thing. Sometimes just knowing a thing is possible is a huge benefit.
charcircuit
Most of these platforms already have transcriptions built in.
niraj-agarwal
Had Claude test it out on 3 videos. Worked at 5-8x realtime. The beauty of it is that it works on all videos, not just the one with transcripts. Combine it with YouTube search and LLM takeaways from transcripts, and you have super-efficient content consumption. There are SaaS products that charge 1 cent per video for those with transcripts. There is a viable product in here somewhere, methinks.
ranger_danger
How can we transcribe other languages besides English?
canadiantim
Nice. Can it do speaker diarization?
HDBaseT
Wouldn't it still be more efficient to do GPU transcriptions anyways? is this something we could actually put the effectively useless NPUs to use in modern laptops?
dmos62
Now make it distinguish speakers and we really have something. As far as I know, that's significantly harder though.
piotrrojek
If someone is interested, this is my supershort zsh/bash scripts that I keep in .zshrc for doing the same thing using plain whisper.cpp, ffmpeg and yt-dlp (`brew install whisper-cpp yt-dlp` for Mac); I output it in vtt format (subtitles) though, but it's easy enough to change it to txt. yt_to_srt() { local url="$1" local output_base="$2" local language="${3:-en}" yt-dlp -x --audio-format wav --postprocessor-args "-ar 16000" -o "$output_base.wav" "$url" whisper-cli --language "$language" --model "$WHISPER_MODEL" --split-on-word --max-len 65 --output-vtt --output-file "$output_base" --file "$output_base.wav" rm "$output_base.wav" } file_to_srt() { local filepath="$1" local language="${2:-en}" local filename=$(basename "$filepath") local filename_no_ext="${filename%.*}" local output_base="$filename_no_ext" local temp_wav="$output_base.wav" ffmpeg -i "$filepath" -vn -acodec pcm_s16le -ar 16000 -ac 1 "$temp_wav" whisper-cli --language "$language" --model "$WHISPER_MODEL" --split-on-word --max-len 65 --output-vtt --output-file "$output_base" --file "$temp_wav" rm "$temp_wav" } plus additional bootstrap script for large-v3-turbo model from my chez-moi dotfiles: #!/bin/bash # Download whisper.cpp models from Hugging Face (runs once per machine). set -euo pipefail MODELS_DIR="$HOME/whisper-models" BASE_URL="https://huggingface.co/ggerganov/whisper.cpp/resolve/main" MODELS=("ggml-large-v3-turbo.bin" "ggml-tiny.bin") mkdir -p "$MODELS_DIR" for model in "${MODELS[@]}"; do if [ ! -f "$MODELS_DIR/$model" ]; then echo "Downloading $model..." curl -L --progress-bar -o "$MODELS_DIR/$model" "$BASE_URL/$model" else echo "$model already exists, skipping." fi done echo "Whisper models ready at $MODELS_DIR"