Podcast Idiot Transcriber — Free Local Podcast Transcription

What is it?

Transcription that stays on your machine

The Podcast Idiot Transcriber is a desktop application I built to solve a simple problem: I needed accurate transcripts of my podcast episodes without paying for a cloud service every month or handing my audio files off to someone else’s server.

It uses OpenAI Whisper — the same AI transcription engine that powers some of the best commercial services — running entirely on your local computer. After the one-time setup, it works completely offline. Your audio never leaves your machine.

It also supports speaker diarization — the ability to tell speakers apart — so you get a labeled transcript showing which person said what. Essential for interview-format podcasts.

🔒

Completely Private

Your audio files never leave your computer. No uploads, no cloud processing, no data collection. Ever.

💸

Free Forever

No subscriptions, no API keys to pay for, no usage limits. Download once, use as much as you want.

🎙️

Speaker Labels

Automatically identifies different voices and labels them in the transcript. Perfect for interviews.

📄

All Formats

Outputs TXT, SRT, VTT, JSON, and Podcasting 2.0 JSON — ready to drop into your RSS feed.

⚡

GPU Accelerated

Automatically uses your NVIDIA, Apple Silicon, or AMD GPU for dramatically faster transcription.

🖥️

Cross-Platform

One download works on Windows, macOS, and Linux. Installers and uninstallers for all three included.

How it works

Simple from start to transcript

Download & Install

Download the zip, extract it, and run the installer for your OS. It handles Python, ffmpeg, and all dependencies automatically — and downloads the Whisper AI model (about 150 MB) one time.

Open the App

Launch Podcast Idiot Transcriber from your desktop icon, Start Menu, or application launcher. A clean branded interface shows all your options at a glance.

Choose Your Audio

Browse for your MP3, WAV, M4A, FLAC, OGG, or AAC file. Select an output folder, or let the app save transcripts beside the original audio.

Configure Options

Pick your Whisper model, choose language or auto-detect, toggle speaker labels, choose output formats, and set CPU priority so transcription runs quietly in the background.

Click Transcribe

Hit the big red Transcribe button and watch the progress log. When done, all your chosen output files are ready in the output folder.

Output Formats

Every format podcasters need

The app creates all your transcript files in a single pass. Choose which ones you want — or grab all of them.

Format	File	Best For
TXT	.txt	Plain readable transcript for your website, show notes, or personal reference
SRT	.srt	Subtitle file for video versions of your podcast, YouTube, or video editors
VTT	.vtt	WebVTT captions for HTML5 players and web-based podcast players
JSON	.json	Full timestamped segment data for developers or custom integrations
Podcast 2.0	_podcast20.json	Ready for the `<podcast:transcript>` tag in your RSS feed

Podcasting 2.0 ready: The Podcast 2.0 JSON format follows the official podcast namespace transcript spec. Upload it to your server and point to it from your RSS feed — compatible with all Podcasting 2.0 apps.

Whisper Models

Pick your speed vs. accuracy tradeoff

Whisper comes in five sizes. The app defaults to base — a great balance for most podcasts. Switch to a larger model any time for more accurate results on difficult audio.

Model	Size	Speed (1hr, CPU)	Accuracy
tiny	75 MB	~5 min	Good
base ★	145 MB	~10–15 min	Very Good — recommended
small	465 MB	~20–30 min	Great
medium	1.5 GB	~45–60 min	Excellent
large	2.9 GB	~90–120 min	Best

Have a GPU? The app automatically detects and uses your NVIDIA (CUDA), Apple Silicon (Metal), or AMD GPU. A mid-range NVIDIA GPU can be 8–15× faster than CPU — a one-hour episode in under two minutes.

Speaker Labeling

Know who said what

Speaker diarization automatically detects different voices and labels them in the transcript. Instead of a wall of text, you get something like this:

[SPEAKER_00] Welcome back to the show. Today we’re talking about transcription tools.
[SPEAKER_01] Thanks for having me. I’ve been looking for something like this for a while.
[SPEAKER_00] Let’s start with why you think transcripts matter for podcasters.

Speaker labeling uses pyannote.audio and requires a free HuggingFace account and token — a two-minute one-time setup. You can also turn it off entirely for faster, label-free transcription.

Works best with two clearly distinct voices and minimal crosstalk — typical interview podcasts transcribe with excellent accuracy.

System Requirements

What you need to run it

🪟 Windows 10 / 11

🍎 macOS 12+

🐧 Linux (Ubuntu, Mint, Fedora, Arch…)

🐍

Python 3.10+

The installer checks for Python and guides you if it’s missing. Mac and Linux often have it pre-installed.

🎞️

ffmpeg

Required for audio processing. The installer downloads and installs it automatically on all three platforms — no manual setup needed.

💾

~500 MB Disk

For the base Whisper model and Python environment. Larger models need up to 3 GB extra.

🧠

4 GB RAM

8 GB or more recommended. Larger Whisper models need more RAM — see the model table above.

Ready to transcribe?

Free. Local. Private. No account required.

Download Free — v1.15

Windows · macOS · Linux | Includes installer & uninstaller for all platforms

Podcast IdiotTranscriber

Transcription that stays on your machine

Completely Private

Free Forever

Speaker Labels

All Formats

GPU Accelerated

Cross-Platform

Simple from start to transcript

Download & Install

Open the App

Choose Your Audio

Configure Options

Click Transcribe

Every format podcasters need

Pick your speed vs. accuracy tradeoff

Know who said what

What you need to run it

Python 3.10+

ffmpeg

~500 MB Disk

4 GB RAM

Ready to transcribe?

Podcast Idiot
Transcriber