Back in 2017 I had created a site which took the the audio of some of my favorite podcasts and tried to make them searchable by passing them through an automated speech-to-text engine.
In the original version of this the transcription was done with very rough, slow, low accuracy system which resulted in transcripts that were fit for keyword searching but not much else.
Thankfully since then OpenAI has released Whisper a powerful speech-to-text engine that I can run right on my Mac and results in transcripts that are shockingly good. They aren’t quite at the level of a human transcriber but they get darn close in many instances. Getting close to the level where you could use them to grab a pull quote with only a little bit of tidying up to do.
Moreover, because I can just run this locally it is essentially ‘free’ to run, costing only my CPU time. So I can much more easily keep it up to date and incorporate the entire back catalogs of the shows I’m indexing.
I’ve been running the truly excellent C++ Port of Whisper by Georgi Gerganov. This has been optimized for Apple Silicon processors and absolutely screams through the audio. On my M2 Max MacBook Pro I can transcribe an episode at roughly 7X realtime. So it takes around 15 minutes to transcribe a typical episode.
The updated site can be found at podsearch.david-smith.org. I hope you find it helpful and fun to explore. You can easily search for the occurrences of particular keywords within a particular show and find the timestamp of when it was said.
I have transcribed the whole back catalogs of: