How to transcribe the audio from Youtube videos with Python

How to transcribe the audio from Youtube videos with Python

A few days ago, I wanted to transcribe the audio from some of my old Youtube videos to work on a content project.

How to download a video from Youtube using Python?

In order to extract the audio, I needed to download those videos from Youtube. There are multiple free websites on the web offering you this service but the quality isn’t always the best you can expect and some of them experience frequent bugs.

I searched for a Python-based solution.

The first library I found, which is still the most popular at time of writing, is Pytube. It worked for a while but it suddenly faced a very nasty bug, constantly producing the same error.

return self.vid_info['streamingData']
KeyError: 'streamingData'

So I searched for an alternative to Pytube and came across yt_dlp. It wasn’t as user-friendly as Pytube in terms of implementation but it works nicely (so far so good).

pip install yt_dlp

Here’s the code of a quick function to use yt_dlp.

The function downloads the video to a folder called video (located in the same folder as the function), assigning the title as a filename and adding the extension (.mp4).

You can also simply hard-code it to video.mp4 to avoid some weird characters.

The function returns the title.

import yt_dlp

def download_video_and_get_title(url):
    ydl_opts = {
        'format': 'best',
        'outtmpl': 'video/%(title)s.%(ext)s',
        'noplaylist': True,
        'quiet': True,
        'no_warnings': True,

    with yt_dlp.YoutubeDL(ydl_opts) as ydl:
        video_info = ydl.extract_info(url, download=True)  # Download the video and extract its info
        title = video_info.get('title', None)  # Get video title

    return title


How to extract the audio from a video clip with Python?

Then you can extract the audio from the MP4 with moviepy.

pip install moviepy

If you’re using dynamic titles (instead of video.mp4), adapt your code accordingly, for instance: clip = VideoFileClip(f”video/{title}.mp4″) with the title retrieved from Step 1: title = download_video_and_get_title(video_url).

from import VideoFileClip

clip = VideoFileClip("video/video.mp4")
audio =

Then you can use pydub and mutagen to convert this wav to a MP3.

pip install pydub

pip install mutagen

I lower the bitrate to 32K to reduce the size of the MP3 before moving to the transcription phase.

from pydub import AudioSegment

# load the audio file
sound = AudioSegment.from_wav("audio/audio.wav")
# set the desired bit rate (low to reduce the size of the mp3)
bitrate = "32k"
# export the audio file in mp3 format with desired bitrate
sound.export("audio/output.mp3", format="mp3", bitrate=bitrate)

How to transcribe audio with Python?

Then the last phase is to transcribe the audio using Whisper by OpenAI.

pip install openai

import openai
import json

openai.api_key = "your api key"
#(alternative = import from env variable: openai.api_key = os.getenv('OPENAI_API_KEY'))

file = open("audio/output.mp3", "rb")

transcription = openai.Audio.transcribe("whisper-1", file)
transcription_json_str = json.dumps(transcription)
json_data = json.loads(transcription_json_str)
text = json_data["text"]

print("Transcription complete!")

You can now manipulate the text saved in the text variable.

Have fun with Python!

🚀 Subscribe to my weekly newsletter packed with tips & tricks around AI, SEO, coding and smart automations

☕️ If you found this piece helpful, you can buy me coffee