Turn any YT Video into a Content-Making Machine with Claude-3 and Lightning AI
Last Updated on May 9, 2024 by Editorial Team
Author(s): Tim Cvetko
Originally published on Towards AI.
On Feb 20, Karpathy released an extensive YouTube video building the GPT-3 Tokenizer from scratch.
On Mar 4, Emmanuel Ameisen and Erik Schluntz underwent Karpathyβs challenge to convert video into a blog with LLMs and used Claude-3 to create a blog out of the very same GPT-3 Tokenizer video.
-> Find the blog here.
Today, Iβm Taking this 1 Step Further.
I am partnering with Lightning AI to help people create content-making machines from any YT video via Lightning Studios!
Go! Create Your Own Below β
Create any Video-to-Text LLM with Claude-3 – a Lightning Studio by cvetkotim
This studio had been created to transcribe Youtube videos into generated blog content with Claude-3. Duplicate toβ¦
lightning.ai
Hereβs What Youβll Learn in This Article:
- How to Implement PyTube and Claude-3 to Extract any YouTube Video into Text β Full Code Implementation
- How you can Build ANY Voice-to-Text LLM Product Seamlessly and Pump Up Limitless Content
➡ If youβd like to convert YT videos into full-scope content automatically, fill out the form here β¬
P1 : Aight, show me how this thing works!
This notebook provides a baseline to reproduce Claude-3's solution to Karpathyβs challenge of converting a video tutorial in a blog post.
Okay, hereβs how this works step by step:
- Get an API Key for Claude-3 and Init Anthropic Client
- Download YouTube video and Transcript
- Init Whisper Model for Speech-to-Text
- Chop the Video in Text + Screenshot Pairs
- Apply Claude-3 to Fill Out the Blog
Step 1: Get an API Key for Claude-3 and Init Anthropic Client
Get an API key from Anthropic AI's official site to run the demo using any of the three available models.
Step 2: Download Youtube Video & Transcript
Using YTβs pytube library in Python, we will 1st download the video along with the audio stream which we will require for Whisper later on.
Step 3: Init Whisper Model for Speech-to-Text
We import the WhisperModel from from faster_whisper import WhisperModel to transcribe the contents of our YT video into text segments.
Step 4: Chop the Video in Text + Screenshot Pairs
Hereβs the not-so-fun part. We need to load video and text separately from pytube + Whisper. To do that, we should divide the video β can be accomplished best through video chapters.
Step 5: Apply Claude-3 to Fill Out the Blog
Ha! Cool. Now, we can divide any YT video into chapters that include both video and text and we can apply LLM semantics to them. Letβs first take a look at the prompt for the Claude-3 model.
prompt_instructions = f"""
<instructions>
You have been given images of a video at different timestamps, followed by the audio transcript in <transcript>
The transcript was generated by an AI speech recognition tool and may contain some errors/infelicities.
Your task is to transform the transcript into a markdown blog post.
This transcript is noisy. Please rewrite it using the following guidelines:
- output valid markdown
- insert section headings and other formatting where appropriate
- you are given only part of a transcript, so do not include introductory or concluding paragraphs. Only include the main topics discussed in the transcript
- use styling to make images, text, code, callouts and the page layout and margins look like a typical blog post or textbook
- remove any verbal tics
- if there are redundant pieces of information, only present it once
- keep the conversational content in the style of the transcript. Including headings to make the narrative structure easier to follow along
- the transcript includes too many images, so you should only include the most important 1-2 images in your output
- choose images that provide illustrations that are relevant to the transcript
- prefer to include images which display complete code, rather than in progress
- when relevant transcribe important pieces of code and other valuable text
- if an image would help illustrate a part of a transcript, include it
- to include an image, insert a tag with <img src="xxxxx.jpg"/> where xxxxx is replaced by the exact image timestamp inserted above the image data
- do not add any extraneous information: only include what is either mentioned in the transcript or the images
Your final output should be suitable for inclusion in a textbook.
</instructions>
"""
Aaaand, hereβs the final iteration applied to each of the chunks + markdown.
β¦ which leaves us with a blog post similar to the one below. Thatβs mighty cool!
Build any Video-to-Text LLM Product
Okay, okay. Having code that can:
- Take any YT video as Input
- Create a List of Chapters
- Apply Claude-3 to create Markdown
β¦ is pretty awesome. And itβs also automatic.
So, how do you go from this to building any video-to-text LLM product seamlessly? Easy! You change the prompt.
If Blog, why NOT Content? Hereβs HOW:
Weβre going to use prompt engineering to auto-create any piece of content with AI. This is our initial prompt to creating a blog.
prompt_instructions = f"""
<instructions>
You have been given images of a video at different timestamps, followed by the audio transcript in <transcript>
The transcript was generated by an AI speech recognition tool and may contain some errors/infelicities.
Your task is to transform the transcript into a markdown blog post.
This transcript is quite noisy. Your job is to create valid Twitter/LinkedIn posts, no longer than 200 characters. Short rapid sentences and learnings.
- output valid markdown
- insert section headings and other formatting where appropriate
- you are given only part of a transcript, so do not include introductory or concluding paragraphs. Only include the main topics discussed in the transcript
- use styling to make images, text, code, callouts and the page layout and margins look like a typical blog post or textbook
- remove any verbal tics
- if there are redundant pieces of information, only present it once
- keep the conversational content in the style of the transcript. Including headings to make the narrative structure easier to follow along
- choose images that provide illustrations that are relevant to the transcript
- prefer to include images which display complete code, rather than in progress
- when relevant transcribe important pieces of code and other valuable text
- do not add any extraneous information: only include what is either mentioned in the transcript or the images
Your final output should be suitable for inclusion in a textbook.
</instructions>
"""
Letβs tune this baby to spill out content directly from a YouTube video.
prompt_instructions = f"""
<instructions>
You have been given images of a video at different timestamps, followed by the audio transcript in <transcript>
The transcript was generated by an AI speech recognition tool and may contain some errors/infelicities.
This transcript is quite noisy. Your job is to create valid Twitter/LinkedIn posts, no longer than 200 characters. Short rapid sentences and learnings.
- output valid markdown
- insert section headings and other formatting where appropriate
- you are given only part of a transcript, so do not include introductory or concluding paragraphs. Only include the main topics discussed in the transcript
- use styling to make images, text, code, callouts and the page layout and margins look like a typical blog post or textbook
- remove any verbal tics
- if there are redundant pieces of information, only present it once
- keep the conversational content in the style of the transcript. Including headings to make the narrative structure easier to follow along
- choose images that provide illustrations that are relevant to the transcript
- prefer to include images which display complete code, rather than in progress
- when relevant transcribe important pieces of code and other valuable text
- do not add any extraneous information: only include what is either mentioned in the transcript or the images
Your final output should be suitable for inclusion in a textbook.
</instructions>
"""
For example, I inserted this video from 20VC:
aaand got the following: (205 lines of pure wisdom)!
-> Check here
To Conclude β¦
- Iβve shown you how you can take any YouTube video as input, do a lilβ processing, and apply Claude-3(or any other LLM for that manner) to create content.
- Thanks to Lightning AI, you can do so by:
- Getting an API Key for Claude-3
- Clicking Open in Studio
- Running a Streamlit app from main.py and play with this
3. Iβm interested to hear about what youβre going to build!
Happy hacking! If you have any questions, feel free to get in touch via [email protected].
Enjoyed This Story?
Thanks for getting to the end of this article. My name is Tim, I work at the intersection of AI, business, and biology. I love to elaborate ML concepts or write about business(VC or micro)! Get in touch!
Subscribe for free to get notified when I publish a new story.
Get an email whenever Tim Cvetko publishes.
Get an email whenever Tim Cvetko publishes. By signing up, you will create a Medium account if you don't already haveβ¦
timc102.medium.com
References
- https://pub.towardsai.net/using-claude-3-to-transform-a-video-tutorial-in-a-blog-post-d2c1e04e7a7b
- https://github.com/Timothy102/youtube-to-blog/blob/main/blogpost.md
- https://www.youtube.com/watch?v=RKRJ3-PT3jA
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI