Turn any YT Video into a Content-Making Machine with Claude-3 and Lightning AI

Last Updated on May 9, 2024 by Editorial Team

Author(s): Tim Cvetko

Originally published on Towards AI.

On Feb 20, Karpathy released an extensive YouTube video building the GPT-3 Tokenizer from scratch.

Example of a 2h13 video converted in a blog post (featuring screenshots and code) with Claude 3 Opus

On Mar 4, Emmanuel Ameisen and Erik Schluntz underwent Karpathy’s challenge to convert video into a blog with LLMs and used Claude-3 to create a blog out of the very same GPT-3 Tokenizer video.

-> Find the blog here.

Today, I’m Taking this 1 Step Further.

I am partnering with Lightning AI to help people create content-making machines from any YT video via Lightning Studios!

Go! Create Your Own Below ↓

Create any Video-to-Text LLM with Claude-3 – a Lightning Studio by cvetkotim

This studio had been created to transcribe Youtube videos into generated blog content with Claude-3. Duplicate to…

lightning.ai

Here’s What You’ll Learn in This Article:

How to Implement PyTube and Claude-3 to Extract any YouTube Video into Text — Full Code Implementation
How you can Build ANY Voice-to-Text LLM Product Seamlessly and Pump Up Limitless Content

➡ If you’d like to convert YT videos into full-scope content automatically, fill out the form here ⬅

P1 : Aight, show me how this thing works!

This notebook provides a baseline to reproduce Claude-3's solution to Karpathy’s challenge of converting a video tutorial in a blog post.

Okay, here’s how this works step by step:

Get an API Key for Claude-3 and Init Anthropic Client
Download YouTube video and Transcript
Init Whisper Model for Speech-to-Text
Chop the Video in Text + Screenshot Pairs
Apply Claude-3 to Fill Out the Blog

Step 1: Get an API Key for Claude-3 and Init Anthropic Client

Get an API key from Anthropic AI's official site to run the demo using any of the three available models.

Step 2: Download Youtube Video & Transcript

Using YT’s pytube library in Python, we will 1st download the video along with the audio stream which we will require for Whisper later on.

Image by Author: Download Youtube Video & Transcript

Step 3: Init Whisper Model for Speech-to-Text

We import the WhisperModel from from faster_whisper import WhisperModel to transcribe the contents of our YT video into text segments.

Step 4: Chop the Video in Text + Screenshot Pairs

Here’s the not-so-fun part. We need to load video and text separately from pytube + Whisper. To do that, we should divide the video — can be accomplished best through video chapters.

Image by Author: Divide video by chapters

Step 5: Apply Claude-3 to Fill Out the Blog

Ha! Cool. Now, we can divide any YT video into chapters that include both video and text and we can apply LLM semantics to them. Let’s first take a look at the prompt for the Claude-3 model.

prompt_instructions = f"""
<instructions>
You have been given images of a video at different timestamps, followed by the audio transcript in <transcript>
The transcript was generated by an AI speech recognition tool and may contain some errors/infelicities.
Your task is to transform the transcript into a markdown blog post.
This transcript is noisy. Please rewrite it using the following guidelines:
- output valid markdown
- insert section headings and other formatting where appropriate
- you are given only part of a transcript, so do not include introductory or concluding paragraphs. Only include the main topics discussed in the transcript
- use styling to make images, text, code, callouts and the page layout and margins look like a typical blog post or textbook
- remove any verbal tics
- if there are redundant pieces of information, only present it once
- keep the conversational content in the style of the transcript. Including headings to make the narrative structure easier to follow along
- the transcript includes too many images, so you should only include the most important 1-2 images in your output
- choose images that provide illustrations that are relevant to the transcript
- prefer to include images which display complete code, rather than in progress
- when relevant transcribe important pieces of code and other valuable text
- if an image would help illustrate a part of a transcript, include it
- to include an image, insert a tag with <img src="xxxxx.jpg"/> where xxxxx is replaced by the exact image timestamp inserted above the image data
- do not add any extraneous information: only include what is either mentioned in the transcript or the images

Your final output should be suitable for inclusion in a textbook.
</instructions>
"""

Aaaand, here’s the final iteration applied to each of the chunks + markdown.

Image by Author: Final For Loop on Chunks

… which leaves us with a blog post similar to the one below. That’s mighty cool!

Build any Video-to-Text LLM Product

Okay, okay. Having code that can:

Take any YT video as Input
Create a List of Chapters
Apply Claude-3 to create Markdown

… is pretty awesome. And it’s also automatic.

So, how do you go from this to building any video-to-text LLM product seamlessly? Easy! You change the prompt.

If Blog, why NOT Content? Here’s HOW:

We’re going to use prompt engineering to auto-create any piece of content with AI. This is our initial prompt to creating a blog.

prompt_instructions = f"""
<instructions>
You have been given images of a video at different timestamps, followed by the audio transcript in <transcript>
The transcript was generated by an AI speech recognition tool and may contain some errors/infelicities.
Your task is to transform the transcript into a markdown blog post.
This transcript is quite noisy. Your job is to create valid Twitter/LinkedIn posts, no longer than 200 characters. Short rapid sentences and learnings.
- output valid markdown
- insert section headings and other formatting where appropriate
- you are given only part of a transcript, so do not include introductory or concluding paragraphs. Only include the main topics discussed in the transcript
- use styling to make images, text, code, callouts and the page layout and margins look like a typical blog post or textbook
- remove any verbal tics
- if there are redundant pieces of information, only present it once
- keep the conversational content in the style of the transcript. Including headings to make the narrative structure easier to follow along
- choose images that provide illustrations that are relevant to the transcript
- prefer to include images which display complete code, rather than in progress
- when relevant transcribe important pieces of code and other valuable text
- do not add any extraneous information: only include what is either mentioned in the transcript or the images

Your final output should be suitable for inclusion in a textbook.
</instructions>
"""

Let’s tune this baby to spill out content directly from a YouTube video.

 prompt_instructions = f"""
 <instructions>
 You have been given images of a video at different timestamps, followed by the audio transcript in <transcript>
 The transcript was generated by an AI speech recognition tool and may contain some errors/infelicities.
 This transcript is quite noisy. Your job is to create valid Twitter/LinkedIn posts, no longer than 200 characters. Short rapid sentences and learnings.
 - output valid markdown
 - insert section headings and other formatting where appropriate
 - you are given only part of a transcript, so do not include introductory or concluding paragraphs. Only include the main topics discussed in the transcript
 - use styling to make images, text, code, callouts and the page layout and margins look like a typical blog post or textbook
 - remove any verbal tics
 - if there are redundant pieces of information, only present it once
 - keep the conversational content in the style of the transcript. Including headings to make the narrative structure easier to follow along
 - choose images that provide illustrations that are relevant to the transcript
 - prefer to include images which display complete code, rather than in progress
 - when relevant transcribe important pieces of code and other valuable text
 - do not add any extraneous information: only include what is either mentioned in the transcript or the images

 Your final output should be suitable for inclusion in a textbook.
 </instructions>
 """

For example, I inserted this video from 20VC:

aaand got the following: (205 lines of pure wisdom)!

-> Check here

Image by Author: blogpost.md created by Claude-3

To Conclude …

I’ve shown you how you can take any YouTube video as input, do a lil’ processing, and apply Claude-3(or any other LLM for that manner) to create content.
Thanks to Lightning AI, you can do so by:

Getting an API Key for Claude-3
Clicking Open in Studio
Running a Streamlit app from main.py and play with this

3. I’m interested to hear about what you’re going to build!

➡ If you’d like to convert YT videos into full-scope content automatically, fill out the form here ⬅

Happy hacking! If you have any questions, feel free to get in touch via [email protected].

Enjoyed This Story?

Thanks for getting to the end of this article. My name is Tim, I work at the intersection of AI, business, and biology. I love to elaborate ML concepts or write about business(VC or micro)! Get in touch!

Subscribe for free to get notified when I publish a new story.

Get an email whenever Tim Cvetko publishes.

Get an email whenever Tim Cvetko publishes. By signing up, you will create a Medium account if you don't already have…

timc102.medium.com

References

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Turn any YT Video into a Content-Making Machine with Claude-3 and Lightning AI

Author(s): Tim Cvetko

Today, I’m Taking this 1 Step Further.

Create any Video-to-Text LLM with Claude-3 – a Lightning Studio by cvetkotim

This studio had been created to transcribe Youtube videos into generated blog content with Claude-3. Duplicate to…

Here’s What You’ll Learn in This Article:

P1 : Aight, show me how this thing works!

Step 1: Get an API Key for Claude-3 and Init Anthropic Client

Step 2: Download Youtube Video & Transcript

Step 3: Init Whisper Model for Speech-to-Text

Step 4: Chop the Video in Text + Screenshot Pairs

Step 5: Apply Claude-3 to Fill Out the Blog

Build any Video-to-Text LLM Product

If Blog, why NOT Content? Here’s HOW:

To Conclude …

Enjoyed This Story?

Get an email whenever Tim Cvetko publishes.

Get an email whenever Tim Cvetko publishes. By signing up, you will create a Medium account if you don't already have…

References

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

The Fundamental Mathematics of Machine Learning

Built-In AI Web APIs Will Enable A New Generation Of AI Startups

Auditing Predictive A.I. Models for Bias and Fairness

Why is Llama 3.1 Such a Big deal?

5 AI Real-World Projects To Set Foot in The Door

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Turn any YT Video into a Content-Making Machine with Claude-3 and Lightning AI

Author(s): Tim Cvetko

Today, I’m Taking this 1 Step Further.

Create any Video-to-Text LLM with Claude-3 – a Lightning Studio by cvetkotim

This studio had been created to transcribe Youtube videos into generated blog content with Claude-3. Duplicate to…

Here’s What You’ll Learn in This Article:

P1 : Aight, show me how this thing works!

Step 1: Get an API Key for Claude-3 and Init Anthropic Client

Step 2: Download Youtube Video & Transcript

Step 3: Init Whisper Model for Speech-to-Text

Step 4: Chop the Video in Text + Screenshot Pairs

Step 5: Apply Claude-3 to Fill Out the Blog

Build any Video-to-Text LLM Product

If Blog, why NOT Content? Here’s HOW:

To Conclude …

Enjoyed This Story?

Get an email whenever Tim Cvetko publishes.

Get an email whenever Tim Cvetko publishes. By signing up, you will create a Medium account if you don't already have…

References

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement