Master LLMs with our FREE course in collaboration with Activeloop & Intel Disruptor Initiative. Join now!

Publication

Turn any YT Video into a Content-Making Machine with Claude-3 and Lightning AI
Latest   Machine Learning

Turn any YT Video into a Content-Making Machine with Claude-3 and Lightning AI

Last Updated on May 9, 2024 by Editorial Team

Author(s): Tim Cvetko

Originally published on Towards AI.

On Feb 20, Karpathy released an extensive YouTube video building the GPT-3 Tokenizer from scratch.

Example of a 2h13 video converted in a blog post (featuring screenshots and code) with Claude 3 Opus

On Mar 4, Emmanuel Ameisen and Erik Schluntz underwent Karpathy’s challenge to convert video into a blog with LLMs and used Claude-3 to create a blog out of the very same GPT-3 Tokenizer video.

-> Find the blog here.

Link to post on X

Today, I’m Taking this 1 Step Further.

I am partnering with Lightning AI to help people create content-making machines from any YT video via Lightning Studios!

Go! Create Your Own Below ↓

Create any Video-to-Text LLM with Claude-3 – a Lightning Studio by cvetkotim

This studio had been created to transcribe Youtube videos into generated blog content with Claude-3. Duplicate to…

lightning.ai

Here’s What You’ll Learn in This Article:

  1. How to Implement PyTube and Claude-3 to Extract any YouTube Video into Text — Full Code Implementation
  2. How you can Build ANY Voice-to-Text LLM Product Seamlessly and Pump Up Limitless Content

➡ If you’d like to convert YT videos into full-scope content automatically, fill out the form here

P1 : Aight, show me how this thing works!

This notebook provides a baseline to reproduce Claude-3's solution to Karpathy’s challenge of converting a video tutorial in a blog post.

Video to Blog Workflow

Okay, here’s how this works step by step:

  1. Get an API Key for Claude-3 and Init Anthropic Client
  2. Download YouTube video and Transcript
  3. Init Whisper Model for Speech-to-Text
  4. Chop the Video in Text + Screenshot Pairs
  5. Apply Claude-3 to Fill Out the Blog

Step 1: Get an API Key for Claude-3 and Init Anthropic Client

Get an API key from Anthropic AI's official site to run the demo using any of the three available models.

Image by Author: Init w/ API key

Step 2: Download Youtube Video & Transcript

Using YT’s pytube library in Python, we will 1st download the video along with the audio stream which we will require for Whisper later on.

Image by Author: Download Youtube Video & Transcript

Step 3: Init Whisper Model for Speech-to-Text

We import the WhisperModel from from faster_whisper import WhisperModel to transcribe the contents of our YT video into text segments.

Init Whisper Model for Speech-to-Text

Step 4: Chop the Video in Text + Screenshot Pairs

Here’s the not-so-fun part. We need to load video and text separately from pytube + Whisper. To do that, we should divide the video — can be accomplished best through video chapters.

Image by Author: Divide video by chapters

Step 5: Apply Claude-3 to Fill Out the Blog

Ha! Cool. Now, we can divide any YT video into chapters that include both video and text and we can apply LLM semantics to them. Let’s first take a look at the prompt for the Claude-3 model.

prompt_instructions = f"""
<instructions>
You have been given images of a video at different timestamps, followed by the audio transcript in <transcript>
The transcript was generated by an AI speech recognition tool and may contain some errors/infelicities.
Your task is to transform the transcript into a markdown blog post.
This transcript is noisy. Please rewrite it using the following guidelines:
- output valid markdown
- insert section headings and other formatting where appropriate
- you are given only part of a transcript, so do not include introductory or concluding paragraphs. Only include the main topics discussed in the transcript
- use styling to make images, text, code, callouts and the page layout and margins look like a typical blog post or textbook
- remove any verbal tics
- if there are redundant pieces of information, only present it once
- keep the conversational content in the style of the transcript. Including headings to make the narrative structure easier to follow along
- the transcript includes too many images, so you should only include the most important 1-2 images in your output
- choose images that provide illustrations that are relevant to the transcript
- prefer to include images which display complete code, rather than in progress
- when relevant transcribe important pieces of code and other valuable text
- if an image would help illustrate a part of a transcript, include it
- to include an image, insert a tag with <img src="xxxxx.jpg"/> where xxxxx is replaced by the exact image timestamp inserted above the image data
- do not add any extraneous information: only include what is either mentioned in the transcript or the images

Your final output should be suitable for inclusion in a textbook.
</instructions>
"""

Aaaand, here’s the final iteration applied to each of the chunks + markdown.

Image by Author: Final For Loop on Chunks

… which leaves us with a blog post similar to the one below. That’s mighty cool!

Image by Author: just a screenshot!

Build any Video-to-Text LLM Product

Okay, okay. Having code that can:

  1. Take any YT video as Input
  2. Create a List of Chapters
  3. Apply Claude-3 to create Markdown

… is pretty awesome. And it’s also automatic.

So, how do you go from this to building any video-to-text LLM product seamlessly? Easy! You change the prompt.

If Blog, why NOT Content? Here’s HOW:

We’re going to use prompt engineering to auto-create any piece of content with AI. This is our initial prompt to creating a blog.

 prompt_instructions = f"""
<instructions>
You have been given images of a video at different timestamps, followed by the audio transcript in <transcript>
The transcript was generated by an AI speech recognition tool and may contain some errors/infelicities.
Your task is to transform the transcript into a markdown blog post.
This transcript is quite noisy. Your job is to create valid Twitter/LinkedIn posts, no longer than 200 characters. Short rapid sentences and learnings.
- output valid markdown
- insert section headings and other formatting where appropriate
- you are given only part of a transcript, so do not include introductory or concluding paragraphs. Only include the main topics discussed in the transcript
- use styling to make images, text, code, callouts and the page layout and margins look like a typical blog post or textbook
- remove any verbal tics
- if there are redundant pieces of information, only present it once
- keep the conversational content in the style of the transcript. Including headings to make the narrative structure easier to follow along
- choose images that provide illustrations that are relevant to the transcript
- prefer to include images which display complete code, rather than in progress
- when relevant transcribe important pieces of code and other valuable text
- do not add any extraneous information: only include what is either mentioned in the transcript or the images

Your final output should be suitable for inclusion in a textbook.
</instructions>
"""

Let’s tune this baby to spill out content directly from a YouTube video.

 prompt_instructions = f"""
<instructions>
You have been given images of a video at different timestamps, followed by the audio transcript in <transcript>
The transcript was generated by an AI speech recognition tool and may contain some errors/infelicities.
This transcript is quite noisy. Your job is to create valid Twitter/LinkedIn posts, no longer than 200 characters. Short rapid sentences and learnings.
- output valid markdown
- insert section headings and other formatting where appropriate
- you are given only part of a transcript, so do not include introductory or concluding paragraphs. Only include the main topics discussed in the transcript
- use styling to make images, text, code, callouts and the page layout and margins look like a typical blog post or textbook
- remove any verbal tics
- if there are redundant pieces of information, only present it once
- keep the conversational content in the style of the transcript. Including headings to make the narrative structure easier to follow along
- choose images that provide illustrations that are relevant to the transcript
- prefer to include images which display complete code, rather than in progress
- when relevant transcribe important pieces of code and other valuable text
- do not add any extraneous information: only include what is either mentioned in the transcript or the images

Your final output should be suitable for inclusion in a textbook.
</instructions>
"""

For example, I inserted this video from 20VC:

aaand got the following: (205 lines of pure wisdom)!

-> Check here

Image by Author: blogpost.md created by Claude-3

To Conclude …

  1. I’ve shown you how you can take any YouTube video as input, do a lil’ processing, and apply Claude-3(or any other LLM for that manner) to create content.
  2. Thanks to Lightning AI, you can do so by:
  • Getting an API Key for Claude-3
  • Clicking Open in Studio
  • Running a Streamlit app from main.py and play with this

3. I’m interested to hear about what you’re going to build!

➡ If you’d like to convert YT videos into full-scope content automatically, fill out the form here

Happy hacking! If you have any questions, feel free to get in touch via [email protected].

Enjoyed This Story?

Thanks for getting to the end of this article. My name is Tim, I work at the intersection of AI, business, and biology. I love to elaborate ML concepts or write about business(VC or micro)! Get in touch!

Subscribe for free to get notified when I publish a new story.

Get an email whenever Tim Cvetko publishes.

Get an email whenever Tim Cvetko publishes. By signing up, you will create a Medium account if you don't already have…

timc102.medium.com

References

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓