How to use AI to magically add B-roll to your videos: a step-by-step guide

With the magic of AI, now anyone can add B-roll to their videos—no money or special skills required. A step-by-step guide for how to do it.

Nearly every great thing that’s happened to Capsule—our largest volume of inbound leads, press coverage, raising a seed round—has come as a result of the videos we’ve made.

One key component that’s made these videos perform so well? Engaging, relevant B-roll.

Here you’ll learn what B-roll is, why professionals use B-roll to make videos more engaging, and how now, with the magic of AI, you can add it to your own videos without any skills, time, or money.

1. Videos are great. They’re even better with B-roll.

Video is working wonders for content teams—and not just our team.

Video is the most-used format for marketers, and it also has the highest marketing ROI of any other format by far.

But many content teams are still publishing either:
📈👎 Lots of videos, but they’re unengaging or low-quality
🌟🙅 High-quality videos, but they don’t have the resources to make as many as they need

We’ll first cover how adding B-roll can address the first issue (making videos more engaging), and then how using AI can help with the second issue (scaling higher-quality production).

What B-roll is and why video editors use it

When you're telling a story with video, you’re not relying on the precision of a single image to tell that story like you would with a still image; you're stitching together hundreds or hundreds of thousands of images over a period of time. And as you're telling that linear story, bringing in other assets helps support that narrative.

B-roll refers to any supplemental photos or videos that are cut in or layered on top of the primary footage.

Using B-roll harnesses that magical force that makes video the ultimate format: combining our auditory brain (the words spoken) with our visual brain (the B-roll) to leave a stronger impression than each component could on its own.

Benefits of using B-roll in your videos:

  • Increases engagement
  • Boosts memory or information retention
  • Improves pacing
  • Provides context
  • Smooths out transitions

Research shows that human attention spans have been decreasing. At the same time, our capacity for long attention spans is also increasing through engaging storytelling (👋 hi, binge-watching 6 hours of Succession).

B-roll is a simple way to keep viewers’ short and long attention spans: it adds variety to short-form videos, and it also adds layers of meaning in longer-form videos to create more resonant stories.

Video examples of engaging B-roll

What does B-roll look like in action? Here are a few different types of videos that benefit from adding B-roll:

News & Media

Bloomberg posts news updates to its YouTube Shorts channel and relies on B-roll to help add more meaning to a story or establish its setting.

This video about gold pricing uses a combination of stock footage, text, and custom charts to visually support the script and tell a more engaging story.

A video about Singapore’s soaring rents uses B-roll footage of Singapore to give the story more context and depth.

Marketing

B-roll is the perfect (and necessary) addition for a video that's promoting or recapping an event. Here, Capsule customer Twilio uses only B-roll from previous events, plus some text and motion graphics, to promote an upcoming event series.

Adding B-roll to podcast video clips is a simple way to add some visual variation to an otherwise static talking head (where the subject is talking directly into the camera).

2. AI helps you add B-roll quickly and for free

While B-roll clearly adds a layer of depth to videos, most content teams can’t keep up with the demand for producing more engaging and professional-looking videos.

Marketers often run into the same three constraints of not having enough:

  1. 💰 Money
  2. 🌟 Skill
  3. 🕔 Time

Traditional challenges of sourcing B-roll

No matter which method you use for sourcing B-roll, the process has always been time-consuming.

The 3 traditional methods for sourcing B-roll:

  • Shoot it yourself
  • Hire a videographer or editor to shoot it
  • Search and pay for stock footage

Here’s how those benefits break down when you’re trying to make videos that are cost-efficient, high-quality, and quick to produce:

table outlining the pros and cons of different B-roll sourcing methods. using AI to source B-roll is economical, professional, and fast.
Traditional methods of sourcing B-roll usually solve for 1-2 constraints amongst making videos fast, good, and cheap. AI-generated B-roll solves for all three.

Step-by-step process for using AI to generate B-roll

Thanks to the explosion in the popularity of generative AI tools like OpenAI’s ChatGPT and Midjourney, content teams can generate an entire new class of assets for their videos quickly and cheaply.

Below is a step-by-step process of how you can stitch these AI tools together to generate completely customized B-roll imagery.

1. Identify your B-roll needs

Choose the moments, phrases, or stories in your video that you want highlighted with B-roll. Make sure these moments are spaced out in a way that will keep your viewers engaged throughout the video.

2. Generate an image prompt with ChatGPT

Open ChatGPT and ask it to create a prompt for your B-roll image. Here’s an example you can use as a template:

You're a video editor in the process of editing a 40-second video. The video is for content creators and it's about “the importance of storytelling and using AI to tell more stories” and has an “energetic” tone to it.  Can you suggest a prompt to generate a B-roll image that supports this dialogue:
“The only thing that matters is whether or not you have a great story to tell.”

3. Generate an image with Midjourney

Open Midjourney (or another AI image generator like Stable Diffusion or DreamStudio) and copy and paste the prompt from ChatGPT. Keep tweaking the prompt until you get an image you’re happy with.

4. Add your image into your video

Using your video editing software of choice, add your image in at the appropriate time. For static images, it’s best to add in some movement effects as well.

You can recreate the “Ken Burns” effect, which adds panning and zooming to a still image, by adding keyframes to the Zoom and Position effects.

How Capsule’s AI Studio generates B-roll in seconds

The technology outlined above is exciting and fun to play with, but the process can still be fairly complex. Capsule has automated all of those steps so that you can achieve the same outcome in seconds.

Here’s how Capsule uses three bits of technology to instantly add custom B-roll to your video:

  1. An Automatic Speed Recognition (ASR) model, also known as Speech to Text (STT), transcribes the audio from your video into text.
  2. A diffusion model generates images from selected parts of that text.
  3. Capsule's video markup language (CapsuleScript) automates editing the video, stitching your B-roll into the video.

Now, all you have to do is upload your video into Capsule, highlight the text from the auto-generated transcript that you want to create an image for, tweak the prompt, click Use image, and your completely custom image appears in your video.

The process looks essentially like this:

icons that represent the process of how Capsule automates B-roll with AI. dialogue gets turned into a transcript, which gets turned into an AI-generated image, which gets stitched into the video.
Capsule’s AI Studio takes information from the dialogue of your video, then transcribes it into text to generate prompts for B-roll footage—all to produce a more engaging and layered video.

To see how Capsule’s AI Studio automates this whole process, watch the demo below:

Limitations of AI-generated B-roll

While generative AI is excellent for visualizing abstract concepts and adding depth to videos, it won’t solve every type of video’s B-roll needs.

Here are a few examples where generative AI may not be necessary or helpful:

Highlighting specific people, places, or things

Certain news coverage and stories about historical events (including documentaries) should use archival images and video when they’re available.

Likewise, if you’re covering or recapping an event, you’ll want actual footage from that event: the setting, the people there, the presentations, the food.

Here’s how you could add that type of B-roll instead:

  1. Shoot the footage yourself or hire a professional
  2. Use Capsule Collect to crowdsource your B-roll footage. Simply send attendees a link where they can upload or record their videos.

In general, hyper-specific references won’t gain much from AI-generated B-roll. But an abstract or generalized concept can benefit from layering in an AI-generated image.

3. AI removes technical barriers so content teams can tell better stories

Video is the richest, most contextual, and engaging format for storytelling—and platforms will continue to prioritize it. But video is also the most challenging format to create.

With AI, content teams can now bypass hours of tedious, manual work and focus on what really matters: telling a great story.

screenshot of an email to capsule CEO Champ Bennett that says: "Hi Champ - I've been waiting for-fucking-ever for this. Our head of media will play around on it tomorrow. Stoked you guts [sic] exist! We have huuuuuundreds of hours of video content that needs generated images/broll, and our 4 person media team can't keep up."
Most content teams have more video demand than they can keep up with. Using AI to eliminate tedious tasks of video editing, like B-roll sourcing, helps teams create engaging videos faster.

Our mission at Capsule is to allow anyone to create professional-looking video without any expertise, and we’re excited about how AI will continue to eliminate those barriers for non-professionals.

To get access to our AI-assisted video editor that automates B-roll sourcing and editing, join the waitlist here.

Authored by

Table of Contents

Nearly every great thing that’s happened to Capsule—our largest volume of inbound leads, press coverage, raising a seed round—has come as a result of the videos we’ve made.

One key component that’s made these videos perform so well? Engaging, relevant B-roll.

Here you’ll learn what B-roll is, why professionals use B-roll to make videos more engaging, and how now, with the magic of AI, you can add it to your own videos without any skills, time, or money.

1. Videos are great. They’re even better with B-roll.

Video is working wonders for content teams—and not just our team.

Video is the most-used format for marketers, and it also has the highest marketing ROI of any other format by far.

But many content teams are still publishing either:
📈👎 Lots of videos, but they’re unengaging or low-quality
🌟🙅 High-quality videos, but they don’t have the resources to make as many as they need

We’ll first cover how adding B-roll can address the first issue (making videos more engaging), and then how using AI can help with the second issue (scaling higher-quality production).

What B-roll is and why video editors use it

When you're telling a story with video, you’re not relying on the precision of a single image to tell that story like you would with a still image; you're stitching together hundreds or hundreds of thousands of images over a period of time. And as you're telling that linear story, bringing in other assets helps support that narrative.

B-roll refers to any supplemental photos or videos that are cut in or layered on top of the primary footage.

Using B-roll harnesses that magical force that makes video the ultimate format: combining our auditory brain (the words spoken) with our visual brain (the B-roll) to leave a stronger impression than each component could on its own.

Benefits of using B-roll in your videos:

  • Increases engagement
  • Boosts memory or information retention
  • Improves pacing
  • Provides context
  • Smooths out transitions

Research shows that human attention spans have been decreasing. At the same time, our capacity for long attention spans is also increasing through engaging storytelling (👋 hi, binge-watching 6 hours of Succession).

B-roll is a simple way to keep viewers’ short and long attention spans: it adds variety to short-form videos, and it also adds layers of meaning in longer-form videos to create more resonant stories.

Video examples of engaging B-roll

What does B-roll look like in action? Here are a few different types of videos that benefit from adding B-roll:

News & Media

Bloomberg posts news updates to its YouTube Shorts channel and relies on B-roll to help add more meaning to a story or establish its setting.

This video about gold pricing uses a combination of stock footage, text, and custom charts to visually support the script and tell a more engaging story.

A video about Singapore’s soaring rents uses B-roll footage of Singapore to give the story more context and depth.

Marketing

B-roll is the perfect (and necessary) addition for a video that's promoting or recapping an event. Here, Capsule customer Twilio uses only B-roll from previous events, plus some text and motion graphics, to promote an upcoming event series.

Adding B-roll to podcast video clips is a simple way to add some visual variation to an otherwise static talking head (where the subject is talking directly into the camera).

2. AI helps you add B-roll quickly and for free

While B-roll clearly adds a layer of depth to videos, most content teams can’t keep up with the demand for producing more engaging and professional-looking videos.

Marketers often run into the same three constraints of not having enough:

  1. 💰 Money
  2. 🌟 Skill
  3. 🕔 Time

Traditional challenges of sourcing B-roll

No matter which method you use for sourcing B-roll, the process has always been time-consuming.

The 3 traditional methods for sourcing B-roll:

  • Shoot it yourself
  • Hire a videographer or editor to shoot it
  • Search and pay for stock footage

Here’s how those benefits break down when you’re trying to make videos that are cost-efficient, high-quality, and quick to produce:

table outlining the pros and cons of different B-roll sourcing methods. using AI to source B-roll is economical, professional, and fast.
Traditional methods of sourcing B-roll usually solve for 1-2 constraints amongst making videos fast, good, and cheap. AI-generated B-roll solves for all three.

Step-by-step process for using AI to generate B-roll

Thanks to the explosion in the popularity of generative AI tools like OpenAI’s ChatGPT and Midjourney, content teams can generate an entire new class of assets for their videos quickly and cheaply.

Below is a step-by-step process of how you can stitch these AI tools together to generate completely customized B-roll imagery.

1. Identify your B-roll needs

Choose the moments, phrases, or stories in your video that you want highlighted with B-roll. Make sure these moments are spaced out in a way that will keep your viewers engaged throughout the video.

2. Generate an image prompt with ChatGPT

Open ChatGPT and ask it to create a prompt for your B-roll image. Here’s an example you can use as a template:

You're a video editor in the process of editing a 40-second video. The video is for content creators and it's about “the importance of storytelling and using AI to tell more stories” and has an “energetic” tone to it.  Can you suggest a prompt to generate a B-roll image that supports this dialogue:
“The only thing that matters is whether or not you have a great story to tell.”

3. Generate an image with Midjourney

Open Midjourney (or another AI image generator like Stable Diffusion or DreamStudio) and copy and paste the prompt from ChatGPT. Keep tweaking the prompt until you get an image you’re happy with.

4. Add your image into your video

Using your video editing software of choice, add your image in at the appropriate time. For static images, it’s best to add in some movement effects as well.

You can recreate the “Ken Burns” effect, which adds panning and zooming to a still image, by adding keyframes to the Zoom and Position effects.

How Capsule’s AI Studio generates B-roll in seconds

The technology outlined above is exciting and fun to play with, but the process can still be fairly complex. Capsule has automated all of those steps so that you can achieve the same outcome in seconds.

Here’s how Capsule uses three bits of technology to instantly add custom B-roll to your video:

  1. An Automatic Speed Recognition (ASR) model, also known as Speech to Text (STT), transcribes the audio from your video into text.
  2. A diffusion model generates images from selected parts of that text.
  3. Capsule's video markup language (CapsuleScript) automates editing the video, stitching your B-roll into the video.

Now, all you have to do is upload your video into Capsule, highlight the text from the auto-generated transcript that you want to create an image for, tweak the prompt, click Use image, and your completely custom image appears in your video.

The process looks essentially like this:

icons that represent the process of how Capsule automates B-roll with AI. dialogue gets turned into a transcript, which gets turned into an AI-generated image, which gets stitched into the video.
Capsule’s AI Studio takes information from the dialogue of your video, then transcribes it into text to generate prompts for B-roll footage—all to produce a more engaging and layered video.

To see how Capsule’s AI Studio automates this whole process, watch the demo below:

Limitations of AI-generated B-roll

While generative AI is excellent for visualizing abstract concepts and adding depth to videos, it won’t solve every type of video’s B-roll needs.

Here are a few examples where generative AI may not be necessary or helpful:

Highlighting specific people, places, or things

Certain news coverage and stories about historical events (including documentaries) should use archival images and video when they’re available.

Likewise, if you’re covering or recapping an event, you’ll want actual footage from that event: the setting, the people there, the presentations, the food.

Here’s how you could add that type of B-roll instead:

  1. Shoot the footage yourself or hire a professional
  2. Use Capsule Collect to crowdsource your B-roll footage. Simply send attendees a link where they can upload or record their videos.

In general, hyper-specific references won’t gain much from AI-generated B-roll. But an abstract or generalized concept can benefit from layering in an AI-generated image.

3. AI removes technical barriers so content teams can tell better stories

Video is the richest, most contextual, and engaging format for storytelling—and platforms will continue to prioritize it. But video is also the most challenging format to create.

With AI, content teams can now bypass hours of tedious, manual work and focus on what really matters: telling a great story.

screenshot of an email to capsule CEO Champ Bennett that says: "Hi Champ - I've been waiting for-fucking-ever for this. Our head of media will play around on it tomorrow. Stoked you guts [sic] exist! We have huuuuuundreds of hours of video content that needs generated images/broll, and our 4 person media team can't keep up."
Most content teams have more video demand than they can keep up with. Using AI to eliminate tedious tasks of video editing, like B-roll sourcing, helps teams create engaging videos faster.

Our mission at Capsule is to allow anyone to create professional-looking video without any expertise, and we’re excited about how AI will continue to eliminate those barriers for non-professionals.

To get access to our AI-assisted video editor that automates B-roll sourcing and editing, join the waitlist here.

AI Studio

Add B-roll to videos in seconds with AI.

Get Access