Capsule 3.0 is here!
Check us out on Product Hunt

Generative AI is the most exciting creative tool invented in my lifetime.

Generative AI enables anyone to create using only their imagination. Try out Capsule's first generative AI experiment that brings videos to life with words.

November 8, 2022

The digitization era

My obsession with creative tools began at age 16.

I grew up playing drums and guitar and started writing songs as a teenager. Naturally I wanted to share the music I was making, but in the early 90’s that was impossible unless you had thousands of dollars to spend on booking studio time.

Then in 1996, Roland announced the VS-880 as the first affordable, self-contained digital recording system that had all the capabilities of a professional recording studio. To this day, I don’t think I’ve lusted after a piece of technology as much as I did the VS-880. I worked and saved up for an entire summer to afford the $1300 price tag. It was worth every penny. In fact, it changed my life.

24 years later: My Roland VS-880

Earlier this year I spotted the Roland VS-1880 (the 18-track version of the VS-880) in the Kanye West documentary, Jeen-yuhs. It turns out Kanye and I were recording our first albums around the same time, on the same cheap equipment, from similar makeshift studios in our bedrooms.

His album was called The College Dropout. (It did a little better than mine. Like a few million copies and multiple Grammys better.)

Kayne West recording on the Roland VS-1880

Throughout the documentary I learned how much Kanye struggled to get attention from labels early in his career. Record companies that funded studio time for artists ignored him entirely, leaving him to figure out how to record his music with few resources.

It occurred to me that without access to more affordable hardware like the VS-880, it’s possible that The College Dropout — which went on to be named one of Rolling Stone’s Greatest Albums of All Time — would have never been made.

The digitization of media lead to a wave of affordable software and hardware that lowered the friction to creativity. Along with it, a generation of stories were told that might not have been otherwise.

Today you can buy a VS-880 on eBay for $50.

(note: hopefully it goes without saying that a reference to an earlier part of Kanye's career isn’t an endorsement of his recent statements in the media).

The iPhone era

The obsession with creative tools in my youth lead to a 20-year career in tech as an engineer, product designer and founder — much of which has been focused on using technology to eliminate friction from the creative process.

When the iPhone came out in 2007, I naturally became obsessed with the idea that cameras and microphones — two of the most powerful tools for creative expression — would become ubiquitous. Even more exciting to me was that they’d be networked together with virtually everyone in the world, which meant distributing those creative ideas could become instant and free.

In 2009, I went all in. I left my cushy job, taught myself objective-c, and started making apps for the iPhone. Little did I know at the time that I would spend the next decade working alongside an incredible cast of engineers and designers building creative tools and formats that would go on to be used by over a billion people.

This used to cost tens of thousands of dollars and years of training!

Today my niece and nephew (12 and 8 years old, respectively) along with many of their friends, are expressing their creative ideas by publishing videos to YouTube every week. All shot and edited on their iPhones. These same videos would have required years of professional training and at least $10K worth of software and hardware when I was their age.

The iPhone is to them what the VS-880 was to me.  Because it exists, a generation of stories are being told that might not have been otherwise.

The generative AI era

Through the lens of my own experiences, I’m constantly on the lookout for new technology that could further democratize creativity and usher in the next wave of storytelling.

Few things have grabbed my attention like generative AI.

What is it? Generative AI refers to machine learning algorithms that enable computers to use existing content like text, audio, video, images, and even code to automate the creation of entirely new pieces of content.

In simpler words, the promise of generative AI is it lets anyone create using their imagination, not technical skill.

I’m finding myself once again obsessed by what this technology could enable, and spending as much time as I can learning and building for it (lucky for me my co-founder and CTO, Joseph, spent a lot of time doing AI research in school and throughout his career).

Today, the technology is nascent. And in many ways it’s over-hyped, which makes it very easy to dismiss, or even despise (much like the iPhone in 2007). But I encourage you to play around with it yourself, and form your own opinion. The more time I spend with it, the more I am convinced that the next wave of creative expression will be powered by the work being done today with diffusion models and machine learning.

If that's true, a generation of stories will be told that might not have been otherwise.

Capsule’s first generative AI experiment

As you might imagine, generative AI is a hot topic at Capsule, where our mission is quite literally to democratize creativity with easy-to-use video making tools.

So we asked ourselves, what could we contribute to the conversation around generative AI?  Here’s what we came up with:

For context, “b-roll” is a term used by video professionals that refers to any supplemental photos or videos that are cut in or layered on top of primary footage. It’s a visual technique used to help tell a more engaging story.

Sourcing and editing b-roll into videos is a very tedious and time-consuming job — even for professionals. So we asked ourselves: what if we could use generative AI to automate the process?

Meet G-ROLL.

G-ROLL is generative b-roll. It's video-to-text-to-video and it’s powered by a combination of three very cool bits of technology:

  1. An ASR model that transcribes your video into text
  2. A diffusion model that generates images from selected parts of that text
  3. Capsule's video markup language (CapsuleScript) that automates editing the video

Try G-ROLL for free today

Today we’re launching G-ROLL as a free tool for anyone to use. Like all experiments, we’re not exactly sure what the results will be. But it sure will be fun to see what you make, and get your feedback. Feel free to reach out on Twitter (@madewithcapsule).