Text to Sound Effects API

We are excited to introduce the Text to Sound Effects API.

Our Sound effects API model enables everyone to build with fully custom AI sound effects. It's charged at 100 characters per generation when using auto-generation or 25 characters per second generated when setting a duration.

Tutorial: elevenlabs.io/docs/api-referen...

To showcase it - we've built the first Video to Sound Effects app. This app is available for free online and fully open-source.

We built our Video to Sound Effects app in under a day (it still has some rough edges). Try out the Video to Sounds Effects app for yourself at videotosoundeffects.com. The code is open source on GitHub: github.com/elevenlabs/elevenla...

Here's how it works:

It extracts 4 frames from the video at 1 second intervals, all client side.
It then sends the frames and a prompt to GPT-4O to create the custom text to sound effects prompt.
The prompt is used to create a sound effect with the ElevenLabs text-to-sound-effects API.
The video and audio are combined client side with ffmpeg into a single file to download.