How to Edit Podcasts with Multiple Speakers—Fast

For podcast editors, messy audio can be the norm. Guests talk over each other. Room noise creeps in. And while sometimes these moments of noisy audio make for the best content, they can also be the hardest to edit. Editing soundbites, promo content, or short clips for different platforms from moments of overlapping sounds quickly becomes tedious.
Sound separation is a new audio technology available to podcast editors to help alleviate these editing pains. It allows users to take any piece of audio, from field recordings to improperly mic’d interviews, where all audio is caught on a single mixed track, and separate them into different components, including dialogue, music, effects, and multiple speakers.
Here are some ways sound separation is used to improve podcast editing:
Remove or clean up background noise
Unpredictable recording environments lead to unpredictable audio. When background noise distracts or muddles what is being said, AudioShake’s dialogue isolation technology allows editors to lower or remove background noise.
For example, imagine you're interviewing someone on the street, and an ambulance siren blares as they speak. Creating a dialogue stem with AudioShake will isolate what is being said and remove all background noise. But what if later, the speaker brings up the ambulance from earlier? Now, you need to remove some, but not all, of that noise in the edit. With AudioShake, you can isolate the dialogue and background stems and lower the background stem so the siren is heard and acknowledged, but not drowning out the content of what is being said.
Separate overlapping speakers
AudioShake’s multi-speaker separation separates a single audio file into individual speaker stems. Whether you have two people talking on a mixed track, or a single speaker gets drowned out by the dialogue in the surrounding environment, multi-speaker gives you clean, isolated dialogue tracks. This allows you to remove crosstalk and interruptions, balance volume levels of one person, or fix the timing of another.
Improve dialogue clarity for transcriptions
Cleaner dialogue tracks for individual speakers can also help with caption accuracy. Rather than transcribing the conversation of two people on one mixed track, distinct tracks allow ASR platforms and other transcription software to ingest one speaker at a time. AudioShake customers have reported that our stem separation has improved their captioning by over 25%.
Build voice clones of podcast speakers
Clean speaker tracks can also help podcast editors and creators take advantage of new generation audio tools like voice cloning and voice synthesis. These have become helpful when editors need to add content to a podcast, but don’t have the time or budget to re-record. Voice clones trained on stems generated by AudioShake’s dialogue and multi-speaker separation technologies have helped train voice clones for fast-growing podcasts like “Fight On with Jake Shields” and for communication tools for patients with ALS.
Edit generative AI podcast content
Generative AI platforms like NotebookLM now make it possible for creators to generate podcast-like content from text-based queries. Though these platforms open up new possibilities and avenues for podcast creation, most often their outputs are single tracks where all speakers are mixed together. AudioShake allows users to separate each speaker into their distinct track so editors have more control. Wondercraft, an AI studio for podcast and content creators, integrated AudioShake to do just that–allowing its users to have more control and flexibility in the studio when editing AI generated podcasts.
As podcast creation becomes faster, more distributed, and increasingly AI-assisted, sound separation has become an essential part of the modern editor’s toolkit. Whether you're refining a field recording, building accessibility features, or adapting content across platforms, AudioShake helps you isolate what matters. Cleaner stems mean more flexibility, less friction, and higher-quality content—no matter how messy the original recording.