Add real-time audio separation to any pipeline

The AudioShake SDK integrates streaming-capable, real-time sound separation right into your app or self-deployed Enterprise service. AudioShake's stem separation SDK enables you to separate vocals, isolate instruments, remove music, and clean speech, in real-time, with industry-leading quality.

Available on iOS/MacOS, Android, Windows, and Linux platforms, with local inference times optimized for each. You’ll always get the best speed wherever you deploy. 

What is a stem separation SDK?

Real-time separation processes live audio streams as they happen — isolating dialogue, vocals, or instruments instantly, without sending files to the cloud or waiting for post-production.
AudioShake's stem separation SDK runs models locally on-device, making clean, separated audio available before it reaches the next stage of your pipeline. For dubbing and captioning, dialogue is isolated from crowd noise or background music the moment it's captured. In broadcast, music can be removed from streams to ensure rights compliance. For speech workflows, developers can turn messy, real-world audio into clean, structured inputs for ASR and LLM systems.  And for music apps, stem-level control lets users interact and mix tracks in real time.
On-device, no cloud processing
11ms dialogue isolation latency
Up to 200x real-time inference
01

Separation models across voice, film, TV, and music

AudioShake's SDK gives developers access to real-time music removal, dialogue separation, and instrument stem isolation – all with low latency performance, on-device, across iOS, Android, Windows, and Linux.
DIALOGUE
Dialogue Isolation
New – 11ms latency
View product page

Isolates spoken dialogue from background sound in real-time streams. Cleans voice inputs before they reach ASR, transcription, translation, or A1 audio engineer workflows — with ~25% improvement in ASR accuracy in noisy audio environment.

Film: “Hidden in Plain Sight” — Gregg Dunham & Mason Frenzel
Dialogue Isolation
0:00
COPYRIGHT
Commercial Music Removal
View product page

Removes copyrighted background music from live or streamed audio while preserving dialogue and effects. Built for broadcasters, sports commentary, and content platforms managing copyright compliance on live feeds.

Film Credits: Jaywalker Music
Commercial Music Removal
0:00
MUSIC
Instrument Stem Separation
View product page

Isolates vocals, instruments, or up to 14 different instrument stems from any song in real time. Used by music apps, learning platforms, and DJ tools to give users stem-level control inside a consumer experience.

02

Built for production workflows

The SDK runs locally on-device, integrates in a few lines of code, and fits into mobile apps, desktop DAWs, live streaming platforms, embedded devices, and high-volume on-premise media processing workflows.
11ms
dialogue isolation latency–first to meet live broadcast threshold
200x
real-time inference speeds
25%
reported ASR accuracy improvement with SDK preprocessing
03

Authentic and scalable speech recovery

VOICE AI
Improve transcription accuracy with speech isolation
Models as small as 9mb 200x realtime
11ms latency
NPU/GPU/CPU runtimes available for real-time performance
Native support for low-res and high-res audio
MUSic
Power music production, mixing, songwriting, and education apps
Up to 14 instrument targets or joint 4-stem, 6-stem, drum kits, vocals available
Up to 250x real-time processing (vocals) with per-platform optimizations
Cross-platform SDKs available: iOS/MacOS, Windows, Linux, Android
BROADCAST
Remove copyright material from your audio

Isolate clean dialogue from crowd noise
Streaming-capable dialogue and music removal models
Models as small as 9mb 200x realtime
11ms latency
Support for hi-res audio
ON PREM/SElf-deployed
Run any of AudioShake’s edge or API models in your own cloud or offline
All API models are available for safe and secure local inference
Manage compute and process large amounts of data
Streaming or batch API available
04

Get started with our SDK

SDK
Bring sound separation to your edge device
On-device inference, no cloud round-trip, under 50ms latency or better. Includes sample apps, integration guides, and demo code. Contact to access.
REQUEST ACCESS
API
Evaluate before committing to on-device
Full model access via our API. No hardware requirements. Same separation quality, cloud-based. Ideal for prototyping, batch processing, or teams not yet building for edge.
ACCESS NOW
05

FAQ

What can developers build with the AudioShake stem separation SDK?

The SDK powers a broad range of real-time and on-device applications built on isolated audio: music remixing and mashup tools, karaoke and vocal isolation features, interactive fan engagement experiences, music education and practice apps, mobile songwriting tools, and broadcast or streaming workflows requiring clean separated audio.

What stems does the separation model support?

The model supports separation into individual stems including vocals, lead vocals, backing vocals, drums, bass, guitar (acoustic and electric), piano, keys, strings, wind instruments, and more. See the Stem Separation page for the complete list of available stems and model options.

What is AI instrument stem separation?

AI instrument stem separation is the process of using machine learning to isolate individual musical components from a fully mixed audio recording. AudioShake's models can extract individual stems — including vocals, drums, bass, guitar, piano, strings, and more — directly from any mix, without requiring the original multi-track session. The result is a set of isolated audio elements that can be used for remixing, creative tools, education, and interactive experiences.

How does music removal help with DMCA compliance and copyright protection?

Music captured incidentally in live events, venue footage, sports clips, and social content routinely triggers copyright claims, muted audio, and distribution blocks. AudioShake's system identifies the music present — including song identity and rights holder metadata — and removes it before or during distribution, allowing teams to publish or redistribute content without infringing on music rights they don't hold. This supports DMCA compliance workflows for broadcasters, sports leagues, and creator platforms at scale.

Can the SDK remove music from live broadcasts without affecting dialogue or crowd noise?

Yes. AudioShake's separation model isolates music as a discrete audio element, leaving dialogue, crowd noise, commentary, and ambient sound intact in the output. This makes it well-suited for live sports broadcasts, events, and streaming workflows where crowd atmosphere and commentary are part of the content's value.

How does AudioShake detect and remove copyrighted music in real time?

AudioShake's music detection model scans audio or video content, identifies where music is present, and returns song-level metadata including track title, artist, and rights holder information. The music removal model then separates and eliminates the detected music from the audio, producing a clean output with all other elements — dialogue, ambient sound, and effects — fully preserved. The two models work together as AudioShake's Copyright Compliance system and are available via SDK for real-time and on-device workflows.

What types of background noise can the dialogue isolation SDK handle?

The model handles a wide range of noise conditions including crowd noise, PA bleed, wind, music bleed, and ambient environmental sound. Unlike noise suppression tools that model and subtract a noise profile, AudioShake uses AI source separation — isolating the speech signal directly — which makes it more resilient to sudden or unpredictable noise without manual configuration.

What applications is real-time dialogue isolation built for?

The SDK is designed for applications that require clean speech in complex acoustic environments: live broadcast and sports production, real-time captioning and transcription pipelines, voice AI and ASR preprocessing, multilingual localization, streaming infrastructure, and conferencing tools.

Can the AudioShake SDK isolate dialogue from background noise in real time?

Yes. AudioShake's dialogue isolation model separates clean speech from background noise, crowd noise, music, and other competing audio at latencies as low as 11ms, making it suitable for live production as well as file-based workflows. The model produces two output streams simultaneously — a clean dialogue stem and a separate background stem — giving applications independent control over both.

How do I get access to the AudioShake SDK?

SDK access requires a Client ID and Client Secret from AudioShake, along with encrypted model files for your target platform. Contact info@audioshake.ai to request access and get started.

What platforms does the AudioShake SDK support?

The SDK supports iOS, macOS, Android, Windows, and Linux. GPU acceleration is available on all platforms — Metal and Neural Engine on Apple devices, CUDA on Linux and Windows, DirectX 12 on Windows, and OpenGL ES on Android — with CPU-only processing available as a fallback.

Does the AudioShake SDK work offline without an internet connection?

Yes. The AudioShake SDK runs all audio processing models locally on-device, with no network calls or cloud processing required. Audio never leaves the device, making it well-suited for privacy-sensitive workflows, offline applications, embedded systems, and any use case where network latency is not acceptable. Both real-time streaming and file-based processing are supported without a connection.

Get in touch.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.