Add real-time audio separation to any pipeline

The AudioShake SDK integrates streaming-capable, real-time sound separation right into your app or self-deployed Enterprise service. AudioShake's stem separation SDK enables you to separate vocals, isolate instruments, remove music, and clean speech, in real-time, with industry-leading quality.

Available on iOS/MacOS, Android, Windows, and Linux platforms, with local inference times optimized for each. You’ll always get the best speed wherever you deploy.

What is a stem separation SDK?

Real-time separation processes live audio streams as they happen — isolating dialogue, vocals, or instruments instantly, without sending files to the cloud or waiting for post-production.

AudioShake's stem separation SDK runs models locally on-device, making clean, separated audio available before it reaches the next stage of your pipeline. For dubbing and captioning, dialogue is isolated from crowd noise or background music the moment it's captured. In broadcast, music can be removed from streams to ensure rights compliance. For speech workflows, developers can turn messy, real-world audio into clean, structured inputs for ASR and LLM systems. And for music apps, stem-level control lets users interact and mix tracks in real time.

On-device, no cloud processing

11ms dialogue isolation latency

Up to 200x real-time inference

Separation models across voice, film, TV, and music

AudioShake's SDK gives developers access to real-time music removal, dialogue separation, and instrument stem isolation – all with low latency performance, on-device, across iOS, Android, Windows, and Linux.

DIALOGUE

Dialogue Isolation

View product page →

Isolates spoken dialogue from background sound in real-time streams. Cleans voice inputs before they reach ASR, transcription, translation, or A1 audio engineer workflows — with ~25% improvement in ASR accuracy in noisy audio environment.

DIALOGUE RT

Low-Latency Dialogue Isolation

New – 11ms latency

Dialogue RT delivers 11ms latency for live broadcast workflows — built for live sports, news, commentary, transcription, and real-time speech applications.

Built for production workflows

The SDK runs locally on-device, integrates in a few lines of code, and fits into mobile apps, desktop DAWs, live streaming platforms, embedded devices, and high-volume on-premise media processing workflows.

11ms

dialogue isolation latency–first to meet live broadcast threshold

200x

real-time inference speeds

25%

reported ASR accuracy improvement with SDK preprocessing

Authentic and scalable speech recovery

VOICE AI

Improve transcription accuracy with speech isolation

Models as small as 9mb 200x realtime
11ms latency

NPU/GPU/CPU runtimes available for real-time performance

Native support for low-res and high-res audio

MUSic

Power music production, mixing, songwriting, and education apps

Up to 14 instrument targets or joint 4-stem, 6-stem, drum kits, vocals available

Up to 250x real-time processing (vocals) with per-platform optimizations

Cross-platform SDKs available: iOS/MacOS, Windows, Linux, Android

BROADCAST

Remove copyright material from your audio

Isolate clean dialogue from crowd noise

Streaming-capable dialogue and music removal models

Models as small as 9mb 200x realtime
11ms latency

Support for hi-res audio

ON PREM/SElf-deployed

Run any of AudioShake’s edge or API models in your own cloud or offline

All API models are available for safe and secure local inference

Manage compute and process large amounts of data

Streaming or batch API available

Get started with our SDK

SDK

Bring sound separation to your edge device

On-device inference, no cloud round-trip, under 50ms latency or better. Includes sample apps, integration guides, and demo code. Contact to access.

REQUEST ACCESS

API

Evaluate before committing to on-device

Full model access via our API. No hardware requirements. Same separation quality, cloud-based. Ideal for prototyping, batch processing, or teams not yet building for edge.

ACCESS NOW

FAQ

What can developers build with the AudioShake stem separation SDK?

The SDK powers a broad range of real-time and on-device applications built on isolated audio: music remixing and mashup tools, karaoke and vocal isolation features, interactive fan engagement experiences, music education and practice apps, mobile songwriting tools, and broadcast or streaming workflows requiring clean separated audio.

What stems does the separation model support?

The model supports separation into individual stems including vocals, lead vocals, backing vocals, drums, bass, guitar (acoustic and electric), piano, keys, strings, wind instruments, and more. See the Stem Separation page for the complete list of available stems and model options.

What is AI instrument stem separation?

AI instrument stem separation is the process of using machine learning to isolate individual musical components from a fully mixed audio recording. AudioShake's models can extract individual stems — including vocals, drums, bass, guitar, piano, strings, and more — directly from any mix, without requiring the original multi-track session. The result is a set of isolated audio elements that can be used for remixing, creative tools, education, and interactive experiences.

How does music removal help with DMCA compliance and copyright protection?

Music captured incidentally in live events, venue footage, sports clips, and social content routinely triggers copyright claims, muted audio, and distribution blocks. AudioShake's system identifies the music present — including song identity and rights holder metadata — and removes it before or during distribution, allowing teams to publish or redistribute content without infringing on music rights they don't hold. This supports DMCA compliance workflows for broadcasters, sports leagues, and creator platforms at scale.

Can the SDK remove music from live broadcasts without affecting dialogue or crowd noise?

Yes. AudioShake's separation model isolates music as a discrete audio element, leaving dialogue, crowd noise, commentary, and ambient sound intact in the output. This makes it well-suited for live sports broadcasts, events, and streaming workflows where crowd atmosphere and commentary are part of the content's value.

How does AudioShake detect and remove copyrighted music in real time?

AudioShake's music detection model scans audio or video content, identifies where music is present, and returns song-level metadata including track title, artist, and rights holder information. The music removal model then separates and eliminates the detected music from the audio, producing a clean output with all other elements — dialogue, ambient sound, and effects — fully preserved. The two models work together as AudioShake's Copyright Compliance system and are available via SDK for real-time and on-device workflows.

What types of background noise can the dialogue isolation SDK handle?

The model handles a wide range of noise conditions including crowd noise, PA bleed, wind, music bleed, and ambient environmental sound. Unlike noise suppression tools that model and subtract a noise profile, AudioShake uses AI source separation — isolating the speech signal directly — which makes it more resilient to sudden or unpredictable noise without manual configuration.

What applications is real-time dialogue isolation built for?

The SDK is designed for applications that require clean speech in complex acoustic environments: live broadcast and sports production, real-time captioning and transcription pipelines, voice AI and ASR preprocessing, multilingual localization, streaming infrastructure, and conferencing tools.

Can the AudioShake SDK isolate dialogue from background noise in real time?

Yes. AudioShake's dialogue isolation model separates clean speech from background noise, crowd noise, music, and other competing audio at latencies as low as 11ms, making it suitable for live production as well as file-based workflows. The model produces two output streams simultaneously — a clean dialogue stem and a separate background stem — giving applications independent control over both.

How do I get access to the AudioShake SDK?

SDK access requires a Client ID and Client Secret from AudioShake, along with encrypted model files for your target platform. Contact info@audioshake.ai to request access and get started.

What platforms does the AudioShake SDK support?

The SDK supports iOS, macOS, Android, Windows, and Linux. GPU acceleration is available on all platforms — Metal and Neural Engine on Apple devices, CUDA on Linux and Windows, DirectX 12 on Windows, and OpenGL ES on Android — with CPU-only processing available as a fallback.

Does the AudioShake SDK work offline without an internet connection?

Yes. The AudioShake SDK runs all audio processing models locally on-device, with no network calls or cloud processing required. Audio never leaves the device, making it well-suited for privacy-sensitive workflows, offline applications, embedded systems, and any use case where network latency is not acceptable. Both real-time streaming and file-based processing are supported without a connection.