Compare Models

Transcription Accuracy & Separation Quality

Transcription Accuracy

To evaluate the accuracy of transcription models, researchers use industry-wide benchmarks to evaluate the word error rate (WER), as well as punctuation and formatting related metrics.  At ISMIR, AudioShake's Research team presented a new benchmark based on the JamendoLyrics dataset that accounted for the finer nuances of written lyrics called Jam-ALT.

Below are the results of running various transcription systems against this new benchmark for evaluation.

Hear the difference

Models from 10x-400x faster than real-time

Process hours-long files

Process high-resolution (192kHz) files

Song: "Future" by Torches
READ ABOUT SDR SCORES
Future
Torches
BASS
DRUMS
OTHER
VOCALS

Demucs v4

BASS
DRUMS
OTHER
VOCALS

Spleeter

BASS
DRUMS
OTHER
VOCALS

SDR Scores

The music information retrieval community typically uses something called the Signal-to-Distortion (SDR) score to measure quality, and AudioShake has repeatedly demonstrated its ability to achieve the highest SDR scores. However, we caution people from using SDR score as the best indicator of a high-quality model, because it is quite possible to achieve a high SDR score while the actual results sound poor.

That's why we have developed perceptual metrics that we use for evaluating model performance, and sometimes choose lower-SDR scores models that perform better on different tasks.

That said, we know some people love to ask about SDR scores, so below is a sample of our stem separation scores for music.
Get in touch.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.