Time Stretching Audio in Javascript

By Jonas Wagner, 2015-12-06

Seven years ago I wrote a piece of software called Play it Slowly. It allows the user to change the speed and pitch of an audio file independently. This is useful for example for practicing an instrument or doing transcriptions.

Now I created a new web based version of Play it Slowly called TimeStretch Player. It’s written in Javascript and using the WebAudio API.

Open TimeStretch Player

It features (in my opinion) much better audio quality for larger time stretches as well as a 3.14159 × cooler looking user interface. But please note that this is beta stage software it’s still far from perfect and polished.

How the time stretching works

The time stretching algorithm that I use is based on a Phase Vocoder with some simple improvements.

It works by cutting the audio input into overlapping chunks, decomposing those into their individual components using a FFT, adjusting their phase and then resynthesizing them with a different overlap.

Oversimplified Explanation

Suppose we have a simple wave like this:

wave1

We can cut it into overlapping pieces like this:

wave2

By changing the overlap between the pieces we can do the time stretching:

wave3

This messes up the phases so we need to fix them up:

wave4

Now we can just combine the pieces to get a longer version of the original:

wave5

In practice things are of course a bit more complicated and there are a lot of compromises to be made. ;)

Much better explanation

If you want a more precise description of the phase vocoder technique than the handwaving above I recommend you to read the paper Improved phase vocoder time-scale modification of audio by Jean Laroche and Mark Dolson. It explains the basic phase vocoder was well as some simple improvements.

If you do read it and are wondering what the eff the angle sign used in an expression like phase of (t u a, omega k) means: It denotes the phase - in practice the argument of the complex number of the term. I think it took me about an hour to figure that out with any certainty. YAY mathematical notation.

Pure CSS User Interface

I had some fun by creating all the user interface elements using pure CSS, no images are used except for the logo. It’s all gradients, borders and shadows. Do I recommend this approach? Not really, but it sure is a lot of fun! Feel free to poke around with your browsers dev tools to see how it was done.

Future Features

While developing the time stretching functionality I also experimented with some other features like a karaoke mode that can cancel individual parts of an audio file while keeping the rest of the stereo field intact. This can be useful to remove or isolate parts of songs for instance the vocals or a guitar solo. However, the quality of the results was not comparable to the time stretching so I decided to remove the feature for now. But you might get to see that in another app in the future. ;)

Library Release

I might release the phase vocoder code in a standalone node library in the future but it needs a serious cleanup before that can happen.