Span<T> for Audio

10 years ago I blogged that one of my most wanted C# language features was the ability to perform reinterpret casts between different array types (e.g. cast a byte[] to a float[]). This is something you frequently need to do in audio programming, where performance matters and you want to avoid unnecessary copies or memory allocations.

NAudio has used a trick involving explicit struct offsets for some time, but it does have some gotchas and I've always held out hope that one day we'd get proper language support for doing this.

Span<T>

So I'm very happy that in .NET Core 2.1, the new Span<T> functionality gives me exactly what I wanted. It's very exciting to see the significant performance optimisations this is already bringing to ASP.NET Core and wider parts of the .NET framework.

I was keen to try out Span<T> to see if it could be used in NAudio, and so while I was at the MVP Summit in March, I put together a quick proof of concept, using an early beta release of the System.Memory functionality. I was privileged to meet Krzysztof Cwalina while I was there who was able to give me some pointers for how to use the new functionality.

I've now updated my app to use the final released bits, and published the code to GitHub, so here's a quick runthrough of the changes I made and their benefits.

IWaveProvider and ISampleProvider

The two main interfaces in NAudio that define a class that can provide a stream of audio are IWaveProvider and ISampleProvider. IWaveProvider allows you to read audio into a byte array, and so is flexible enough to cover audio in any format. ISampleProvider is for when you are dealing exclusively with IEEE floating point samples, which is typically what you want to use whenever you are performing any mixing or audio manipulation with audio streams.

Both interfaces are very simple. They report the WaveFormat of the audio they provide, and define a Read method, to which you pass an array that you want audio to be written into. This is of course for performance reasons. You don't want to be allocating new memory buffers every time you read some audio as this will be happening many times every second during audio playback.

public interface IWaveProvider
{
    WaveFormat WaveFormat { get; }
    int Read(byte[] buffer, int offset, int count);
}

public interface ISampleProvider
{
    WaveFormat WaveFormat { get; }
    int Read(float[] buffer, int offset, int count);
}

Notice that both Read methods take an offset parameter. This is because in some circumstances, the start of the buffer is already filled with audio, and we don't want the new audio to overwrite it. The count parameter specifies how many elements we want to be written into the buffer, and the Read method returns how many elements were actually written into the buffer.

So what does this look like if we take advantage of Span<T>? Well, it eliminates the need for an offset and a count, as a Span<T> already encapsulates both concepts.

The updated interfaces look like this:

public interface IWaveProvider
{
    WaveFormat WaveFormat { get; }
    int Read(Span<byte> buffer);
}

public interface ISampleProvider
{
    WaveFormat WaveFormat { get; }
    int Read(Span<float> buffer);
}

This not only simplifies the interface, but it greatly simplifies the implementation, as the offset doesn't need to be factored into every read or write from the buffer.

Creating Spans

There are several ways to create a Span<T>. You can go from a regular managed array to a Span, specifying the desired offset and number of elements:

var buffer = new float[WaveFormat.SampleRate * WaveFormat.Channels];
// create a Span based on this buffer
var spanBuffer = new Span<float>(buffer,offset,samplesRequired);

You can also create a Span based on unmanaged memory. This is used by the WaveOutBuffer class, because the buffer is passed to some Windows APIs that expect the memory pointer to remain valid after the API call completes. That means we can't risk passing a pointer to a managed array, as the garbage collector could move the memory at any time.

In this example, we allocate some unmanaged memory with Marshal.AllocHGlobal, and then create a new Span based on it. Unfortunately, there is no Span constructor taking an IntPtr, forcing us to use an unsafe code block to turn the IntPtr into a void *.

var bufferPtr = Marshal.AllocHGlobal(bufferSize);
// ...
Span<byte> span;
unsafe
{
    span = new Span<byte>(bufferPtr.ToPointer(), bufferSize);
}

It's also possible to create a new Span from an existing Span. For example, in the original implementation of OffsetSampleProvider, we need to read samplesRequired samples into an array called buffer, into an offset we've calculated from the original offset we were passed plus the number of samples we've already written into the buffer:

var read = sourceProvider.Read(buffer, offset + samplesRead, samplesRequired);

But the Span<T> implementation uses Slice to create a new Span of the desired length (samplesRequired), and from the desired offset (samplesRead) into the existing Span. The fact that our existing Span already starts in the right place eliminates the need for us to add on an additional offset, eliminating a common cause of bugs.

var read = sourceProvider.Read(buffer.Slice(samplesRead, samplesRequired));

Casting

I've said that one of the major benefits of Span<T> is the ability to perform reinterpret casts. So we can essentially turn a Span<byte> into a Span<float> or vice versa. The way you do this changed from the beta bits - now you use MemoryMarshal.Cast, but it is pretty straightforward.

This greatly simplifies a lot of the helper classes in NAudio that enable you to switch between IWaveProvider and ISampleProvider. Here's a simple snippet from SampleToWaveProvider that makes use of MemoryMarshal.Cast.

public int Read(Span<byte> buffer)
{
    var f = MemoryMarshal.Cast<byte, float>(buffer);
    var samplesRead = source.Read(f);
    return samplesRead * 4;
}

This eliminates the need for the WaveBuffer hack that we previously needed to avoid copying in this method.

Span<T> Limitations

There were a few limitations I ran into that are worth noting. First of all, a Span<T> can't be used as a class member (read Stephen Toub's article to understand why). So in the WaveOutBuffer class, where I wanted to reuse some unmanaged memory, I couldn't construct a Span<T> up front and reuse it. Instead, I had to hold onto the pointer to the unmanaged memory, and then construct a Span on demand.

This limitation also impacts the way we might design an audio recording interface for NAudio. For example, suppose we had an AudioAvailable event that was raised whenever recorded audio was available. We might want it to provide us a Span<T> containing that audio:

interface IAudioCapture
{
    void Start();
    void Stop();
    event EventHandler<AudioCaptureEventArgs> AudioAvailable;
    event EventHandler<StoppedEventArgs> RecordingStopped;
}

/// not allowed:
public class AudioCaptureEventArgs : EventArgs
{
    public AudioCaptureEventArgs(Span<byte> audio)
    {
        Buffer = audio;
    }

    public Span<byte> Buffer { get; }
}

But this isn't possible. We'd have to switch to Memory<T> instead. We can't even create a callback like this as Span<T> can't be used as the generic type for Func<T>:

void OnDataAvailable(Func<Span<byte>> callback);

However, one workaround that does compile is to use Span<T> in a custom delegate type:

void OnDataAvailable(AudioCallback callback);

// ...
delegate void AudioCallback(Span<byte> x);

I'm not sure yet whether this approach is preferable to using Memory<T>. The recording part of my proof of concept application isn't finished yet and so I'll try both approaches when that's ready.

Next steps

There is still a fair amount I'd like to do with this sample to take full advantage of Span<T>. There are more array allocations that could be eliminated, and also there should now be no need for any pinned GCHandle instances.

There's also plenty more NAudio classes that could be converted to take advantage of Span<T>. Currently the sample app just plays a short tone generated with the SignalGenerator, so I'd like to add in audio file reading, as well as recording. Feel free to submit PRs or raise issues if you'd like to help shape what might become the basis for a future NAudio 2.0.

Span<T> and .NET Standard

Of course one big block to the adoption of Span<T> is that it is currently supported on .NET Core 2.1 only. It's not part of .NET Standard 2.0, and it seems there are no immediate plans to create a new version of the .NET Standard that supports Span<T>, presumably due to the challenges of back-porting all this to the regular .NET Framework. This is a shame, because it means that NAudio cannot realistically adopt it if we want one consistent programming model across all target frameworks.

Conclusion

Span<T> is a brilliant new innovation, that has the potential to bring major performance benefits to lots of scenarios, including audio. For the time being though, it is only available in .NET Core applications.

Comments

June 19. 2018 17:58

THANK YOU! I was wondering where NonPortableCast went :)

Jason Bock

June 19. 2018 18:10

ha ha, yes, thankfully Krzysztof Cwalina gave me advance warning of the upcoming change or I would have been left floundering!

Mark Heath

June 20. 2018 20:25

What about System.Memory nuget package? Looks like it provides Span and related types for environments other than .NET Core 2.1. I presume it will be less performant than native Span but maybe it's still an option.

Jacek Bukarewicz

June 20. 2018 21:14

good question, might be worth experimenting and seeing whether that can be used from regular .NET framework and what its performance is light

June 21. 2018 10:41

Span is on NuGet and supports .NET Standard 2.0, as well as providing specific packages for .NET 4.x and other platforms. The thing that is specific to .NET Core 2.1 is the integration with pre-existing APIs like Streams and Collections, but if you don't need that for your app - if you just want to use Span or Memory in your own code - then you can use it today.

markrendle

June 21. 2018 10:48

That's great, somehow I'd missed that. NAudio is still targeting .NET 3.5 at the moment, but it would definitely be worth make the leap to 4.x for the next major version to gain access to Span<t>/Memory<t>

January 8. 2019 05:59

markrendle but

disqus_YpOYBmcCuK