Span<T> for Audio
10 years ago I blogged that one of my most wanted C# language features was the ability to perform reinterpret casts between different array types (e.g. cast a byte[]
to a float[]
). This is something you frequently need to do in audio programming, where performance matters and you want to avoid unnecessary copies or memory allocations.
NAudio has used a trick involving explicit struct
offsets for some time, but it does have some gotchas and I've always held out hope that one day we'd get proper language support for doing this.
Span<T>
So I'm very happy that in .NET Core 2.1, the new Span<T>
functionality gives me exactly what I wanted. It's very exciting to see the significant performance optimisations this is already bringing to ASP.NET Core and wider parts of the .NET framework.
I was keen to try out Span<T>
to see if it could be used in NAudio, and so while I was at the MVP Summit in March, I put together a quick proof of concept, using an early beta release of the System.Memory
functionality. I was privileged to meet Krzysztof Cwalina while I was there who was able to give me some pointers for how to use the new functionality.
I've now updated my app to use the final released bits, and published the code to GitHub, so here's a quick runthrough of the changes I made and their benefits.
IWaveProvider and ISampleProvider
The two main interfaces in NAudio that define a class that can provide a stream of audio are IWaveProvider
and ISampleProvider
. IWaveProvider
allows you to read audio into a byte array, and so is flexible enough to cover audio in any format. ISampleProvider
is for when you are dealing exclusively with IEEE floating point samples, which is typically what you want to use whenever you are performing any mixing or audio manipulation with audio streams.
Both interfaces are very simple. They report the WaveFormat
of the audio they provide, and define a Read
method, to which you pass an array that you want audio to be written into. This is of course for performance reasons. You don't want to be allocating new memory buffers every time you read some audio as this will be happening many times every second during audio playback.
public interface IWaveProvider
{
WaveFormat WaveFormat { get; }
int Read(byte[] buffer, int offset, int count);
}
public interface ISampleProvider
{
WaveFormat WaveFormat { get; }
int Read(float[] buffer, int offset, int count);
}
Notice that both Read
methods take an offset
parameter. This is because in some circumstances, the start of the buffer is already filled with audio, and we don't want the new audio to overwrite it. The count
parameter specifies how many elements we want to be written into the buffer, and the Read
method returns how many elements were actually written into the buffer.
So what does this look like if we take advantage of Span<T>
? Well, it eliminates the need for an offset
and a count
, as a Span<T>
already encapsulates both concepts.
The updated interfaces look like this:
public interface IWaveProvider
{
WaveFormat WaveFormat { get; }
int Read(Span<byte> buffer);
}
public interface ISampleProvider
{
WaveFormat WaveFormat { get; }
int Read(Span<float> buffer);
}
This not only simplifies the interface, but it greatly simplifies the implementation, as the offset doesn't need to be factored into every read or write from the buffer.
Creating Spans
There are several ways to create a Span<T>
. You can go from a regular managed array to a Span
, specifying the desired offset and number of elements:
var buffer = new float[WaveFormat.SampleRate * WaveFormat.Channels];
// create a Span based on this buffer
var spanBuffer = new Span<float>(buffer,offset,samplesRequired);
You can also create a Span
based on unmanaged memory. This is used by the WaveOutBuffer
class, because the buffer is passed to some Windows APIs that expect the memory pointer to remain valid after the API call completes. That means we can't risk passing a pointer to a managed array, as the garbage collector could move the memory at any time.
In this example, we allocate some unmanaged memory with Marshal.AllocHGlobal
, and then create a new Span
based on it. Unfortunately, there is no Span
constructor taking an IntPtr
, forcing us to use an unsafe
code block to turn the IntPtr
into a void *
.
var bufferPtr = Marshal.AllocHGlobal(bufferSize);
// ...
Span<byte> span;
unsafe
{
span = new Span<byte>(bufferPtr.ToPointer(), bufferSize);
}
It's also possible to create a new Span
from an existing Span
. For example, in the original implementation of OffsetSampleProvider
, we need to read samplesRequired
samples into an array called buffer
, into an offset
we've calculated from the original offset we were passed plus the number of samples we've already written into the buffer:
var read = sourceProvider.Read(buffer, offset + samplesRead, samplesRequired);
But the Span<T>
implementation uses Slice
to create a new Span
of the desired length (samplesRequired
), and from the desired offset (samplesRead
) into the existing Span
. The fact that our existing Span
already starts in the right place eliminates the need for us to add on an additional offset
, eliminating a common cause of bugs.
var read = sourceProvider.Read(buffer.Slice(samplesRead, samplesRequired));
Casting
I've said that one of the major benefits of Span<T>
is the ability to perform reinterpret casts. So we can essentially turn a Span<byte>
into a Span<float>
or vice versa. The way you do this changed from the beta bits - now you use MemoryMarshal.Cast
, but it is pretty straightforward.
This greatly simplifies a lot of the helper classes in NAudio that enable you to switch between IWaveProvider
and ISampleProvider
. Here's a simple snippet from SampleToWaveProvider
that makes use of MemoryMarshal.Cast
.
public int Read(Span<byte> buffer)
{
var f = MemoryMarshal.Cast<byte, float>(buffer);
var samplesRead = source.Read(f);
return samplesRead * 4;
}
This eliminates the need for the WaveBuffer
hack that we previously needed to avoid copying in this method.
Span<T> Limitations
There were a few limitations I ran into that are worth noting. First of all, a Span<T>
can't be used as a class member (read Stephen Toub's article to understand why). So in the WaveOutBuffer
class, where I wanted to reuse some unmanaged memory, I couldn't construct a Span<T>
up front and reuse it. Instead, I had to hold onto the pointer to the unmanaged memory, and then construct a Span
on demand.
This limitation also impacts the way we might design an audio recording interface for NAudio. For example, suppose we had an AudioAvailable
event that was raised whenever recorded audio was available. We might want it to provide us a Span<T>
containing that audio:
interface IAudioCapture
{
void Start();
void Stop();
event EventHandler<AudioCaptureEventArgs> AudioAvailable;
event EventHandler<StoppedEventArgs> RecordingStopped;
}
/// not allowed:
public class AudioCaptureEventArgs : EventArgs
{
public AudioCaptureEventArgs(Span<byte> audio)
{
Buffer = audio;
}
public Span<byte> Buffer { get; }
}
But this isn't possible. We'd have to switch to Memory<T>
instead. We can't even create a callback like this as Span<T>
can't be used as the generic type for Func<T>
:
void OnDataAvailable(Func<Span<byte>> callback);
However, one workaround that does compile is to use Span<T>
in a custom delegate type:
void OnDataAvailable(AudioCallback callback);
// ...
delegate void AudioCallback(Span<byte> x);
I'm not sure yet whether this approach is preferable to using Memory<T>
. The recording part of my proof of concept application isn't finished yet and so I'll try both approaches when that's ready.
Next steps
There is still a fair amount I'd like to do with this sample to take full advantage of Span<T>
. There are more array allocations that could be eliminated, and also there should now be no need for any pinned GCHandle
instances.
There's also plenty more NAudio classes that could be converted to take advantage of Span<T>
. Currently the sample app just plays a short tone generated with the SignalGenerator
, so I'd like to add in audio file reading, as well as recording. Feel free to submit PRs or raise issues if you'd like to help shape what might become the basis for a future NAudio 2.0.
Span<T> and .NET Standard
Of course one big block to the adoption of Span<T>
is that it is currently supported on .NET Core 2.1 only. It's not part of .NET Standard 2.0, and it seems there are no immediate plans to create a new version of the .NET Standard that supports Span<T>
, presumably due to the challenges of back-porting all this to the regular .NET Framework. This is a shame, because it means that NAudio cannot realistically adopt it if we want one consistent programming model across all target frameworks.
Conclusion
Span<T>
is a brilliant new innovation, that has the potential to bring major performance benefits to lots of scenarios, including audio. For the time being though, it is only available in .NET Core applications.
Comments
THANK YOU! I was wondering where NonPortableCast went :)
Jason Bockha ha, yes, thankfully Krzysztof Cwalina gave me advance warning of the upcoming change or I would have been left floundering!
Mark HeathWhat about System.Memory nuget package? Looks like it provides Span and related types for environments other than .NET Core 2.1. I presume it will be less performant than native Span but maybe it's still an option.
Jacek Bukarewiczgood question, might be worth experimenting and seeing whether that can be used from regular .NET framework and what its performance is light
Mark HeathSpan is on NuGet and supports .NET Standard 2.0, as well as providing specific packages for .NET 4.x and other platforms. The thing that is specific to .NET Core 2.1 is the integration with pre-existing APIs like Streams and Collections, but if you don't need that for your app - if you just want to use Span or Memory in your own code - then you can use it today.
markrendleThat's great, somehow I'd missed that. NAudio is still targeting .NET 3.5 at the moment, but it would definitely be worth make the leap to 4.x for the next major version to gain access to Span<t>/Memory<t>
Mark Heathmarkrendle but
disqus_YpOYBmcCuK