Episode #594

Build a Multi Track Audio Engine

Series: Learn AudioKit

23 minutes
Published on June 6, 2025

This video is only available to subscribers. Get access to this video and 594 others.

In this episode we will build a simplified multi track audio engine, capable of mixing & muting individual tracks and starting synchronized playback. In this episode we'll focus on the engine, and implement the UI in the next episode./

This episode uses Swift 6.0, Xcode 16.4.

Okay, for our last demo, we're going to build a multi-track playback engine. So here I have a multi-track view, but we're going to have to split this episode into two parts. So the first part, we're going to talk about the audio engine part of it. And in the second episode, we'll tie that together with some UI. So what I want to start with here is an audio track. And this audio track is going to be, it's going to have a name, which will be a string. You'll have a player, which is going to be an audio player from AudioKit. And it's going to have a fader. This fader is going to control the volume of this individual track. And then we're going to initialize this with a name and a URL. So the URL is going to be a file URL, and this can throw. So what we're going to do is assign the name. And then we're going to set, we're going to create an AV audio file. And that's going to be for reading this URL. And that can throw. So we're going to add the try keyword. Then we're going to create a player for that. That's going to be an audio player. And this can return an optional. So we're just going to force unwrap that. Next, these audio tracks that I have downloaded here come from MobyGratis.com, which is a bunch of tracks that you can use for basically whatever purpose. And so I've just grabbed a few of these tracks that are part of a song, and they are generally pretty loud because these are meant for you to mix on your own, so they are not pre-mixed. So what I'm going to do is just set the volume to the player to be pretty low, And this is going to, because we're mixing all these together and the tracks themselves are very loud already, if we mix them together, it's going to get even louder. So we're going to start the volume really low. And then we're also going to create this fader, which is going to give us control over the volume of this player. And we're going to start the fader volume. This gain is going to be 0.5. So it'll start off in the middle, and we can adjust it up or down as needed. Okay, so that represents our individual audio tracks. Next, let's create our main engine for this demo. So what we're going to do is create an observable class, and this is going to be a multi-track engine. It's going to have a private let engine, which is going to be our audio kit engine, which is audioengine. And we can assign that right away. We're also going to need a mixer to mix all these audio tracks together. And we're going to have an array of tracks, which is going to be an audio track that will start off as an empty array right now. So we'll just do it like this. Next we're going to need some playback state options So we're going to have isPlaying IsFalse We're going to have isLoaded IsFalse And we're going to have some playback progress Which will be a double Which will start at zero Finally we'll have some Housekeeping variables called play start time. This is going to be a time interval. That'll start off at zero, and a private var paused at, which will be a time interval, and start at zero. The only thing we need to do now is when we initialize, we're going to set the engine's output to be the mixer's output, or to be this mixer node. Okay, so now we need a function called loadTracks, And we're going to allow this load tracks to be called specifically. The reason why is that in our multitrack view, if we do something like, let's say we do state var, private var engine is a multitrack engine. We don't want this to be loading these tracks. These are actually pretty big on disk, probably 10 to 20 megabytes each. And we don't want those to be loaded up into memory because it's going to take a significant amount of time. We're going to be on the main thread when that happens. And they're not even looking at this view at that moment. If we look in the content view here, this tab is going to get created right away. And when we switch to it, those tracks, the view would have already been loaded by that time. So they're not even looking at this. So what I want to do is wait until they've clicked on it to load those tracks. So that's why we're separating the load tracks. Now, these track files are going to... They have interesting file names, but they represent some instruments. So I'm just going to map the file name to the instrument. And then we're going to have a do catch block in here. and then here we're going to make sure that we stop the playback of our audio engine. We'll write that function in a minute. We're also going to remove everything from the tracks array in case there was something already and we're also going to remove all inputs from our mixer. Let's go ahead and add the func stop here just to get that to not complain. Okay, now for each of the file name and the instrument in our track files, we're going to make sure that we can get that URL. For now, I'm just going to make this crash if we can't. So we're going to get bundle.main.url for resource, file name with extension, and the file name is complete, so we can use extension nil here. And so now we have our URL. We're going to get our track.

So this is going to be try audio track, and we're going to use the instrument as the name of the track and the URL as the track file. We can append the track here, and then add the tracks fader as an input to our mixer. Once all of this is done, we can try to start the engine. And here, I'm going to say everyone tracks.

I'm also going to add a print loading tracks here. I'm going to add, we're going to say loading and the file name. And then at the very end, I'm going to say done loading tracks. Now, the reason I'm doing this in three parts is so we can actually see in the logs when this is happening. There's multiple ways we could load these audio tracks. If we take a look at this audio player, it has a buffered property. And if we say buffered true, then it's actually going to load all of the audio waveform data into memory up front. And it's going to take a significant amount of time. If we leave it off or say buffered false, then it's not going to do that. but that means that there's going to be a bit of buffering time when we first play the track so it can read this file. If you're reading a very small file, then it's probably a good idea to buffer it so that you already have that in memory. You can just play it over and over and over again if it's, say, a sound effect. In our case, these tracks are going to be fairly long, fairly large in terms of data size, so we don't want to read them all from memory. We want to stream them from disk as needed. So it's an important thing to just keep in mind that it could take some time for these tracks to load. Okay, so we've got our stop function, and here we have a bunch of our audio players. So we can go through each one of our, for track and tracks, we can say track.player.stop. and then we can say start synchronized layback to start the tracks. Now what we want to do here, one thing I forgot is up here is we can set is loaded to be true and here we want to guard that we're loaded otherwise we're going to return because there's nothing to do. OK, so at this point now, we can schedule all the tracks to play at the same time. We're going to get a reference to AV audio time now. And it's important to note that we've got a clock that's running on the audio engine. And AV audio time is a way to represent the time, a moment in time that allows us to compare the current time with the time that the audio engine thinks it is. So there's two ways to do this. One is mock absolute time, which is what avaudioTime.now is doing. And so this is called the host time. There's also audio samples at a particular sample rate. So if we have the AV audio, let's say we have a file, and we want to say I want to go three seconds into that file, and I know the sample rate, then we can use the sample rate to convert a number of samples to represent time. Or we can do the mock absolute time, which is the host time. It's important to note that the audio engine is keeping track of time in terms of samples. And so those things are not perfectly one-to-one. And so the audio engine may be lagging behind the host time by, you know, a certain number of samples. So it's probably a good idea just to, like, wrap your head around this idea of audio time. And this is not the time in the track that we're playing. This is the time right now. And what we need to do, we could naively say for track in tracks. And then we could say track.player.play. Now, if we did this, we would start playing the first track. That would schedule some samples in the audio engine to be played. Then we'd loop over the second one. then it would schedule some audio to be played as well. Those would all get added to the mixer, and the mixer would be sending those samples to the audio buffer. And so as we are playing, these are actually going to be not perfectly in sync with each other. So what we need to do is have a stable reference point in time in the future that we can iterate through all of these, that we can give these plenty of time for it to load in the buffer from disk, which is also going to take time, and then start playing at a moment in the future. So what we're going to do now is compute the sample time. And if we take a look at this track in Finder and we say get info, we should be able to see somewhere in here that shows the sample rate. Yeah, right here. So we have 44.1 kilohertz for the sample rate of these files. So the sample time, we're going to get now.sampleTime. And if we take a look at this, this is the time as a number of audio samples as tracked by the current audio device. We can also get the sample rate, which is going to be 44100.0. And that's going to be a double. So we're going to take that sample time, and we're going to multiply 0.1, so 0.1 seconds, times the sample rate. So that's going to equal 4410. That's the number of samples in a tenth of a second. And I believe that this needs to be an int 64. Yeah. Okay, so that gives us our sample time. And then we can say our start time is AV audio time. for sample time at rate sample rate. Okay, so now we have a moment in the future that is 0.1 seconds or 4,410 seconds from the host times sample time right now. So now we can say that we're going to play and we have options to play from and to and at.

So at this point, we're going to play from and at a given time. The time interval that we're playing from is going to be zero, but we're really going to use that paused at time, which comes from our engine. And then the audio time that we're going to play at is going to be our start time. So now we're going to loop over all the tracks them prepared to play at a specific moment in time, so now they're all going to be starting at the exact moment in time. Now we can say the play start time is the date time interval since reference date minus our paused at time interval. And then finally we're going to set isPlaying equals true. Let's add our pause function. We're going to set pausedAt to be date, time interval since reference date, minus our play start time. Then we're going to go through all of our tracks. We're going to track.player.pause. And then we're going to set is playing to false. Okay.

So now I mentioned that we are going to want to set the volume and mute for each track because this is going to be like a digital audio workstation. So if you've ever used GarageBand or Logic or something like that, we're going to want to be able to set the volume for a given track. so we're going to use the index of the track and we're going to say to value and pass in a floating point number we're going to make sure that this index is less than the tracks count tracks.count else return we could also make sure that the index is greater than or equal to zero so that way we have a valid index for the track then we're going to set tracks at that index dot fader dot gain equals value and this probably needs to be an au value with this value now it's worth noting that because our tracks are classes we can actually just pass it in here so it really depends on how you model that i think i actually do want to just pass the track because this um because that simplifies things and now we no longer need to we no longer need to check. We can just say track.fader.gain like this. We can also do the same thing for toggling mute. So toggling the mute here, we're going to, we could add some state to this, or we could just to use the fact that if the fader is zero, then it's muted. So we'll say let fader is track.fader. We're going to set track.fader.gain. If the fader's gain is zero, then we're going to set it to its default value. Otherwise, we're going to set it to zero. Actually, we want to say if it's greater than zero, then we're going to set it to zero. Otherwise, we're going to set it to the default value. Let's do it like that. OK. We also are going to need a way to get the volume for track, audio track, and this is going to be track.fader.gain. This will be useful for our-- let's see. Do we want this to be a float? Yeah, which means we'll wrap it. So it'll be useful as we build out the UI, but we're just adding all the functionality that our UI is going to need. OK, so this is going to be-- let's add some sections here. This is going to be volume and mute. And then we'll pass it here for progress tracking. and then just be...

Okay. Okay. For the last part of this, we're going to need the duration of our audio file. So we're going to do time interval. We're going to grab the first track, and then we're going to return the first tracks player.duration. And this one is going to grab the file's duration in our case, because we have an audio file based on the track. Okay, we're going to say update progress, and that's going to check to see if we are playing. We're going to get the elapsed time. That's going to be the time that we've been playing, time interval since reference date minus the play start time. Then we're going to set the play progress to the minimum of elapsed time divided by duration and 1.0. So this means that we cannot go over 1.0. And that's pretty good. Now, we do have a problem here where if we don't have any tracks, we return zero. And then we'll divide by zero here. Also, if this file is actually zero, we'll have a problem here. So let's get the duration like this. And then we'll guard. Let's see. I only want to do this once so I'm going to get the duration and then we're going to say guard duration is greater than 0 and then if that happens I'm going to just fatal error and say duration was 0 this update progress could probably just return, let's just do that we don't really need to crash if that happens we just need to return okay so i think that this is it for audio engine we've got multi-track audio with audio kit we've got sample accurate synchronized playback we've got individual track volume and mute controls and we've got progress tracking so we can update the ui so in the next video we're going to take a look at connecting this to some SwiftUI so we can actually see this in practice.