Placeholder Image

Subtitles section Play video

  • Meet the Packets: How audio travels into your browser

  • Sara Fecadu KATIE: Hello. Welcome back. So, I keep forgetting

  • to do this and I apologize. But the big announcement right now is that the swag is ready. But do

  • not go get swag now because we're about to have a really awesome talk by Sara Fecadu.

  • I asked Sara for a fun fact and her fun fact was that she makes  bakes a mean cookie which

  • unfortunately we can't all indulge in. So, as a follow up question, I said what prompted

  • you write this talk about an audio API. And she said, well, I had spent a year building

  • a checkout form and I just couldn't stand to look at it or think about it anymore and

  • I had to do something different. Which I think is something that literally all you have us

  • can probably identify really strongly with. So, anyways, Sara is gonna come up and talk

  • to us about the audio API. So, give it up for Sara.

  • [ Applause ] SARA: Hello. See if I can get my computer

  • started here. Okay. Welcome to my talk. Meet the packets. If not everyone has realized,

  • it's a play off meet the parents. I spent a lot of time working on that.

  • [ Laughter ] Let's see here. One second. Gonna progress?

  • No. Okay. We're gonna do it without the clicker. So, this will be interesting. As Katie said,

  • my name  oh. My whole slide deck isn't progressing. Okay. One second. There we go. Okay. Thank

  • you for coming to talk. As Katie said, my name is Sara Fecadu. I am from Seattle, Washington.

  • And I don't have a ton of hobbies besides making cookies and listening to a lot of podcasts.

  • And by day I'm a software developer at Nordstrom. And Nordstrom is a clothing retailer founded

  • in 1901. While people don't usually associate 100 year old companies with tech, we have

  • a thriving tech org working on innovative ways to get you what you need and feel your

  • best. And a year ago I was hired on to do a rewrite of Nordstrom.com's redux. And as

  • of last May, we have been taking 100% of customer orders. Now, why am I talking about audio

  • streaming? Katie may have taken my joke here, but the answer is: Form fields. Our checkout

  • UI has 22 form fields. And they come in different groupings for different reasons. But many

  • of my waking moments over the past year have been spent thinking about these form fields.

  • And I just wanted to do anything else. So, I was sitting on my couch one night reading

  • a book on packet analysis, like one does, and watching a YouTube video. And I thought

  • to myself, how does that work? Like, on the packet level, how does audio video streaming

  • work? So, to answer the larger question, I started small with: What is audio streaming?

  • And audio streaming is the act of sending audio files over the network. And this talk

  • will be about on demand audio streaming. Now, the major difference between on demand streaming

  • and live streaming, is with on demand streaming we need all of the packets to get across the

  • wire. Whereas with live streaming, you may be more interested in keeping them up with

  • the event and a certain amount of packet loss is acceptable. Over the past few months, I

  • learned that audio streaming, even when limited to on demand, is as wide a subject as it is

  • deep. I have picked three topics that exemplify what audio streaming is. Why it's hard and

  • how to get started yourself. And we will talk about audio streaming protocols, TCP congestion

  • control and client players. Audio streaming protocols give us a stand how to encode, segment

  • and ship your code to the client. TCP congestion control handles congestion on the TCP layer

  • of the stack. And it is relevant with on demand audio streaming because we're shipping larger

  • audio files and we need every single packet to make its way to the client to play audio.

  • A client player is any network connected device with a play and pause button. So, this could

  • be your phone, your TV, your laptop, et cetera. And client players not only allow us to play

  • our audio, but when paired with modern audio streaming protocols, they hold a lot of decision

  • making power. Well, audio streaming protocols are the heart of audio streaming. And today

  • we'll talk about adaptive bitrate streaming it &s it benefits and how to convert your

  • own audio files to work with two popular audio streaming protocols. Before we get started,

  • I wanted to go over some terms that will come up. A codec encodes data and uses compression

  • techniques to get the highest quality for the smallest footprint. Encoding and trans

  • coding is converting it from one type to another. Trans coding can convert from digital to digital.

  • And then move from analog to other digital files. Bitrate is how many bits it takes to

  • encode a second of audio. And this number usually refers to the quality of the audio

  • file. When I think of playing music on the Internet, I think of an HTML5 audio tag with

  • a source attribute set to the path of my audio file. And this is a perfectly reasonable way

  • to do it. You can request and receive a single file containing an entire song. And it would

  • be referred to as progressive streaming and the major benefit here is you only have one

  • file to deal with. But let's say, for instance, you have a user and they have a slow network

  • connection and they can't download your one file. They're stuck. So, adaptive bitrate

  • streaming aims to solve this problem by encoding your audio in multiple bitrates and allowing

  • the client player to decide which quality is best for the user to listen to your audio

  • uninterrupted. This allows more users to access your audio. But it does add a layer of operational

  • complexity because now you've got a lot more work on moving parts. The audio streaming

  • protocols we'll talk about not only average adaptive bitrate streaming, but also use HTTP

  • web servers. They do this by encoding the file, segmenting they will, placing them on

  • a web server and then once requested, partial audio files are sent to the client one at

  • a time. Here is the secret to our modern audio streaming protocols is it's more of a series

  • of downloads than it really is a stream. But we'll refer to it as streaming anyway. The

  • two most popular audio streaming protocols today are HTTP lye streaming, or HLS, and

  • dynamic adaptive streaming over HTTP, MPEG DASH. It was created by Apple to support streaming

  • to mobile devices and it is default on all Mac OS and Apple devices. And MPEG DASH was

  • a direct alternative to HLS. It was created by the forum who want to make MPEG DASH the

  • international streaming. Let's look at them side by side. HLS takes the MPC, AAC, AC 3,

  • or EC 3, encodes them into fragmented MP4 files. Those segmented files are in a play

  • list. If you have multiple bitrate streams, each stream will be in a media play list and

  • all of your media play lists will be in a master play list. With MPEG DASH, it is agnostic,

  • in theory you can convert any into MPEG DASH. It will be fragmented into a fragmented MP4

  • file. That will be displayed in an XML manifest file called a media presentation description.

  • Okay. We've talked about what files will be used and what they'll be segmented into, but

  • how do you get it there? You've got this audio file. What tools allow you to convert the

  • audio file? Well, you've got options. But most of these options are paid options. Except

  • for FFmpeg. Which is an open source demand line tool that among other things allows you

  • to convert audio files to be HLS or MPEG DASH. However, I founded learning curve for FFmpeg

  • to be pretty steep. And a lot of the documentation for HLS and MPEG DASH were for video streams.

  • Instead I used Amazon elastic trans coder. It's an AWS offering that converts files of

  • one type to another. In our case, we're taking an audio file and converting it to be used

  • with HLS and MPEG DASH. It's pretty much plug and play. You tell Amazon elastic trans coder

  • what type of files you have and what type of files you want and it outputs the stream

  • for you. And even though it's easy to use, it's not a free service. So, if you were going

  • to be converting a lot of files, it may be worth your time to learn more about an open

  • source alternative like MPEG DASH. My workflow when working with Amazon Elastic Transcoder

  • was to upload to an AWS object store. I told Amazon Elastic Transcoder where my audio file

  • was and what settings I needed it to convert my audio files to. And Amazon Elastic Transcoder

  • output my streams into that same S3 bucket. And I downloaded them for us to explore. This

  • is the basic set of files you would get with an HLS stream. And it kind of looks like a

  • lot. But we're going to break it down into four groups. In the top left, the master play

  • list. In our case, we have two bitrate streams represented and they will be linked out from

  • the master play list. And then in the top right you'll see those media play lists which

  • have each bitrate stream. And those will contain all of our links to our transport stream files

  • which are the fragmented audio files represented in both the bottom left and the bottom right.

  • On the bottom right we have our 64K bitrate stream segmented audio files. And in the bottom,

  • oh. Did I get that backwards? I'm not really good at right and left. But in the bottom

  • section you'll have your fragmented audio files. We'll take a closer look at those so

  • you can see really what's in it. This is the entirety of the HLS master play list. It contains

  • information about the specific bitrate streams and links out to those media play lists that

  • represent the streams themselves. Let's look at the 64K bitrate stream media playlist.

  • It has even more information about the stream including caching information, the target

  • duration of each segmented audio file, and most importantly, links out to our transport

  • streams. This is what one of those fragmented audio times looks like. And there's something

  • a little interesting going on here. If you'll notice, it's color coded and I kept trying

  • to figure out why. But then I realized a transport stream has the file extension .ts. And something

  • else has the file extension .ts, TypeScript. Ignore the colors. It's just a binary coded

  • file. Now our MPEG DASH audio stream has fewer files and looks more manageable. But it's

  • similar. We have our media presentation description, which is an XML manifest file which contains

  • all of our information about the stream. Then below we have our two segmented audio files.

  • All of the segments are encapsulated in a single file, but within them there are segments.

  • That's why there are fewer files in the MPEG DASH audio stream than in the other audio

  • stream. Look at the description. See a lot of stuff here. But there are three important

  • elements. All bitrate streams are represented in a representation tag. And then all bitrate

  • streams are enclosed in an adaptation set. Within the representation tag, we do have

  • our URL to our audio files. And taking a look at one of those audio files we'll see if looks

  • fairly similar to the segmented audio file we saw with HLS. Minus the color coding because

  • it's a .MP4 versus .TS. visual studio is not confused in this case.

  • Earlier we talked about progressive streaming which is streaming an entire audio file in

  • one two. We used an audio element and a source attribute with the path of our audio file.

  • With MPEG DASH and HLS, it's very similar. But instead of having the path to our audio

  • file, we have the path to the master play list for HLS or media presentation description

  • for MPEG DASH. We're going to take a hard left here and we're gonna talk about the second

  • topic in my talk. Which is TCP congestion control. And TCP is a transport layer protocol

  • and it has mechanisms in both its sender and receiver which are defined by the operating

  • systems of each to react to and hopefully avoid congestion when sending packets over

  • the wire. And they are called TCP congestion control. And today we talk about packet loss

  • congestion control and why it isn't so great. And more specific, the congestion window and

  • duplicate acknowledgment in packet loss based congestion control. Before we get started,

  • somewhere terms, bandwidth is the rate at which data can be sent. And throughput is

  • the rate at which data can be received. The congestion window is a TCP variable that defines

  • the amount of data that can be sent before the acknowledgment is received by the sender.

  • Let's say you have a user who has requested your audio file from the server. Your audio

  • packets travel down the network stack, across the physical layer, up the data link layer

  • in the network layer and arrives at the transport layer and unfortunately there's congestion

  • right before we reached our destination. Now, traffic congestion and network congestion

  • have very similar beginnings. Either too many cars or too many packets have entered the

  • roadway and there's nowhere for them to go. With traffic, you have to wait it out. Luckily

  • for us, TCP congestion control allows them to flow over the wire, even during congestion.

  • And before we get to the specifics of these TCP congestion control algorithms, let's talk

  • about the TCP happy path. We're going to start with a single packet sent from the sender

  • to the receiver flowing through the receiver's buffer. And being acknowledged by the receiver

  • and having an acknowledgment packet sent back to the requester. We talked about the congestion

  • window, the amount of data before a sender receives an acknowledgment. Another way of

  • thinking about the congestion window is as a sending rate. As the sender receives acknowledgments,

  • the congestion window grows. And as the receiver's buffers fill and they drop all excess packets,

  • the sender responds by shrinking the congestion window. A second way of thinking about the

  • congestion window is as a bucket. And as packet loss occurs, the bucket shrinks. And as acknowledgments

  • are received by the sender, the bucket gross. There's a slight oversight in the bucket explanation