join

Youtube Audio Reverse Engineering

Brandon Drury —  January 5, 2011 — 2 Comments

YouTube-Audio-Reverse-Engineering

The mission: Figure out the best possible settings for maintaining ultra audio quality on Youtube through a variety of experiments.

Possible Factors: RMS Level, Peak Level, Video Resolution, 20Khz sin wav addition, and probably more.

What Am I Up To?

Source Files
Here are the original mixes in 44.1Khz/16-bit as well as the original Test #1 video in .mp4 format before Youtube got a hold of them. Download Source Files

Warnings
- I seem to have made the intro a bit quiet. It’s 3 seconds long or something. DO NOT crank your studio to hear that because when the drums kicks in it blows you all to hell.
- You may need to rethink your views on “dynamics” a bit in regard to the Loudness War because of my initial warning.
- Sometimes the Youtube player doesn’t load all the videos. I presume this is due to me having more videos than I’m supposed to on this page. If there is a big white box, hit refresh.

Question #1: What does the peak volume of the audio file being uploaded have an effect on what Youtube does to it?

Hunches: My guess is there is a sweet spot where Youtube won’t do as much damage to an audio file in terms of peak level, but I suspect they will increase the volume of Test #3, the -12db peak file so that it’s closer to Test #1 which is as full volume.

Youtube Reverse Audio Engineering Test #1
Audio mixed at -8dB RMS, -0.1dB peak, 320k AAC, uploaded at 1080p

Youtube Reverse Audio Engineering Test #2
Audio mixed at -8dB RMS, -6dB peak, 320k AAC, uploaded at 1080p

Youtube Reverse Audio Engineering Test #3
Audio mixed at -8dB RMS, -12dB peak, 320k AAC, uploaded at 1080p

Conclusion For Question #1
When I listen back in 1080p, it seems that Youtube really hasn’t done much. I’m not listening on my monitors and probably could be listening more critically, but I don’t really hear any artifacts in Test #1 that would show any compression, normalization, or processing of any kind.

Also, when listening at 1080p, it seems the volume levels have more-or-less been left in tact. I don’t have any way to meter the way I would in Cubase 5, but Test #3 is definitely lower in level than Test #2 which is definitely lower in level than Test #1.

When listening at standard 360p, the audio is DEFINITELY been mp3 munched, but I’m not hearing any deviations from the source files.

Summary For Test #1
Youtube respects your decisions in level when it comes to music mixed at -8dB RMS no matter if it’s -0.1dB, -6dB, or -12dB with files uploaded at 1080p.

Question #2: Do the conclusions from Test #1 hold up with files mixed @ -12dB RMS?

Youtube Reverse Audio Engineering Test #4
Audio mixed at -12dB RMS, -0.1dB peak, 320k AAC, uploaded at 1080p

Youtube Reverse Audio Engineering Test #5
Audio mixed at -12dB RMS, -6dB peak, 320k AAC, uploaded at 1080p

Youtube Reverse Audio Engineering Test #6
Audio mixed at -12dB RMS, -12dB peak, 320k AAC, uploaded at 1080p

Conclusion For Question #2
Again, it appears that Youtube leaves the levels to us. It’s not going to help us if the levels are low. That’s GREAT for us audio guys….not so great for video guys who may not have as much control over their levels.

At the moment, I think it’s safe to say that there is no reason to come up with any wild tricks to upload music. While we’ll test it anyway, I can’t imagine what benefit blending in a 20Khz sin wav would do when uploading a 1080p file.

Question #3: Does Youtube do more MP3 Munching to lower quality video uploads?

We are going back to Test #1 (-8dB RMS, -0.1 peak) @ 320k ACC, but I’m rendering the file at 720p and 320p keeping the audio at 320k AAC. This test is designed to see if people with SD cameras can still get good quality in the audio department.

Youtube Reverse Audio Engineering Test #7
Audio mixed at -8dB RMS, -0.1dB peak, 320k AAC, uploaded at 720p

Youtube Reverse Audio Engineering Test #8
Audio mixed at -8dB RMS, -0.1dB peak, 320k AAC, uploaded at 360p

Conclusion For Question #3
First off, I want to point out that the file size did not decrease with the smaller resolutions. I’m not sure how that is possible. Either I goofed something up or the fact that I’m using vector graphics instead of real photorealistic images had a factor. So if you are doing this type of graphic work, go head and keep it at 1080p.

When listening/watching @ 360p, I don’t really hear a difference between Test #1 and Test #3.

Summary: If you want to be in the high-quality audio end, you’ve got to upload @ 1080p. In order for your users to hear the high quality audio, they must be watching the 1080p version. 720p does offer significant audio improvement over 360p. I do believe I hear a difference between 1080p audio and 720p audio, but this is nowhere near as large of gap between 720p quality and 360p quality.

Question #4: If I’m stuck with a 360p video, is there any point in keeping the audio at 320k?

For Test #9 I’m uploading a 360p video file the same as Test #8, but I’ve dropped the quality down to 128k. We’ll see if it sounds any different than the 320k audio file version.

Youtube Reverse Audio Engineering Test #9
Audio mixed at -8dB RMS, -0.1dB peak, 128k AAC, uploaded at 360p

Conclusion For Test #4
I’m not hearing a significant difference between Test #8 and Test #9. So if you are uploading a 360p video, don’t bother going any higher than 128k on the encode.

Question #5: Is there anything to gain from blending in a 20Khz sine wave with the audio mix?

Let me explain my methodology. I took the finished mix that you’ve heard a million times now and inserted it into a new stereo track in Cubase. I think created a new track with the Cubase signal generator thingy found in the Tools menu of the VST plugins. I set it to output a 20Khz sine wave at -0.1dB, matching the peak level of the mix. I then linked their faders and pulled them down until they peaked at -0.1dB on the stereo bus.

I’m not sure if this is the “correct” way of doing it, but the rumors have been quite vague.

Youtube Reverse Audio Engineering Test #10
Audio mixed at -8dB RMS, -0.1dB peak, 320k AAC, uploaded at 1080p, 50/50 blend of Mix and 20Khz Sine Wave

Is it better to upload a 44.1Khz file or a 48Khz file?
Youtube Reverse Audio Engineering Test #12 (Don’t ask what happened to test #11) 
Audio mixed at -8dB RMS, -0.1dB peak, 320k AAC, uploaded at 1080p, rendered to 44.1Khz in video editing software.

Many video editing programs are really in love with this 48K sample rate thing. I can’t imagine a single good reason why, but what else is new. I do know that resampling audio is NEVER a good idea and the people that are tracking to high sample rates are almost always summing into analog before going into 44.1Khz or 88.2Khz.

Conclusion For Test #5
Oh, I don’t know. On the computer speakers I can’t tell a difference on this one, either. At 1080p it sounds about the same. I’m not sure if there are any repercussions on the video end for using a 44.1Khz sample rate (probably), but the sonics seem about the same here.

Youtube Audio Conclusions

  1. Youtube doesn’t care about your peak volume
  2. Youtube doesn’t care about your RMS volume
  3. Youtube doesn’t do anything special when you blend in a 20Khz sine wave
  4. Youtube ONLY saves their best quality for the 1080p uploads with 720p videos coming in a close second.
  5. 360p videos are automatically reduced to something resembling 128k mp3s no matter what they were originally uploaded at.

Be Careful What Your Video Editing Software Is Doing
After further investigation, I now realize that the clipping and oddball compression I was hearing in some of my previous videos was done by my cheapo video editing software. This stuff resembled something you could purchase at Best Buy. (That’s about like imagining your mother-in-law naked.) It came with my Kodaz Zi8 I was using there for a while.

It’s highly recommended that you deal with your audio in an AUDIO program and your video in a VIDEO program. While I’m still a total idiot when it comes to video (and life in general) I’ve not seen the tools I feel are necessary to get my audio in tip top shape in any video editing application.

Saved Comments

tb-av – 02-11-2011, 05:46 PM Edit Reply
Well let me put it this way. My headphones were sitting on my desk and plugged into my laptop at levels where I can listen to most anything on YT. When that first vid came on I had to grab the vol control. After I got it under control and actually tried to listen to the clips, I had to have my vol control on the “next to off” position. Every one of yours was that way but most other stuff on YT is not.

This is screen cap is a few notches up and absolutely as loud as I would want to listen that. Most things on YT I would be able to run maybe 3/4 or more.

Hmmm. This shouldn’t have showed up yet.
The magic of the InterWeb

onlinemusic – 02-17-2011, 04:47 PM Edit Reply
Thanks for the research. Valuable stuff.

One way to quantitatively compare the results would be to download the resultant Youtube vids, extract the audio using MPEG Streamclip (free and awesome video/audio converter from Squared 5 – MPEG Streamclip video converter for Mac and Windows), then bring the audio into Cubase for analysis.

Another analytical tool is MediaInfo from MediaInfo
MediaInfo supplies technical and tag information about a video or audio file. You can just drag a file onto the GUI, and it will tell you more than you want to know about the codecs, compression parameters, bitrates, etc.

Mackanov – 02-17-2011, 06:11 PM Edit Reply
The probable reason your videos were the same size at 720p and 360p is because they were encoded at the same bit rate (for instance 2Mbps) instead of the same quality level. Sounds plausible?

Oh, what about 480p?

Michial – 02-18-2011, 11:00 PM Edit Reply
I PMd you and as I don’t know how that works for sure on here I will mention that I am also TheMichGuff who commented on your latest video. I’m using Reaper and when I render I use a wave for Youtube but I don’t actually remember whether it was 8bit or 16. Whether this is because of Moviemaker 5.6 or because of what Youtube excepts from me I don’t know. I have tried both MP3s and wave and had the best sound with wave. I just mixed to what sounded good at the time to me, rendered to wave, added the wave to Moviemaker, did a quick scratch video and uploaded. When the weather gets nice again I will be concentrating on shooting better video. The weather here right now is -31C and w/ windchill -48.

Michial – 02-19-2011, 12:01 AM Edit Reply
Thankyou onlinemusic. I downloaded MediaInfo and it works as you said. According to it I was uploading 16 bit audio. I also discovered that I have dbpoweramp installed (Release13.5) and if you hover your cursor over the title of your video or audio file on your computer it displays some good info although the two are in disagreement over what version of MovieMaker was used. Anyways I’m interested to learn more about all this from everyone. Thanks in advance.

untitled001 – 02-22-2011, 02:39 PM Edit Reply
I thinhk the beard looks good..

Now I’m going to read/watch the rest of the article.

ChristopherW – 02-22-2011, 03:06 PM Edit Reply
Surely a more definitive way to tell if YouTube is messing (significantly) with the audio would be to sample playback, time align and phase invert both the channels on the recording… If you were concerned about internal noise introduced in the analogue sampling on playback, you could always go and download the raw FLV from YouTube using one of the many ripper tools available, then extract the AAC soundtrack from the file with a tool like FLVExtract. Then you’d have a bitperfect AAC file, as (re)encoded by YouTube. Comparatively simple to drag into your DAW and compare that way.

YouTube *will* leave your audio stream alone, as it will your video stream, if you follow a very specific set of parameters and encode it in a particular way. However that only applies to the highest quality audio & video and it will still reencode – and transcode – for all other formats.

You’ve also omitted a couple of other quite important “things-to-check” with this article:

1: music with lots of bass frequency energy (and from what I’ve heard, lots of synthesised basslines) is absolutely MURDERED by YouTube’s AAC codec. It really does it no favours whatsoever – bass freqs become an indistinct, rumbly mush even with 720p and 1080p uploads. However, music with acoustic / natural / amped bass guitar lines is equally effected. I have wondered for a while now whether they optimised their AAC codec to sacrifice bass and 10kHz+ quality to optimise encoding bitrates for the 1-10kHz bands (given that’s where most acoustic energy seems to be is in the majority of crappy webcam and digital camera uploads using consumer gear).

Your audio clip (as heard on all of the test videos’ soundtracks) was very bass-light, and unless you’ve watched quite a few videos (e.g. dubstep or Drum & Bass videos) I wouldn’t have expected you to have already picked up on this. However it definitely merits further investigation.

2: another equally valid test would be to upload 20Hz-20kHz sine sweeps at a variety of bitrates. That way you could quickly hear if YouTube’s audio reencoding was doing anything destructive to shifts in frequencies or sound in particular frequency bands.

dudermn – 02-22-2011, 03:24 PM Edit Reply
Good article, and thanks for taking the time to do that. Though a more impressive way would be to bleed through ‘monitors’, record, do a spectrum analist . Keep all the outputs identical and also don’t move anything in the studio and record the comparisons. It could just be that youtube has a few standards for audio streaming, and should just be seen as a host that streams.

On to the good stuff.

1. Youtube now supports 4k video(, who cares.)
2. They use H.264/MPEG-4 AVC.
3. Audio is AAC at 44.1 khz with a range of 100kbps – 256kbps.
There are two articles that explain audio on youtube that may be useful, and may help discover a few more secrets :
1.> Bigger and Better: Encoding for YouTube 720p HD
2,> Approximate Youtube Bitrates » Ad Terras Per Aspera
8 . Youtube is believed not to support 1080p or 720p (here’s an article explaining in detail the issue Youtube’s 1080p – Failure Depends on How You Look At It | Trevor*Greenfield)
And youtube does have it’s own wikipedia page, which kinda helps to explain whats going on….(though it is just another way to advertise.)
Have a good day mates. And great beard.

jazzroc – 02-23-2011, 04:45 AM Edit Reply
Originally Posted by ChristopherW
Surely a more definitive way to tell if YouTube is messing (significantly) with the audio would be to sample playback, time align and phase invert both the channels on the recording… If you were concerned about internal noise introduced in the analogue sampling on playback, you could always go and download the raw FLV from YouTube using one of the many ripper tools available, then extract the AAC soundtrack from the file with a tool like FLVExtract. Then you’d have a bitperfect AAC file, as (re)encoded by YouTube. Comparatively simple to drag into your DAW and compare that way.
YouTube *will* leave your audio stream alone, as it will your video stream, if you follow a very specific set of parameters and encode it in a particular way. However that only applies to the highest quality audio & video and it will still reencode – and transcode – for all other formats.
You’ve also omitted a couple of other quite important “things-to-check” with this article:
1: music with lots of bass frequency energy (and from what I’ve heard, lots of synthesised basslines) is absolutely MURDERED by YouTube’s AAC codec. It really does it no favours whatsoever – bass freqs become an indistinct, rumbly mush even with 720p and 1080p uploads. However, music with acoustic / natural / amped bass guitar lines is equally effected. I have wondered for a while now whether they optimised their AAC codec to sacrifice bass and 10kHz+ quality to optimise encoding bitrates for the 1-10kHz bands (given that’s where most acoustic energy seems to be is in the majority of crappy webcam and digital camera uploads using consumer gear).
Your audio clip (as heard on all of the test videos’ soundtracks) was very bass-light, and unless you’ve watched quite a few videos (e.g. dubstep or Drum & Bass videos) I wouldn’t have expected you to have already picked up on this. However it definitely merits further investigation.
2: another equally valid test would be to upload 20Hz-20kHz sine sweeps at a variety of bitrates. That way you could quickly hear if YouTube’s audio reencoding was doing anything destructive to shifts in frequencies or sound in particular frequency bands.
I whole heartedly endorse this.
Brandon, thanks for starting this enquiry.

headrheum – 02-23-2011, 05:48 AM Edit Reply
“Many video editing programs are really in love with this 48K sample rate thing. I can’t imagine a single good reason why, but what else is new”.

Have you heard about DVDs and BluRays and stuff?
They work with 48kHz Audio.
That’s why.

ChristopherW – 02-23-2011, 06:18 AM Edit Reply
Originally Posted by headrheum
“Many video editing programs are really in love with this 48K sample rate thing. I can’t imagine a single good reason why, but what else is new”.

Have you heard about DVDs and BluRays and stuff?
They work with 48kHz Audio.
That’s why.
As headrheum says, 48kH was also chosen I believe partially because it’s a whole number samplerate – using 44.1kHz would lead to timesync drift and quantisation problems because the SMPTE clock would have to be constantly dividing by a fractional samplerate which is never desirable.

Why the music industry on its own only deals with 44.1kHz is beyond me, it’s some kind of strange hangover from years gone by. 48kHz universally would make much more sense. Sony and Philips argued amongst themselves and came to a de facto standard which allows audio to 20kHz with a 2kHz transition band, but it still doesn’t explain the .1kHz. It smacks to me somewhat of obstinate developer syndrome.

UncleWaldo – 02-24-2011, 04:09 PM Edit Reply
Brandon, two things:

1. You’ve got a Google Slap going on. Whenever I try to enter the site I get the, “This site is going to harm your computer” window. You might want to check that out.

2. After listening to all of your examples I’ve come to my own conclusion: crap in – crap out. You know what you’re doing. We might here subtle differences in distortion but for the vast amount of listeners, they’ll never catch it. Record it correctly in the first place and YouTube will provide a somewhat accurate playback of it.

Chadfish – 03-02-2011, 02:38 PM Edit Reply
I’ve noticed that before, anything 480p and above sounded “good” on YouTube. More recently I noticed that the “good” slot moved up to 720p, leaving the 480p stuff sounding dull. I hear no difference between 1080p and 720p. Most of my videos now go up ad 1080p, and youtube makes multiple versions and that’s that. I gave up trying to annotate “Listen at 720p for better sound!” because it’s a waste. People who care know to select a higher resolution.

I have a Vimeo account and the audio sounds great on whatever. If I want someone to hear a video I embed the vimeo version into Facebook or a forum. I still post everything to youtube because that’s the popular way to spread your seed, but when I embed something on an audio forum, its Vimeo. I do mic shootouts, and even at it’s best youtube still takes something away. For something as subtle as this shootout, you need the best available:

tb-av – 03-02-2011, 06:33 PM Edit Reply
While I appreciate the work that had to go into making the test, it seems like an effort to create a solution for a problem that doesn’t exist.

If you RTM, youtube tells you to use 256K encodes so why start with 320K? Why put it in their hands.

Is YouTube really your choice to deliver the nuances of your audio to your intended listeners? Hopefully not.

Can you get really good sounding audio on YouTube and get your point across. Absolutely. Just use their settings and playback at 720p or higher. Here is one example… http://www.youtube.com/watch?v=RT9PQgMtkDo

If you need better than that shouldn’t you be considering something other than a free streaming service?

Brandon Drury

Posts Twitter Facebook Google+

Brandon Drury quit counting at 1,200 recorded songs in his busy home recording studio. He is the creator of RecordingReview.com and is the author of the Killer Home Recording series.
join

2 responses to Youtube Audio Reverse Engineering

  1. Actually, 1080p has the exact same audio bitrate as 720p youtube videos. Only the video bitrate gets upped.

  2. funny… what happened with your conclusion 1 and 2 dude?

    all your videos have fallen in peak and rms drastically

Leave a Reply