Saturday, July 15, 2017

The MSU-1 Volume Fiasco, Explained

If you've ever tried one of the many amazing SNES hacks utilizing the MSU-1 audio coprocessor, you may have run across information about volume levels referencing "hardware" versions and "emulator" versions of the same hack, as well as "boosted" or "non-boosted" audio files, and may have been confused by the complicated, and often conflicting information about all of these different variants.  I've explained the issue on several forums, but I wanted to go ahead and do a single, unified write-up explaining the issue, as well as the "correct" way to do things.  Hopefully this will help clear up the confusion for people new to the MSU-1.

First of all, before I dig into the full explanation, I'll just cut to the chase.  THERE IS NO LONGER ANY NEED FOR SEPARATE VERSIONS.  You only need one version, and that same version works everywhere.  Many MSU-1 hack authors have fixed their hacks and released them as a single fixed version.  However, since there are still a handful of un-fixed patches floating around, if you happen to have a patch with both "Hardware/SD2SNES" and "Emulator" versions, the correct combination is to use the "Hardware/SD2SNES" version of the patch with non-boosted audio files.  However, if you can't find non-boosted audio files, then you can also use boosted audio files with the "Emulator" version of the patch, but that will not sound as good (see Problem #3 below).  If you're not sure whether your audio files are boosted or not, try out the Hardware/SD2SNES patch in an emulator.  If the audio sounds really loud/distorted, you have boosted audio files.

Also, if you have an older-revision SD2SNES, you'll need to go into the menu and set the MSU-1 audio boost to max (see Less Hacky Workaround #1 below).

Now, to explain the issue...

First, we had higan. Audio was mixed properly, and .pcm audio files were more-or-less properly normalized. Let's call this the correct patch and correct audio files.

Problem #1:

The SD2SNES played MSU-1 audio too quietly. There was a lot of speculation as to why, but eventually ikari realized that the DAC output was high impedance, and the SNES analog mixing inputs were low impedance, which caused reduced volume. This means the problem is in the hardware, and can't be fully fixed without a board revision.

Hacky Workaround #1:

SNES audio tends to be somewhere in the range of -10dBFS to -20dBFS RMS. This leaves a fair amount of headroom. Therefore, it's possible to simply amplify the .pcm files to remove that headroom, allowing them to be more or less the right volume when played on the SD2SNES. Let's call this boosted audio files.

Problem #2:

These newly boosted audio files are much too loud when played on higan. Maintaining 2 separate audio packs is not only a logistical pain in the butt, it's also a huge amount of storage and bandwidth increase.

Hacky Workaround #2:

Instead of releasing separate audio packs, just modify the .asm code so that any time you write to the MSU-1 volume register, you write a smaller value instead of the original. Patch files are small, so uploading 2 versions of the patch is much easier than 2 versions of the audio files. Through rough trial-and-error, it was mostly settled on $60 being the value used for "full volume" and $30 for "half volume" with anything else such as fade effects being adjusted relative to those values. Let's call this the "emulator version", and the original is now the "hardware/SD2SNES version". These are sometimes also referred to as the "FF version" (aka the hardware version) and the "60 version" (aka emulator version). To reiterate, the hardware/FF version is the original, "correct" patch.

Less Hacky Workaround #1:

ikari realized that he could actually do this same audio boosting in firmware, in realtime, between reading the file and sending it to the DAC. This would allow using "correct audio" files and the "hardware/FF" patch, and still get more or less the correct audio levels. This essentially eliminated any need for HW#1, which in turn eliminated the need for HW#2.

Non-Hacky, Correct Fix #1:

SD2SNES Rev. H includes a unity-gain op-amp on the output of the MSU-1 DAC, which solves the impedance problem, fixing the volume level. Along with LHW#1, all revisions of the SD2SNES can now output the correct volume levels. HW#1 and HW#2 are completely unnecessary.

Now, technically, with HW#2, boosted audio and the "emulator/60" patch cancel each other out, resulting in the correct levels, so why not just use that version for everything? After all, a lot of patch creators were really annoyed at having to go to all the trouble of boosting their files for HW#1, along with re-uploading everything and writing new documentation, and they really didn't want to go through all of that again. Unfortunately, that leaves us with...

Problem #3:

Actually, this is several problems. First of all, most of the boosted audio files were actually peak normalized to 0dBFS. On the one hand, thankfully this wouldn't cause any clipping, but it does mean that the audio files aren't actually normalized relative to each other. If you don't understand the difference between RMS and peak normalization, the ELI5 version is that with peak normalization, the ONLY THING that matters is the single loudest sample in the entire track. Imagine you have 2 tracks, one is really loud all the way through, and one just has a loud cymbal crash at the end, while the rest of the track is really silent. Peak normalizing these two tracks to the same level means that loud crash at the end will be the same loudness as the entire loud track, so if you listen to them side by side, the entire quiet track will be much quieter than the loud one. This is an extreme example, but if you've ever looked at a waveform visually, this is basically true of any track with a lot of really large "spikes" in volume (the "quiet" tracks) vs tracks which are very "dense" and consistently the same volume. The "spiky" tracks will end up sounding much quieter. RMS normalization accounts for this by "averaging" the volume over time, which gives a better comparison between tracks. Basically, long story short, peak normalized tracks are no longer properly normalized relative to each other.

Now, that's assuming that the tracks were peak normalized to 0dBFS. Some people didn't do that, they just "cranked up the volume", which actually resulted in clipping, permanently damaging the files. The only way to fix them is to reconvert from the original source files.

Also, some games don't actually have a single normalized level for their entire OST, and instead use a wide dynamic range. Super Metroid is probably the most extreme example I've found, but a lot of the JRPG's do as well. You completely lose this dynamic range in the boosted tracks, which really kills a lot of the impact of that dynamic range (imagine the Arrival on Brinstar track being as loud as the Ridley fight... it's just wrong).

So, you could stick with HW#2, but it's ugly, and it's still wrong for several reasons. Thankfully, I've gotten a lot of people on board with understanding this and have been working to remove and replace boosted audio packs with properly normalized ones. It's been a bit of an uphill battle, it's not 100% complete, and in some instances I've just had to do the work myself, but the hardest part was convincing people that this was really the right way to do it, and on that front, at least, I've pretty much succeeded. This means no more need for separate patches, or hacky workarounds, the ONLY workaround that needs to be mentioned is the MSU-1 boost option in the SD2SNES menu for older hardware revisions, OR the op-amp installation mod, which essentially upgrades the hardware to Rev. H. Then, all you need is the correct patch (aka "hardware/FF" version), and the correctly-leveled audio tracks, and we can go back to (mostly) pretending that this whole fiasco never happened.

No comments:

Post a Comment