The sound is different whether I render a wav or mp4

micha · July 2, 2024, 9:38am

Is it normal that the sound is different depending on whether I render an audio wav or video mp4?

I have switched on a highpass filter to remove the unnecessary low tones.

Here is the frequency analysis from Audacity - everything is correct and as desired:

Here is the frequency analysis of the mp4 video - there is suddenly a lot of audio garbage in the low range:

I wonder about this phenomenon. What is going wrong?

micha · July 2, 2024, 10:07am

Yay - problem solved!
After several attempts with all possible render formats:

To create clean audio with no false and unnecessary low frequencies at the bottom, you need to do the following in the render setting:

More options
Audio > Rate Control > CBR and a high Bitrate of 384k: everything will be fine, as it should be.

grafik696×486 50.6 KB

There are certainly other settings that are just as good.

It’s great that Kdenlive exists!

berndmj · July 2, 2024, 2:10pm

Thanks for sharing your solution!

Ron · July 2, 2024, 5:38pm

Can you actually hear any of that “junk”? At -80db below 50Hz you’d need pretty good headphones in a remarkably silent room …

CBR is almost never what you want with a modern codec unless you’re streaming down a strictly constrained pipe, and even then … - and 320kb/s for 2 channels was usually cargo cult overkill even for poorly performing codecs like mp3.

If you’re optimising for what a spectrum graph looks like and bit exactness, you’d get better results from something like flac. If you’re using a lossy codec then you’re optimising for what it sounds like. Looking at graphs is almost always going to be misleading in that case.

micha · July 2, 2024, 6:09pm

Hello @Ron,
of course I don’t hear this acoustic garbage, fortunately.

What bothered me here is just the fact that I use a highpass (= low-cut) filter to remove useless stuff from the audio signal. This works well with Glame Lowpass Filter.

And when I then render videos in mp4, these frequencies appear that should be completely cut off. I was very astonished. To check, I then rendered wav, and again everything was good and clean, just as it should be.

At first I thought, what nonsense is Kdenlive doing here? I had to search for a long time before I realized that it was the inadequate compression of the aac.

I’m not being critical here at all, I’m not looking for the ultimate setting and I’m happy with any setting that doesn’t produce any additional interference in the lows, regardless of whether it’s audible or not.

micha · July 2, 2024, 6:15pm

Hello Ron,
do you know which is better: CBR or VBR for audio? VBR surprisingly makes slightly larger files.

Ron · July 3, 2024, 3:39am

Who says that it’s “useless stuff”? If it’s not actually audible noise like power supply hum or something else undesirable you can hear, then it’s more likely to be high end (beat) harmonics of sounds you do want, and removing them is just going to distort its colour and timbre and depth and make it seem hollower to anyone listening with good ears and good quality equipment.

The most pleasant sounds are very rarely “clean”

You don’t say what codec you’re using in WAV, so I’m assuming it’s most probably uncompressed 16-bit PCM - which means the reconstruction will be “perfect” - garbage in, garbage out. Whatever damage you did with filtering out parts of the available bandwidth will be perfectly retained in the reconstruction (but your listening equipment (and the environment around it!) is almost certainly still going to re-introduce sounds across the whole spectrum in the reproduction, Because Physics).

AAC on the other hand, which is the usual default to pair with H.26x video in mp4 containers is a lossy codec. Which means much of its compression comes from “intelligently” discarding information that most people won’t be able to hear anyway, and reconstructing what it does keep to sound as imperceptibly different from the original as possible.

So there’s no sample plot you can look at to decide whether or not it did that well, the only way to know is to listen to it. And with most modern lossy codecs worth their salt, once you get above about 64kb/s per channel, even people with really good ears and really good equipment become uncertain enough about whether they can actually hear a difference that the only way to be sure is blind ABX testing what they hear.

There’s even cases where it’s highly beneficial to add noise. Look into dithering to see how carefully selected “random” noise can greatly improve the perceptual quality of reconstructed audio and video - and can in fact even increase the real dynamic range well beyond what might naively be calculated from the number of bits per sample in the reconstruction.

If you want to optimise for quality in compressed A/V, start by using the best available codec (which for audio would be Opus), don’t filter anything you can’t hear that you wish you couldn’t, don’t use a higher bitrate than you can actually hear a desirable improvement if you increase it, and don’t scare yourself unnecessarily by trying to look at what is in the sausage to judge its quality. Just worry about how it actually tastes, it was the codec designer’s problem to ensure there’s nothing toxic in there, not the end user’s.

As for CBR vs VBR - about the only use case for CBR in a modern codec is streaming with very low latency when the amount of bandwidth available is guaranteed but strictly limited.

In pretty much every other case, in any codec worth considering, VBR will be superior. Because it will save bits that CBR would be forced to “waste” on very simple to encode segments, and have the ability to spend those saved bits on the very difficult to encode segments, improving their perceptual quality while still preserving an “average” bitrate over the whole corpus lower than what would be needed to get that quality with CBR.

With single pass encoding that means some samples will use more bits than CBR at the “same rate” and some will use less, since the bitrate was calibrated over a large corpus. If you care about size and it’s “too large”, drop the rate a bit until you get the balance of size and quality you want.

There’s intermediate methods of rate limiting, like “constrained VBR” which puts a hard cap on the size of the largest possible packet - but they are all trading quality for hard bandwidth limiting.

For “master” audio that you want to process and edit, use uncompressed PCM in WAV or similar, or a lossless codec like FLAC. Else you’ll get new degradation every time you transcode lossy to lossy. But for the highest quality in the smallest size for “final” renders, you’re very rarely going to need more than Opus at 128kb/s (for 2 channel audio), and even at lower rates than that, most people won’t be able to tell the difference in most cases from your original lossless audio.

Don’t try to voodoo this like many people do. Listen to it and trust what your ears actually hear.

micha · July 3, 2024, 9:42am

Hello Ron,
I am overwhelmed by your detailed explanations.

You seem to be a great audio specialist? That would be great for Kdenlive and the forum, where audio is treated very neglected. Apparently nobody needs it except me.

I would love to get a better understanding of the whole complex.

But I don’t want to give the impression that I have particularly high expectations.

In terms of sound, aac with medium bit rates is perfectly adequate for me.

As I said, I only became suspicious because low frequencies appear in the rendered video, despite the high-pass filter, which were not present in the wav. By increasing the bit rate, the sound in the video can also be rendered with aac without this, I’ll call it audio garbage.

I only make my true demands on my artistic work. However, I would like to reproduce it as normally and audibly as is currently possible with Kdenlive.

That leaves two questions:
Wouldn’t you filter away the low frequencies created by the microphone and the reverb during recording (with Glame Highpass Filter)?

You prefer the VBR - but with the settings that Kdenlive offers, the result is almost always the same. What would you set in terms of quality?

grafik

My audio is only spoken language, it should be as understandable as possible and, of course, sound as pleasant as possible.

I am very happy to receive further suggestions from you.
Michael

Ron · July 4, 2024, 5:14am

It’s great that you’re doing a deep dive into what all the knobs do, and talking about what you find - just don’t confuse the fact that there are knobs with a need to press them all all the time, let alone to spin some up past 11.

Good defaults should mean you rarely need them.

By increasing the bit rate, the sound in the video can also be rendered with aac without this …

With enough excess of effort you can also write letters with a paintbrush, but most people skilled in the art of (modern latin script) handwriting would say you’re just using the wrong tool for the job. If you need or want lossless audio, use a lossless codec. Trying to make a lossy codec behave like a lossless one is hammering a square peg into a round hole. You can keep hitting it harder, but it won’t be a square peg or a good round peg if you do.

Wouldn’t you filter away the low frequencies

Why would you want to if you can’t hear a problem they are responsible for? What are you expecting the benefit of that to be? If you remove all the power from that part of the spectrum, something else you don’t control is just going to fill that void …

I can’t advise you offhand on what settings in the kdenlive preset options translate to, since I’ve never found a pressing need to change its defaults yet. You’re going to need to see what they translate to in terms of options passed to ffmpeg, and then again what each of those options means for each specific codec you want to use, and then do a lot of listening to see whether changes you make do in fact improve or worsen things perceptibly compared to the defaults. I suspect there’s little or no sanity checking at the kdenlive level, since I’d expect a “quality” level of 320 (as in your image) to be way off the scale in most or all cases, and just about every advanced codec has its own nuance for how quality and bitrate targets are requested.

“What I would set” has no one-size-fits-all answer. In a lossy codec it’s all about how much degradation is acceptable or audible in exchange for reduced bitrate in each given situation.

But if you’re coding predominantly speech, that’s another reason you ought to look at Opus instead of AAC. It will detect speech and use techniques specifically designed to optimise speech coding, and give you far better quality than AAC at bitrates even lower than what I’d recommended earlier with fullband audio in mind.

micha · July 4, 2024, 7:58am

Hello Ron,
again, that’s a lot of information that I have to conquer, understand and internalize step by step. I’ve made a note of everything so that I can refer back to it at any time.

I would now like to answer your questions.

Almost all sound engineers recommend cutting off unnecessary frequencies. For a man’s speech, this is anything below 75 Hz. Do you see it differently?

I have no problems with a lossy codec. Firstly, I don’t hear the restrictions anyway, and people who see or hear my videos won’t notice any differences. As I said, it’s just audio spoken by me.

I have selected Opus as the audio codec several times, and each time there is no sound. Doesn’t seem to work with my settings in Kdenlive.
grafik

My question about the codec only comes from the fact that with unfavorable settings aac adds something in the low frequency range, which does not happen with a higher bit rate or with wav.

So for me the decision remains: If I want to use aac (because it seems to be the usual or normal codec for mp4), would it make sense to choose CBR or VBR? I am completely satisfied with the sound of both.
My question is, what should I do to ensure that my videos can be played reliably everywhere?
For example, on my video channel: Videosophie - tchncs

Ron · July 4, 2024, 9:54am

Almost all sound engineers recommend cutting off unnecessary frequencies.

Sure, though I would use the word unwanted, and a good engineer isn’t going to give you a magic one-size-fits all cutoff point, they’re going to tell you to listen to each track and attenuate what is unwanted, and/or boost what is.

My question about the codec only comes from the fact that with unfavorable settings aac adds something in the low frequency range

This is the misconception. Seeing something in the spectrum analysis where you didn’t expect it doesn’t make them “unfavourable settings”.

Hearing something you didn’t want to hear is, but when that is inaudible or more importantly not unpleasant, then they are just artifacts of the reconstruction from a lossy representation. Functionally not much different to the artifacts you’d likewise see if you recorded the actual sounds which are present in an actual room where your recording is being played.

If you can’t hear them, you don’t need to spend bits modelling them more identically (or improving the acoustics of your listening room, or buying better speakers etc. etc.). Doing that well is what lossy compression is all about. Evaluating one just by comparing spectrum graphs is always going to give you the wrong idea about how it performed. Even complex algorithms designed to ‘objectively’ assess the ‘quality’ of lossy codecs don’t aways give the same answer as real human listeners as to whether some reconstruction is better or worse than another.

I have selected Opus as the audio codec several times, and each time there is no sound.

You need to select libopus, which is both the reference and best encoder library. “opus” uses ffmpeg’s incomplete (and apparently now completely broken?) implementation.

The kdenlive dialog has some broken assumptions if you want a bitrate different to the default, since it ties “Bitrate” and “Quality” to what rate control option you choose - where for opus “bitrate” is used for all of VBR and (constrained) CBR modes, as is “quality” which selects the computational complexity to use for encoding (which trades quality for higher speed encoding without affecting the target bitrate).

Frankly, I’d leave the rate control as “not set”, since the default of VBR, quality 10, and 96kb/s (for a stereo pair) should work for most uses, but if you really need to tweak that you’ll need to use the “manual edit” option to set the ffmpeg parameters for your preset.

There’s nothing stopping CBR from sounding good if you can pick an arbitrarily large bitrate. It’s just that any encoder worth its salt can make a recording sound at least as good with fewer bits in VBR mode.

Schlaefer · July 4, 2024, 9:55am

Keep in mind that we’re talking about a 50 Hz range on a log plot. You’re probably looking at a lot of spectrum aliasing when creating that plot. Just toy around with the FFT window type and size in Audacity and see that part change completely.

Ron · July 4, 2024, 10:06am

Modern lossy codecs operate in the frequency domain - so you’re quite right, but that means it does similarly apply to the reconstruction as well.

micha · July 4, 2024, 2:25pm

Yes, this codec works.

Ron · July 6, 2024, 4:45am

Opus is used and preferred by all the major streaming and voice comms services and supported in all still maintained browers and player apps.

Using it in webm (with a similarly unencumbered video codec) is the more normal case, but everything I’ve tried so far seems to have no complaint playing video with opus audio in mp4.

AV1/Opus in webm is going to give you the highest quality for the lowest bitrate, but the encoding complexity of AV1 is much higher than for older more primitive codecs.

I’ve tended to prefer mkv or webm for any final render that includes subtitles, since you can embed them and let the user turn them on or off (or select from multiple options) in their player instead of having to permanently burn them into the video images like you do with mp4 containers.

micha · July 6, 2024, 10:02am

Hello Ron,
It’s fascinating what you can adjust to achieve optimum results.

In my case, however, I have to be careful not to overshoot the mark.

I want to render my videos so that they can be used as universally as possible.
Once on my video channel at tube.tchcns.
And I also want to play them on a USB stick directly on a TV set. That’s why I only use H-264 and 25 fps. Even older devices can do this well.
Watching on my own computer is a common case, but I trust my machine to handle many codecs properly.

Don’t you think I’m not on the safest side with aac in mp4 video?

PS:
Ron, don’t you want to tell us where your vast knowledge of audio and codecs comes from and what you use Kdenlive for?

micha · July 6, 2024, 2:20pm

Is this what you mean?

Hann-Fenster

Rechteck-Fenster

Very amazing. The differences are huge. What can you use as a guide?

Schlaefer · July 6, 2024, 3:04pm

Yes, that’s what I mean. The math that creates the spectrum inherently introduces some side effects in the spectrum.

Rechteck/Rectangle is the mathematical “default” window which shows a lot of stuff “on the side”. To reduce that effect you can put other window functions onto the input signal before processing it. But it’s a tradeoff and there is no “right or wrong”.

I just remember this stuff from electrical engineering from a few decades ago, so no clue what an audio person would prefer. But you have to be aware of that effect.

This seems like a good introduction to the basic principle.

micha · July 6, 2024, 3:14pm

The deeper you dig, the infinitely deeper you can go!
I took a look at the link. Unfortunately, I can’t do anything with all these technical terms. I lack far too much specialist knowledge.

But does this mean that the Rechteck-Fenster also displays low frequencies that are not actually there? Or are they present as overtones or undertones?

Schlaefer · July 6, 2024, 3:36pm

Yes. From what I linked they show the spectrum of a 4 Hz sinus. You would expect:

But instead get:

It’s not much, but totally normal. And in the Audacity plot we look at 50 Hz out of a 20 kHz range at -80 dB. So from my eletrical-enigneering background this immediately triggers “wait a minute, that’s maybe just some mathematical smudge from creating the spectrum”.