As a result, it
is not necessary to use precious bits to encode these masked frequencies. In
perceptual coders, a filter bank divides the audio into multiple bands. When
audio in a particular band falls below the masking threshold, few or no bits are
devoted to encoding that signal, resulting in a conservation of bits that can
then be used where they are needed.
|
|
While various
codecs use different techniques in the details, the principle is the same for
all, and the implementation follows a common plan. There are four major
subsections, which work together to generate the coded bitstream:
The analysis filter bank
divides the audio into spectral
components. At minimum, sufficient frequency resolution must be used in order to
exceed the width of the ear's critical bands, which have widths of 100 Hz below 500 Hz and roughly 20% of the center
frequency at higher frequencies. Finer resolution can help a coder
make better decisions.
The estimation of masked threshold section is where
the human ear/brain system is modeled. This determines the masking
curve, under which noise must fall.
The audio is reduced to a lower bitrate in the quantization and
coding section. On the one hand, the quantization must be
sufficiently course in order not to exceed the target bitrate. On the
other hand, the error must be shaped to be under the limits set by the
masking curve.
The quantized values are joined in the bitstream multiplex, along with any side information.
(more)
?

|