The STFT and the CLAM
Segment
A frame is a short audio fragment and all the analysis
data associated to it. Look at the user manual and the
Frame class in the repository.
- What are the different attributes of CLAM
Frame?
On the other hand, a segment is a longer audio fragment
that is made up of a set of frames that usually share
some common properties (they belong to the same note,
melody, recording...).
The main application class (SMSBase) has a member attribute
that we still had paid no attention to. It is an object
of the Segment class. This class is one of the most important
ProcessingData classes available in the CLAM
repository.
- Briefly explain the structure of the CLAM
Segment.
Now we are ready to add the CLAM Segment
to our application. We will keep every spectrum that is
output from the FFT in a different frame in our segment.
- Add the segment structure to your application. Explain
how it now all works.
When we take an audio chunk directly and we input it
to the FFT (like we are doing now), what we are actually
doing is to multiply the audio chunk with a rectangular
window. This means we are convolving the spectrum of our
signal with the transform of a rectangular window. But
the transform of this window is not very suitable for
analysis.
What we have to do is to use a temporal window that shows
better spectral features. In CLAM WindowGenerator
class we have implemented the most usual windowing functions.
- What are these functions?
Note that in order to apply a window we need two Processing
objects: the WindowGenerator and an AudioMultiplier.
Add these two Processings to the MyAnalyzer class (you
can take a look at the Spectral Analysis class as a reference).
- Add the option of window type selection in the user
menu. Insert here the screenshot of the waveform after
it has been windowed by a couple of different windows.
To preserve even-symmetry, we need the window to be odd
size. However the size of the audio we feed the FFT must
be a power of two (for computing efficiency). The FFT
input may have some trailing zeroes to make it this size.
The FFT input size can be derived from the window size
directly.
Look
at how these sizes are handled in the configuration of the
SpectralAnalysis. Add this to your application so the FFT
(input) size is automatically configured from the window
size that the user chooses.
Another thing we can introduce to our analysis is the
use of zero-padding. This way we can add zeros after each
audio frame and obtain a greater spectral resolution.
This feature is interesting for the Sinusoidal component,
as we need to have as much resolution as possible in order
to better detect the spectral peak frequencies. Look at
the SpectralAnalysis::Do(const Audio& in, Spectrum&
outSp).
- Add zero-padding to your application and add this
option to your user menu. Explain what is the effect
on the resulting spectrum. A!
The Composite we are taking as example (SpectralAnalysis)
also has another interesting Processing: the CircularShift.
This Processing "shifts around" the signal to keep zero-phase
conditions in the spectrum.
- Add this Processing to your Composite and comment
how this affects the resulting audio frame. A!
And last, we may be interested in having overlaping
frames. That is, an analysis hop size different
and smaller than the window size. The most "elegant" way
to implement this is using a Circular Buffer.
- Implement the possibility of defining the analysis
hop size in your application. Comment the most important
changes you have had to introduce into your code. A!