CLAM Tutorial - Part 5

The STFT and the CLAM Segment

A frame is a short audio fragment and all the analysis data associated to it. Look at the user manual and the Frame class in the repository.

What are the different attributes of CLAM Frame?

On the other hand, a segment is a longer audio fragment that is made up of a set of frames that usually share some common properties (they belong to the same note, melody, recording...).

The main application class (SMSBase) has a member attribute that we still had paid no attention to. It is an object of the Segment class. This class is one of the most important ProcessingData classes available in the CLAM repository.

Briefly explain the structure of the CLAM Segment.

Now we are ready to add the CLAM Segment to our application. We will keep every spectrum that is output from the FFT in a different frame in our segment.

Add the segment structure to your application. Explain how it now all works.

When we take an audio chunk directly and we input it to the FFT (like we are doing now), what we are actually doing is to multiply the audio chunk with a rectangular window. This means we are convolving the spectrum of our signal with the transform of a rectangular window. But the transform of this window is not very suitable for analysis.

What we have to do is to use a temporal window that shows better spectral features. In CLAM WindowGenerator class we have implemented the most usual windowing functions.

What are these functions?

Note that in order to apply a window we need two Processing objects: the WindowGenerator and an AudioMultiplier.

Add these two Processings to the MyAnalyzer class (you can take a look at the Spectral Analysis class as a reference).

Add the option of window type selection in the user menu. Insert here the screenshot of the waveform after it has been windowed by a couple of different windows.

To preserve even-symmetry, we need the window to be odd size. However the size of the audio we feed the FFT must be a power of two (for computing efficiency). The FFT input may have some trailing zeroes to make it this size. The FFT input size can be derived from the window size directly.

Look at how these sizes are handled in the configuration of the SpectralAnalysis. Add this to your application so the FFT (input) size is automatically configured from the window size that the user chooses.

Another thing we can introduce to our analysis is the use of zero-padding. This way we can add zeros after each audio frame and obtain a greater spectral resolution. This feature is interesting for the Sinusoidal component, as we need to have as much resolution as possible in order to better detect the spectral peak frequencies. Look at the SpectralAnalysis::Do(const Audio& in, Spectrum& outSp).

Add zero-padding to your application and add this option to your user menu. Explain what is the effect on the resulting spectrum. A!

The Composite we are taking as example (SpectralAnalysis) also has another interesting Processing: the CircularShift. This Processing "shifts around" the signal to keep zero-phase conditions in the spectrum.

Add this Processing to your Composite and comment how this affects the resulting audio frame. A!

And last, we may be interested in having overlaping frames. That is, an analysis hop size different and smaller than the window size. The most "elegant" way to implement this is using a Circular Buffer.

Implement the possibility of defining the analysis hop size in your application. Comment the most important changes you have had to introduce into your code. A!