A frame is a short audio fragment and all the analysis data associated to it. Look at the user manual and the Frame class in the repository.
On the other hand, a segment is a longer audio fragment that is made
up
of a set of frames that usually share some common properties (they
belong
to the same note, melody, recording...).
The main application class (SMSBase) has a member attribute that we
still
had paid no attention to. It is an object of the Segment class. This
class
is one of the most important ProcessingData classes available in the
CLAM repository.
Now we are ready to add the CLAM Segment to our application. We will keep every spectrum that is output from the FFT in a different frame in our segment.
When we take an audio chunk directly and we input it to the FFT (like we are doing now), what we are actually doing is to multiply the audio chunk with a rectangular window. This means we are convolving the spectrum of our signal with the transform of a rectangular window. But the transform of this window is not very suitable for analysis.
What we have to do is to use a temporal window that shows better spectral features. In CLAM WindowGenerator class we have implemented the most usual windowing functions.
Note that in order to apply a window we need two Processing objects: the WindowGenerator and an AudioMultiplier.
Add these two Processings to the MyAnalyzer class (you can take a look at the Spectral Analysis class as a reference).
To preserve even-symmetry, we need the window to be odd size. However the size of the audio we feed the FFT must be a power of two (for computing efficiency). The FFT input may have some trailing zeroes to make it this size. The FFT input size can be derived from the window size directly.
Look at how these sizes are handled in the configuration of the SpectralAnalysis. Add this to your application so the FFT (input) size is automatically configured from the window size that the user chooses.Another thing we can introduce to our analysis is the use of zero-padding. This way we can add zeros after each audio frame and obtain a greater spectral resolution. This feature is interesting for the Sinusoidal component, as we need to have as much resolution as possible in order to better detect the spectral peak frequencies. Look at the SpectralAnalysis::Do(const Audio& in, Spectrum& outSp).
The Composite we are taking as example (SpectralAnalysis) also has another interesting Processing: the CircularShift. This Processing "shifts around" the signal to keep zero-phase conditions in the spectrum.
And last, we may be interested in having overlaping frames. That is, an analysis hop size different and smaller than the window size. The most "elegant" way to implement this is using a Circular Buffer.