CLAM Tutorial - Part 4

Processing on the Spectral Domain

Before we continue with the tutorial, we must restructure our application a bit. Ultimately, we want an application which can analyze audio, transform this analysis and resynthesize it. These three parts of our application will be encapsulated as objects which are ProcessingComposites. We must familiarize ourselves with the concepts of Processings, ProcessingComposites and Configs.

  1. What is a Processing? Look at the Processing base class. What are its most important methods? There are some that have to always be implemented in any class that derives from this base class. What are they?
  2. What is a ProcessingComposite? Look at the base class. What is the difference between this and the basic Processing class?

In the next part of the tutorial we are going to focus on the analysis part of our application. Now, we will create a new class called MyAnalysizer, which is derived from ProcessingComposite. (Note: to do so, we also need to create a Dynamic Type class to configure this class which is derived from ProcessingConfig). For now, you can have the class only print "I'm doing it!" in it's Do() method. The configuration class can only have a name field.

Add an instance of this class to your application class, and add a temporary item to the menu which calls MyAnalyzer::Do() in order to test it.

Implement and compile this. Run it and varify that everything is working as it should.

If this is all working as it should (ie. it's printing the correct string). It's a good idea to make a copy of this 'empty' class as we must implement 3 of such classes later on in the tutorial.

Possibly on of the major attratives of the CLAM library are its spectral processing capabilities. From here to the end of the tutorial we will focus on this domain while becoming familiar with more CLAM tools and ways of working.

First we have to talk about a very important ProcessingData: the Spectrum. Open the Spectrum.hxx file. It is a quite complex data class. Most of its complexity is due to the fact that it allows for its data to be stored in different formats.<>

  1. <>Explain the different formats the Spectrum offers to represent spectral data. Note that the Spectrum is the only ProcessingData in the CLAM repository that has an associated configuration. Explain what are the different attributes of this configuration and what is their meaning.

In order to have a spectrum in our application, we will have to deal with the FFT. At the time of this writing, in CLAM we have three different implemenations of the FFT: one based in the Numerical Recipes book, another one that uses the new_Ooura implementation, and finally another one that uses the FFTW library from the MIT. This latter is the most efficient and is the one that is used by default.

  1. Look at the FFT_Config.hxx file in the CLAM repository. What is the only parameter used to configure the FFT? What is its mathematical relation with the size of the resulting spectrum?

Now we will add an FFT to our analysis composite (MyAnalyzer) and we will add the "Analyze" option in the main user menu in our application. Once the user chooses this option we will ask for the FFT size. One of the problems we have to face is how to "cut" the input audio into smaller chunks or frames (for the time being they have to be the same size as the FFT).

Add the FFT without adding any other Processing.

To debug CLAM applications, we can use some of the tools available in the Visualization modules. But sometimes it is very interesting to generate an XML file with the content present in one of the objects that are in memory at a given moment. Most CLAM objects can be serialized to XML by calling their Debug() method (a Debug.xml file will be created), either explicitly in code or using a debugger. General purpose XML serialization/deserialization is provided by the XMLStorage class.

Now we can consult the content of our spectrum in XML and describe its main features.

  1. Debug your application, adding a breakpoint after the FFT has been performed. Then Debug() your spectrum and analyze the resulting XML file. Repeat this process with sine.wav, sweep.wav and noise.wav files. What is the content of the resulting spectrum in each case?

As we have just seen, textual debugging is sometimes not the most convenient when trying to analyze the effect of a given algorithm or process. We will use a Spectrum Snapshot to be able to inspect the result visually.

Add a Spectrum Snapshot to your application so you can visualize the result of the FFT.

Now we can already see the result of the spectral analysis. These are the main features that are usually analyzed when looking at a spectrum: fundamental frequency (if available), harmonicity, centroid (center of gravity of the spectrum), spectral deviation (standard deviation from the center of gravity, indicates whether the spectral energy is concentrated at one point or spread throughout the whole spectral range) and spectral irregularity (how flat the spectrum is).

  1. Analyze the following files: sine.wav, sweep.wav, noise.wav,1.wav, 2.wav and 3.wav. Comment what is the shape of the spectrums taking into account the above properties.

To finish this part of the tutorial, we have realized that the spectrum snapshot we have just added has introduced a big overhead because it opens for every audio frame.

Add an option to the user menu so the snapshot can be activated/deactivated.