CLAM Tutorial - Part 1: Introduction to CLAM

In this first part of the tutorial we will become familiar with the most important concepts in CLAM. We will also experiment what working with a free framework (GPL) means, using the tools that are usually included in this kind of distributions.

To begin with, please point to CLAM's website (www.iua.upf.es/mtg/clam). In the "docs" section you will find all the publicly available documentation. Throughout this tutorial be sure to consult both the CLAM User and Development Documentation and the Doxygen documentation. While the User and Development Documentation covers most of CLAM's concepts and design, the Doxygen contains more detailed technical description of interfaces and such. The Doxygen documentation is derived directly from interface comments and because of this is, in general, more up-to-date than the User and Development Documentation.

After the introduction and your look at the documentation you should be able to mentally answer the following questions:

Every open project should have an associated mailing list. Follow the link in the web and please add your name to the mailing list.

CLAM also has a bug managing section that uses the Mantis tool. Go to the "bugreporting" link in the web and add yourself as a user. Any bug you might find in CLAM from now on will have to be communicated to the CLAM team using this tool.

CLAM also uses third party libraries. Point your browser to the "links" section of CLAM's web and take a look at what each external library offers.

By now you should have decided whether you would like to do this tutorial in the Linux using GCC or Windows platform using Visual C++.

You should also have the CLAM repository correctly deployed in your system. Look at the structure of the whole repository especially the /src folder. It is interesting to become somewhat familiar with this structure.

Now we are ready to start compiling.

CLAM Tutorial - Part 2: Introduction to CLAM (II), the SMS Example

After the introduction we had in Part 1 of this tutorial we are ready to learn a bit more about the processing capabilities of the framework. To do so, we will work on one of the examples: the SMSTools2 application. This application uses the Spectral Modeling Synthesis scheme in order to analyze, transform and synthesize audio files. It is interesting enough to dedicate a whole session to its study. In the meantime, we will become familiar with more CLAM tools. You will find the files of this example in the /examples/SMS/Tools folder in the repository.

Now we can run and study the example. Note that the main class in the application is really a class hierarchy. We have the SMSBase class, where most of the functionality of the application is implemented. The SMSTools and SMSStdio classes derive from this, the former implementing the graphical version of the application and the later implementing the standard i/o version. We will choose to compile the graphical version so we will have a graphical interface created using the Fltk library.

You can use the xml file available in the example folder or edit the default configuration from the application. Note that there is a field in the configuration that points to the path of the incoming audio file that is going to be analyzed. You will need to modify the path so it points to a sound available in your local drive (this operation though can be directly done through the graphical user interface). You can use any of the following sounds: sine.wav, sweep.wav, noise.wav, 1.wav, 2.wav, 3.wav.

Now we can dive a little deeper into the code that implements this application.

First, take a look at the SMSBase class.

Now we will take a look at the SMSAnalysisSynthesisConfig class. It is the first time we are looking at a DynamicType class.

Most of the processing of the application is handled in the AnalysisProcessing and SynthesisProcessing methods of the SMSBase class. Note that in both cases we do more or less the same. We 'Start' the ProcessingComposite an then enter a loop that runs through all the audio and we call the Do method with the appropiate data. Finally we call the Stop method to "turn off" the ProcessingComposite.

Let's take a look at the SMSAnalysis class. This class looks like it should be very complex. All the analysis processing is done inside its Do method. But we see that the class does not have so many lines of code. This is because we are using the ProcessingComposite structure: the class is not much more than an aggregate of smaller Processing classes.

CLAM Tutorial - Part 3: My first CLAM application

CLAM (as any development framework) has its own coding conventions: a set of rules and recomendations we should observe when writing CLAM-compliant code.

Now we are ready to implement our first CLAM application. It will be a very minimal application that will just be able to load an audio file into ram and then save it again using another file name.

First of all you must start a new CLAM project. Consult the CLAM User and Development documentation's section on creating a new CLAM-project for the platform you've chosen. For our first application we will be using CLAM's Audio, AudioFileIn and AudioFileOut classes.

Now create a new class called MyCLAMApp and give it a (CLAM::)Audio member which we will use to store the audio we are going to use. Add a method, LoadAudioFile(), to this class which loads an audio file into the class' Audio object using an AudioFileIn object which must first be configured. Then, add a method, SaveAudioFile(), which saves the class' Audio member to a file (with a different file name than the input file) using an AudioFileOut object. Finially, add a Do() method which first calls LoadAudioFile() and then calls SaveAudioFile(). Create a main-like function which calls MyCLAMApp::Do().

We will want to make our app a little more flexible (as we want to extend it later on). So we now create a very simple user interface.


CLAM Tutorial - Part 4: Audio

In this part of the tutorial we will focus on the study of the audio signal in the time domain and the tools that CLAM has for handling it.

To start with, we will work a little bit more on the application we designed in Part 3. We will insert an audio signal visualizer.

CLAM has a quite large and complex visualization module but we will only use one of its functionalities: the Plots..

Now we are ready to add a Plot in our application.

The truth is that, appart from seeing the waveforms, it would be very interesting to listen to the sounds inside our application. One way to do so is from the Audio Plot that we have just added. But we would like to be able to listen to the sounds without having to open the Plot. To do so, we need to take a look at the way the AudioIO example handles audio playback. Look at the code and note how it is completely cross-platform (the same lines are used for Linux and for Windows).

So you can very easily add audio output support to your application. You will have to compile all the .cxx files from the /tools/AudioIO folder and, in case you are compiling Linux the ones in /tools/AudioIO/Linux and, in case you are using Windows the files that implement the default RtAudio layer (all the files that start with Rt).

Next we will take a deeper look at the CLAM classes we have used until now. To start with, we will focus in the ProcessingData Audio class. It is a class with a quite simple structure but with some methods that are a bit more complex.

For reading/writing the audio files we are using the AudioFileIn and AudioFileOut classes from the src/Processing folder.


CLAM Tutorial - Part 5: Processing

Before we continue with the tutorial, we must restructure our application a bit. Ultimately, we want an application which can analyze audio. This functionality of our application will be encapsulated as a ProcessingComposites. We must familiarize ourselves with the concepts of Processings, ProcessingComposites and Configs. Before continuing you should be able to mentally answer these questions:

In the next part of the tutorial we are going to focus on the analysis part of our application. Now, we will create a new class called MyAnalysizer, which is derived from ProcessingComposite. (Note: to do so, we also need to create a Dynamic Type class to configure this class which is derived from ProcessingConfig). For now, you can have the class only print "I'm doing it!" in it's Do() method. The configuration class can only have a name field.

Add an instance of this class to your application class, and add a temporary item to the menu which calls MyAnalyzer::Do() in order to test it.


CLAM Tutorial - Part 6: The Spectrum and the FFT

Possibly on of the major attratives of the CLAM library are its spectral processing capabilities. The following parts of this tutorial will focus on this domain while becoming familiar with more CLAM tools and ways of working.

First we have to talk about a very important ProcessingData: the Spectrum. Open the Spectrum.hxx file. It is a quite complex data class. Most of its complexity is due to the fact that it allows for its data to be stored in different formats.

In order to have a spectrum in our application, we will have to deal with the FFT. At the time of this writing, in CLAM we have three different implemenations of the FFT: one based in the Numerical Recipes book another that uses the new-Ooura, and the lastone that uses the FFTW library from the MIT. This latter is the most efficient and is the one that is used by default. (If you need more information about the FFTW GPL library you can go to fftw.org).

Now we will add an FFT to our analysis composite (MyAnalyzer) and we will add the "Analyze" option at the main user menu in our application. Once the user chooses this option we will ask for the FFT size. One of the problems we have to face is how to "cut" the input audio into smaller chunks or frames (for the time being they have to be the same size as the FFT).

To debug CLAM applications, we can use some of the tools available in the Visualization modules. But sometimes it is very interesting to generate an XML file with the content present in one of the objects that are in memory at a given moment. Most CLAM objects can be serialized to XML by calling their Debug() method (a Debug.xml file will be created), either explicitly in code or using a debugger. General purpose XML serialization/deserialization is provided by the XMLStorage class.

Now we can consult the content of our spectrum in XML and describe its main features.

As we have just seen, textual debugging is sometimes not the most convenient when trying to analyze the effect of a given algorithm or process. We will use a Spectrum Snapshot to be able to inspect the result visually.

To finish this part of the tutorial, we have realized that the spectrum snapshot we have just added has introduced a big overhead because it opens for every audio frame.

CLAM Tutorial - Part 7: The STFT and the CLAM Segment

A frame is a short audio fragment and all the analysis data associated to it.

On the other hand, a segment is a longer audio fragment that is made up of a set of frames that usually share some common properties (they belong to the same note, melody, recording...).

The Segment class is one of the most important and complex ProcessingData classes available in the CLAM repository. Here you have an approximate UML static diagram of its structure:

Segment


Now we are ready to add the CLAM Segment to our application. We will keep every spectrum that is output from the FFT in a different frame in our segment.

When we take an audio chunk directly and we input it to the FFT (like we are doing now), what we are actually doing is to multiply the audio chunk with a rectangular window. This means we are convolving the spectrum of our signal with the transform of a rectangular window. But the transform of this window is not very suitable for analysis. Other important things related with the STFT we are not doing is the circular shift or buffering centering, or the option of adding zero-padding to our analysis.

We could do all these things by hand but in CLAM there is already a ProcessingComposite that includes all the necessary functionality: it is the SpectralAnalysis class.

Now we are ready to add this composite to our MyAnalyzed class.
Note: to make the SpectralAnalysis work with a Segment there are several possibilities. The easiest one at this time is to use the SpectralAnalysis::Do(const Audio&, Spectrum&) overload, passing the chunk of audio we were using as input and a reference to Segment.GetFrame(i).GetSpectrum() as output, where "i" is the current frame index.

CLAM Tutorial - Part 8: SMS Analysis

One of the limitations we had in the previous part is the fact that the approach used does not allow to have overlapping audio frames ( having a hop-size that is different from the window size). We could solve this by hand using some tools in the CLAM repository. But to make it more simple, we will use another Processing Composite: the SMSAnalysis. And we will get for free the whole implementation of the SMS algorithm including spectral peak detection, pitch estimation and separation of the signal into sinusoidal and residual component.

Here is the SMS analysis block diagram:

SMSAnalysis

Now we are ready to add the SMSAnalysis class.

CLAM Tutorial - Part 9: Statistics

In this last part of the tutorial we will use the output of the SMS analysis in order to compute some low-level descriptors on each of the components. The descriptor computation in CLAM is still far from being complete and we are still in the process of incorporating the output of the CUIDADO project.

The basic  descriptor infrastructure is made up of : basic operations such as mean or nth order moment that can be used on an array or any other kind of sequence and are implemented as functional objects; a statistical class that using these basic operations is able to perform very efficient statistics on a vector, reusing computations whenever possible and offering an easy to use interface; and finally concrete descriptors implemented for the basic CLAM Processing Data's such as Spectrum, SpectralPeakArray or Frame.

All concrete descriptors derive from an abstract Descriptor class that forces to implement a ConcreteCompute() method where all the descriptor computation must be performed. This base class also offers methods for initializing the statistics from a particular array and for setting the prototype of the particular set of descriptors that we want to compute.
In order to make the computation of descriptors "CLAM-compliant" a Processing class must be used in order to trigger the computation of a particular descriptor. This Processing class is very simple and straightforward.
Finally in order to implement the descriptor computation and to understand its basic usage, take a look at the DescriptorComputationExample in /src/examples/. Note that the only complexity is in setting the actual Prototype that should be used for the computation. As all descriptors are Dynamic Types you mus Add the attributes in each descriptor that you want to be computed. Others will be ignored.