..::CLAM::..C++ Lybrary for Audio and Music

CLAM TUTORIAL - PART 1: INTRODUCTION TO CLAM

In this first part of the tutorial we will become familiar with the most important concepts in CLAM. We will also experiment what working with a free framework (GPL) means, using the tools that are usually included in this kind of distributions.

To begin with, please point to CLAM's website (www.iua.upf.es/mtg/clam). In the "docs" section you will find all the publicly available documentation. Throughout this tutorial be sure to consult both the CLAM User and Development Documentation and the Doxygen documentation. While the User and Development Documentation covers most of CLAM's concepts and design, the Doxygen contains more detailed technical description of interfaces and such. The Doxygen documentation is derived directly from interface comments and because of this is, in general, more up-to-date than the User and Development Documentation.

After the introduction and your look at the documentation you should be able to mentally answer the following questions:

What are the main features of the framework?
What are the following CLAM-related concepts: Dynamic Types, Processing classes, Processing Data classes, Controls?
What support does CLAM offer for XML, Audio input/output, SDIF and MIDI?
Whic are the Processing Data classes included in the CLAM repository?

Every open project should have an associated mailing list. Follow the link in the web and please add your name to the mailing list.

CLAM also has a bug managing section that uses the Mantis tool. Go to the "bugreporting" link in the web and add yourself as a user. Any bug you might find in CLAM from now on will have to be communicated to the CLAM team using this tool.

CLAM also uses third party libraries. Point your browser to the "links" section of CLAM's web and take a look at what each external library offers.

By now you should have decided whether you would like to do this tutorial in the Linux using GCC or Windows platform using Visual C++.

Read the "How to compile" section of the manual that corresponds to your chosen OS.

You should also have the CLAM repository correctly deployed in your system. Look at the structure of the whole repository especially the /src folder. It is interesting to become somewhat familiar with this structure.

Now we are ready to start compiling.

Compile and execute some of the Simple examples.
Finally compile the SMSTools application, which will be used in the next part of the tutorial.

CLAM TUTORIAL - PART 2: INTRODUCTION TO CLAM (II), THE SMS EXAMPLE

After the introduction we had in Part 1 of this tutorial we are ready to learn a bit more about the processing capabilities of the framework. To do so, we will work on one of the examples: the SMSTools2 application. This application uses the Spectral Modeling Synthesis scheme in order to analyze, transform and synthesize audio files. It is interesting enough to dedicate a whole session to its study. In the meantime, we will become familiar with more CLAM tools. You will find the files of this example in the /examples/SMS/Tools folder in the repository.

Read the SMSREADME file you will find in that folder plus the information available in the CLAM documentation.

You should also understand what the xml configuration file (available in /build/Examples/SMS/ folder) contain?

Now we can run and study the example. Note that the main class in the application is really a class hierarchy. We have the SMSBase class, where most of the functionality of the application is implemented. The SMSTools and SMSStdio classes derive from this, the former implementing the graphical version of the application and the later implementing the standard i/o version. We will choose to compile the graphical version so we will have a graphical interface created using the Fltk library.

You can use the xml file available in the example folder or edit the default configuration from the application. Note that there is a field in the configuration that points to the path of the incoming audio file that is going to be analyzed. You will need to modify the path so it points to a sound available in your local drive (this operation though can be directly done through the graphical user interface). You can use any of the following sounds: sine.wav, sweep.wav, noise.wav, 1.wav, 2.wav, 3.wav.

Load the xml file (or edit the default configuration) and visualize the input sound.

Analyze the sounds. Go to the Display menu and visualize the different components.

Save the result of the analysis in xml format. What do you think is relevant to this process: speed, size and contents of the resulting xml file,...?

Now we are ready to synthesize the analysis result. Listen to each of the components of the synthesis (sinusoidal, residual and final sound) and explain their characteristics. Look at the waveform of each component.

Now we can dive a little deeper into the code that implements this application.

First, take a look at the SMSBase class.

Look at the functionality of its most important methods.
What classes and methods do we use for loading and storing audio files?
The class also has some boolean attributes. What are they used for?
And last, the class also has some members that are instances of classes that derive from the ProcessingConfig class. What are they used for? How is the xml configuration file loaded?

Now we will take a look at the SMSAnalysisSynthesisConfig class. It is the first time we are looking at a DynamicType class.

What are its main features, attributes and methods?

Most of the processing of the application is handled in the AnalysisProcessing and SynthesisProcessing methods of the SMSBase class. Note that in both cases we do more or less the same. We 'Start' the ProcessingComposite an then enter a loop that runs through all the audio and we call the Do method with the appropiate data. Finally we call the Stop method to "turn off" the ProcessingComposite.

Let's take a look at the SMSAnalysis class. This class looks like it should be very complex. All the analysis processing is done inside its Do method. But we see that the class does not have so many lines of code. This is because we are using the ProcessingComposite structure: the class is not much more than an aggregate of smaller Processing classes.

Try to explain the functionality of each of the Processing (or ProcessingComposites) inside the SMSAnalysis class

CLAM TUTORIAL - PART 3: MY FIRST CLAM APPLICATION

CLAM (as any development framework) has its own coding conventions: a set of rules and recomendations we should observe when writing CLAM-compliant code.

Read the corresponding section in the manual.

Now we are ready to implement our first CLAM application. It will be a very minimal application that will just be able to load an audio file into ram and then save it again using another file name.

First of all you must start a new CLAM project. Consult the CLAM User and Development documentation's section on creating a new CLAM-project for the platform you've chosen. For our first application we will be using CLAM's Audio, AudioFileIn and AudioFileOut classes.

Create a new project using CLAM for your application.

Now create a new class called MyCLAMApp and give it a (CLAM::)Audio member which we will use to store the audio we are going to use. Add a method, LoadAudioFile(), to this class which loads an audio file into the class' Audio object using an AudioFileIn object which must first be configured. Then, add a method, SaveAudioFile(), which saves the class' Audio member to a file (with a different file name than the input file) using an AudioFileOut object. Finially, add a Do() method which first calls LoadAudioFile() and then calls SaveAudioFile(). Create a main-like function which calls MyCLAMApp::Do().

Compile and run your app to see if it works properly.

We will want to make our app a little more flexible (as we want to extend it later on). So we now create a very simple user interface.

To keep things simple, create a 'console-based' user interface which has options for loading and saving an audio file and allows the user to enter file names for both files. The application class' Do() method may be removed now as it's no longer needed.

CLAM TUTORIAL - PART 4: AUDIO

In this part of the tutorial we will focus on the study of the audio signal in the time domain and the tools that CLAM has for handling it.

To start with, we will work a little bit more on the application we designed in Part 3. We will insert an audio signal visualizer.

CLAM has a quite large and complex visualization module but we will only use one of its functionalities: the Plots..

Read the CLAM documentation (including examples and Doxygen) in order to understand what the Plots are and how are they used.

Now we are ready to add a Plot in our application.

Add a function which shows the waveform of your application's Audio member using an Audio Plot.Add an option in your main menu so the user is able to watch the sound once loaded.

The truth is that, appart from seeing the waveforms, it would be very interesting to listen to the sounds inside our application. One way to do so is from the Audio Plot that we have just added. But we would like to be able to listen to the sounds without having to open the Plot. To do so, we need to take a look at the way the AudioIO example handles audio playback. Look at the code and note how it is completely cross-platform (the same lines are used for Linux and for Windows).

So you can very easily add audio output support to your application. You will have to compile all the .cxx files from the /tools/AudioIO folder and, in case you are compiling Linux the ones in /tools/AudioIO/Linux and, in case you are using Windows the files that implement the default RtAudio layer (all the files that start with Rt).

Next we will take a deeper look at the CLAM classes we have used until now. To start with, we will focus in the ProcessingData Audio class. It is a class with a quite simple structure but with some methods that are a bit more complex.

Look at the attributes and methods of the Audio class.

For reading/writing the audio files we are using the AudioFileIn and AudioFileOut classes from the src/Processing folder.

Look at the header files and try to understand their functionality and structure.

CLAM TUTORIAL - PART 5: PROCESSING

Before we continue with the tutorial, we must restructure our application a bit. Ultimately, we want an application which can analyze audio. This functionality of our application will be encapsulated as a ProcessingComposites. We must familiarize ourselves with the concepts of Processings, ProcessingComposites and Configs. Before continuing you should be able to mentally answer these questions:

What is a Processing? Look at the Processing base class. What are its most important methods? There are some that have to always be implemented in any class that derives from this base class. What are they?
What is a ProcessingComposite? Look at the base class. What is the difference between this and the basic Processing class?
Look at the SMSAnalysis class, look at the AttachChildren() and ConfigureChildren() methods.
Look at how the SMSAnalysis class can be configured and how the configuring mechanism works.

In the next part of the tutorial we are going to focus on the analysis part of our application. Now, we will create a new class called MyAnalysizer, which is derived from ProcessingComposite. (Note: to do so, we also need to create a Dynamic Type class to configure this class which is derived from ProcessingConfig). For now, you can have the class only print "I'm doing it!" in it's Do() method. The configuration class can only have a name field.

Add an instance of this class to your application class, and add a temporary item to the menu which calls MyAnalyzer::Do() in order to test it.

Implement and compile this. Run it and varify that everything is working as it should.

CLAM TUTORIAL - PART 6: THE SPECTRUM AND THE FFT

Possibly on of the major attratives of the CLAM library are its spectral processing capabilities. The following parts of this tutorial will focus on this domain while becoming familiar with more CLAM tools and ways of working.

First we have to talk about a very important ProcessingData: the Spectrum. Open the Spectrum.hxx file. It is a quite complex data class. Most of its complexity is due to the fact that it allows for its data to be stored in different formats.

Look at the different formats the Spectrum offers to represent spectral data.
Note that the Spectrum is the only ProcessingData in the CLAM repository that has an associated configuration. Look at the different attributes of this configuration and what is their meaning.
You should also understand why does the Spectrum offer two overloads for the GetMag() and GetPhase() methods?

In order to have a spectrum in our application, we will have to deal with the FFT. At the time of this writing, in CLAM we have three different implemenations of the FFT: one based in the Numerical Recipes book another that uses the new-Ooura, and the lastone that uses the FFTW library from the MIT. This latter is the most efficient and is the one that is used by default. (If you need more information about the FFTW GPL library you can go to fftw.org).

Look at the FFT.hxx file in the CLAM repository and note that there is only one parameter used to configure the FFT. What is it? What is its mathematical relation with the size of the resulting spectrum?

Now we will add an FFT to our analysis composite (MyAnalyzer) and we will add the "Analyze" option at the main user menu in our application. Once the user chooses this option we will ask for the FFT size. One of the problems we have to face is how to "cut" the input audio into smaller chunks or frames (for the time being they have to be the same size as the FFT).

Add the FFT without adding any other Processing.

To debug CLAM applications, we can use some of the tools available in the Visualization modules. But sometimes it is very interesting to generate an XML file with the content present in one of the objects that are in memory at a given moment.Most CLAM objects can be serialized to XML by calling their Debug() method (a Debug.xml file will be created), either explicitly in code or using a debugger. General purpose XML serialization/deserialization is provided by the XMLStorage class.

Now we can consult the content of our spectrum in XML and describe its main features.

Debug your application, adding a breakpoint after the FFT has been performed. Then Debug() your spectrum and analyze the resulting XML file.

As we have just seen, textual debugging is sometimes not the most convenient when trying to analyze the effect of a given algorithm or process. We will use a Spectrum Snapshot to be able to inspect the result visually.

Add a Spectrum Snapshot.

To finish this part of the tutorial, we have realized that the spectrum snapshot we have just added has introduced a big overhead because it opens for every audio frame.

Add an option to the user menu so the snapshot can be activated/deactivated.

CLAM TUTORIAL - PART 7: THE STFT AND THE CLAM SEGMENT

A frame is a short audio fragment and all the analysis data associated to it.

Look at the user manual and the Frame class in the repository.

On the other hand, a segment is a longer audio fragment that is made up of a set of frames that usually share some common properties (they belong to the same note, melody, recording...).

The Segment class is one of the most important and complex ProcessingData classes available in the CLAM repository. Here you have an approximate UML static diagram of its structure:

Make sure to understand the structure of the CLAM Segment.

Now we are ready to add the CLAM Segment to our application. We will keep every spectrum that is output from the FFT in a different frame in our segment.

Add the segment structure to your application.

When we take an audio chunk directly and we input it to the FFT (like we are doing now), what we are actually doing is to multiply the audio chunk with a rectangular window. This means we are convolving the spectrum of our signal with the transform of a rectangular window. But the transform of this window is not very suitable for analysis. Other important things related with the STFT we are not doing is the circular shift or buffering centering, or the option of adding zero-padding to our analysis.

We could do all these things by hand but in CLAM there is already a ProcessingComposite that includes all the necessary functionality: it is the SpectralAnalysis class.

Take a look at the SpectralAnalysisConfig class and try to understand the different parameters included.
Take a look at the SpectralAnalysis Processing and note how it includes all the different functionality previously mentioned.

Now we are ready to add this composite to our MyAnalyzed class.

Substitute the FFT by a SpectralAnalysis. Make it work with the Segment you already have.

Note: to make the SpectralAnalysis work with a Segment there are several possibilities. The easiest one at this time is to use the SpectralAnalysis::Do(const Audio&, Spectrum&) overload, passing the chunk of audio we were using as input and a reference to Segment.GetFrame(i).GetSpectrum() as output, where "i" is the current frame index.

At this time you may have noticed that you must either use many parameters by default or ask the user for a lot of independent parameters like WindowSize, WindowType... If you feel like it you can very easily implement XML input in order to load an XML configuration file.

CLAM TUTORIAL - PART 8: SMS ANALYSIS

One of the limitations we had in the previous part is the fact that the approach used does not allow to have overlapping audio frames ( having a hop-size that is different from the window size). We could solve this by hand using some tools in the CLAM repository. But to make it more simple, we will use another Processing Composite: the SMSAnalysis. And we will get for free the whole implementation of the SMS algorithm including spectral peak detection, pitch estimation and separation of the signal into sinusoidal and residual component.

Here is the SMS analysis block diagram:

Look at the SMSAnalysisConfig class and try to understand its main parameters.
Look at the SMSAnalysis class just to get a basic grasp of what is going on.

Now we are ready to add the SMSAnalysis class.

Substitute the SpectralAnalysis class for this new SMSAnalysis class. Now you can call the SMSAnalysis::Do(Segment&) overload that will take care of everything including the audio slicing we had to previously do by hand.
Dump the resulting Segment into XML and comment the results.

CLAM TUTORIAL - PART 9: STATISTICS

In this last part of the tutorial we will use the output of the SMS analysis in order to compute some low-level descriptors on each of the components. The descriptor computation in CLAM is still far from being complete and we are still in the process of incorporating the output of the CUIDADO project.

The basic descriptor infrastructure is made up of : basic operations such as mean or nth order moment that can be used on an array or any other kind of sequence and are implemented as functional objects; a statistical class that using these basic operations is able to perform very efficient statistics on a vector, reusing computations whenever possible and offering an easy to use interface; and finally concrete descriptors implemented for the basic CLAM Processing Data's such as Spectrum, SpectralPeakArray or Frame.

Look at the BasicOps.hxx file located at /src/Standard to get a very basic grasp of how low-level functional objects are implemented.
Take a look at the Stats.hxx file located at the same folder.

All concrete descriptors derive from an abstract Descriptor class that forces to implement a ConcreteCompute() method where all the descriptor computation must be performed. This base class also offers methods for initializing the statistics from a particular array and for setting the prototype of the particular set of descriptors that we want to compute.

Look at the Descriptor.hxx file located in /src/Data/Descriptors.
Now take a look at the particular descriptors implemented in the same folder (SpectralDescriptors, AudioDescriptors...)

In order to make the computation of descriptors "CLAM-compliant" a Processing class must be used in order to trigger the computation of a particular descriptor. This Processing class is very simple and straightforward.

Look at the DescriptorComputation class located in /src/Processing/Analysis

Finally in order to implement the descriptor computation and to understand its basic usage, take a look at the DescriptorComputationExample in /src/examples/. Note that the only complexity is in setting the actual Prototype that should be used for the computation. As all descriptors are Dynamic Types you mus Add the attributes in each descriptor that you want to be computed. Others will be ignored.

Implement whatever DescriptorScheme computation you choose on the output of your SMS analysis.
Dump the result of the Descriptor computation into XML.