CLAM
TUTORIAL - PART 1: INTRODUCTION TO CLAM |
In this first part of the tutorial we will become familiar
with the most important concepts in CLAM.
We will also experiment what working with a free framework
(GPL) means, using the tools that are usually included
in this kind of distributions.
To begin with, please point to CLAM's
website (www.iua.upf.es/mtg/clam). In the "docs"
section you will find all the publicly available documentation.
Throughout this tutorial be sure to consult both the CLAM
User and Development Documentation and the Doxygen documentation.
While the User and Development Documentation covers most
of CLAM's concepts and design, the Doxygen
contains more detailed technical description of interfaces
and such. The Doxygen documentation is derived directly
from interface comments and because of this is, in general,
more up-to-date than the User and Development Documentation.
After the introduction and your look at the documentation
you should be able to mentally answer the following questions:
- What are the main features of the framework?
- What are the following CLAM-related
concepts: Dynamic Types, Processing classes, Processing
Data classes, Controls?
- What support does CLAM offer for
XML, Audio input/output, SDIF and MIDI?
- Whic are the Processing Data classes included in the
CLAM repository?
Every open project should have an associated mailing
list. Follow the link in the web and please add your name
to the mailing list.
CLAM also has a bug managing section
that uses the Mantis tool. Go to the "bugreporting" link
in the web and add yourself as a user. Any bug you might
find in CLAM from now on will have to
be communicated to the CLAM team using
this tool.
CLAM also uses third party libraries.
Point your browser to the "links" section of CLAM's
web and take a look at what each external library offers.
By now you should have decided whether you would like
to do this tutorial in the Linux using GCC or Windows
platform using Visual C++.
- Read the "How to compile" section of the manual that
corresponds to your chosen OS.
You should also have the CLAM repository
correctly deployed in your system. Look at the structure
of the whole repository especially the /src folder. It
is interesting to become somewhat familiar with this structure.
Now we are ready to start compiling.
- Compile and execute some of the Simple examples.
- Finally compile the SMSTools application, which will
be used in the next part of the tutorial.
CLAM
TUTORIAL - PART 2: INTRODUCTION TO CLAM (II), THE SMS
EXAMPLE |
After the introduction we had in
Part 1 of this tutorial we are ready to learn a bit more
about the processing capabilities of the framework. To do
so, we will work on one of the examples: the SMSTools2 application.
This application uses the Spectral Modeling Synthesis scheme
in order to analyze, transform and synthesize audio files.
It is interesting enough to dedicate a whole session to
its study. In the meantime, we will become familiar with
more CLAM tools. You will find the files
of this example in the /examples/SMS/Tools folder in the
repository.
- Read the SMSREADME file you will
find in that folder plus the information available in
the CLAM documentation.
- You should also understand what
the xml configuration file (available in /build/Examples/SMS/
folder) contain?
Now we can run and study the example.
Note that the main class in the application is really a
class hierarchy. We have the SMSBase class, where most of
the functionality of the application is implemented. The
SMSTools and SMSStdio classes derive from this, the former
implementing the graphical version of the application and
the later implementing the standard i/o version. We will
choose to compile the graphical version so we will have
a graphical interface created using the Fltk library.
You can use the xml file available
in the example folder or edit the default configuration
from the application. Note that there is a field in the
configuration that points to the path of the incoming audio
file that is going to be analyzed. You will need to modify
the path so it points to a sound available in your local
drive (this operation though can be directly done through
the graphical user interface). You can use any of the following
sounds: sine.wav, sweep.wav,
noise.wav, 1.wav,
2.wav, 3.wav.
- Load the xml file (or edit the
default configuration) and visualize the input sound.
- Analyze the sounds. Go to the
Display menu and visualize the different components.
- Save the result of the analysis
in xml format. What do you think is relevant to this process:
speed, size and contents of the resulting xml file,...?
- Now we are ready to synthesize
the analysis result. Listen to each of the components
of the synthesis (sinusoidal, residual and final sound)
and explain their characteristics. Look at the waveform
of each component.
Now we can dive a little deeper into
the code that implements this application.
First, take a look at the SMSBase
class.
- Look at the functionality of its
most important methods.
- What classes and methods do we
use for loading and storing audio files?
- The class also has some boolean
attributes. What are they used for?
- And last, the class also has some
members that are instances of classes that derive from
the ProcessingConfig class. What are they used for? How
is the xml configuration file loaded?
Now we will take a look at the SMSAnalysisSynthesisConfig
class. It is the first time we are looking at a DynamicType
class.
- What are its main features, attributes
and methods?
Most of the processing of the application
is handled in the AnalysisProcessing and SynthesisProcessing
methods of the SMSBase class. Note that in both cases we
do more or less the same. We 'Start' the ProcessingComposite
an then enter a loop that runs through all the audio and
we call the Do method with the appropiate data. Finally
we call the Stop method to "turn off" the ProcessingComposite.
Let's take a look at the SMSAnalysis
class. This class looks like it should be very complex.
All the analysis processing is done inside its Do method.
But we see that the class does not have so many lines of
code. This is because we are using the ProcessingComposite
structure: the class is not much more than an aggregate
of smaller Processing classes.
- Try to explain the functionality
of each of the Processing (or ProcessingComposites) inside
the SMSAnalysis class
CLAM
TUTORIAL - PART 3: MY FIRST CLAM APPLICATION |
CLAM (as any development
framework) has its own coding conventions: a set of rules
and recomendations we should observe when writing CLAM-compliant
code.
- Read the corresponding section
in the manual.
Now we are ready to implement our
first CLAM application. It will be a very
minimal application that will just be able to load an audio
file into ram and then save it again using another file
name.
First of all you must start a new
CLAM project. Consult the CLAM
User and Development documentation's section on creating
a new CLAM-project for the platform you've
chosen. For our first application we will be using CLAM's
Audio, AudioFileIn and AudioFileOut classes.
- Create a new project using CLAM
for your application.
Now create a new class called MyCLAMApp
and give it a (CLAM::)Audio member which we will use to
store the audio we are going to use. Add a method, LoadAudioFile(),
to this class which loads an audio file into the class'
Audio object using an AudioFileIn object which must first
be configured. Then, add a method, SaveAudioFile(), which
saves the class' Audio member to a file (with a different
file name than the input file) using an AudioFileOut object.
Finially, add a Do() method which first calls LoadAudioFile()
and then calls SaveAudioFile(). Create a main-like function
which calls MyCLAMApp::Do().
- Compile and run your app to see
if it works properly.
We will want to make our
app a little more flexible (as we want to extend it later
on). So we now create a very simple user interface.
- To keep things simple, create
a 'console-based' user interface which has options for
loading and saving an audio file and allows the user to
enter file names for both files. The application class'
Do() method may be removed now as it's no longer needed.
CLAM
TUTORIAL - PART 4: AUDIO |
In this part of the tutorial we will
focus on the study of the audio signal in the time domain
and the tools that CLAM has for handling
it.
To start with, we will work a little
bit more on the application we designed in Part 3. We will
insert an audio signal visualizer.
CLAM has a quite
large and complex visualization module but we will only
use one of its functionalities: the Plots..
- Read the CLAM
documentation (including examples and Doxygen) in
order to understand what the Plots are and how are
they used.
Now we are ready to add a Plot in
our application.
- Add a function which shows the
waveform of your application's Audio member using an Audio
Plot.Add an option in your main menu so the user is able
to watch the sound once loaded.
The truth is that, appart from seeing
the waveforms, it would be very interesting to listen to
the sounds inside our application. One way to do so is from
the Audio Plot that we have just added. But we would like
to be able to listen to the sounds without having to open
the Plot. To do so, we need to take a look at the way the
AudioIO example handles audio playback. Look at the code
and note how it is completely cross-platform (the same lines
are used for Linux and for Windows).
So you can very easily add audio
output support to your application. You will have to compile
all the .cxx files from the /tools/AudioIO folder and, in
case you are compiling Linux the ones in /tools/AudioIO/Linux
and, in case you are using Windows the files that implement
the default RtAudio layer (all the files that start with
Rt).
Next we will take a deeper look at
the CLAM classes we have used until now.
To start with, we will focus in the ProcessingData Audio
class. It is a class with a quite simple structure but with
some methods that are a bit more complex.
- Look at the attributes and methods
of the Audio class.
For reading/writing the audio files
we are using the AudioFileIn and AudioFileOut classes from
the src/Processing folder.
- Look at the header files and try
to understand their functionality and structure.
CLAM
TUTORIAL - PART 5: PROCESSING |
Before we continue with the
tutorial, we must restructure our application a bit. Ultimately,
we want an application which can analyze audio. This functionality
of our application will be encapsulated as a ProcessingComposites.
We must familiarize ourselves with the concepts of Processings,
ProcessingComposites and Configs. Before continuing you
should be able to mentally answer these questions:
- What is a Processing? Look at
the Processing base class. What are its most important
methods? There are some that have to always be implemented
in any class that derives from this base class. What are
they?
- What is a ProcessingComposite?
Look at the base class. What is the difference between
this and the basic Processing class?
- Look at the SMSAnalysis class,
look at the AttachChildren() and ConfigureChildren() methods.
- Look at how the SMSAnalysis class
can be configured and how the configuring mechanism works.
In the next part of the tutorial
we are going to focus on the analysis part of our application.
Now, we will create a new class called MyAnalysizer, which
is derived from ProcessingComposite. (Note: to do so, we
also need to create a Dynamic Type class to configure this
class which is derived from ProcessingConfig). For now,
you can have the class only print "I'm doing it!" in it's
Do() method. The configuration class can only have a name
field.
Add an instance of this class to
your application class, and add a temporary item to the
menu which calls MyAnalyzer::Do() in order to test it.
- Implement and compile this. Run
it and varify that everything is working as it should.
CLAM
TUTORIAL - PART 6: THE SPECTRUM AND THE FFT |
Possibly on of the major attratives
of the CLAM library are its spectral processing
capabilities. The following parts of this tutorial will
focus on this domain while becoming familiar with more CLAM
tools and ways of working.
First we have to talk about a very
important ProcessingData: the Spectrum. Open the Spectrum.hxx
file. It is a quite complex data class. Most of its complexity
is due to the fact that it allows for its data to be stored
in different formats.
- Look at the different formats
the Spectrum offers to represent spectral data.
- Note that the Spectrum is the
only ProcessingData in the CLAM repository
that has an associated configuration. Look at the different
attributes of this configuration and what is their meaning.
- You should also understand why
does the Spectrum offer two overloads for the GetMag()
and GetPhase() methods?
In order to have a spectrum in our
application, we will have to deal with the FFT. At the time
of this writing, in CLAM we have three
different implemenations of the FFT: one based in the Numerical
Recipes book another that uses the new-Ooura, and the lastone
that uses the FFTW library from the MIT. This latter is
the most efficient and is the one that is used by default.
(If you need more information about the FFTW GPL library
you can go to fftw.org).
- Look at the FFT.hxx file in the
CLAM repository and note that there is
only one parameter used to configure the FFT. What is
it? What is its mathematical relation with the size of
the resulting spectrum?
Now we will add an FFT to our analysis
composite (MyAnalyzer) and we will add the "Analyze" option
at the main user menu in our application. Once the user
chooses this option we will ask for the FFT size. One of
the problems we have to face is how to "cut" the input audio
into smaller chunks or frames (for the time being they have
to be the same size as the FFT).
- Add the FFT without adding any
other Processing.
To debug CLAM applications,
we can use some of the tools available in the Visualization
modules. But sometimes it is very interesting to generate
an XML file with the content present in one of the objects
that are in memory at a given moment.Most CLAM
objects can be serialized to XML by calling their Debug()
method (a Debug.xml file will be created), either explicitly
in code or using a debugger. General purpose XML serialization/deserialization
is provided by the XMLStorage class.
Now we
can consult the content of our spectrum in XML and describe
its main features.
- Debug your application, adding
a breakpoint after the FFT has been performed. Then Debug()
your spectrum and analyze the resulting XML file.
As we have just seen, textual debugging
is sometimes not the most convenient when trying to analyze
the effect of a given algorithm or process. We will use
a Spectrum Snapshot to be able to inspect the result visually.
To finish this part of the tutorial,
we have realized that the spectrum snapshot we have just added
has introduced a big overhead because it opens for every audio
frame.
- Add an option to the user menu
so the snapshot can be activated/deactivated.
CLAM
TUTORIAL - PART 7: THE STFT AND THE CLAM SEGMENT |
A frame is a short audio fragment
and all the analysis data associated to it.
- Look at the user manual and the
Frame class in the repository.
On the other hand, a segment is a
longer audio fragment that is made up of a set of frames
that usually share some common properties (they belong to
the same note, melody, recording...).
The Segment class is one of the most
important and complex ProcessingData classes available in
the CLAM repository. Here you have an approximate UML static
diagram of its structure:
- Make sure to understand the structure
of the CLAM Segment.
Now we are ready to add the CLAM
Segment to our application. We will keep every spectrum
that is output from the FFT in a different frame in our
segment.
- Add the segment structure to your
application.
When we take an audio chunk directly
and we input it to the FFT (like we are doing now), what
we are actually doing is to multiply the audio chunk with
a rectangular window. This means we are convolving the spectrum
of our signal with the transform of a rectangular window.
But the transform of this window is not very suitable for
analysis. Other important things related with the STFT we
are not doing is the circular shift or buffering centering,
or the option of adding zero-padding to our analysis.
We could do all these things by hand
but in CLAM there is already a ProcessingComposite
that includes all the necessary functionality: it is the
SpectralAnalysis class.
- Take a look at the SpectralAnalysisConfig
class and try to understand the different parameters included.
- Take a look at the SpectralAnalysis
Processing and note how it includes all the different
functionality previously mentioned.
Now we are ready to add this
composite to our MyAnalyzed class.
- Substitute the FFT by a SpectralAnalysis.
Make it work with the Segment you already have.
Note: to make the SpectralAnalysis
work with a Segment there are several possibilities. The
easiest one at this time is to use the SpectralAnalysis::Do(const
Audio&, Spectrum&) overload, passing the chunk of
audio we were using as input and a reference to Segment.GetFrame(i).GetSpectrum()
as output, where "i" is the current frame index.
- At this time you may have noticed
that you must either use many parameters by default or
ask the user for a lot of independent parameters like
WindowSize, WindowType... If you feel like it you can
very easily implement XML input in order to load an XML
configuration file.
CLAM
TUTORIAL - PART 8: SMS ANALYSIS |
One of the limitations we had in
the previous part is the fact that the approach used does
not allow to have overlapping audio frames ( having a hop-size
that is different from the window size). We could solve
this by hand using some tools in the CLAM
repository. But to make it more simple, we will use another
Processing Composite: the SMSAnalysis. And we will get for
free the whole implementation of the SMS algorithm including
spectral peak detection, pitch estimation and separation
of the signal into sinusoidal and residual component.
Here is the SMS analysis block diagram:
Now we are ready to add the SMSAnalysis
class.
- Substitute the SpectralAnalysis
class for this new SMSAnalysis class. Now you can call
the SMSAnalysis::Do(Segment&) overload that will take
care of everything including the audio slicing we had
to previously do by hand.
- Dump the resulting Segment
into XML and comment the results.
CLAM
TUTORIAL - PART 9: STATISTICS |
In this last part of the tutorial
we will use the output of the SMS analysis in order to compute
some low-level descriptors on each of the components. The
descriptor computation in CLAM is still far
from being complete and we are still in the process of incorporating
the output of the CUIDADO project.
The basic descriptor infrastructure is made up of :
basic operations such as mean or nth order moment that can
be used on an array or any other kind of sequence and are
implemented as functional objects; a statistical class that
using these basic operations is able to perform very efficient
statistics on a vector, reusing computations whenever possible
and offering an easy to use interface; and finally concrete
descriptors implemented for the basic CLAM
Processing Data's such as Spectrum, SpectralPeakArray or Frame.
- Look at the BasicOps.hxx file
located at /src/Standard to get a very basic grasp of
how low-level functional objects are implemented.
- Take a look at the Stats.hxx file
located at the same folder.
All concrete descriptors derive
from an abstract Descriptor class that forces to implement
a ConcreteCompute() method where all the descriptor computation
must be performed. This base class also offers methods for
initializing the statistics from a particular array and for
setting the prototype of the particular set of descriptors
that we want to compute.
- Look at the Descriptor.hxx file
located in /src/Data/Descriptors.
- Now take a look at the particular
descriptors implemented in the same folder (SpectralDescriptors,
AudioDescriptors...)
In order to make the computation
of descriptors "CLAM-compliant" a Processing class must be
used in order to trigger the computation of a particular descriptor.
This Processing class is very simple and straightforward.
- Look at the DescriptorComputation
class located in /src/Processing/Analysis
Finally in order to implement
the descriptor computation and to understand its basic usage,
take a look at the DescriptorComputationExample in /src/examples/.
Note that the only complexity is in setting the actual Prototype
that should be used for the computation. As all descriptors
are Dynamic Types you mus Add the attributes in each descriptor
that you want to be computed. Others will be ignored.
- Implement whatever DescriptorScheme
computation you choose on the output of your SMS analysis.
- Dump the result of the Descriptor
computation into XML.
|