CLAM Tutorial - Part 6

SMS Analysis

Starting with the STFT that we have already implemented, in this part we will add everything that is missing in order to have a complete SMS analysis.

Here is the SMS analysis block diagram:

It's important to note that the input signal is split up into two branches which are both analyzed spectrally. One of the resulting spectra will be used for detecting the sinusoidal part of the signal and the other to form the spectrum of the residual ('noisy') part of the signal by substracting the sinusoidal spectrum. It's a good idea to have the properties of both analyses adjustable independently as you might want more frequency resolution (through zero-padding or a bigger window size) before doing the spectral peak detection for instance. However, the hop size of both analyses must be equal as they must both analyze the same amount of input. For this tutorial you may leave the window size for the sinusoidal part of the signal constant as making it pitch synchronous would complicate things a lot.

For the reasons mentioned above, it's a good idea to make a separate ProcessingComposite which only does the spectral analysis. The SMS Analysis ProcessingComposite can then use two of these objects, one for each branch, and allow their properties to be set independently without too much duplicate code. You can create a new ProcessingComposite called MySMSAnalyzer and have it aggregate a MyAnalyzer object (we will first do the sinusoidal component).

The first thing we need to add is the Peak Detection and Continuation. But before that we will have to understand a little better how the SpectralPeak and SpectralPeakArray classes work. Look at the code and the documentation of these two classes.

Next, we will take a look at the PeakDetect Processing class. Its only complexity is in the processing algorithm and in how the configuration parameters are used. Here you have a diagram that may help you in understanding the algorithm.

Now we are ready to add a PeakDetect Processing object to the MyIn class.

Add a peak detection to your application. For the time being it is not important for the user to control the configuration, we can just use the default configuration parameters. (Note: At the time of this writing the PeakDetect algorithm needs the input spectrum to be in dBs, you will have to first convert it to dB and back to Linear after the algorithm is applied).

Next we will add a Processing to perform sinusoidal peak continuation (SinTracking). Look at the method bool SinTracking::Do(const SpectralPeakArray& iPeakArray, SpectralPeakArray& oPeakArray). This is the overload we call if we are doing an inharmonic analysis and we don't have a valid fundamental frequency in a given frame. It is based on the classic McAulay&Quatieri algorithm.

Add a SinTracking object to your application, using always the inharmonic version of the algorithm.

We can now understand much better what is the use of the xml configuration file in the SMS Example. We keep on adding more and more configuration parameters and we would like them to be available to the user. But it is very difficult to manage this from a user menu. It would be much better to have an xml configuration file that could hold the different parameters that are to be used in the analysis/synthesis.

Add this functionality to your application. The new user menu only needs the following options: Load Configuration File, Display Input Audio, Play Input Audio, Analyze, Synthesize, Display Output Audio, Play Output Audio (these latter three are still useless but we can already add them so they are available for next developments.

  1. Comment the most important steps you have had to take up until now.

We have so far analyzed the sinusoidal component, now we have to do the same with the residual. But if we take a look again at the SMS analysis block diagram we see that in order to obtain the residual spectrum, we have to synthesize the sinusoidal component and substract it from the "original". This so called "original" spectrum does not necessarily have to be the spectrum used as input of the PeakDetection algorithm. This spectrum has to be the result of a spectral analysis that has two specific requirements: (1) the analysis window has to be a BlackmannHarris92 (as this window is used when synthesizing the spectral peaks of the sinusoidal component; see below) and (2) this spectrum must be of the same size as the spectrum we obtained by synthesizing the spectral peaks (see below).

Note that the substraction of the sinusoidal component from the "original" component can be done in the spectral (frequency) domain as well as the time domain. In this tutorial we use spectral domain substraction.

To obtain a spectrum from a peak array, we have to use the SynthSineSpectrum class. This Processing uses a quite simple algorithm: we just convolve the spectral peak array with the main lobe of the fourier transform of a Blackman-Harris 92 dB windowing function.

Add this synthesis class to your program.
  1. Does the zero-padding factor we have used in the spectral analysis of the sinusoidal component affect the spectrum synthesized from the spectral peaks in any way (eg. size of the spectrum, etc)? A!

Now we are ready to substract the synthesized spectrum from the "original". But we have a problem; if the "original" spectrum has been analyzed with a window different to the BH92, we will be substracting two very different things. That is why the SMSAnalysis class has two different SpectralAnalysis objects: one for the sinusoidal component and another one for the residual. The sinusoidal component can be analyzed using the most convinient window but the residual always has to use a BH92 because the sinusoidal spectrum was synthesized using a BH92 window (or at least, it's main lobe).

Add this functionality to your application (residual spectral analysis separated from sinusoidal).
Now you are ready to substrack the original spectrum minus the synthesized one.

And finally, we can add a couple of plots so we can see the differences betweeen the sinusoidal and residual spectrum.

  1. Add these Plots and explain what you see, using the usual examples: sine.wav, sweep.wav,noise.wav, 1.wav, 2.wav and 3.wav.