Summary for implemented descriptors

Descriptors sumary

This section list some examples of the kind o features/descriptors that are obtained in current projects. It is not an extensive list, its purpose is simply to take some use cases.

Stats (Both nominal and absolute)

All those descriptors can be extracted from any raw TData arrays whatever the meaning.

Raw Moment (nth order): sum(xi^n) / size(X)
Central Moment (nth order): sum(Mean(X)-xi^n) / size(X) [Also from the moments]
Center of Gravity (nth order): sum((i*xi^n)) / sum(xi^n)
Mean: sum(xi) / size(X)
Standard Deviation: sqrt(sum((xi-Mean(X))^2) / size(X))
Centroid: sum(i*xi) / sum(xi)
Variance: sum((xi-Mean(X))^2) / size(X) [Separation from mean]
Skewness: sum((xi-Mean(X))^3) / pow<3/2>(sum((xi-Mean)^2)) [variance asimetry]
Kurtosis: sum((xi-Mean(X))^4) / sum((xi-Mean(X))^2)^2 [Degree of peakness]
Slope: (size(X)*sum(xi*i)-sum(xi)*sum(i)) / (sum(xi)*(size(X)*sum(i^2)-sum(i)^2))
Tilt: ???
GeometricalMean: exp(sum(log(xi)) / size(X))
Energy: sum(xi^2)
RMS: sqrt(sum(xi^2))
Max: max(X)
Min: min(X)

They can also be aplied to objects having defined some arithmethic operators (*/+-). It is expected that those objects are structs containing TData fields so that the result of applying such arithmethic operators is an object of the same type applying the same operator over those TData fields one by one.

Frame Representations

The frame can have several representations:

AudioFrame: Is a chunck of the original audio. Is taken sequencially with optional overlap.
LPModel: From AudioFrame thru LPC_Autocorrelation
Spectrum: From AudioFrame thu Windowing+Shifting+ZeroPadding+FFT
MelSpectrum: From Spectrum thru MelFilterBank
MelCepstrum: From MelSpectrum thru CepstralTransform
PeakArray: From Spectrum thru SpectralPeakDetect
For every peak it has:
- Magnitude
- Phase
- Frequency
- Bin position
- Bin width
Spectral Envelope: From PeakArray thru SpectralEnvelopeExtract
(Spectrum expresed as BPF)
Pitch Class Profile (not in CLAM): From Spectrum or From SpectralPeakArray thru PCP
The type is not a CLAM PD, just an array

Their relations are 1 to 1 with an AudioFrame.

Audio Descriptors

Those descriptors are calculated from the subject audio. Most only depend on the raw data and its size but some of them also depends on the audio SampleRate.

Some attributes have sense only when the audio represents a bigger audio than a frame, so when the audio is a frame audio, they are not suposed to be used/aggregated. Have a lot of sense on audio that only represents a single note/sample.

Stats on Absolute values: Mean, Variance, Energy
Stats on Absolute values (positions normalized to time in ms): Centroid (TemporalCentroid)
Zero crossing rate: Signal sign changes per sample
Attack Time: The time that the audio takes 2% to the 80% of the maximum absolute value (Should be moved to segment and be calculated from frame energy?)
LogAttackTime: The log10 of the previous value (Should be moved to note segment?)
Decrease: TOCHECK (Should be moved to note segment?)
Decay, Sustain, Release, RiseTime: Not calculed! (Should be moved to note segment?)

They need SampleRate: (Temporal)Centroid, (Log)AttackTime, Decrease,

Spectral Descriptors

Spectral descriptors are computed mainly from stats over the Spectrum magnitude buffer. Some of them use the spectral range and the spectrum size to normalize bin position into frequencies.

Stats on Nominal Magnitude Buffer: Mean, GeometricMean, Energy, Kurtosis, Skewness, Moment2-6, Tilt
Stats on Nominal Magnitude Buffer (positions normalized to frequency in Hz): Centroid (*binfreq), Slope (/binfreq)
Spectral Flatness: Non ported statistical 10*log(GeometricalMean/Mean)
High Frequency coefficient: sum(xi^2*i)
MaxMagFrequency: MaxPos(X) normalized to frequency in Hz (To be ported to statistics)
LowFrequencyEnergyRelation: Energy relation under 100Hz in %
RollOff:
Spread: Rate of dispersion of the energy over the spectrum relative to the centroid. sum((i-Centroid)*xi)
Irregularity (Not Implemented)
StrongPeak (Not Implemented)
HFC (Not Implemented)
MFCC DataArray (Where?)
BandEnergy DataArray (Where?)

SpectralPeak descriptors

MagnitudeMean: Mean(Magnitudes)
HarmonicCentroid: CrossCenterOfGravity<1>(Frequencies,Magnitudes)
SpectralTilt: Tilt(Frequencies,Magnitudes) converted to linear
HarmonicDeviation: Calculed over the frequencies
FirstTristimulus: Relative energy of the 1st peak
SecondTristimulus: Relative energy of the added 2,3,4 peaks
ThirdTristimulus: Relative energy of the 5th and further peaks
OddHarmonics: Over magnitudes sum(xi|i is odd)
EvenHarmonics: Over magnitudes sum(xi|i is even)
OddToEvenRatio: The ratio OddHarmonics/EvenHarmonics

Frame Descriptors

Are calculated for each kind of subdescriptor if an analog object is present on the related frame.

SpectralPeakD: SpectralPeakDescriptors
SpectrumD: SpectralDescriptors
ResidualSpecD: SpectralDescriptors
SinusoidalSpecD: SpectralDescriptors
AudioFrameD: AudioDescriptors
ResidualAudioFrameD: AudioDescriptors
SinusoidalAudioFrameD: AudioDescriptors
SinthAudioFrameD: AudioDescriptors
Cuidado
MFCCDerivate DataArray [is computed any where]
PCP DataArray
BandDescriptors (SpectralDescriptors Array)
StrongPeak
MorphologicalFrameD: MorphologicalFrameDescriptors

Segment Descriptors:

Contains Frames Descriptors that contains the statistical agregations of every descriptor on the contained FrameDescriptors. There is no subsegment aggregation.

FrameDescriptors aggregations: MeanD, MaxD, MinD, VarianceD
AudioD AudioDescriptors
FramesD FrameDescriptors
Cuidado
FrameDescriptors Aggregations: only Mean Var & Std
ChildrenD SegmentDescriptors
MorphologicalSegmentD MorphologicalSegmentDescriptors
Note [If it exists is a Note Segment?]
Melody [If it exists is a Melody Segment?]

Note

PitchNote (Enum)
FundFreq
Energy
Time (MT)
Attack (MT)
Sustain (MT)
Release (MT)

Melody

Contour (TData List)
Notes (Notes[])
Tessitura
Number of Notes: It is not implicit on Notes
Cuidado
Key (Key)
Intervals []
RelativeDurations []
FFHDescriptors (FFHDescriptors)
DistributionDescriptors (DistributionDescriptors)
PCP []
AscendingIntervals [Should be int?]
DescendingIntervals [Should be int?]
ConstantIntervals [Should be int?]
Dissonance

Distribution Descriptors

They take the MelodySegmentDescriptors and then calculates the following statistics:

Max
Min
Range
Mean
Variance

On the following Note attributes along the Melody Segment children

Fundamental frequency (FundFreq)
Pitch (MIDINote)
Duration
Energy
Interval

Envelope Descriptors

Fundamental Frequency Histograms descriptors

Those descriptors are calculated, firstly by computing two histograms on the fundamental frequency of every frame on a given segment. And then, computing some values on those histograms.

FF Unfolded Histiogram: The unfolded histogram has a bin for every single midi note. Normalized by the size.
FF Folded Histiogram: The folded one has a bin for every fith ( ((midinote%12)*7)%21 ) Normalized by the size.
UnfoldedSum: The sum of all the bins [since they are normalized should be always 1?]
FoldedSum: The sum of all the bins [since they are normalized should be always 1?]
UnfoldedMaxBin: The bin index with maximum value [When the maximum is in one extreme it gives 0]
FoldedMaxBin: The bin index with maximum value [When the maximum is in one extreme it gives 0]
FoldedMaxInterval: The distance between the two greater bins [Should take into acount the circle distance]

Histograms can be computated progressively from a frame fundamental pipeline. Also both maximums can be computed progressively since you only have to compare with the last updated bin. It has the property that the progresive calculation is true for the already computed frames.

Histogram extraction and histogran descriptors computation may be generalized. The folded version is not

Appendix: Aggregating numerical descriptors

Primitives

U: (R -> R):  pow, exp, log, abs, inv, neg
B: (R x R -> R): sum, mult, min, max, dif, div
N: (Rn -> R): size
S: (? -> ?): getFieldArray

vectorizeUnary: U -> (Rn -> Rn)
        vectorizeUnary(U)(X) = [U(x1),U(x2),...]

vectorizeBinary: B -> (Rn x Rn -> Rn)
        vectorizeBinary(B)(X,Y) = [B(x1,y1), B(x2,y2)...]

accumulate: B x R -> (Rn -> R)
        accumulate(B,a)(X) = B(B(...B(B(a,x1),x2)...),xn)

accumulate: B -> (R x Rn -> R)
        accumulate(B)(a,X) = B(B(...B(B(a,x1),x2)...),xn)

accumulate: B -> (Rn -> R)
        accumulate(B)(X)   = B(B(...B(B(x1,x2),x3)...),xn)

partial_accumulate: B x Rn -> Rn
        partial_accumulate(B,X) = [ x1, B(x1,x2), B(B(x1,x2),x3), ]

adjacent: B x Rn -> Rn
        adjacent(B,X) = [ x1, B(x1,x2), B(x2,x3), B(x3,x4),...]

invertArguments: B -> B
        invertArguments(B)(x,y) = B(y,x)

bindFirst: B x R -> U
        bindFirst(B,a)(x) -> B(a,x)

bindSecond: B x R -> U
        bindSecond(B,a)(x) -> B(x,a)

Summary calculations (Rn -> R) highlight calculation that can not be pipelined.