Summary for implemented descriptors
Descriptors sumary
This section list some examples of the kind o features/descriptors that are obtained in current projects.
It is not an extensive list, its purpose is simply to take some use cases.
Stats (Both nominal and absolute)
All those descriptors can be extracted from any raw TData arrays whatever the meaning.
- Raw Moment (nth order): sum(xi^n) / size(X)
- Central Moment (nth order): sum(Mean(X)-xi^n) / size(X) [Also from the moments]
- Center of Gravity (nth order): sum((i*xi^n)) / sum(xi^n)
- Mean: sum(xi) / size(X)
- Standard Deviation: sqrt(sum((xi-Mean(X))^2) / size(X))
- Centroid: sum(i*xi) / sum(xi)
- Variance: sum((xi-Mean(X))^2) / size(X) [Separation from mean]
- Skewness: sum((xi-Mean(X))^3) / pow<3/2>(sum((xi-Mean)^2)) [variance asimetry]
- Kurtosis: sum((xi-Mean(X))^4) / sum((xi-Mean(X))^2)^2 [Degree of peakness]
- Slope: (size(X)*sum(xi*i)-sum(xi)*sum(i)) / (sum(xi)*(size(X)*sum(i^2)-sum(i)^2))
- Tilt: ???
- GeometricalMean: exp(sum(log(xi)) / size(X))
- Energy: sum(xi^2)
- RMS: sqrt(sum(xi^2))
- Max: max(X)
- Min: min(X)
They can also be aplied to objects having defined some arithmethic operators (*/+-).
It is expected that those objects are structs containing TData fields
so that the result of applying such arithmethic operators is an object
of the same type applying the same operator over those TData fields one by one.
Frame Representations
The frame can have several representations:
- AudioFrame: Is a chunck of the original audio. Is taken sequencially with optional overlap.
- LPModel: From AudioFrame thru LPC_Autocorrelation
- Spectrum: From AudioFrame thu Windowing+Shifting+ZeroPadding+FFT
- MelSpectrum: From Spectrum thru MelFilterBank
- MelCepstrum: From MelSpectrum thru CepstralTransform
- PeakArray: From Spectrum thru SpectralPeakDetect
For every peak it has:
- Magnitude
- Phase
- Frequency
- Bin position
- Bin width
- Spectral Envelope: From PeakArray thru SpectralEnvelopeExtract
(Spectrum expresed as BPF)
- Pitch Class Profile (not in CLAM): From Spectrum or From SpectralPeakArray thru PCP
The type is not a CLAM PD, just an array
Their relations are 1 to 1 with an AudioFrame.
Audio Descriptors
Those descriptors are calculated from the subject audio.
Most only depend on the raw data and its size
but some of them also depends on the audio SampleRate.
Some attributes have sense only when the audio represents a bigger audio than a frame,
so when the audio is a frame audio, they are not suposed to be used/aggregated.
Have a lot of sense on audio that only represents a single note/sample.
- Stats on Absolute values: Mean, Variance, Energy
- Stats on Absolute values (positions normalized to time in ms): Centroid (TemporalCentroid)
- Zero crossing rate: Signal sign changes per sample
- Attack Time: The time that the audio takes 2% to the 80% of the maximum absolute value
(Should be moved to segment and be calculated from frame energy?)
- LogAttackTime: The log10 of the previous value
(Should be moved to note segment?)
- Decrease: TOCHECK
(Should be moved to note segment?)
- Decay, Sustain, Release, RiseTime: Not calculed!
(Should be moved to note segment?)
They need SampleRate: (Temporal)Centroid, (Log)AttackTime, Decrease,
Spectral Descriptors
Spectral descriptors are computed mainly from stats over the Spectrum magnitude buffer.
Some of them use the spectral range and the spectrum size to normalize bin position
into frequencies.
- Stats on Nominal Magnitude Buffer:
Mean, GeometricMean, Energy, Kurtosis, Skewness, Moment2-6, Tilt
- Stats on Nominal Magnitude Buffer (positions normalized to frequency in Hz):
Centroid (*binfreq), Slope (/binfreq)
- Spectral Flatness: Non ported statistical 10*log(GeometricalMean/Mean)
- High Frequency coefficient: sum(xi^2*i)
- MaxMagFrequency: MaxPos(X) normalized to frequency in Hz (To be ported to statistics)
- LowFrequencyEnergyRelation: Energy relation under 100Hz in %
- RollOff:
- Spread: Rate of dispersion of the energy over the spectrum relative to the centroid.
sum((i-Centroid)*xi)
- Irregularity (Not Implemented)
- StrongPeak (Not Implemented)
- HFC (Not Implemented)
- MFCC DataArray (Where?)
- BandEnergy DataArray (Where?)
SpectralPeak descriptors
- MagnitudeMean: Mean(Magnitudes)
- HarmonicCentroid: CrossCenterOfGravity<1>(Frequencies,Magnitudes)
- SpectralTilt: Tilt(Frequencies,Magnitudes) converted to linear
- HarmonicDeviation: Calculed over the frequencies
- FirstTristimulus: Relative energy of the 1st peak
- SecondTristimulus: Relative energy of the added 2,3,4 peaks
- ThirdTristimulus: Relative energy of the 5th and further peaks
- OddHarmonics: Over magnitudes sum(xi|i is odd)
- EvenHarmonics: Over magnitudes sum(xi|i is even)
- OddToEvenRatio: The ratio OddHarmonics/EvenHarmonics
Frame Descriptors
Are calculated for each kind of subdescriptor if an analog
object is present on the related frame.
- SpectralPeakD: SpectralPeakDescriptors
- SpectrumD: SpectralDescriptors
- ResidualSpecD: SpectralDescriptors
- SinusoidalSpecD: SpectralDescriptors
- AudioFrameD: AudioDescriptors
- ResidualAudioFrameD: AudioDescriptors
- SinusoidalAudioFrameD: AudioDescriptors
- SinthAudioFrameD: AudioDescriptors
- Cuidado
- MFCCDerivate DataArray [is computed any where]
- PCP DataArray
- BandDescriptors (SpectralDescriptors Array)
- StrongPeak
- MorphologicalFrameD: MorphologicalFrameDescriptors
Segment Descriptors:
Contains Frames Descriptors that contains the statistical agregations
of every descriptor on the contained FrameDescriptors.
There is no subsegment aggregation.
- FrameDescriptors aggregations: MeanD, MaxD, MinD, VarianceD
- AudioD AudioDescriptors
- FramesD FrameDescriptors
- Cuidado
- FrameDescriptors Aggregations: only Mean Var & Std
- ChildrenD SegmentDescriptors
- MorphologicalSegmentD MorphologicalSegmentDescriptors
- Note [If it exists is a Note Segment?]
- Melody [If it exists is a Melody Segment?]
Note
- PitchNote (Enum)
- FundFreq
- Energy
- Time (MT)
- Attack (MT)
- Sustain (MT)
- Release (MT)
Melody
- Contour (TData List)
- Notes (Notes[])
- Tessitura
- Number of Notes: It is not implicit on Notes
- Cuidado
- Key (Key)
- Intervals []
- RelativeDurations []
- FFHDescriptors (FFHDescriptors)
- DistributionDescriptors (DistributionDescriptors)
- PCP []
- AscendingIntervals [Should be int?]
- DescendingIntervals [Should be int?]
- ConstantIntervals [Should be int?]
- Dissonance
Distribution Descriptors
They take the MelodySegmentDescriptors and then calculates the following statistics:
- Max
- Min
- Range
- Mean
- Variance
On the following Note attributes along the Melody Segment children
- Fundamental frequency (FundFreq)
- Pitch (MIDINote)
- Duration
- Energy
- Interval
Envelope Descriptors
Fundamental Frequency Histograms descriptors
Those descriptors are calculated, firstly by computing two histograms
on the fundamental frequency of every frame on a given segment.
And then, computing some values on those histograms.
- FF Unfolded Histiogram: The unfolded histogram has a bin for every single midi note.
Normalized by the size.
- FF Folded Histiogram: The folded one has a bin for every fith ( ((midinote%12)*7)%21 )
Normalized by the size.
- UnfoldedSum: The sum of all the bins [since they are normalized should be always 1?]
- FoldedSum: The sum of all the bins [since they are normalized should be always 1?]
- UnfoldedMaxBin: The bin index with maximum value [When the maximum is in one extreme it gives 0]
- FoldedMaxBin: The bin index with maximum value [When the maximum is in one extreme it gives 0]
- FoldedMaxInterval: The distance between the two greater bins [Should take into acount the circle distance]
Histograms can be computated progressively from a frame fundamental pipeline.
Also both maximums can be computed progressively since you only have to compare with
the last updated bin.
It has the property that the progresive calculation is true for the already
computed frames.
Histogram extraction and histogran descriptors computation may be generalized.
The folded version is not
Appendix: Aggregating numerical descriptors
Primitives
U: (R -> R): pow, exp, log, abs, inv, neg
B: (R x R -> R): sum, mult, min, max, dif, div
N: (Rn -> R): size
S: (? -> ?): getFieldArray
vectorizeUnary: U -> (Rn -> Rn)
vectorizeUnary(U)(X) = [U(x1),U(x2),...]
vectorizeBinary: B -> (Rn x Rn -> Rn)
vectorizeBinary(B)(X,Y) = [B(x1,y1), B(x2,y2)...]
accumulate: B x R -> (Rn -> R)
accumulate(B,a)(X) = B(B(...B(B(a,x1),x2)...),xn)
accumulate: B -> (R x Rn -> R)
accumulate(B)(a,X) = B(B(...B(B(a,x1),x2)...),xn)
accumulate: B -> (Rn -> R)
accumulate(B)(X) = B(B(...B(B(x1,x2),x3)...),xn)
partial_accumulate: B x Rn -> Rn
partial_accumulate(B,X) = [ x1, B(x1,x2), B(B(x1,x2),x3), ]
adjacent: B x Rn -> Rn
adjacent(B,X) = [ x1, B(x1,x2), B(x2,x3), B(x3,x4),...]
invertArguments: B -> B
invertArguments(B)(x,y) = B(y,x)
bindFirst: B x R -> U
bindFirst(B,a)(x) -> B(a,x)
bindSecond: B x R -> U
bindSecond(B,a)(x) -> B(x,a)
- Summary calculations (Rn -> R) highlight calculation that can not be pipelined.