Existing solutions for the descriptors problem
David Garcia

Introduction

This text is provided as complement for the analysis done to give a general solution in CLAM for sharing description extraction algorithms among projects.

DescriptorSet solution (Nir/Miquel)

Intent

The main idea on this solution is to provide a reusable descriptors container with no compile time semantic bindings. The data type and name of each descriptor are provided on runtime. Algorithms results are inserted in and fetched from the descriptor container by name using a type safe interface.

Description

That solution is implemented using several maps from strings to values. Each map is dedicated to a single data type, so all the TData (float/double) descriptors go inside one different map than those descriptors that are strings. They are also accessed using specialized access methods for each data type.

Supported types are

TData (aka double/float depending on the compilation options)
DataArray (aka Array<TData>)
Text (CLAM::Text with modified stream extractors)
ProcessingData *

Serialization can be done directly by storing each descriptor with its name. Deserialization is more complex because we may not know which are the names or the types involved. So we need a kind of data dictionary for a project that defines such things.

Analysis

The main force of that solution is that algorithms are independent of the final data dictionary definition. Currently, the application itself can insert/retrieve from the DescriptorSet those descriptors that the processing produces or needs. In a port based application this function can be performed by special processings that are configured with the descriptor name to retrieve/insert.

Fact sheet

Its a simple but powerful approach
Algorithms fully independent on the data structures
Algorithms are coupled with data but no with data pool
Dependencies must be tracked outside by the programmer
Descriptors vocabulary must be fixed only inside a project
Reusing high level algorithms (that includes DescriptorsSet managing) may drive to coupling again
Serializes and deserializes to XML

AudioClas descriptors (Nicolas Wack)

Intent

The AudioClas descriptors system was designed in order to reuse other projects descriptors extraction in AudioClas. The final system addresses two problems:

Calculation with dependencies
Descriptors storage

Descriptor Abstraction

A 'Descriptor' is an object that calculates a given descriptor. Each Descriptor subclass has:

a method that calculates the descriptor (directly or using CLAM processings)
a method to configure the descriptor calculation parameters
static members that contains meta information regarding the descriptor
- the identifier (a string)
- the type: selects the scope (Frame or Global) and the calculation interface to be used
- the dependencies (a string array)

The only thing you have to do is to add a new descriptor is defining a new class and using a registrator to add it to the descriptor factory.

Calculation with dependencies

A planner just take each goal descriptor:

checks that the descriptor is not already calculated
asks for calculation for dependencies
takes pointers to input dependencies and output on the data pool
ask the descriptor t calculate itself
when the descriptor is frame based the upper is done for each frame

Descriptor pool

All the descriptors, even the spectral data is stored as raw data in two TData pools: one for frame descriptors and another for global descriptors. Frame descriptors pool are stored interleaved, that is, sorted by frame not by descriptors.

Analysis

One of the highlights of that approach is the plugability of new descriptors.

This approach separates the reference scope of each descriptor and its calculation dependencies. It only takes two scope levels: Frame and Global. The final system should support more scope levels than those.

Data storage is contiguous and sequential. In opposition to DescriptorsSet approach, it gives more speed on descriptor retrieval but because the lookup is done by integer indexing. It also does a name based retrieval on the offset-size-hop to do the indexed lookup but is reused along all the frames.

Fact sheet

Extremely simple to add new descriptors to the system
Dependencies are solved dynamically and are based on the descriptor names
Sample files in mind: Two only scopes frame and global
Only one aggregation: From frame to global descriptors
Calculators are independent from the data location
Limited to numerical TData descriptors
Name look up gives parameters for indexed look up
Frame descriptors are interlaced, increasing cache misses: Stored by frame (as in a stream processing) and stored in a per descriptor basis (easy to change with the existing architecture)

BasicStatistics refactoring (Xavi Amatriain)

Intent

Most descriptors are obtained on statistics over an array. Most of those descriptors need doing some calculations again and again. This approach intent is to minimize the calculations by catching them.

Description

The data structure Stats contains the cached data. That class also knows how to calculate that from a given Array. Every time a client ask it for a given statistic, the object checks whether it has already calculated such statistic and then return the cached value or calculate it.

Analysis

The biggest flaw of this solution is the scalability of such system. To aggregate, for example, statistics on frames along one segment. Having an array of statistical descriptors, which are a tuple of descriptors, and because operation are applied over full elements of an Array, they cannot be applied on a single descriptor along an array of tuples of descriptors. The provided solution is to compute the statistic (ie. the mean) on every instantiated descriptor on a descriptor set (ie. SpectralDescriptors) along the array. This is clearly not what we intended.

Fact Sheet

Calculations are cached for efficiency
Calculations dependencies are encapsulated inside the Stats object
Calculations dependencies are hardcoded
Reuse of calculation is hidden to the client
Clients don't need to know dependencies
Always cached thought never reused
Statisticals descriptors should be neutral on data semantics.
Limited to floating point descriptors
It is difficult to scale it up due to the direct use of data array interface
It is difficult to extend for a concrete project (implies modifying the shared class Stats)

Possible enhancements

This system should be highly enhanced by generalizing the interface to collect data. The generalization can be made following two alternative paths:

Choosing an iterator based interface to the data and then providing the custom iterators.
Constructing view classes that can be seen as an array but does the internal way to the data.

Any of these solution will provide scalability on calculating statistics on statistics but it still will lack on project extensibility, discrete descriptors and algorithms based on data semantics.

Existing solutions for the descriptors problemDavid Garcia

Introduction

DescriptorSet solution (Nir/Miquel)

Intent

Description

Analysis

Fact sheet

AudioClas descriptors (Nicolas Wack)

Intent

Descriptor Abstraction

Calculation with dependencies

Descriptor pool

Analysis

Fact sheet

BasicStatistics refactoring (Xavi Amatriain)

Intent

Description

Analysis

Fact Sheet

Possible enhancements

Existing solutions for the descriptors problem
David Garcia