This text is provided as complement for the analysis done to give a general solution in CLAM for sharing description extraction algorithms among projects.
The main idea on this solution is to provide a reusable descriptors container with no compile time semantic bindings. The data type and name of each descriptor are provided on runtime. Algorithms results are inserted in and fetched from the descriptor container by name using a type safe interface.
That solution is implemented using several maps from strings to values. Each map is dedicated to a single data type, so all the TData (float/double) descriptors go inside one different map than those descriptors that are strings. They are also accessed using specialized access methods for each data type.
Supported types are
Serialization can be done directly by storing each descriptor with its name. Deserialization is more complex because we may not know which are the names or the types involved. So we need a kind of data dictionary for a project that defines such things.
The main force of that solution is that algorithms are independent of the final data dictionary definition. Currently, the application itself can insert/retrieve from the DescriptorSet those descriptors that the processing produces or needs. In a port based application this function can be performed by special processings that are configured with the descriptor name to retrieve/insert.
The AudioClas descriptors system was designed in order to reuse other projects descriptors extraction in AudioClas. The final system addresses two problems:
A 'Descriptor' is an object that calculates a given descriptor. Each Descriptor subclass has:
The only thing you have to do is to add a new descriptor is defining a new class and using a registrator to add it to the descriptor factory.
A planner just take each goal descriptor:
All the descriptors, even the spectral data is stored as raw data in two TData pools: one for frame descriptors and another for global descriptors. Frame descriptors pool are stored interleaved, that is, sorted by frame not by descriptors.
One of the highlights of that approach is the plugability of new descriptors.
This approach separates the reference scope of each descriptor and its calculation dependencies. It only takes two scope levels: Frame and Global. The final system should support more scope levels than those.
Data storage is contiguous and sequential. In opposition to DescriptorsSet approach, it gives more speed on descriptor retrieval but because the lookup is done by integer indexing. It also does a name based retrieval on the offset-size-hop to do the indexed lookup but is reused along all the frames.
Most descriptors are obtained on statistics over an array. Most of those descriptors need doing some calculations again and again. This approach intent is to minimize the calculations by catching them.
The data structure Stats contains the cached data. That class also knows how to calculate that from a given Array. Every time a client ask it for a given statistic, the object checks whether it has already calculated such statistic and then return the cached value or calculate it.
The biggest flaw of this solution is the scalability of such system. To aggregate, for example, statistics on frames along one segment. Having an array of statistical descriptors, which are a tuple of descriptors, and because operation are applied over full elements of an Array, they cannot be applied on a single descriptor along an array of tuples of descriptors. The provided solution is to compute the statistic (ie. the mean) on every instantiated descriptor on a descriptor set (ie. SpectralDescriptors) along the array. This is clearly not what we intended.
This system should be highly enhanced by generalizing the interface to collect data. The generalization can be made following two alternative paths:
Any of these solution will provide scalability on calculating statistics on statistics but it still will lack on project extensibility, discrete descriptors and algorithms based on data semantics.