Planet CLAM is a window into the world, work and lives of CLAM hackers and contributors. The planet is open to any blog feed that occasionally relates with CLAM or its brother projects like testfarm. Ask in the devel list to get in.

PLANET CLAM

December 23, 2011

The Recommender Problem & the Presentation Context

In the traditional formulation of the "Recommender Problem", we have pairs of items and users and user feedback values for very few of those dyads. The problem is formulated as the finding of a utility function or model to estimate the missing values.

In many real-world situations, feedback will be implicit** and binary in nature. For instance, in a web page you will have users visiting a url, or clicking on an add as a positive feedback. In a music service, a user will decide to listen to a song. Or in a movie service, like Netflix, you will have users deciding to watch a title as an indication that the user liked the movie. In these cases, the recommendation problem becomes the prediction of the probability a user will interact with a given item. There is a big shortcoming in using the standard recommendation formulation in such a setting: we don't have negative feedback. All the data we have is either positive or missing. And the missing data includes both items that the user explicitly chose to ignore because they were not appealing and items that would have been perfect recommendations but were never presented to the user.

A similar issue has been dealt with in traditional data mining research, where classifiers need to be trained only using positive examples. In the "
Learning Classifiers from Only Positive and Unlabeled Examples" SIGKDD 08 paper, the authors present a method to convert unlabeled examples into both a positive and a negative example, each with a different weight related to the probability that a random exemplar is positive or negative. Another solutions to this issue is presented in the "Collaborative Filtering for Implicit Feedback Datasets" paper by Hu, Koren and Volinsky. In this work, the authors binarize the implicit feedback values: any feedback value greater than zero means positive preference, while any value equal to zero is converted to no preference. A greater value in the implicit feedback value is used to measure the "confidence" in the fact the user liked the item, but not in measuring "how much" the user liked it. Yet another approach to inferring positive and negative feedback from implicit data is presented in the paper I co-authored with Dennis Parra, and I presented in a previous post. There, we argue that implicit data can be transformed to positive and negative feedback if aggregated at the right level. For example, the fact that somebody listened only once to a single track in an album can be interpreted as the user not liking that album.

In many practical situations, though, we have more information than the simple binary implicit feedback from the user. For unlabeled examples that the user did not directly interact with, we can expect to have other information. In particular, we might be able to know whether they were shown to the user or not. This adds very valuable information, but slightly complicates the formulation of our recommendation problem. We now have three different kinds of values for items: positive, presented but not chosen, and not presented. And this is only if we simplify the model. In reality, information related to the presentation can be much richer than this and we might be able to derive data like the probability the user actually saw the item or weigh in different interaction events such as mouse overs, scrolls...



In Netflix, we are working on different ways to add this rich information related to presentations and user interaction to the recommender problem. That is why I was especially interested in finding out that this year's SIGIR best student paper award has been awarded to a paper that addresses this issue. In the paper "Collaborative Competitive Filtering: Learning
Recommender Using Context of User Choice
", the authors present an extension to traditional Collaborative Filtering by encoding into the model not only the collaboration between similar users and items, but also the competition of items for user attention. They derive the model as an extension to standard latent factor models by taking into account the context in which the user makes the decision. That is, the probability I decide to select a given item depends on which are the other items I have as an alternative. Results are preliminary but promising. And, this work is definitely an interesting and appealing starting point for an area with many practical applications.

However, there are many possible improvements to the model. One of them, mentioned by the authors, is the need to take into account the so-called position bias. An item that is presented in the first position of a list has many more possibilities to be chosen than one that is farther down. This effect is well-known in the search community and has been studied from several angles. I would recommend, for instance to read some of the very interesting papers on this topic by Thorsten Joachims and his students. In the paper “How Does Clickthrough Data Reflect Retrieval Quality?”, for instance, they show how arbitrarily swapping items in a search result list has almost no effect. This proves that the positioning of the element can be a most important factor than how relevant the item is.

I would love to hear of other ideas or approaches to deal with this new version of the recommender problem that includes, and would encourage researchers in the area to address an issue of huge potential impact.

**Note: I am using the word implicit here in the traditional sense in the recommendation literature. The truth is that a user selecting an item is in fact explicit information. However, it can be considered implicit in that the user is informing about the preferences indirectly by comparing the item to others in a context.

December 04, 2011

November 03, 2011

Recsys 2011 - Notes and Pointers


I found Recsys this year of very high quality in general. There were many good papers and presentations. The Industry track was also very high-quality, with very interesting talks from companies such as Twitter, Facebook, or eBay. Jon Sanders and I also gave two presentations explaining how recommendations have evolved since the Netflix Prize (more on this soon).

Here are my rough notes with pointers to some papers I considered especially interesting. I have grouped them in 5 categories that I think summarize the main topics in the conference: (1) Transparency and explanations, (2) Implicit Feedback, (3) Context, (4) Metrics and evaluation, and (5) Others. Note that the selection is completely biased towards my personal interests.

(1) TRANSPARENCY & EXPLANATIONS. One of the recurring themes was the fact that user trust and perceived quality of the recommendations was very much influence not by accuracy alone, but by how transparent the system was, and the amount of "explanations" that were added.
  • Daniel Tunkelang(LinkedIn) did a very interesting tutorial on "Recommendations as a Conversation with the User", where he focused on these kinds of issues. See his slides in his blog.
  • Neel Sundaresan (eBay) also stressed in his keynote that adding explanations can sometimes be more important than getting the recommendation right.
  • In the paper "Each to His Own: How Different Users Call for Different Interaction Methods in Recommender Systems", the authors found that depending on how experts are users in the domain, they prefer different kind of recommendations and interaction models. For example, in one of the extremes, novices, prefer top-10 non-personalized to their personalized recommendations. In general a hybrid model of interaction is better than either implicit or explicit-only.
(2) IMPLICIT FEEDBACK. A lot of papers this years on using implicit consumption data instead of (or in combination with) ratings.
(3) CONTEXT. There were 2 workshops (CARS and CAMRA), and several papers in the main conference, talking about how to add contextual information for the recommendations:
  • "The Effect of Context-Aware Recommendations on Customer Purchasing Behavior and Trust" is an interesting paper, focusing on the evaluation side. They include an A/B test for measuring the effect of context-aware recommendations. Using context increased overall sales in $ but not in number. Therefore, users tend to spend more $ per item.
  • In the CAMRA workshop, many papers (such as "Temporal Rating Habits: A Valuable Tool for Rater Differentiation" or "Identifying Users From Their Rating Patterns") were related to how to identify who the author of a rating in a household was, since this was one of the tasks for the contest.
  • Also related to group recommendations, "Group Recommendation using Feature Space Representing Behavioral Tendency and Power Balance among Members", tries to model what is a good recommendation for a group where each of the individuals does not have the same influence.
(4) METRICS and EVALUATIONS: There were several papers that offered different ways to measure accuracy for top-N ranked recommendations.
  • "Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems" presents an interesting framework that includes metric for measuring not only accuracy, but also novelty, diversity....
  • "Item Popularity and Recommendation Accuracy" is an interesting work on how to remove popularity bias from accuracy metrics. A user study validates the fact that recall measure is correlated with user perceived quality of recommendation. Besides proposing a recall metric that removes popularity bias, he also proposes a popularity stratified training method that weights negative examples according to how popular they are.
  • "Evaluating Rank Accuracy based on Incomplete Pairwise Preferences" proposes a measure called expected discounted rank correlation for the specific case of implicit feedback.
(5) OTHERS
  • eBay and UCSC presented "Utilizing Related Products for Post-Purchase Recommendation in E-commerce". The paper won the best poster award
  • There were many papers on Social Recommendations. Just to name one, in "Power to the People: Exploring Neighbourhood Formations in Social Recommender Systems", they did a user study to figure out how much users would like and trust recommendations coming from different user groups (those they decided, friends, everyone...). Interestingly, the method of choice did not make much difference... until you told the users what it was.
  • In "Wisdom of the Better Few: Cold Start Recommendation via Representative based Rating Elicitation" they discussed how to select most imformative users and items for cold start. I was surprised to see that our "Wisdom of the Few" approach got paraphrased in a paper title.
  • There were a couple of very interesting workshops on Music Recommendations and Mobile Recommendations that I had to miss since I was attending others. But, they are definitely worth looking into if you are into music or mobile.

September 06, 2011

CLAM binders, ipyclam and PyQt

I just finished a pair programming session with Xavier Serra. He is going to defend his Final Career Project on Friday and we are preparing some cool demos involving binding ipyclam and pyqt. It has been an exciting session because as we debunked one by one the show stoppers that we were finding, we realized that the potential of the tool mix is wider than we thought.

What we can do now is to build a network with ipyclam, building an interface by combining pyqt and ui files and binding them to that network, all that either in live by using the interactive shell or using a written script. This formula adds the flexibility of scripting to CLAM prototyping system, raising the ceiling of what you can do without raising too much the learning threshold.

For example, let's play a file in a loop with ipyclam:

import ipyclam
n=ipyclam.Network(ipyclam.Clam_NetworkProxy())
n.file = n.types.MonoAudioFileReader
n.out = n.types.AudioSink
n.file.Loop = 1
n.file.SourceFile = "jaume-voice.mp3"
n.file > n.out

n.backend="PortAudio"
n.play()

Then you can instantiate an Oscilloscope, bind it and play:

from PyQt4 import QtCore, QtGui
a=QtGui.QApplication([])

w=n.createWidget("Oscilloscope")
w.show()
w.setProperty("clamOutPort", "file.Samples Read")

n.stop()
n.bindUi(w)
n.play()

You can also load an UI file and bind it simultaneously..

w2 = n.loadUi("myui.ui")
w2.show()
n.stop()
n.bindUi(w2)
n.play()

Adding a tonal analysis and some related views

n.stop()
n.tonal = n.types.TonalAnalysis
n.file > n.tonal
w3 = n.createWidget("CLAM::VM::KeySpace")
w3.setProperty("clamOutPort", "tonal.Chord Correlation")
n.bindUi(w3)
w3.show()
n.play()

The enabler of the mix has been the new binder architecture in CLAM. Until 1.4, the Prototyper used what we call 'binders' to relate processing elements with user interface elements. Each binder concentrates in a given kind of binding: oscilloscopes, control sliders, transport buttons...

For the upcomming CLAM 1.5 the binder interface (CLAM::QtBinder) has been redefined and moved out from the Prototyper to the qtmonitors module. So now you can use them in any CLAM application. The bindings are now based on Qt dynamic properties which provide more flexibility than former QObject name mangling.

Currently there are several binders implemented:
* Action/Button -> launch a configurator dialog on some processing
* Action/Button -> launch a open dialog for a MonoFileReader
* Checkable -> Send a bool, or a bistable float control
* Slider -> maped float/int control
* Any Monitor -> outPort to monitor
* ControlSurface -> Send pair of controls
* Slider -> ProgressControl or a MonoFileReader or similar to control

Now you can extend those binder by means of plugins as we extended processings. You can use Qt dynamic properties in the interface elements to specify how the binding is done.

The abstract interface provides the static method QtBinder::bindAllBinders() which is the one called by python bindUi. This looks for every ui element in the QObject hierarchy, looks for every registered binder and if they match it is applied.

To implement your own handle you just have to rewrite the bool handles(QObject*) method that returns true if the object managed by the binder (type, name, presence of properties...) and the bool bind(QObject * uiElement, Network & network, QStringList & errors) which does the actual binding. To register the binder, you just have to instanciate one as static variable of a cxx file in your library.

All that is still being building up but its potential seems clear, so i hope that this new way of claming will be useful to you all.

ipyclam: interactive python console for CLAM networks manipulation

Xavier Serra (not the MTG head, but an homonym student at the UPF) is about to reach an interesting milestone on his Final Career Project related to CLAM. He is developing an interactive Python console for CLAM named ipyclam. It is a text based interface to explore and manipulate CLAM networks. Hopefully this will end being a dock widget within the Network Editor but it is also useful as interactive text console tool and as programming interface, which are the first useful outputs Xavi is about to get. ipyclam comes to cover the need of manipulating complex networks in cases when point and click is just tedious and also that of generating parametrized networks.

Let's make clear that it is not a full wrapper for the CLAM API like Hernan's pyclam, ipyclam just enables high level operations you usually do with Network Editor: instantiating, connecting, configuring and playing modules. Moreover we have intentionally decoupled ipyclam API to any existing CLAM API in order to define the most pythonic, convenient, and clean API we could think. For example the code to build an stereo cable would be:

net = Network()
net.processing1 = net.types.AudioSource
net.processing1.NOuputs = 2
net.processing2 = net.types.AudioSink
net.processing2.NInputs = 2

net.processing1.Output1 > net.processing2.Input1
net.processing1.Output2 > net.processing2.Input2

net.play()

An interesting feature, is that network.code() just returns an string with similar python code, so a network knows how to regenerate itself. This opens the door to python based format for CLAM networks. Bye bye XML!.

Convenience? Yes, please. Connections can be done in batch, broadcasting connections from all the connectors in one processing to all or to a single connection of another:

net.processing1 > net.processing2 # all to all
net.processing1.Output1 > net.processing2 # one to all
net.processing1 > net.processing2.Input1 # all to one

Convenience is good but all things should be doable. Often most convenient solution is not the best suited for all the cases. Accessing configuration parameters and connectors of any kind directly from proccessing is quite convenient but names can collide, to solve that some special attributes are provided to restrict the name scope.

network.processing1._config.NOutputs
network.processing1._inports.InPort1
network.processing1._outports.OutPort1
network.processing1._incontrols.InControl1
network.processing1._outcontrols.OutControl1

Connection names can have spaces, and attributes cannot, so we also provide an alternative subscript syntax.

net.processing1['Output 1']

One of the features I like the more is tab key exploration.
  • network.[tab] and you get the list of processings
  • network.processing._inports[tab] and you get the list of input ports
  • network.processing._config[tab] and you get the list of configuration parameters
  • network.processing.[tab] and you get connectors and cofiguration parameters
  • network.types.[tab] and you get the list of available processing types

It works for interactive console with tab completion like ipython. We also provide ipyclam_console, an ipython based shell conveniently importing modules and adding a network to start playing with.

For more details on the API just check this document.

The implementation


ipyclam is defined in two layers. The front-end layer, purely written in python, makes the convenient API happen: Exploration, dynamic object attributes, many interfaces to similar operations... This upper layer, relies on the lower, less pythonic and more purpose oriented one, the network proxy layer, which provides access to the state and operations of the actual network with quite fewer entry points.

Xavi defined the front-end layer while evolving, in parallel, a dummy proxy layer simulating network status with basic python structures. That gave us the flexibility to explore and evolve both the frond-end and the proxy interface without bothering about the actual CLAM implementation that would have made such exploration quite rigid.

Two layers approach is also convenient because by defining the ops in the network proxy layer, we are able to reuse the front-end API with any other patching like environment. I am thinking on JACK or environments like ingen.

Status and road map


Xavi just finished most important bits of the API: Inspection, module creation, module type enumeration and module inter-connection. Currently we still rely on a dummy network proxy, but a CLAM based proxy will arrive during next week. Configuration API seems to be controversial and we are holding it until we have something implemented with CLAM.

Once we have a full CLAM based proxy the next big goal is to integrate the console in NetworkEditor. We need a console widget for Qt which is able to execute a python interpret. We found some candidates:
  • qconsole which already has a python wrapper
  • qtermwidget a more robust console (based on KDE Konsole) but without direct support for python, and
  • ipython qt frontend quite experimental and using a weird threading model, but still interesting as ipyclam_console already uses ipython and it is awesome

So either candidate seems to require some work on our side.

Well, I hope you will find this new CLAM feature as amusing as I do. Xavier is really doing quite a good job, congrats and keep it that high.

July 31, 2011

Joining Netflix

Three weeks ago, I started to work for Netflix. Everything has moved so fast with so many things to do and learn that it seems like I have already been here for a much longer time!

I am now working as the manager of a small team working on recommendations & personalization in the company that promoted recommender systems research to major headlines thanks to the Netflix Prize. It also feels great to come to the company in an exciting time when it has just reached its 25th million customer and is starting its international expansion to Latin America.

All the fuzz created around the Netflix Prize might lead some to believe that rating prediction is all there is to Netflix suggesting a given movie. However, I was happy to find out that rating prediction is only one of the many signals that my team uses in creating the final suggestions.

Awesome place, awesome people, and awesome time to be around. And, btw, we are hiring, so let me know if you are interested in joining. (Update: it seems that the jobs link is currently not active outside US/CA... I'm working on getting this fixed)


July 21, 2011

Walk the Talk: On the Combination of Implicit and Explicit Feedback

Last week, Denis Parra presented our paper entitled "Walk the Talk: Analyzing the Relation between Implicit and Explicit Feedback for Preference Elicitation" at the UMAP conference. The paper won Denis the best-student paper award (Congratulations!).

The paper presents our initial work in analyzing the relation between implicit and explicit feedback. In short, the main question we wanted to answer is how does the self-reported preferences users give in a typical 5-star interface relate to what they actually do when looking at their consumption patterns. Our hypothesis was that there should exist simple models that relate both kinds of feedback. Finding a way to robustly convert implicit feedback into explicit ratings would open up the door to applying well-known methods with implicit feedback. But, much more importantly, we could then combine both kinds of input in a single model.

In order to test our hypothesis, we prepared an experiment in the music domain. We asked last.fm users to take a survey in which we queried them about how much they liked albums that were already in their listening history. With this data in hand, we could analyze the relation between implicit and explicit feedback and try to fit a simple model.

I recommend you read the full paper if you want to get the longer story of our findings, but here is a brief summary:
  • There is a strong correlation between implicit feedback and self-reported preference (see figure below)
  • Variables such as recentness of interaction or overall popularity do not have significant effect. Note that in a previous study by Salganik & Duncan Watts, global popularity was found to affect users perceived quality. However, in that case and as opposed to ours, users were made aware of the popularity.
  • Interaction effect: When listening to music, some people prefer to listen to isolated songs or albums. The way they interact with music, affects the way they report their taste.

After our analysis, we then construct a linear model that takes into account these variables by performing a linear regression. Once we have built these models, we can evaluate their performance in a regular recommendation scenario by measuring the error in predicting ratings in a hold-out dataset.

This paper represents an initial but very promising line of work that we have already improved in several ways such as the use of logistic instead of linear regression to account for the non-linearity of the rating scale or the use of the regression model as a way to combine both implicit and explicit feedback. But I will leave those findings for a future post.

June 17, 2011

June 15, 2011

Como se hizo…

http://www.planetatortuga.com/noticias.item.3875/los-violentos-en-#parlamentcamp-son-policias-infiltrados.-ver-fotos-rt-plz.html

De ahí:
(aquí había un video en alta calidad, que tenía cientos de miles de visitas y cientos de comentarios… y fue borrado: http://www.youtube.com/embed/YcmvzRvsf8g…. va otra copia en menor calidad)

Actualización: otro link hablando de lo mismo: http://jmgoig.wordpress.com/2011/06/15/estrategias-del-poder-para-desprestigiar-movimientos-sociales-el-caso-parlamentcamp/

April 04, 2011

Ubuntu PPA for CLAM

For the convenience of Ubuntu users, we deployed a personal package archive (PPA) in launchpad.

https://launchpad.net/~dgarcia-ubuntu/+archive/ppa

Instructions available at the same page. It currently contains libraries, extension plugins, NetworkEditor and Chordata packages for maverick, and platforms i386 and amd64.


February 16, 2011

Numpy arrays to video

I found some examples on how to generate video with python by piping frames to external programs like ffmpeg or mencoder. All those snippets encode each frame as a given image format (png, jpg...) by using PIL, matplotlib.. to get them. Besides the dependency, building figures or encoding images can be quite slow. I finally found a way of piping raw numpy buffers as frames to mencoder. You can use the following code:


import subprocess
import numpy as np

class VideoSink(object) :
def __init__( self, size, filename="output", rate=10, byteorder="bgra" ) :
self.size = size
cmdstring = ('mencoder',
'/dev/stdin',
'-demuxer', 'rawvideo',
'-rawvideo', 'w=%i:h=%i'%size[::-1]+":fps=%i:format=%s"%(rate,byteorder),
'-o', filename+'.avi',
'-ovc', 'lavc',
)
self.p = subprocess.Popen(cmdstring, stdin=subprocess.PIPE, shell=False)

def run(self, image) :
assert image.shape == self.size
# image.tofile(self.p.stdin) # should be faster but it is indeed slower
self.p.stdin.write(image.tostring())
def close(self) :
self.p.stdin.close()


It is a tenth faster than the PIL based one. And there is no fair comparison to the matplotlib one. I got it working with mencoder but I could not figure out how to make it with ffmpeg.

You can find that code with an usage example there. Being based on other public domain snippets consider it also public domain.

I am using that class to save the output of my Freenect based project. More on that on next entries.

September 20, 2010

High abstraction level audio plugins specification (and code generation)

If you ever wrote at least 2 audio plugins in your life, for sure you have noticed you had to write a lot of duplicated code. In other words, most of the times, writing a plugin there is very little … Continue reading

Some experience with CLAM inside an audio club at FIUBA, Argentina

(Note: I wrote this as something to tell to the clam-devel mailing list about some of my source-code commits) About eight months ago, there was a foundation of something like an “audio club” in my university [1]. As soon i … Continue reading

March 08, 2010

CLAM Chordata 1.0

screenshot

The CLAM project is pleased to announce the first stable release of Chordata, which is released in parallel to the 1.4.0 release of the CLAM framework.

Chordata is a simple but powerful application that analyses the chords of any music file in your computer. You can use it to travel back and forward the song while watching insightful visualizations of the tonal features of the song. Key bindings and mouse interactions for song navigation are designed thinking in a musician with an instrument at hands.

Chordata in live: http://www.youtube.com/watch?v=xVmkIznjUPE
The tutorial: http://clam-project.org/wiki/Chordata_tutorial
Downloat it at http://clam-project.org

This application was developed by Pawel Bartkiewicz as his GSoC 2008 project, by using existing CLAM technologies under a more suited interface which is now Chordata. Please, enjoy it.


CLAM 1.4.0, 3D molluscs in the space


The CLAM project is delighted to announce the long awaited 1.4.0 release of CLAM, the C++ framework for audio and music, code name 3D molluscs in the space.

In summary, this long term release includes a lot of new spacialization modules for 3D audio; MIDI, OSC and guitar effects modules; architectural enhancements such as typed controls; nice usability features for the NetworkEditor interface; convenience tools and scripts to make CLAM experience better; enhanced building of LADSPA plugins and new support for LV2 and VST plugin building; a new easy to use application to explore songs chords called Chordata; many optimizations, bug fixing and code clean ups.

Many thanks to the people who contributed to this release, including but not limited to the GSoC 2008 students and all the crew at Barcelona Media’s Audio Group.

Some details follow:

  • Chordata is a new CLAM application which offers a user friendly way to explore the chords of your favourite songs, using already existing technology in the CLAM framework but with a much simpler interface. Video
  • The spacialization module and helper tools, contributed by Barcelona Media audio group, turn CLAM in tandem with Blender and Ardour, into a powerful 3D audio authoring and exhibition platform. Here you can see some related Videos.
  • Typed controls extend CLAM with the ability to use whichever C++ type as the message for a control. So, not just floats, but also bools, enums, integers, or envelopes can be sent as asynchronous controls. Examples on boolean and MIDI controls are provided.
  • NetworkEditor has been ported to the QGraphicsView framework. Dealing with heavy networks such the big ones used in Barcelona Media have pushed many usability enhancements into its interface: multi-wire dragging, wire highlighting, default port and control actions, network and in-canvas documentation… Video
  • It also made necessary to provide a tool such clamrefactor.py to perform batch high level changes to clam network XML files such as renaming processing types, ports, or configuration parameters, changing configuration values, duplicating sets of processings, connecting them…
  • Music Annotator application now is designed to aggregate several sources of descriptors and update them after edit. Descriptors are mapped to a work description schema that can be graphically defined. Also semantic web descriptor sources to access webservices such as MusicBrainz have been implemented.

You can download them from the download page. Source, windows, debian and ubuntu packages are available. Contributed binaries for other platforms are welcome.

See also: development screenshots, the CHANGELOG, the version migration guide and the new CLAM group on youtube.


January 26, 2010

Libraries becoming plugins, plugins becoming libraries

Let me post about some of the latest changes on the CLAM build system.

The CLAM project has been distributing two kinds of shared libraries: the main library modules (core, audioio and processing) and a set of plugins which are optional (spacialization, osc, guitareffects...). Plugins are very convenient since programs like the NetworkEditor can be extended without recompile and enable third party extensions. They are so convenient that we were seriously considering splitting processing and auidioio as plugins.

But the use of content of a clam plugin is limited to abstract Processing interface. No other symbols are visible from other components. That has a serious impact on the flexibility to define plugin boundaries. For example, if a plugin defines a new data token type, any processings managing that token type should be in that same plugin. To use symbols of one plugin from another implies installing headers, provide soname, linker name and so on. That is, building a module library.

So, if plugins want to be libraries and libraries want to be plugins, let them all be both. Each module will be compiled as library (libclam_mymodule.so.X.Y.Z), with soname (libclam_mymodule.so.X.Y), linker link (libclam_mymodule.so), headers (include/CLAM/mymodule/*), pc file... and it will also provide a plugin library (libclam_mymodule_plugin.so) which has no symbol by itself but is linked against the module library. When a program loads such a plugin, all the processings in the module library become available via the abstract interface. On the other side, if program or a module needs to use explicitly the symbols of another module, it just has to include the headers and link against the module library.

I just applied the changes to the plugins and seems to work nicely. Adapting the main modules will be harder because the old sconstruct file, but it is a matter of days and we can split them. I moved a lot of common SCons code to the clam.py scons tool. There is a cute environment method called ClamModule which generates a clam module including the module library, soname, linker name, plugin, pkg-config, and install targets. I also added some build system freebies for the modules and applications:

  • All the intermediate generated code apart in a 'generated' directory
  • Non verbose command line
  • Colour command line
  • Enhanced scanning methods with black lists

Well, do not rely too much on the current clam scons tool API. Changes are still flowing through the subversion. Of course, warn me if I broke something. ;-)

April 03, 2009

December 10, 2008

Chord Segmentation: first results are here!

Hullo Planet!
Three months after starting this blog, finally the first post...
...because finally I have something nice to show off.

My Google Summer Of Code task is enhancing realtime chord extraction in CLAM. So far I've been working on small changes, refactorings, etc. But now I took a small break from that to check whether I can really improve the chord segmentation.

The chord extraction algorithm in CLAM is really good but has a very "raw" output - not exactly something one could use to learn the chords of a favourite song. A big part of my GSoC task was changing this. And the first results are here:

The screenshot shows ChordExtractor output as viewed with the Annotator. The song being analysed is Debaser-WoodenHouse.mp3. The upper half of the screenshot shows the old output (to be exact the ChordExtractor from current svn, as extracted with my computer using fftw3). The lower half shows the new improved segmentation (notice the chord segments are much bigger, not that, well - segmented).

Problem is - this code exists only in my sandbox for now... I unfortunately reverted to my old pre-svn methods of programming - more or less just jabbing at the code as long as the number of segfaults stays manageable (just one with this code, shows how simple the changes are!). The next few days will hopefully see it cleaned and committed to the svn.

What this new improved segmentation actually does ...

Some chords are very similar to others i.e. C# Minor differs from A Major by just one note (G# exchanged for A). When you play just the two common notes for the first 5 seconds and then a full chord for the next 5, you'll know that you're not really changing the chords... but the old algorithm would probably show you a mix of both chords during the first 5 seconds.

The new algorithms calulates a chord similarity matrix and takes this similarity into account when deciding whether a new segment really needs to be inserted. This is enough to produce the results above. I still hope this simplicity will allow some nice improvements... but this is still to be seen (hopefully before the GSoC deadline, *gulp*).

For anyone wishing to see the results, links to the new and old ChordExtractor .pool files for the songs that come as examples with Annotator:
Debaser, Wooden House, old
Debaser, Wooden House, new
Debaser, Coffee Smell, old
Debaser, Coffee Smell, new