Planet CLAM is a window into the world, work and lives of CLAM hackers and contributors. The planet is open to any blog feed that occasionally relates with CLAM or its brother projects like testfarm. Ask in the devel list to get in.

PLANET CLAM

June 30, 2014

Pythonic access to audio files: python-wavefile

Last week, python-wavefile received a pull request from the PyDAW project to make it compatible with Python3. So, I awaked the project to pull the contributions and addressing some of the old pending tasks.

I did not realize python-wavefile got more relevance than most of my github projects: Other people, not just me, are actually using it, and that’s cool. So I think I owe the project a blog entry… and maybe a logo.

python-wavefile is a Python module to read and write audio files in a pythonic way. Instead of just exposing the C API of the powerful Eric De Castro Lopo’s libsndfile, it enables common Python idioms and numpy bridging for signal processing. There are many Python modules around wrapping libsndfile including an standard one. At the end of the article I do a quick review of them and justify why i did yet-another libsndfile Python wrapper.

History

This module was born to cover the needs I had while doing research for my PhD thesis on 3D audio. I needed floating point samples and multi-channel formats for Higher Order Ambisonics and multi-speaker mix-down. I also needed efficient block processing, as well as the inefficient, but sometimes convenient, Matlab-like load-it-all functionality.

This is why I proposed Xavi Serra, when he was starting his Master Thesis, a warm-up exercise: Mocking up Python bindings for the libsndfile library using different methods: Cython, CPython module, Boost, CTypes, SIP… That exercise resulted in several mock-ups for each binding method, and an almost full implementation using CPython, based on the double layer strategy Xavi finally used for iPyCLAM: A lower narrow layer making the C API available to Python as is, and a user layer adding the Python sugar.

As we evolved the wrapper towards the user layer we wanted, CPython code became too complex. So I created python-wavefile by reimplementing the user API we defined with Xavier Serra but relying on the C-API wrapping defined in libsndfile-ctypes.

Python-wave, the official API and the root of all evil

Why do we do that? The root of all evil is Python official module to deal with wave files. It is based on libsndfile as well, but the Python API is a crap, a real crap:

  • :-) As standard lib it is available on every Python install, but…
  • :-( It has nasty accessors like getcomptype, getsampwidth
    • Names with a hard to read/remember combination of shorts
    • Using getters instead of properties
  • :-( It just opens WAV files, and none of the many formats libsndfile supports
  • :-( It just opens Mono and Stereo audio.
  • :-( It just opens some limited encodings.
  • :-( Data is passed as coded byte strings.
    • On writting, users are responsable of encoding samples which is a low level and error prone task.
    • Even worse, on reading, users have to implement decoding for every kind of encoding available.
    • Libsndfile actually does all this stuff for you, so why the hell to use the raw interface?
  • :-( It ignores Python constructs and idioms:
    • Generators to access files progressively in iterations
    • Context managers to deal safely with file resources
    • Properties instead of getters and setters
  • :-( It allocates a new data block for each block you read, which is a garbage collector nightmare.
  • :-( It has no support for numpy
    • A core lib cannot have a dependency on numpy but it is quite convenient feature to have to perform signal processing

Because of this, many programmers built their own libaudiofile wrapper but most of them fail for some reason, to fulfill the interface I wanted. Instead of reinventing the wheel I reused design and even code from others. At the end of the article I place an extensive list of such alternatives and their strong and weak points.

The API by example

Let’s introduce the API with some examples.

To try the examples you can install the module from PyPi repositories using the pip command.

$ pip install wavefile

Notes for Debian/Ubuntu users:

  • Use sudo or su to get administrative rights
  • If you want to install it for Python3 use pip3 instead

Writting example

Let’s create an stereo OGG file with some metadata and a synthesized sound inside:

from wavefile import WaveWriter, Format
import numpy as np

with WaveWriter('synth.ogg', channels=2, format=Format.OGG|Format.VORBIS) as w :
    w.metadata.title = "Some Noise"
    w.metadata.artist = "The Artists"
    data = np.zeros((2,512), np.float32)
    for x in xrange(100) :
        # Synthesize a kind of triangular sweep in one channel
        data[0,:] = (x*np.arange(512, dtype=np.float32)%512/512)
        # And a squared wave on the other
        data[1,512-x:] =  1
        data[1,:512-x] = -1

        w.write(data)

Playback example (using pyaudio)

Let’s playback a command line specified audio file and see its metadata and format.

import pyaudio, sys
from wavefile import WaveReader

p = pyaudio.PyAudio()
with WaveReader(sys.argv[1]) as r :

    # Print info
    print "Title:", r.metadata.title
    print "Artist:", r.metadata.artist
    print "Channels:", r.channels
    print "Format: 0x%x"%r.format
    print "Sample Rate:", r.samplerate

    # open pyaudio stream
    stream = p.open(
            format = pyaudio.paFloat32,
            channels = r.channels,
            rate = r.samplerate,
            frames_per_buffer = 512,
            output = True)

    # iterator interface (reuses one array)
    # beware of the frame size, not always 512, but 512 at least
    for frame in r.read_iter(size=512) :
        stream.write(frame, frame.shape[1])
        sys.stdout.write("."); sys.stdout.flush()

    stream.close()

Processing example

Let’s process some file by lowering the volume and changing the title.

import sys
from wavefile import WaveReader, WaveWriter

with WaveReader(sys.argv[1]) as r :
    with WaveWriter(
            'output.wav',
            channels=r.channels,
            samplerate=r.samplerate,
            ) as w :
        w.metadata.title = r.metadata.title + " (dull version)"
        w.metadata.artist = r.metadata.artist

        for data in r.read_iter(size=512) :
            sys.stdout.write("."); sys.stdout.flush()
            w.write(.8*data)

read_iter simplifies the code by transparently:

  • allocating the data block for you,
  • reusing such block for each read and thus reducing the memory overhead, and
  • returning a slice of it when the last incomplete block arrives.

Masochist example

If you like you can still do things by hand using a more C-ish API:

import sys, numpy as np
from wavefile import WaveReader, WaveWriter

with WaveReader(sys.argv[1]) as r :
    with WaveWriter(
            'output.wav',
            channels=r.channels,
            samplerate=r.samplerate,
            ) as w :
        w.metadata.title = r.metadata.title + " (masochist)"
        w.metadata.artist = r.metadata.artist

        data = r.buffer(512)   # equivalent to: np.empty((r.channels,512), np.float32, order='F')
        nframes = r.read(data)
        while nframes :
            sys.stdout.write("."); sys.stdout.flush()
            w.write(.8*data[:,:nframes])
            nframes = r.read(data)

Notice that with read you have to reallocate the data yourself, the loop structure is somewhat more complex with duplicated read inside and outside the loop. You also have to slice to the actual number of read frames since the last block usually does not have the size you asked for.

The API uses channel as the first index for buffers. This is convenient because usually processing splits channels first. But audio files (WAV) interleaves samples for different channels in the same frame:

f1ch1 f1ch2 f2ch1 f2ch2 f3ch1 f3ch2 ...

Reads are optimized by using a read buffer with Fortran order (F). Numpy handles the indexing transparently but for the read buffer, and just for the read buffer we recommend to use the buffer() method. That’s not needed for the rest of buffers, for example, for writting and you don’t have to worry at all if you are using the read_iter API.

Load and save it all interface

This interface is not recommended for efficient processing, because it loads all the audio data in memory at once, but is sometimes convenient in order to have some code quickly working.

import wavefile

samplerate, data = wavefile.load("synth.ogg")

data = data[::-1,:] # invert channels

wavefile.save("output.flac", data, samplerate)

New introduced features

Python 3 support

That was the pull request from Jeff Hugges of the PyDAW project. Thanks a lot for the patches!

We managed to make Python 3 code to be also compatible with Python 2. So now the same code base works on both versions and passes the same tests.

Unicode in paths and tags

Besides Python3 compatibility, now the API deals transparently with Unicode strings both for file names and text tags such as title, artist…

If you encode the string before passing it to the API, and pass it as a byte string, the API will take that encoding with no question and use it. More safe is just passing the unicode string (unicode in Py2 and str in Py3). In that case the API encodes or decodes the string transparently. In the case of filenames, it uses the file system default encoding available to Python as sys.getfilesystemencoding(). In the case of text tags, it will use UTF-8 which is the standard for Vorbis based files (ogg, flac…).

WAV’s and AIFF standard just specifies about ASCII strings and I had my concerns about using UTF-8 there. After a discussion with Eric de Castro, we settled that UTF-8 is a safe option for reading and a nice one to push as de facto standard, but I am still not confident about the later. The alternative would have been raise a text encoding exception whenever a non ASCII character is written to a WAV/AIFF. I am still open to further arguments.

Seek, seek, seek

I also added API to seek within the file. This enables a feature a user asked like reseting the file reading and being able to loop. I was uncertain about libsndfile behaviour on seek. Now such behaviour is engraved on API unit tests:

  • Seeks can be a positive or negative number of frames from a reference frame
  • Frames are as many samples as channels, being a sample a digitally encoded audio level
  • The reference point for the seeking can be the beginning (SET), the end (END) or the current next sample to be read (CUR)
    • That is, if your last read was a 10 frame block starting at 40, your current seek reference is 50
  • Seek returns the new frame position to be read if the jump is successful or -1 if not.
  • Jumps to the first frame after the last frame do not fail, even though that frame does not exist.
  • EOF status resets whenever you successfully seek

Why yet another…

A list of alternative implementations follow.

Official python-wave

Nothing to see. It is crap.

scikits.audiolab

  • Author: David Cournapeau
  • Web: http://cournape.github.io/audiolab/
  • PyPi: https://pypi.python.org/pypi/scikits.audiolab/
  • Source: git clone https://github.com/cournape/audiolab
  • Wrap: Cython
  • :-) Property accessors to format metadata and strings
  • :-) Matlab like functions
  • :-) Block processing
  • :-) Numpy integration
  • :-) Enumerable formats
  • :-( Not in-place read (generates a numpy array for each block)
  • :-( No context managers
  • :-| Part of a huge library (no dependencies, though)

ewave

  • Author: C Daniel Meliza
  • Web: https://github.com/melizalab/py-ewave
  • PyPi:https://pypi.python.org/pypi/ewave
  • Source: git clone git@github.com:melizalab/py-ewave.git
  • Wrap: Pure Python (not based on libsndfile)
  • :-( Just WAV’s and limited encodings (no 24bits)
  • :-) Support for floating point encodings, multichannel,
  • :-) Memory mapping for long files
  • :-) Numpy support
  • :-) Context managers

pysndfile (savanah)

  • Author: ???
  • Web: http://savannah.nongnu.org/projects/pysndfile/
  • Wrap: Swig
  • :-( Ugly: Uses a similar metadata API than python-wave
  • :-( Unusable: unfinished implementation, empty read/write methods in wrapper!
  • :-( Unmaintained since 2006

libsndfile-python

  • Author: Hedi Soula (current maintainer) / Rob Melby (original)
  • Web: http://code.google.com/p/libsndfile-python/
  • Source: svn checkout http://libsndfile-python.googlecode.com/svn/trunk/ libsndfile-python
  • Wrap: CPython
  • :-) NumPy
  • :-( Not in-place read (generates a numpy array for each block)
  • :-( Some edges are not that pythonic
  • :-) Implements ‘command’ sndfile interface

libsndfile-ctypes

  • http://code.google.com/p/pyzic/wiki/LibSndFilectypes
  • Author: Timothe Faudot
  • Source: svn checkout http://pyzic.googlecode.com/svn/trunk/libsndfile-ctypes
  • Wrap: CTypes
  • :-) no CPython module compilation required
  • :-) NumPy
  • :-) Context managers!
  • :-) Property accessors for format metadata and strings
  • :-( Not inplace read (creates an array every block read)
  • :-) No property accessors for strings
  • :-( No generator idiom
  • :-( Windows only setup
  • :-( Text tags not as properties
  • :-( Long access to constants (scoping + prefixing)
  • :-( Single object mixing read and write API’s

python-wavefile

That’s the one. I used the implementation layer from libsndfile-ctypes. I really liked the idea of having a direct C mapping without having to compile a CPython module, and how nicely the numpy arrays were handled by CTypes. Then, over that implementation layer, I added a user level API implementing pythonic interface including those supported by other wrappers and the new ones.

  • https://github.com/vokimon/python-wavefile
  • Author: David Garcia Garzon (with code from all the above)
  • Source: git clone git@github.com:vokimon/python-wavefile.git
  • PyPi: wavefile
  • Wrap: CTypes
  • :-) Property accessors to format metadata and strings
  • :-) Dual interface: matlab like and OO block processing
  • :-) No CPython module compilation required
  • :-) NumPy
  • :-) Context managers!
  • :-) Pythonic block iteration
  • :-) Reuses data blocks avoiding garbage collector nigthmares
  • :-) Matlab load-all interface
  • :-) Unicode integration
  • :-) Works in Windows, Linux and Mac
  • :-) Python 2 and Python 3 support
  • :-( Command API not implemented
  • :-( No simultaneous Read/Write mode
  • :-( No writting seek
  • :-( No format enumeration (yet!)
  • :-( Does not accept single dimensional arrays (nuisance)

Other wrappers I found afterwards and I didn’t check

Yet to be reviewed:

June 27, 2014

python-wavefile

Many posts in this blog talk about WiKo, Hyde, pandoc… Solutions we can use to edit wiki like pages as plain text files, so that I can edit them with my preferred editor (vim), do site wide search and replace, track revisions using a version control system such subversion or git, and reuse the same content to generate multiple media: pdf documents, web pages…

After that Grial Quest I have some solutions that works for me. Indeed I am writting this entry using MarkDown which turns into a web page by means of Hyde. But, meanwhile, some of the projects I am involved in already use some kind of traditional wiki system, and most of them use Mediawiki.

Lucky for me, this week, I have come across a useful git extension. It git clones the content of a Mediawiki site as it were a git remote repository so that you can pull revisions into your hard drive, edit them and push them back into the wiki.

A quick tutorial

You can install it on debian/ubuntu with:

    sudo apt-get install git-mediawiki

Once you do that you can execute:

    git clone  mediawiki::http://clam-project.org/w clam-wiki

Since 3k4 revisions we have in CLAM are a quite long download and the wiki api and the server are quite slow, you can avoid the history with:

    git clone -c remote.origin.shallow=true mediawiki::http://clam-project.org/w clam-wiki

Before you push back you must set the wiki user.

    git config remote.origin.mwLogin MyUser

git-mediawiki stores git-commit to mediawiki-revision mappings in a parallel branch.

As migration path way

This extension is not just useful to edit MediaWiki as it where a git remote repository.

It is a nice path to move your wiki to a different system like Hyde, by turning the pages into markdown with pandoc.

for mwfile in *.mw
do
    pandoc -f mediawiki -o $(basename $mwfile .mw).md $mwfile
done

The wish list

The tool is quite useful by itself, but there are some edges that could be improved (bug reports linked):

  • Attachments are not included. So if you have, for instance, images, you won’t have them in local.
  • Cloning, pulling and pushing are slow. Those are the operations that interact with the remote MediaWiki. All the revision handling intelligence happens at users computer, so git-mediawiki has to download a lot of information from mediawiki previously to do any action. MediaWiki API entry points are not designed with those use cases in mind.
  • Supages do not generate directories. For instance, if you have a wiki page named Devel/ToDo, which is a subpage of Devel, instead of generating a folder Devel and a ToDo.mw file inside, it replaces the slash by %2F, Devel%2FToDo.mw, which looks quite unreadable when you list the files.

It is a pity, that git-mediawiki is written in Perl instead of Python. If it were written in Python I would be fixing those bugs right now :-)

June 20, 2014

Blog posts and Summer gigs

I have recently heard complaints that this blog is rather quiet lately. I agree. I have definitely been focused on publishing through other sources and have found little time to write interesting things here. On the one hand, I find twitter ideal for communicating quick and short ideas, thoughts, or pointers. You should definitely follow me there if you want to keep up to date. On the other hand,  I have published a couple of posts on the Netflix Techblog. A few months ago we published a post describing our three-tier system architecture for personalization and recommendations. More recently we described our implementation of distributed Neural Networks using GPUs and the AWS cloud.

The other thing I continue on doing very often is give talks of our work at different events and venues. In the last few months, for instance, I have given talks at LinkedIn, Facebook, and Stanford.

This week I gave a talk and attended the Workshop on Algorithms for Modern Massive Datasets (MMDS). This is a very interesting workshop organized by Michael Mahoney every two years. It brings together a diverse crowd of people, from theoretical physicist and statisticians to industry practicioners. All of them are united by their work on large scale data-driven algorithms. You can find the slides of my presentation here.

So, what is next? If you want to catch some of my future talks, I will be giving a couple of public ones in the next few months.

First, I will be lecturing in the Machine Learning Summer School (MLSS) at CMU in early July. I am really looking forward to joining such a great least of speakers and visiting Pittsburgh for the first time. I will be lecturing on Recommendation Systems and Machine Learning Algorithms for Collaborative Filtering.

Late August I will be giving a 3 hour long Tutorial at KDD in New York. The tutorial is entitled "The Recommender Problem Revisited" and I will be sharing stage with Bamshad Mobasher.

Finally, I was recently notified that a shorter version of the same tutorial has been accepted at Recsys, which this year is held in the Silicon Valley.

I look forward to meeting many of you in any of these events. Don't hesitate to ping me if you will be attending.

June 14, 2014

Command of the day: git-mediawiki

Many posts in this blog talk about WiKo, Hyde, pandoc… Solutions we can use to edit wiki like pages as plain text files, so that I can edit them with my preferred editor (vim), do site wide search and replace, track revisions using a version control system such subversion or git, and reuse the same content to generate multiple media: pdf documents, web pages…

After that Grial Quest I have some solutions that works for me. Indeed I am writting this entry using MarkDown which turns into a web page by means of Hyde. But, meanwhile, some of the projects I am involved in already use some kind of traditional wiki system, and most of them use Mediawiki.

Lucky for me, this week, I have come across a useful git extension. It git clones the content of a Mediawiki site as it were a git remote repository so that you can pull revisions into your hard drive, edit them and push them back into the wiki.

A quick tutorial

You can install it on debian/ubuntu with:

    sudo apt-get install git-mediawiki

Once you do that you can execute:

    git clone  mediawiki::http://clam-project.org/w clam-wiki

Since 3k4 revisions we have in CLAM are a quite long download and the wiki api and the server are quite slow, you can avoid the history with:

    git clone -c remote.origin.shallow=true mediawiki::http://clam-project.org/w clam-wiki

Before you push back you must set the wiki user.

    git config remote.origin.mwLogin MyUser

git-mediawiki stores git-commit to mediawiki-revision mappings in a parallel branch.

As migration path way

This extension is not just useful to edit MediaWiki as it where a git remote repository.

It is a nice path to move your wiki to a different system like Hyde, by turning the pages into markdown with pandoc.

for mwfile in *.mw
do
    pandoc -f mediawiki -o $(basename $mwfile .mw).md $mwfile
done

The wish list

The tool is quite useful by itself, but there are some edges that could be improved (bug reports linked):

  • Attachments are not included. So if you have, for instance, images, you won’t have them in local.
  • Cloning, pulling and pushing are slow. Those are the operations that interact with the remote MediaWiki. All the revision handling intelligence happens at users computer, so git-mediawiki has to download a lot of information from mediawiki previously to do any action. MediaWiki API entry points are not designed with those use cases in mind.
  • Supages do not generate directories. For instance, if you have a wiki page named Devel/ToDo, which is a subpage of Devel, instead of generating a folder Devel and a ToDo.mw file inside, it replaces the slash by %2F, Devel%2FToDo.mw, which looks quite unreadable when you list the files.

It is a pity, that git-mediawiki is written in Perl instead of Python. If it were written in Python I would be fixing those bugs right now :-)

September 07, 2013

AP-Gen new release (LADSPA and VST support)

AP-Gen speeds up and eases the plugin development through base source code generation, both for different standards and operating systems, thus achieving that the developer can focus on his goal, the digital audio processing. To achieve this, starts from normalized … Continue reading

July 30, 2013

VST cross compiling in Linux

1. Install mingw32 and wine: $ sudo apt-get install mingw32 $ sudo apt-get install wine 2. Download Steinberg VST SDK 2.4 and unzip it. 3. Create a PLUGIN_NAME.def file: LIBRARY     '' DESCRIPTION '' EXPORTS     main=VSTPluginMain 4. … Continue reading

July 23, 2013

Recommendations as Personalized Learning to Rank

As I have explained in other publications such as the Netflix Techblog, ranking is a very important part of a Recommender System. Although the Netflix Prize focused on rating prediction, ranking is in most cases a much better formulation for the recommendation problem. In this post I give some more motivation, and an introduction to the problem of personalized learning to rank, with pointers to some solutions. The post is motivated, among others, by a proposal I sent for a tutorial at this year's Recsys. Coincidentally, my former colleagues in Telefonica, who have been working in learning to rank for some time, proposed a very similar one. I encourage you to use this post as an introduction to their tutorial, which you should definitely attend. The goal of a ranking system is to find the best possible ordering of a set of items for a user, within a specific context, in real-time. We optimize ranking algorithms to give the highest scores to titles that a member is most likely to play and enjoy.

If you are looking for a ranking function that optimizes consumption, an obvious baseline is item popularity. The reason is clear: on average, a user is most likely to like what most others like. Think of the following situation: You walk into a room full of people you know nothing about, and you are asked to prepare a list of ten books each person likes. You will get $10 for each book you guess right. Of course, your best bet in this case would be to prepare identical lists with the "10 most liked books in recent times". Chances are the people in the room is a fair sample of the overall population, and you end up making some money. However, popularity is the opposite of personalization. As I explained in the previous example, it will produce the same ordering of items for every member. The goal becomes is to find a personalized ranking function that is better than item popularity, so we can better satisfy users with varying tastes. Our goal is to recommend the items that each user is most likely to enjoy. One way to approach this is to ask users to rate a few titles they have read in the past in order to build a rating prediction component. Then, we can use the user's predicted rating of each item as an adjunct to item popularity. Using predicted ratings on their own as a ranking function can lead to items that are too niche or unfamiliar, and can exclude items that the user would want to watch even though they may not rate them highly. To compensate for this, rather than using either popularity or predicted rating on their own, we would like to produce rankings that balance both of these aspects. At this point, we are ready to build a ranking prediction model using these two features.

Let us start with a very simple scoring approach by choosing our ranking function to be a linear combination of popularity and predicted rating. This gives an equation of the form score(u,v) = w1 p(v) + w2 r(u,v) + b, where u=user, v=video item, p=popularity and r=predicted rating. This equation defines a two-dimensional space as the one depicted in the following figure.


Once we have such a function, we can pass a set of videos through our function and sort them in descending order according to the score. First, though, we need to determine the weights w1 and w2 in our model (the bias b is constant and thus ends up not affecting the final ordering). We can formulate this as a machine learning problem: select positive and negative examples from your historical data and let a machine learning algorithm learn the weights that optimize our goal. This family of machine learning problems is known as "Learning to Rank" and is central to application scenarios such as search engines or ad targeting. A crucial difference in the case of ranked recommendations is the importance of personalization: we do not expect a global notion of relevance, but rather look for ways of optimizing a personalized model.

As you might guess, the previous two-dimensional model is a very basic baseline. Apart from popularity and rating prediction, you can think on adding all kinds of features related to the user, the item, or the user-item pair.Below you can see a graph showing the improvement we have seen at Netflix after adding many different features and optimizing the models.


The traditional pointwise approach to learning to rank described above treats ranking as a simple binary classification problem where the only input are positive and negative examples. Typical models used in this context include Logistic Regression, Support Vector Machines, Random Forests or Gradient Boosted Decision Trees.

There is a growing research effort in finding better approaches to ranking. The pairwise approach to ranking, for instance, optimizes a loss function defined on pairwise preferences from the user. The goal is to minimize the number of inversions in the resulting ranking. Once we have reformulated the problem this way, we can transform it back into the previous binary classification problem. Examples of such an approach are RankSVM [Chapelle and Keerthi, 2010, Efficient algorithms for ranking with SVMs], RankBoost [Freund et al., 2003, An efficient boosting algorithm for combining preferences], or RankNet [Burges et al., 2005, Learning to rank using gradient descent].

We can also try to directly optimize the ranking of the whole list by using a listwise approach. RankCosine [Xia et al., 2008. Listwise approach to learning to rank: theory and algorithm], for example, uses similarity between the ranking list and the ground truth as a loss function. ListNet [Cao et al., 2007. Learning to rank: From pairwise approach to listwise approach] uses KL-divergence as loss function by defining a probability distribution. RankALS [Takacs and Tikk. 2012. Alternating least squares for personalized ranking] is a recent approach that defines an objective function that directly includes the ranking optimization and then uses Alternating Least Squares (ALS) for optimizing.

Whatever ranking approach we use, we need to use rank-specific information retrieval metrics to measure the performance of the model. Some of those metrics include Mean Average Precision (MAP), Normalized Discounted Cumulative Gain (NDCG), Mean Reciprocal Rank (MRR), or Fraction of Concordant Pairs (FCP). What we would ideally like to do is to directly optimize those same metrics. However, it is hard to optimize machine-learned models directly on these measures since they are not differentiable and standard methods such as gradient descent or ALS cannot be directly applied. In order to optimize those metrics, some methods find a smoothed version of the objective function to run Gradient Descent. CLiMF optimizes MRR [Shi et al. 2012. CLiMF: learning to maximize reciprocal rank with collaborative less-is-more filtering], and TFMAP [Shi et al. 2012. TFMAP: optimizing MAP for top-n context-aware recommendation], optimizes MAP in a similar way. The same authors have very recently added a third variation in which they use a similar approach to optimize "graded relevance" domains such as ratings [Shi et. al, "Gapfm: Optimal Top-N Recommendations for Graded Relevance Domains"]. AdaRank [Xu and Li. 2007. AdaRank: a boosting algorithm for information retrieval] uses boosting to optimize NDCG. Another method to optimize NDCG is NDCG-Boost [Valizadegan et al. 2000. Learning to Rank by Optimizing NDCG Measure], which optimizes expectation of NDCG over all possible permutations. SVM-MAP [Xu et al. 2008. Directly optimizing evaluation measures in learning to rank] relaxes the MAP metric by adding it to the SVM constraints. It is even possible to directly optimize the non-diferentiable IR metrics by using techniques such as Genetic Programming, Simulated Annealing [Karimzadehgan et al. 2011. A stochastic learning-to-rank algorithm and its application to contextual advertising], or even Particle Swarming [Diaz-Aviles et al. 2012. Swarming to rank for recommender systems].

As I mentioned at the beginning of the post, the traditional formulation for the recommender problem was that of a rating prediction. However, learning to rank offers a much better formal framework in most contexts. There is a lot of interesting research happening in this area, but it is definitely worth for more researchers to focus their efforts on what is a very real and practical problem where one can have a great impact.

July 22, 2013

Reasons to not use locks: Priority inversion and general purpose vs realtime OS

“Let's say your GUI thread is holding a shared lock when the audio callback runs. In order for your audio callback to return the buffer on time it first needs to wait for your GUI thread to release the lock. … Continue reading

July 09, 2013

The Bla Face

My latest experiments involved animated SVG’s and webapps for mobile devices (FirefoxOS…). Also scratches HTML5 audio tag.

The result is this irritating application: The Bla Face. A talking head that stares around, blinks and speaks the ‘bla’ language.

Take a look at it and read more if you are interested on how it was done.

Animating Inkscape illustrations

I drew the SVG face as an example for a Inkscape course I was teaching as volunteer in a women association at my town. This was to show the students, that, once you have a vectorial drawing, it is quite easy to animate it like a puppet. I just moved the parts directly in Inkscape, for example, moving the nodes of the mouth, or moving the pupils.

Playing with that is quite funny, but the truth is that, although the SVG standard provides means to automate animations, and Internet is full of examples and documentation on how to do it, it must be done either by changing the XML (SMIL, CSS) or by programming with JavaScript, there is no SVG native FLOSS authoring tool available that I know. In fact, the state of the art would be something like that:

  • Synfig: Full interface to animate, imports and exports svg’s but animation is not native SVG and you pay the price.
  • Tupi: Promising interface concept, working with svg but not at internal level. It still needs work.
  • Sozi and JessyInk: Although they just animate the viewport, not the figures, and their authoring UI is quite pedestrian, I do like how they integrate the animation into the SVG output.
  • A blue print exists on how to make animations inside Inkscape. Some years ago and still there.

So if I want to animate the face I should code some SMIL/Javascript. Not something that I could teach my current students, but, at least, let’s use it as a mean to learn webapp development. Hands on.

Embedding svg into HTML5, different ways unified.

The web is full of reference on the different ways to insert an SVG inside HTML5. Just to learn how it works I tried most of them, I discarded the img method that blocks you the access to the DOM, and the embed method which is deprecated.

Inline SVG

The first method consists on inserting the SVG inline into the HTML5, it has the drawback that every time you edit the SVG from Inkscape you have to update the changes. No problem, there are many techniques to insert it dynamically. I used an idiom, that I already used for TestFarm for plots, and I like a lot. That is, a class of div emulating an img with a src attribute.

<!-- Method one: Inline SVG (dinamically inserted) -->
<div
    id='faceit'
    class='loadsvg'
    src='blaface.svg'
    ></div>

Calling the following function (requires JQuery), takes all such div tags and uses the src attributes to dynamically load the svg.

/// For every .loadsvg, loads SVG file specified by the 'src' attribute
function loadsvgs()
{
    $.each($(".loadsvg"), function() {
        xhr = new XMLHttpRequest();
        xhr.open("GET",$(this).attr('src'),false);
        // Following line is just to be on the safe side;
        // not needed if your server delivers SVG with correct MIME type
        xhr.overrideMimeType("image/svg+xml");
        xhr.send("");
        $(this).prepend(
            xhr.responseXML.documentElement);
    });
}

The document to create new elements in this case is the HTML root, so document and you can get the root SVG node by looking up “#faceit > svg”.

Object

The second method is the object tag.

<object
    id='faceit'
    data="blaface.svg"
    type="image/svg+xml"
    ></object>

It is cleaner, since it does not need any additional JavaScript to load. When using object, the root SVG element is not even inside the HTML DOM. You have to lookup for the #faceit element and accessing the contentDocument attribute which is a DOM document itself. Because they are different DOM documents, new SVG elements can not be created, as we did previously, from the HTML document.

This couple of functions will abstract this complexity from the rest of the code:

function svgRoot()
{
    var container = $(document).find("#faceit")[0];
    // For object and embed
    if (container.contentDocument)
        return container.contentDocument;
    return $(container).children();
}
function svgNew(elementType)
{
    svg = svgRoot();
    try { 
        return svg.createElementNS(svgns, elementType);
    }
    catch(e) {
        // When svg is inline, no svg document, use the html document
        return document.createElementNS(svgns, elementType);
    }
}

iframe

I don’t like that much the iframe solution, because instead of adapting automatically to the size of the image, you have to set it by hand, clipping the image if you set it wrong. But it works in older browsers and it is not deprecated like embed:

<iframe
    id='faceit'
    src="blaface.svg"
    type="image/svg+xml"
    height='350px'
    width='250px'
    style='border: none; text-align:center;'
    ></iframe>

You can also play with the SVG view port to get the SVG resized, without losing proportions.

In terms of JavaScript, the same code that works for object works for iframe.

css

The CSS part of the head so that whatever the method they look the same.

Animating the eye pupils

Before doing any animation, my advice: change the automatic ids of the SVG objects to be animated into something nice. You can use object properties dialog or the XML view in Inkscape.

Eye pupils can be moved to stare around randomly. Both pupils have been grouped so that moving such group, #eyepupils, is enough. The JavaScript code that moves it follows:

var previousGlance = '0,0'
function glance()
{
    var svg = svgRoot();
    var eyes = $(svg).find("#eyepupils");
    var eyesanimation = $(eyes).find("#eyesanimation")[0];

    if (eyesanimation === undefined)
    {
        eyesanimation = svgNew("animateMotion");
        $(eyesanimation).attr({
            'id': 'eyesanimation',
            'begin': 'indefinite', // Required to trigger it at will
            'dur': '0.3s',
            'fill': 'freeze',
            });
        $(eyes).append(eyesanimation);
    }
    var x = Math.random()*15-7;
    var y = Math.random()*10-5;
    var currentGlance = [x,y].join(',');
    $(eyesanimation).attr('path', "M "+previousGlance+" L "+currentGlance);
    previousGlance = currentGlance;
    eyesanimation.beginElement();

    nextGlance = Math.random()*1000+4000;
    window.setTimeout(glance, nextGlance);
}
glance();

So the strategy is introducing an animateMotion element into the group, or reusing the previous one, set the motion, trigger the annimation and reprogram the next glance.

Animating mouth and eyelids

To animate eyelids and mouth, instead of moving an object we have to move control nodes of a path. Control nodes are not first class citizens in SVG, they are encoded using a compact format as the string value of the d attribute of the path. I added the following function to convert structured JS data into such string:

function encodePath(path)
{
    return path.map(function(e) {
        if ($.isArray(e)) return e.join(",");
        return e;
        }).join(" ");
}

With this helper, simpler functions to get parametrized variations on a given object become more handy. For instance, to have a mouth path with parametrized opening factor:

function mouthPath(openness)
{
    return encodePath([
        "M",
        [173.28125, 249.5],
        "L",
        [71.5625, 250.8125],
        "C",
        [81.799543, 251.14273],
        [103.83158, 253.0+openness], // Incoming tangent
        [121.25, 253.0+openness], // Mid lower point
        "C",
        [138.66843, 253.0+openness], // Outgoing tangent
        [160.7326, 251.48139],
        [173.28125, 249.5],
        "z"
    ]);
}

And to apply it:

$(svgRoot()).find("#mouth").attr("d", mouthPath(20));

But if we want a soft animation we should insert an attribute animation. For example if we want to softly open and close the mouth like saying ‘bla’ the function wouldbe quite similar to the one for the eye pupils, but now we use an animate instead animateMotion and specify the attributeName instead mpath, and instead of providing the movement path, we provide a sequence of paths to morph along them separated by semicolons.

function bla()
{
    var svg = svgRoot();
    var mouth = $(svg).find("#mouth");
    var blaanimation = $(mouth).find("#blaanimation")[0];
    if (blaanimation === undefined)
    {
        blaanimation = svgNew("animate");
        $(blaanimation).attr({
            'attributeName': 'd',
            'id': 'blaanimation',
            'begin': 'indefinite',
            'dur': 0.3,
            });
        $(mouth).append(blaanimation);
    }
    syllable = [
        mouthPath(0),
        mouthPath(10),
        mouthPath(0),
        ].join(";");
    $(blaanimation)
        .off()
        .attr('values', syllable)
        ;
    blaanimation.beginElement();
    sayBla(); // Triggers the audio
    nextBla = Math.random()*2000+600;
    window.setTimeout(bla, nextBla);
}

The actual code is quite more complicated because it makes words of many syllables (bla’s) and tries to synchronize the lipsing with audio. First of all, using the repeatCount attribute to be a random number between 1 and 4.

    var syllables = Math.floor(Math.random()*4)+1;
    $(blaanimation)
        .off()
        .attr('values', syllable)
        .attr('repeatCount', syllables)
        ;

And then spacing them proportional to the word length:

    var wordseconds = (syllables+1)*0.3;
    var nextBla = Math.random()*2000+wordseconds*1000;
    window.setTimeout(bla, nextBla);

Regarding the lipsing, *sayBla is defined like:

function sayBla()
{
    blaaudio = $("#blaaudio")[0];
    blaaudio.pause();
    blaaudio.currentTime=0;
    blaaudio.play();
}

So the smart move is adding a handler to the repeat event of the animation. But this seems not to work on Chrome. Instead we draw on a timer again.

    for (var i=1; i<syllables; i++)
    window.setTimeout(sayBla, i*0.3*1000);

When animating the eyelids, more browser issues pop up. The eyelid on one eye is an inverted and displaced clone of the other. Firefox won’t apply to clones javascript triggered animations. If you set the values without animation, they work, if they are triggered by the begin attribute, they work, but if you trigger an animation with beginElement, it won’t work.

User interface and FirefoxOS integration

Flashy buttons and checkboxes, panel dialogs that get hidden, the debug log side panel… All that is CSSery i tried to make simple enough so that it can be pulled out. So just take a look at the CSS.

As I said, besides SVG animation I wanted to learn webapp development for FirefoxOS. My first glance at the environment as developer has been a mix of good and bad impressions. On one side, using Linux + Gecko as the engine for the whole system is quite smart. The simulator is clearly an alpha that eats many computer resources. Anyway let’s see how it evolves.

This project I tried to minimized the use of libraries, just using [requirejs] (a library dependency solver) and [Zepto] (a reduced JQuery) because the minimal Firefox example already provides them. But there are a wide ecology of them everybody uses Next thing to investigate is how to work with VoloJs on how to deploy projects, and that wide ecology of libraries available.

You have many foundation JQuery like frameworks such as Prototype, Underscore, Backbone… Then you have libraries for user interface components such as: Dojo, JQuery Mobile, React, YUI, Hammer, w2ui, m-project… Too many to know which is the one to use.

June 10, 2013

Tools for Data Processing @ Webscale


A couple of days ago, I attended the Analytics @Webscale workshop at Facebook. I found this workshop to be very interesting from a technical perspective. This conference was mostly organized by Facebook Engineering, but they invited LinkedIn, and Twitter to present, and the result was pretty balanced. I think the presentations, though biased to what the 3 "Social Giants" do, were a good summary of many of the problems webscale companies face when dealing with Big Data. It is interesting to see how similar problems can be solved in different ways. I recently described how we address at Netflix many of these issues in our Netflix Techblog. It is also interesting to see how much sharing and interaction there is nowadays in the infrastructure space, with companies releasing most of what they do as open source, and using - and even building upon - what their main business competitors have created.

These are my barely edited notes:

Twitter presented several components in their infrastructure. They use Thrift on HDFS to store their logs. They have now build Twitter Parquet, a columnar storage database that improves storage efficiency by allowing to read columns at a time.

@squarecog talking about Parquet

They also presented their DAL (Data Access Layer), built on top of HCatalog.




Of course, they also talked about Twitter Storm, which is their approach to distributed/nearline computation. Every time I hear about Storm it sounds better. Storm now supports different parts of their production algorithms. For example, the ranking and scoring of tweets for real-time search is based on a Storm topology.



Finally, they also presented a new tool called Summingbird. This is still not open sourced, but they are planning on doing so soon. Summingbird is a DSL on top of Scalding that allows to define workflows that integrate offline batch processing from Hadoop and near-line from Storm.




LinkedIn also talked about their approach to combining offline/near-line/real-time computation although I always get the sense that they are much more leaning towards the former. They talked about three main tools: Kafka, their publish subscribe system; Azkaban, a batch job scheduler we have talked about using in the past; and Espresso a timeline-consistent NOSQL database.


Facebook also presented their whole stack. Some known tools, some not so much. Facebook Scuba is a distributed in-memory stats store that allows them to read distributed logs and query them fast. Facebook Presto was a new tool presented as the solution to get fast queries out of Exabyte-scale data stores. The sentence "A good day for me is when I can run 6 Hive queries" supposedly attributed to a FB data scientist stuck in my mind ;-). Morse is a different distributed approach to fast in-memory data loading. And, Puma/ptail is a different approach to "tailing" logs, in this case into HBase. 



Another Facebook tool that was mentioned by all three companies is Giraph. (To be fair, Giraph was started at Yahoo, but Facebook hired the creator Avery Ching). Giraph is a graph-based distributed computation framework that works on top of Hadoop. Facebook claims they ran a Page Rank on a graph with a trillion edges on 200 machines in less than 6 minutes/iteration. Giraph is another alternative to Graphlab. Both LinkedIn and Twitter are using it. In the case of Twitter, it is interesting to hear that they now prefer it to their own in-house (although single-node) Cassovary. It will be interesting to see all these graph processing tolls side by side in this year's Graphlab workshop.

Another interesting thread I heard from different speakers as well as coffee-break discussions was the use of Mesos vs. Yarn or even Spark. It is clear that many of us are looking forward to the NextGen Mapreduce tools to reach some level of maturity.


May 07, 2013

TestFarm 2.0 released

We just released TestFarm 2.0. Now on GitHub.

You can install it by running:

sudo pip install testfarm

In Debian/Ubuntu, if you installed python-stdeb first, it will be installed as a deb package you can remove as other debian packages.

This release is a major rewrite on the server side. You can expect it more reliable, more scalable and easier to install. It is also easier to maintain.
Most changes are at the server and the client-server interface. Client API is mostly the same and migration of existing clients should be quite straight forward.

Regarding CLAM, it would be nice if we can get a bunch of CLAM testfarm clients. Now clients are easier to setup. In order to setup one, please, contact us.


January 24, 2013

10 "Little" lessons for life that I learned from running

(Sorry for allowing myself to depart from the usual geeky computer science algorithmic talk in this blog. I owed it to myself and my biggest hobby to write a post like this. I hope you bear with me.)

Around 3 years ago, I smoked, I was overweight, and only exercised occasionally. Being a fan of radical turns in my life, I decided one day to go on a week-long liquid diet, I stopped smoking, and I took up running, with the only goal in my mind to some time run the half marathon in my home town. Little did I know that the decision to run would change my life in so many ways. This last year 2012, I have run 3 marathons, 4 half marathons, and a 199 mile relay with a team of 12. But, beyond that, I am convinced that I owe part of my personal and professional success these past years to the fact that I am a runner.

This post is my little homage to running and to the many lessons I have found in my journey.



When I started running I had lots of problems. The main one was due to an old knee injury that hit back on me. I had an ACL surgery when I was 16, and ever since my right knee has not been the same. When my knee started hurting this time, I visited several doctors, some specialized in sports. All of them recommended I should give up running. Some told me straight out that I would never be able to run a marathon. It took me lots of visits to the chiropractor, and lots of quads exercises over months to get back to running. But, I overcame these initial hurdles, and went into running not one but several marathons.

Lesson 1. Beginnings are hard: Starting anything new in life will be hard. You will need to invest lots of energies, and at times you will want to give up. The more important and significant the change is, the more it will take from you.


After I finished my first marathon in Santa Cruz, and when I thought all my knee problems were long gone, my knee started hurting again. This was nothing like what I had experienced when starting. Still, it could have been enough to stop me from trying again. But, it didn't. I focused on recovering. Soon I was back on the road.

Lesson 2. There will be ups and downs: Once you have overcome the initial difficulties in starting something new, you will be tempted to think that everything else should be easy . But, life, like most running courses, will have hills with ups and downs.


It is hard to wake up at 6 am for the morning run. It is easy to stay in bed when your legs are still sore from yesterday's training. It is tough to go out running when it is raining or freezing outside. It is even harder to decide not to stop when you hit the wall on mile 20 of a marathon. All these day to day small decisions end up adding up and making the difference between you improving and accomplishing your running goals.

Lesson 3. The importance of those small decisions: The small day to day decisions play a huge role in building your character. They will end up determining your long term success and the direction your life takes.


When you are not at your best, it is even harder to face all these small decisions I mentioned. If you are down for some time because of an injury, it is tough to start again on your own. Having a group of friends that share your passion for running is extremely important. I am fortunate to have a large group of friends that push me to become better, and help me get up when I fall.

Lesson 4. You are not alone - the power of social influence... and friends: Whatever new adventure you start in life, it is important to have people around you that understand and support it. People that share your passion can make a difference when you need it.


As much as I have appreciated having that extra support from friends and other fellow runners, there are many times I have felt the pressure of having to make a decision on my own. Many of those small decisions such as getting up off bed on a rainy day, for example. Nobody is going to make them for you. I have also felt alone in many of my training runs. And, of course, in mile 20 of a marathon, when everyone is giving their best but you can only see strangers around you. In all those moments it is important to be strong and be ready to carry on, on your own.

Lesson 5. But, you will be alone: No matter how many friends support you, you will have to face important decisions on your own, and carry your own weight.


It is well known that "repetition leads to mastery". This is even more so for activities that require developing physical strength and resistance. There is no other secret to becoming a better runner than to run, and run often. Putting on more miles is the goal. Everything else will come.

Lesson 6. Repeat, repeat, repeat, repeat: Repetition is the key to mastering most things in life. If you want to become good at doing something, ask yourself how you can invest thousands of hours in it (read about the 10k hour rule in Malcolm Gladwell's Outliers).



As much as repetition is needed to improve, it is hard to do so without a goal in mind. During my time running I have learned the power of having concrete goals. Setting up goals that are achievable in the long run, but not too easy to get to. As I have progressed, I have learned to be more demanding. My current goals is to do a 3:30 marathon, and a 1:30 half. The first one is achievable, the second one will need much more work. But these goals will keep me going and focused for some time.

Lesson 6. Set your goals: Setting ambitious but achievable goals in life will help you push harder and will keep you focused and looking forward.


When I look back at the way I started running, I realize how many things I did wrong. I have learned so much since then. I have read books, watched movies and online videos, talked to people that know much more than I do. I have also learned from looking at the data that I generate from each of my trainings. I have also learned to listen to and understand my body I am fortunate enough that I love learning, and I have enjoyed every bit of this learning experience.

Lesson 7. Data and knowledge: Use all the information around you to improve your life. Data about you can give you insights into how to become better. And any knowledge you gain from external sources can make a difference when taking a decision.


One of the reasons why beginnings are hard (Lesson 1) is that people that start running tend to overdue it by, for example, increasing distance and pace at the same time. This typically leads to injury, and frustration. One of the most important things to learn when starting to run is to understand your own limitations. Even when you do, you will be tempted to push to hard by continuing to run when your leg hurts, or by doing one too many races in a short period. I have done all of the above. But it is important to remember that everyone has their limits and forcing beyond them can result in long term problems.

Lesson 8. Everyone has their limits: Pushing yourself hard is good. However, there is such a thing as pushing *too* hard. You need to understand where your limits are to push them further, but only little by little.



No matter how hard it can get at some points, no matter how long it can take you, there is no doubt that you can do whatever you set your mind to. I don't have any special conditions for running, and I have never had. I don't think I will ever be a "great" runner. However, now I look back and laugh when I remember my unreachable goal a little over 3 years ago was "only" to run a half marathon. If someone like me, with little or no pre-existing conditions, family and work obligations, and very little time, can do it, so can you.

Lesson 9. But, yes you can: No matter how low you fall or how far your goal is,  you can do it. Only think about the many people just like you who have done it before (e.g. Estimates are that around 0.1 to 0.5% of US Population has completed a Marathon). Why should you be any less?



As a conclusion, let me stress that the fact anyone can run does not mean that running is easy, and it requires no effort. It is precisely the fact that it is hard and requires effort for a long period of time what makes it worthwhile. Like most good things in life.

Lesson 10. All good things come hard: Think about it, all worthy things in life require effort and dedication. Being healthy, fit, happy, having a career, or a family, they all require your energy and long time investment. Just go with it, and enjoy every bit of the journey.

December 04, 2011

June 17, 2011

June 15, 2011

Como se hizo…

http://www.planetatortuga.com/noticias.item.3875/los-violentos-en-#parlamentcamp-son-policias-infiltrados.-ver-fotos-rt-plz.html

De ahí:
(aquí había un video en alta calidad, que tenía cientos de miles de visitas y cientos de comentarios… y fue borrado: http://www.youtube.com/embed/YcmvzRvsf8g…. va otra copia en menor calidad)

Actualización: otro link hablando de lo mismo: http://jmgoig.wordpress.com/2011/06/15/estrategias-del-poder-para-desprestigiar-movimientos-sociales-el-caso-parlamentcamp/

April 04, 2011

Ubuntu PPA for CLAM

For the convenience of Ubuntu users, we deployed a personal package archive (PPA) in launchpad.

https://launchpad.net/~dgarcia-ubuntu/+archive/ppa

Instructions available at the same page. It currently contains libraries, extension plugins, NetworkEditor and Chordata packages for maverick, and platforms i386 and amd64.


September 20, 2010

High abstraction level audio plugins specification (and code generation)

If you ever wrote at least 2 audio plugins in your life, for sure you have noticed you had to write a lot of duplicated code. In other words, most of the times, writing a plugin there is very little … Continue reading

March 08, 2010

CLAM Chordata 1.0

screenshot

The CLAM project is pleased to announce the first stable release of Chordata, which is released in parallel to the 1.4.0 release of the CLAM framework.

Chordata is a simple but powerful application that analyses the chords of any music file in your computer. You can use it to travel back and forward the song while watching insightful visualizations of the tonal features of the song. Key bindings and mouse interactions for song navigation are designed thinking in a musician with an instrument at hands.

Chordata in live: http://www.youtube.com/watch?v=xVmkIznjUPE
The tutorial: http://clam-project.org/wiki/Chordata_tutorial
Downloat it at http://clam-project.org

This application was developed by Pawel Bartkiewicz as his GSoC 2008 project, by using existing CLAM technologies under a more suited interface which is now Chordata. Please, enjoy it.