Attending ICMPC15/ESCOM10 in Graz!

Last week I attended the 15th International Conference on Music Perception and Cognition (ICMPC) / 10th triennial conference of the European Society for the Cognitive Science of Music (ESCOM) in Graz (Austria). This year the conference is distributed across hubs in different continents, since “a semi-virtual multiple-location conference makes it easier for people from less financially privileged countries to participate actively and equally by locating hubs in their countries or regions, reducing their registration, travel and accommodation costs“. You can read more about the impact of the hub-based conference here.

The best thing about this conference is that it is very multidisciplinary: people come from fields such as music psychology, music theory, musicology, music information retrieval, or neuroscience. There were short talks (around 10 minutes), long talks (around 20 minutes), workshops (around 1 hour), and poster sessions (with their associated poster speed presentations). In this blog post, I want to mention ( only presenter & title!) some of the talks I attended (though there were many many more!), for you to have an idea of which topics are covered in the conference.


Andreu Vall. The importance of song context and song order in automated music playlist generation.

Olivier Lartillot. Computational model of pitch detection, perceptive foundations, and application to Norwegian fiddle music.

Emotion and Computing

Elke B. Lange. Challenges and opportunities of prediction musical emotions with perceptual and automatized features.

Will M. Randall. Emotional outcomes of personal music listening: experience sampling with the MuPsych app.

Anna Aljanaki. Extracting majorness as a perceptual property of music.

Feedback and regulation

Margarida Baltazar. Is it me or the music? An experimental study on the contribution of regulatory strategies and music to stress reduction.

Jacob Berglin. The effect of feedback on singing accuracy.

Emotion (hot topic!)

Niels Chr. Hansen. Orchestrated sadness: When instrumentation conveys emotion.

Thomas Magnus Lennie. Universality in the language of emotions revisited: Towards a revised methodology for interpreting acoustic cues in musical affect.

Katie Rose Sanfilippo. Perceptions in pregnancy: An investigation of women’s perceptions of emotional vocalizations and musical excerpts during the perinatal period.

Diana Kayser. Are musical aesthetic emotions embodied?.


Anna Czepiel. Importance of felt mood and emotion for expressive movement characteristics in pianists.

Hye-yoon Chung. Musical expressivity: An approach from simulation theory of mindreading.

Martin Herzog. How do musical means of expression affect the perception of musical meaning?.

Caitlyn Marie Trevor. The expressive role of string register: An ethnological examination of fingering choices in classical string instrument playing.


Thijs Vroegh. Absorption and self-monitoring as experiential predictors for the aesthetic appreciation of music: A correlational study.

Manuel Anglada-Tort. Consider the de source: The effects of source bias on professional assessment of music quality and worth.

Iris Mencke. Aesthetic experience and musical pleasure in contemporary classical music – an interview study.


Peter M.C. Harrison. Dissociating sensory and cognitive theories of harmony perception through computational modeling.

Arvid Ong. The perceptual similarity of tone clusters: An experimental approach to the listening of avant-garde music.

Choral singing

Sara D’Amario. Synchronization in singing ensembles: Do performed asynchronies bear a relationship to the synchrony that listeners with a variety of levels of musical experience can perceive?.

Manuel Alejandro Ordás. Expressive timing in choir: An interactive study between choristers and conductor.

and many more!

On Friday 27th of July I presented our poster:

Cuesta H., Gómez, E., Martorell, A., Loáiciga, F. Analysis of Intonation in Unison Choir Singing.

Here’s the header of the poster, and the paper and details are here:


In the poster session, I talked to participants with very diverse backgrounds and expertise, and all of them had very interesting and insightful comments about our research, that we will use for our further studies.


Technovation Challenge #girls4achange

On May 19th I participated as a judge in the Catalan regional final of the Technovation Challenge, which was held at Universitat Pompeu Fabra (UPF), in Barcelona.

For those who don’t know about #techchal, it started back in 2010 and it challenges teams of 3-4 girls from 10 to 18 years old from all over the world to identify a problem in their society and design and develop a mobile app to solve it, with the guidance of a mentor. The older participants (13 to 18) also have to develop a business plan to start a company and launch their application.

In Catalunya, the association Associació Espiral, especially Esther Subias, coordinated the teams and organized the final event, this time in collaboration with UPF. In this final event, judges were organized in teams of 4 or 5, and each team listened and evaluated 4 or 5 projects based on their submissions (app prototype, business plan, ideation, technical aspects…etc.) and on their 3-minute pitch on-site.

I could write sooo many things about the projects I had the chance to evaluate and the participants I had the chance to meet and talk to, but since it would take a lot of time, I’ll just say that it was an absolutely amazing experience I hope I can repeat, and huge congrats to all the participants, mentors, organizers and jury members for all the great work. The future is bright (and female).


Here you can find the official pictures of the event.

ACM Europe womENcourage 2017

Last week I attended the ACM Europe Celebration of Women in Computing, womENcourage @ UPC (Barcelona). It is a scientific event for women (and men!) working in any computer science (or related) discipline, and it brings together mainly female students (undergraduate and graduate), researchers and people working in industry to share their experiences, present their work and discuss about many different topics.

I received a scholarship to present my work in a poster session, and it was a very interesting experience because every person in the audience had a different technical background, so it was especially challenging to make the presentation of the poster comprehensible at different levels.

I also attended a workshop (within the event) about high performance computing (HPC), taught by professionals of the EPCC from the University of Edinburgh. It was an introduction to HPC using ARCHER, the UK National Supercomputer.

The official pictures of the event can be found in their Facebook page. A couple of pictures I took:

I’d like to thank the organizing committee for the wonderful event and the participation scholarships committee for the award.

Audio melody extraction

During these last three months I’ve been working on a project about audio melody extraction for the Music Information Retrieval course, together with Joe Munday.

We’ve been writing a blog with all the information of the project, together with other classmates working on other MIR tasks. In this link you will find everything we posted, but I wanted to share our final post, with a summary of the whole task, here. You’ll see we’ve managed to improve the results for a specific kind of music – symphonic – although we’d like to to more tests with bigger datasets. Additionally, you can find the final report of the projec here.



This project deals with the task of audio melody extraction, which consists of extracting the main melodic line – the melody people would sing or hum if asked to sing along with a recording – of a piece of music. This task, which is performed frame-wise, is divided into two parts:

  • Voicing detection – saying whether a frame of audio contains melody (voiced) or not (unvoiced)
  • Pitch estimation – extracting the fundamental frequency at each frame

According to the MIREX guidelines, these are the required formats:
Input audio sampled at 44.1kHz, 16 bits, wav encoding, mono.
Output annotations text file with one pitch estimate each 10 ms – either 0 or negative frequency for unvoiced frames.

The evaluation metrics are

  • voicing recall / false alarm / voicing d-prime (voicing recall – voicing false alarm)
  • raw pitch accuracy (RPA) (¼ semitone deviation allowed) / raw chroma accuracy (RCA) (octave errors count as correct predictions)
  • overall accuracy (combination of previous metrics)


We used two datasets:

Orchset – 64 symphonic music excerpts with manual melody annotations
MedleyDB – 108 multi-genre pieces of music with semi-manual (pYIN + manual correction) melody annotations

State of the art

For this project, we worked with two state-of-the-art methods for melody extraction. In general, melody extraction algorithms are divided into two categories: salience-based and source separation-based. We used one example of each class.

MELODIA – salience-based

This algorithm by Salamon & Gómez takes an audio (polyphonic) and computes its pitch contours – a curve that follows the pitch of a melody over time – , which are then filtered to obtain the melody, in a novel way: using heuristics based on auditory streaming cues – time and pitch continuity & exclusive allocation; they also use salience distribution, not only the absolute values. The final pitch contour is chosen by trying to get rid of pitch outliers and octave errors (i.e. parallel contours).

Source/filter model – source separation-based

This algorithm by Durrieu assumes that the melody is vocal, and takes advantage of a source/filter model to extract the pitches. In this method, he assumes the audio signal to be a mixture of the leading melody and an accompaniment (or musical background). The parameters of a source/filter model – pitched source + filters resembling the vocal tract – are estimated using the EM algorithm, and then the accompaniment is modeled as a weighted sum of N sources, usually Gaussian-based. This way, they are able to extract the fundamental frequency (i.e. the pitch of the source from the source/filter model) at each frame.


We’ll make a short summary of the results we obtained with both methods and both datasets. Two main observations to start:

  • MedleyDB obtains a much higher accuracy – in Orchset, the melodies are not vocal, so that’s why most melody extraction algorithms fail.
  • Source/filter model outperforms MELODIA in both datasets.

Orchset → average overall accuracy 18% / average RPA 15% / average RCA 40%
MedleyDB → average overall accuracy 48% / average RPA 48% / average RCA 59%
Source/filter model
Orchset → average overall accuracy 52% / average RPA 55% / average RCA 78%
MedleyDB → average overall accuracy 41% / average RPA 62% / average RCA 74%

First Discussion

Orchset. Clear difference between the two algorithms – MELODIA obtains much lower results (designed with vocal melodies in mind); source/filter model works better because this model can also be applied to some instruments – pitched source + filter modeling the timbre.
– Best results: MELODIA obtains 85% RPA & RCA in a unison passage; Source/filter obtains 95% in the same passage.

MedleyDB. Many pieces in this dataset contain vocals, so the algorithms perform significantly better. Still, the source/filter model outperforms MELODIA in both RPA and RCA, although the change is not as significant as in Orchset. The lowest results are obtained in the pieces that we’d label as symphonic/classical music.
– Best results: MELODIA obtains 95% RPA & RCA in a gospel passage (very clear leading voice) / Source/filter model obtains 96% RPA in the same passage and 98% RCA.

Robustness tests

Using the Audio Degradation Toolbox for Matlab, we applied three different degradations. The experiments were performed only in Orchset, due to lack of time. The most relevant degradation was smartphone playback – applying an impulse response from a smartphone loudspeaker + adding some pink noise – because instead of lowering the accuracy, we improved the results:


Average RPA increased from 18% to 39%; average RCA from 39% to 64%.
The most significant change was an excerpt that got 0% of RPA and after the degradation, we obtained 90%.

For the source/filter model we didn’t obtain significant changes – around 1-2% of deviation.


Ideas after experiments:

  1. Reducing octave errors in MELODIA, focusing on symphonic music.
  2. Taking an instrument specific approach – parameter tuning based on the type of instrument that carries the main melody.
  3. Applying some pre-processing to the audio – motivated by the robustness tests results.

Implemented improvements:

Pre-processing step – audio degradation-like filtering

Given the accuracy increase we experienced with the smartphone playback degradation, we decided to apply a high-pass filter with different cutoff frequencies to see if it helped to improve the results in Orchset using MELODIA – we decided to try to improve MELODIA’s performance in symphonic music.
We tried cutoff frequencies 300, 500 and 700 Hz, and the best results – averaged – were obtained with 700 Hz. Overall accuracy was 18% and after filtering it was 40%. RPA increased from 15% to 40% and RCA was 40% and afterwards it became 55%. The voicing recall and false alarm were maintained.
Instrument specific parameter tuning

We divided the pieces according to the instrument that carries the melody: stringswoodwinds, and brass.

MELODIA’s implementation in Essentia has several parameters that can be tuned. We chose the following ones:

  1. Harmonic weight – the allowed ratio between consequent harmonics
  2. Magnitude Compression – compression of magnitudes for the salience function
  3. Peak Distribution Threshold – a threshold to the deviance from the mean salience allowed for contours

After some experiments with different values, we could get the overall accuracy to increase from 18% to 44%, even more than with filtering. Although this has seen voicing detection deteriorate, we would view this as a worthwhile cost to improving the overall accuracy of the approach.

Combining filtering and parameter tuning

We experimented with a hybrid approach: applying the HPF and then using different parameters according to the instrument.
After applying both a 700Hz high pass filter and the instrument specific tuned parameters, we found that overall accuracy in fact decreased, suggesting that combining these approaches without any modifications is detrimental to performance. The results showed that for woodwind and string instruments, a HPF does not improve performance when the parameters have been tuned as has been described above. However, for the brass section, applying a HPF with cutoff frequency 500 Hz improved the overall accuracy. The final results, with the tuned parameters (instrument-based) and the filtering (only for the brass), are:
Original evaluation // Final evaluation

  • RPA: 16% // 47%
  • RCA: 40% // 65%
  • OA: 18% // 45%
  • VR: 67% // 94%
  • VFA: 50% // 86%

sound ePixystem: a tangible interface for soundscape composition

New project!

I started the Master in Sound and Music Computing in September, and these last months have been crazy: new people, new courses and a lot of projects.

In this post I want to present the final project we did for the course on real-time interaction, together with Meghana Sudhindra and Natalia Delgado. We called it sound ePixystem: a tangible interface for soundscape composition, although maybe the name is not really self-explanatory.

We decided to create a tangible interface rather than a software-based system, because we wanted the user-system interaction to be as natural as possible. We used Pixy CMUcam5 (a Kickstarter project developed at Carnegie Mellon University) to capture objects of different colors together with Arduino for communication purposes, and PureData software to generate, trigger, and modify all the sounds that can be found in the system.

We got some inspiration from Reactable and other similar systems: the user has several colored blocks that can be placed in the FOV of the Pixy camera and can be moved along two axis (x,y). Each block represents a different sound, and depending on its position, different transformations or effects are applied to the sound.  In the current prototype, we have up to 7 colors – 7 different sounds.

Regarding the type of sounds, we decided to use environmental sounds, in order to create a sound ecosystem or a soundscape: bird sounds, windraininsects or tribal drums are a few examples of the sounds that are available in our prototype.

For more and technical details about the project you can check the paper we wrote here. Additionally, we showcased the system in class, and here’s the video of the presentation and demo (demo starts at ~0:51). Not much detail can be seen/heard but it’s useful to see the setup and sounds. We have another video of the demo, recorded from another perspective.


We need to thank Ángel and Cárthach for their supervision, help and suggestions, which were really useful at some points of the development.


El diumenge 29 de novembre vaig participar a l’esdeveniment #GirlsHack organitzat pel col·lectiu Girls in Lab, que pretén inspirar a la propera generació de noies en tecnologia. Els estereotips de gènere han fet que, fins al moment, el percentatge de noies que estudien carreres tecnològiques sigui significativament inferior al percentatge de nois, i és per això que calen iniciatives com aquesta per fer veure a les nenes i a les seves famílies que elles són igual de vàlides per triar aquests camins.

L’esdeveniment, en format hackathon, va consistir en un matí a la Universitat Pompeu Fabra (coorganització) on les nenes i noies (amb les seves famílies) assistents van poder fer tallers i veure demos de tecnologies ja desenvolupades. Alguns dels tallers eren programació amb Scratch i Scratch Junior, instruments musicals amb Makey-Makey, programació d’aplicacions mòbils amb App Inventor i impressió 3D. A més, hi havia demostracions de realitat virtual, impressió 3D, robòtica i un espai maker on les assistents podien fer algunes manualitats com un braç robòtic casolà o un cotxe propulsat amb un globus.

Conjuntament amb dos companys, vam organitzar el taller de programació d’aplicacions mòbils amb App Inventor, que estava orientat a nenes a partir d’uns 9 anys. App Inventor és una eina desenvolupada al MIT que serveix per iniciar-se a les aplicacions mòbils a través de la programació visual per blocs.

Vam preparar un taller a mode d’introducció, donant recursos i instruccions per programar algunes aplicacions senzilles per tal que les nenes veiessin una mica el funcionament i així puguin ampliar els seus coneixements en un futur.

Als perfils de Twitter i Instagram de Girls in Lab trobareu algunes imatges de la jornada i els tallers. Va ser una molt bona experiència que esperem repetir aviat!