Artishow Projects

Learning Music Representation invariant to audio deformations

Ce projet a été attribué.

Encadrants

Aurian Quelennec, Antonin Gagnere, Slim Essid
Emails: prenom.nom@ŧelecom-paris.fr
Bureaux: 5C63

Nombre d'étudiant par instance du projet:

Minimum: 2
Maximum: 4

Nombre d'instances du projet :

Sigles des UE couvertes et/ou Mots-clés :

Deep Learning, Music, Invariants, Python, PyTorch

Image

Description du projet :

It is increasingly common to extract features from Music using deep learning models rather than with traditional algorithms such as Short Time STFT or MFCC. Those new deep features extracted from music can be used to access to multiple pieces of information about the original content, such as tempo, pitch, musical genre or the classes of instruments.

However, it is still difficult to have models that are robust to small variations in the input music, such as noise, saturation, equalization (EQ), etc.. Therefore, we seek to use simple yet effective methods to create models that are invariant to those perturbations.

Objectifs du projet :

The work would consist in re-implementing a novel, yet simple, method called “Complex Auto Encoder” [1] and to test it and evaluate it in simple cases with various perturbations.

In parallel it would be useful to re-implement a lightweight model named Yamnet [2] in PyTorch. And to train it and evaluate it on a simple task of audio classification. The challenge lies in building a good optimization pipeline as the model was released without any information about it.

Finally, the goal would be to assemble both the CAE and Yamnet to learn an invariant representation and to evaluate it and compare it to the basic Yamnet model.

Logiciels requis:

python, pytorch

Références bibliographiques:

[1] https://arxiv.org/abs/1907.05982

[2] https://github.com/tensorflow/models/tree/master/research/audioset/yamnet