Una aproximación genética a la transcripción automática de música
- Reis, Gustavo Miguel Jorge dos
- Francisco Fernández de Vega Zuzendaria
- Aníbal João de Sousa Ferreira Zuzendaria
Defentsa unibertsitatea: Universidad de Extremadura
Fecha de defensa: 2014(e)ko uztaila-(a)k 23
- José Miguel Alonso Presidentea
- Francisco Luna Valero Idazkaria
- Juan Julián Merelo Guervós Kidea
- Fernando Amilcar Bandeira Cardoso Kidea
- Carlos Cotta Porras Kidea
Mota: Tesia
Laburpena
Music transcription is the process of extracting human readable notation, like a music score, from an acoustical signal. This way, automatic transcription of music is the process in which a computer program extracts notation from an audio signal. Automatic transcription of music is a research area that, besides computer science, encompasses several disciplines including digital signal processing, machine learning, psychoacoustics and pitch perception, music theory and also music cognition. Automatic transcription of music is an extremely difficult task, which has already been addressed in several doctoral theses. Despite these number of attempts to solve the problem, a practical and applicable, general-purpose transcription system still does not exist the present time. Furthermore, current available systems fall behind skilled human musicians in both accuracy and flexibility. We depict the problem of automatic music transcription as a combinatorial optimization problem where the goal is to find the combination of musical notes that best represents the observed signal. We extend the sparse coding with genetic algorithms, which are a very good tool on search problems. By using sparse approximation, along with evolutionary algorithms, our method is able to cope with harmonic sounds with varying harmonic components. This dissertation presents the several steps of our research on addressing Genetic Algorithms to the problem of Automatic Transcription of Music. We have employed several genetic algorithm approaches to address the problem of multi-pitch estimation, first starting with simple synthesized models of instruments, and then, moving to real audio recordings, performing several experiments. These experiments included different domains and tools (log spectra, linear spectra, filter banks, real cepstrum, hybrid cepstral and spectral analysis, auto correlation and summary auto correlation functions, etc.) for audio similarity measurement and several error measurements (from Hamming and Itakura-Saito distances to area intersection, correlation and other variations). We faced the problem of Harmonic Overfitting, which is related to timbre differences, and proposed a spectral envelope modelling technique to address this issue. Furthermore, we have also employed this approach on musical signals with different audio instruments to show the feasibility of the approach on multi- timbral music. We present a new method for multiple fundamental frequency (F0) estimation on piano recordings. We propose a framework based on a genetic algorithm in order to analyze the overlapping overtones and search for the most likely F0 combination. The search process is aided by adaptive spectral envelope modelling and dynamic noise level estimation: while the noise is dynamically estimated, the spectral envelope of previously recorded piano samples (internal database) is adapted to best match the piano played on the input signals and aid the search process for the most likely combination of F0s. For comparison, several state-of-the-art algorithms were tested on various musical pieces played by different pianos and then compared using three different metrics. The proposed algorithm ranked second place on both Onset Only and Onset-Offset metrics and ranked first place on Hybrid Decay/Sustain Score metric, which has better correlation with the human hear- ing perception. One final comparison is made with a previous genetic algorithm approach to show how the proposed system brings significant improvements on both quality of the results and computing time. This dissertation also presents our contributions to the field of Evolutionary Computation, namely the Gene Fragment Competition approach, which can be used on most decomposable problems in signal or image processing. An analysis of how decomposable approaches are suitable to decomposable problems is presented. We took advantage of the modular and hierarchical structure of the Royal Road functions to use them as test functions and show how single-population decomposable approaches, such as the Gene Fragment Competition, can overcome the spurious correlation or hitchhiking. We show empirically that both Parisian approach and Gene Fragment Competition show, in general, better behaviour than not only the standard genetic algorithm and the multiple- population co- evolutionary approach but also the random mutation hill-climber. Hitch- hiking is known to be, in general, one of the major bottlenecks of the genetic algorithms performance. Therefore, avoiding hitchhiking has the potential to boost the performance of the algorithm. Applying problem decomposition in building blocks is an advantageous optimization technique, since this avoids the hitchhiking phenomena. Despite the fact that the random mutation hill-climber algorithm has proved in the past to be the ideal for the Royal Road functions, we have shown that single population decomposable approaches can explore more efficiently the search space on Royal Road functions. We show empirically that both Parisian approach and Gene Fragment Competition show, in general, better behaviour than not only the standard genetic algorithm and the multiple-population co- evolutionary approach but also the random mutation hill- climber.