Acoustic event detection and classification

  1. Temko, Andriy
Dirigida por:
  1. Climent Nadeu Camprubí Director/a

Universidad de defensa: Universitat Politècnica de Catalunya (UPC)

Fecha de defensa: 23 de enero de 2008

Tribunal:
  1. José Bernardo Mariño Acebal Presidente/a
  2. Francisco Javier Hernando Pericas Secretario/a
  3. José Carlos Segura Luna Vocal
  4. Dan Ellis Vocal
  5. Xavier Serra Vocal

Tipo: Tesis

Teseo: 141192 DIALNET

Resumen

The human activity that takes place in meeting-rooms or class-rooms is reflected in a rich variety of acoustic events, either produced by the human body or by objects handled by humans, so the determination of both the identity of sounds and their position in time may help to detect and describe that human activity, Additionally, detection of sounds other than speech may be useful to enhance the robustness of speech technologies like automatic speech recognition. Automatic detection and classification of acoustic events is the objective of this thesis work. It aims at processing the acoustic signals collected by distant microphones in meeting-room or class-room environments to convert them into symbolic descriptions corresponding to a listener's perception of the different sound events that are present in the signals and their sources. First of all, the task of acoustic event classification is faced using Support Vector Machine (SVM) classifiers, which are motivated by the scarcity of training data. A confusion-matrix-based variable-feature-set clustering scheme is developed for the multiclass recognition problem, and tested on the gathered database. With it, a higher classification rate than the GMM-based technique is obtained, arriving to a large relative average error reduction with respect to the best result from the conventional binary tree scheme. Moreover, several ways to extend SVMs to sequence processing are compared, in an attempt to avoid the drawback of SVMs when dealing with audio data, i.e. their restriction to work with fixed-length vectors, observing that the dynamic time warping kernels work well for sounds that show a temporal structure. Furthermore, concepts and tools from the fuzzy theory are used to investigate, first, the importance of and degree of interaction among features, and second, ways to fuse the outputs of several classification systems. The developed AEC systems are tested also by participating in several international evaluations fr