Résumé:
This thesis is dedicated to the study of Acoustic Scene Classification systems. The primary
goal is to provide researchers and practitioners with guidelines that describe key steps for
developing efficient scene classification systems. To this end, we have carried out two
experimental case studies using a large set of sound scenes DCASE 2016 dataset. We have
supported our analysis using numerous statistical tests. In the first one, we have conducted a
comparative study among various systems, which were trained using 3 learning paradigms (FeedForward
Neural Network (FNN), Support Vector Machine (SVM) and K-nearest neighbors
(KNN)) on 3 sets of features (Mel Frequency Cepstral Coefficients (MFCC), MFCC+ΔMFCC,
and Spectrogram). The obtained results indicate that ΔMFCCs do not have significant impact on
the predictive performance. Moreover, FNN exhibits very robust and high scores compared with
the other learning paradigms. In the second case study, we have tested the use of feature selection
in order to reduce the computational cost of training. Our analysis shows the positive role of feature
selection in this case. Specifically, we can conclude that systems that were built using 40%
ΔMFCC and 60% MFCC can increase the generalization ability of FNN.
Keywords: Acoustic Scene Classification, Machine Learning, Feature Extraction, Feature
Selection, Statistical Tests.