DEEP CO-TRAINING FRAMEWORK FOR SEMI-SUPERVISED AUDIO TAGGING

Cheifa, Ikram; Yakhlef, Hadjer ( Promotrice); Diffallah, Zhor ( promotrice)

DEEP CO-TRAINING FRAMEWORK FOR SEMI-SUPERVISED AUDIO TAGGING

Cheifa, Ikram; Yakhlef, Hadjer ( Promotrice); Diffallah, Zhor ( promotrice)

URI: https://di.univ-blida.dz/jspui/handle/123456789/20463

Date: 2022

Résumé:

Audio tagging, also known as Sound Event Recognition, is concerned with the development of systems that are able to recognize sound events. A sound event is perceived as a separate individual entity that we can name and recognize, such as helicopter, glass breaking, baby crying, speech, etc. Considerable attention has been geared towards audio tagging for various applications, such as information retrieval, music tagging, and acoustic monitoring. The general framework for audio tagging usually involves two major steps: feature extraction and classification. Clearly, obtaining well-annotated, strongly labeled data is an expensive and time-consuming process. Therefore, a large portion of recent development has been devoted to effectively using weakly labeled data extracted from websites like Youtube, Freesound, or Flickr. Various semi-supervised learning approaches have been proposed in the literature. We can cite Mean Teacher, Pseudo Labeling, Mix Match, and most recently, Deep Co-training. The purpose of this project consists of devising an audio tagging system within the semi-supervised learning paradigm, specifically the Deep Co-training framework. Such systems essentially use both labeled and unlabeled audio data. In addition, our system is trained on two different datasets :Urban8k and Environmental Sound Classification, based on a deep residual neural network (ResNet) and a wide residual neural network (WideResNet). We supported our analysis and discussion with numerous statistical tests to analyze and compare our results. We have investigated the impact of differentiating the supervised ratio on the system’s performance and have tested the impact of various variants of DCT systems based on different adversarial attacks. The results demonstrate the efficacy of the Deep Co-training SSL strategy that significantly boosts the overall performance. Keywords: Audio Tagging, Semi-supervised learning, Deep Co-training, Feature Extraction, Statistical Tests.