Veuillez utiliser cette adresse pour citer ce document : https://di.univ-blida.dz/jspui/handle/123456789/40776
Affichage complet
Élément Dublin CoreValeurLangue
dc.contributor.authorGharouba, Hadil-
dc.contributor.authorBen doumia, kaouther-
dc.contributor.authorYkhlef, Hadjer. (Promotrice)-
dc.date.accessioned2025-10-26T13:57:19Z-
dc.date.available2025-10-26T13:57:19Z-
dc.date.issued2025-
dc.identifier.urihttps://di.univ-blida.dz/jspui/handle/123456789/40776-
dc.descriptionill.,Bibliogr.cote:MA-004-1052fr_FR
dc.description.abstractThe main objective of our project is to develop an effective system for Automated Audio Captioning (AAC), a task that involves describing ambient sounds within an audio clip using a natural language sentence, effectively bridging the gap between auditory perception and linguistic expression. In recent years, AAC has gained significant attention and has seen considerable progress. Despite these advancements, the field still faces many challenges. To achieve this task, our approach follows an encoder-decoder model based on deep learning techniques. Specifically, we employ a novel, fully transformer-based architecture built around BART, which overcomes the limitations of traditional RNN and CNN approaches in AAC. The self-attention mechanism in BART facilitates better modeling of both local and global dependencies in audio signals. Our model integrates VGGish to extract audio embeddings from log-Mel spectrograms, and a BART transformer combining a bidirectional encoder and an autoregressive decoder for generating captions. Word embeddings are produced using a BPE tokenizer, which is adapted to the unique vocabulary of the training dataset, thereby aligning it with the general requirements of the captioning task. In order to improve the quality of the generated audio captions, we performed multiple experiments using the Clotho dataset. The results indicate that our model produces more accurate and diverse descriptions than existing state-of-the-art approaches. Keywords: Automated Audio Captioning, Encoder-Decoder, Deep Learning, Transformer, BART, VGGish, BPE tokenizer, Clothofr_FR
dc.language.isoenfr_FR
dc.publisherUniversité Blida 1fr_FR
dc.subjectAutomated Audio Captioningfr_FR
dc.subjectEncoder-Decoder.fr_FR
dc.subjectDeep Learninfr_FR
dc.subjectTransformer.fr_FR
dc.subjectBART.fr_FR
dc.subjectVGGishfr_FR
dc.subjectBPE tokenizerfr_FR
dc.subjectClothofr_FR
dc.titleEncoder-decoder-based neural Network architectures for automatic audio captioning.fr_FR
dc.typeThesisfr_FR
Collection(s) :Mémoires de Master

Fichier(s) constituant ce document :
Fichier Description TailleFormat 
Gharouba Hadil et Ben Doumia Kaouther.pdf3,68 MBAdobe PDFVoir/Ouvrir


Tous les documents dans DSpace sont protégés par copyright, avec tous droits réservés.