Veuillez utiliser cette adresse pour citer ce document : https://di.univ-blida.dz/jspui/handle/123456789/40776
Titre: Encoder-decoder-based neural Network architectures for automatic audio captioning.
Auteur(s): Gharouba, Hadil
Ben doumia, kaouther
Ykhlef, Hadjer. (Promotrice)
Mots-clés: Automated Audio Captioning
Encoder-Decoder.
Deep Learnin
Transformer.
BART.
VGGish
BPE tokenizer
Clotho
Date de publication: 2025
Editeur: Université Blida 1
Résumé: The main objective of our project is to develop an effective system for Automated Audio Captioning (AAC), a task that involves describing ambient sounds within an audio clip using a natural language sentence, effectively bridging the gap between auditory perception and linguistic expression. In recent years, AAC has gained significant attention and has seen considerable progress. Despite these advancements, the field still faces many challenges. To achieve this task, our approach follows an encoder-decoder model based on deep learning techniques. Specifically, we employ a novel, fully transformer-based architecture built around BART, which overcomes the limitations of traditional RNN and CNN approaches in AAC. The self-attention mechanism in BART facilitates better modeling of both local and global dependencies in audio signals. Our model integrates VGGish to extract audio embeddings from log-Mel spectrograms, and a BART transformer combining a bidirectional encoder and an autoregressive decoder for generating captions. Word embeddings are produced using a BPE tokenizer, which is adapted to the unique vocabulary of the training dataset, thereby aligning it with the general requirements of the captioning task. In order to improve the quality of the generated audio captions, we performed multiple experiments using the Clotho dataset. The results indicate that our model produces more accurate and diverse descriptions than existing state-of-the-art approaches. Keywords: Automated Audio Captioning, Encoder-Decoder, Deep Learning, Transformer, BART, VGGish, BPE tokenizer, Clotho
Description: ill.,Bibliogr.cote:MA-004-1052
URI/URL: https://di.univ-blida.dz/jspui/handle/123456789/40776
Collection(s) :Mémoires de Master

Fichier(s) constituant ce document :
Fichier Description TailleFormat 
Gharouba Hadil et Ben Doumia Kaouther.pdf3,68 MBAdobe PDFVoir/Ouvrir


Tous les documents dans DSpace sont protégés par copyright, avec tous droits réservés.