Please use this identifier to cite or link to this item: http://localhost:8080/xmlui/handle/123456789/40776
Title: Encoder-decoder-based neural Network architectures for automatic audio captioning.
Authors: Gharouba, Hadil
Ben doumia, kaouther
Ykhlef, Hadjer. (Promotrice)
Keywords: Automated Audio Captioning
Encoder-Decoder.
Deep Learnin
Transformer.
BART.
VGGish
BPE tokenizer
Clotho
Issue Date: 2025
Publisher: Université Blida 1
Abstract: The main objective of our project is to develop an effective system for Automated Audio Captioning (AAC), a task that involves describing ambient sounds within an audio clip using a natural language sentence, effectively bridging the gap between auditory perception and linguistic expression. In recent years, AAC has gained significant attention and has seen considerable progress. Despite these advancements, the field still faces many challenges. To achieve this task, our approach follows an encoder-decoder model based on deep learning techniques. Specifically, we employ a novel, fully transformer-based architecture built around BART, which overcomes the limitations of traditional RNN and CNN approaches in AAC. The self-attention mechanism in BART facilitates better modeling of both local and global dependencies in audio signals. Our model integrates VGGish to extract audio embeddings from log-Mel spectrograms, and a BART transformer combining a bidirectional encoder and an autoregressive decoder for generating captions. Word embeddings are produced using a BPE tokenizer, which is adapted to the unique vocabulary of the training dataset, thereby aligning it with the general requirements of the captioning task. In order to improve the quality of the generated audio captions, we performed multiple experiments using the Clotho dataset. The results indicate that our model produces more accurate and diverse descriptions than existing state-of-the-art approaches. Keywords: Automated Audio Captioning, Encoder-Decoder, Deep Learning, Transformer, BART, VGGish, BPE tokenizer, Clotho
Description: ill.,Bibliogr.cote:MA-004-1052
URI: https://di.univ-blida.dz/jspui/handle/123456789/40776
Appears in Collections:Mémoires de Master

Files in This Item:
File Description SizeFormat 
Gharouba Hadil et Ben Doumia Kaouther.pdf3,68 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.