Veuillez utiliser cette adresse pour citer ce document :
https://di.univ-blida.dz/jspui/handle/123456789/40776Affichage complet
| Élément Dublin Core | Valeur | Langue |
|---|---|---|
| dc.contributor.author | Gharouba, Hadil | - |
| dc.contributor.author | Ben doumia, kaouther | - |
| dc.contributor.author | Ykhlef, Hadjer. (Promotrice) | - |
| dc.date.accessioned | 2025-10-26T13:57:19Z | - |
| dc.date.available | 2025-10-26T13:57:19Z | - |
| dc.date.issued | 2025 | - |
| dc.identifier.uri | https://di.univ-blida.dz/jspui/handle/123456789/40776 | - |
| dc.description | ill.,Bibliogr.cote:MA-004-1052 | fr_FR |
| dc.description.abstract | The main objective of our project is to develop an effective system for Automated Audio Captioning (AAC), a task that involves describing ambient sounds within an audio clip using a natural language sentence, effectively bridging the gap between auditory perception and linguistic expression. In recent years, AAC has gained significant attention and has seen considerable progress. Despite these advancements, the field still faces many challenges. To achieve this task, our approach follows an encoder-decoder model based on deep learning techniques. Specifically, we employ a novel, fully transformer-based architecture built around BART, which overcomes the limitations of traditional RNN and CNN approaches in AAC. The self-attention mechanism in BART facilitates better modeling of both local and global dependencies in audio signals. Our model integrates VGGish to extract audio embeddings from log-Mel spectrograms, and a BART transformer combining a bidirectional encoder and an autoregressive decoder for generating captions. Word embeddings are produced using a BPE tokenizer, which is adapted to the unique vocabulary of the training dataset, thereby aligning it with the general requirements of the captioning task. In order to improve the quality of the generated audio captions, we performed multiple experiments using the Clotho dataset. The results indicate that our model produces more accurate and diverse descriptions than existing state-of-the-art approaches. Keywords: Automated Audio Captioning, Encoder-Decoder, Deep Learning, Transformer, BART, VGGish, BPE tokenizer, Clotho | fr_FR |
| dc.language.iso | en | fr_FR |
| dc.publisher | Université Blida 1 | fr_FR |
| dc.subject | Automated Audio Captioning | fr_FR |
| dc.subject | Encoder-Decoder. | fr_FR |
| dc.subject | Deep Learnin | fr_FR |
| dc.subject | Transformer. | fr_FR |
| dc.subject | BART. | fr_FR |
| dc.subject | VGGish | fr_FR |
| dc.subject | BPE tokenizer | fr_FR |
| dc.subject | Clotho | fr_FR |
| dc.title | Encoder-decoder-based neural Network architectures for automatic audio captioning. | fr_FR |
| dc.type | Thesis | fr_FR |
| Collection(s) : | Mémoires de Master | |
Fichier(s) constituant ce document :
| Fichier | Description | Taille | Format | |
|---|---|---|---|---|
| Gharouba Hadil et Ben Doumia Kaouther.pdf | 3,68 MB | Adobe PDF | Voir/Ouvrir |
Tous les documents dans DSpace sont protégés par copyright, avec tous droits réservés.