Please use this identifier to cite or link to this item:
http://localhost:8080/xmlui/handle/123456789/19960| Title: | ENCODER-DECODER NEURAL NETWORK ARCHITECTURES FOR AUTOMATIC AUDIO CAPTIONING |
| Authors: | Bouchelaram, Ishrak Chita, Ramzi Kameche, A. (Promoteur) |
| Keywords: | Audio Captioning Machine Learning Encoder Decoder Models Signal Processing Natural Language Processing |
| Issue Date: | 25-Sep-2022 |
| Publisher: | Université Blida 1 |
| Abstract: | The main purpose of this project is to design an environmental general audio content description using text, where a system accepts as an input an audio signal and outputs the textual description of that signal. This task has drawn lots of attention during the past several years as a result of quick devolvement of different methods that can provide captions for a general audio recording. To accomplish the automatic audio captioning task, we have performed multiple experiments using a Clotho dataset. Two deep neural networks have been employed in the construction of our systems Recurrent Neural Network and Gated Recurrent Unit, along with encoder-decoder architecture and a combination of feature representations based on audio processing techniques like Mel Spectrogram and text processing techniques used in text decoding from word embeddings like one-hot-encoding and BERT. Keywords: Audio Captioning, Machine Learning, Encoder Decoder Models, Signal Processing, Natural Language Processing. |
| Description: | ill., Bibliogr. Cote: ma-004-869 |
| URI: | https://di.univ-blida.dz/jspui/handle/123456789/19960 |
| Appears in Collections: | Mémoires de Master |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| Bouchelaram Ishrak et Chita Ramzi.pdf | 2,66 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.