Université Blida 1

Detection and tracking of targets in real-time images Case: Arabic Visual Speech Recognition System BlidaAVS10 (Lipreading)

Afficher la notice abrégée

dc.contributor.author Baaloul, Ali
dc.date.accessioned 2024-12-10T10:00:25Z
dc.date.available 2024-12-10T10:00:25Z
dc.date.issued 2024
dc.identifier.uri https://di.univ-blida.dz/jspui/handle/123456789/35139
dc.description Thèse Format Electronique fr_FR
dc.description.abstract Automatic visual speech recognition (AVSR) techniques are increasingly prevalent in various domains, including manufacturing, public use, and multimedia devices, making Visual Speech Recognition (VSR) a promising technology that can improve communication accessibility for people with hearing impairments. However, most existing VSR systems are designed for languages like English, leaving a gap for languages like Arabic, which is spoken by over 400 million people worldwide and has unique linguistic and phonetic characteristics. This thesis presents a novel framework for Arabic Visual Speech Recognition, which aims to address this gap and cater to the needs of the Arabic hearing impaired community. The framework integrates state-of-the-art deep learning techniques, such as Convolutional Neural Networks (CNNs) and Vision Transformers (ViT), to transcribe Arabic speech from visual cues accurately and efficiently. The framework also relies on a specialized Arabic dataset, which is carefully curated to capture the diversity and complexity of the Arabic language. This dataset serves as a benchmark for training and evaluating the VSR models, ensuring their robustness and reliability in real-world applications. The framework employs the deep learning techniques like YOLO, CNNs and ViT for robust mouth detection and recognition, which enables the extraction of crucial visual features for accurate speech transcription. The experimental results show that the proposed framework achieves promising performance in enhancing communication accessibility for Arabic speakers with hearing impairments. The framework also demonstrates its effectiveness in handling various linguistic and phonetic variations of the Arabic language, opening up new possibilities for wider applications in real-world scenarios. This research contributes significantly to advancing Arabic Visual Speech Recognition technology, enriching the VSR landscape and fostering greater inclusivity in communication for Arabic speakers. fr_FR
dc.language.iso en fr_FR
dc.publisher univ.Blida1 fr_FR
dc.subject Lecture labiale fr_FR
dc.subject Transformateur de vision fr_FR
dc.subject CNN (Réseaux de neurones convolutifs)* fr_FR
dc.title Detection and tracking of targets in real-time images Case: Arabic Visual Speech Recognition System BlidaAVS10 (Lipreading) fr_FR
dc.type Other fr_FR


Fichier(s) constituant ce document

Ce document figure dans la(les) collection(s) suivante(s)

Afficher la notice abrégée

Chercher dans le dépôt


Recherche avancée

Parcourir

Mon compte