Detection and tracking of targets in real-time images
Case:
Arabic Visual Speech Recognition System
BlidaAVS10 (Lipreading)

Baaloul, Ali

dc.contributor.author	Baaloul, Ali
dc.date.accessioned	2024-12-10T10:00:25Z
dc.date.available	2024-12-10T10:00:25Z
dc.date.issued	2024
dc.identifier.uri	https://di.univ-blida.dz/jspui/handle/123456789/35139
dc.description	Thèse Format Electronique	fr_FR
dc.description.abstract	Automatic visual speech recognition (AVSR) techniques are increasingly prevalent in various domains, including manufacturing, public use, and multimedia devices, making Visual Speech Recognition (VSR) a promising technology that can improve communication accessibility for people with hearing impairments. However, most existing VSR systems are designed for languages like English, leaving a gap for languages like Arabic, which is spoken by over 400 million people worldwide and has unique linguistic and phonetic characteristics. This thesis presents a novel framework for Arabic Visual Speech Recognition, which aims to address this gap and cater to the needs of the Arabic hearing impaired community. The framework integrates state-of-the-art deep learning techniques, such as Convolutional Neural Networks (CNNs) and Vision Transformers (ViT), to transcribe Arabic speech from visual cues accurately and efficiently. The framework also relies on a specialized Arabic dataset, which is carefully curated to capture the diversity and complexity of the Arabic language. This dataset serves as a benchmark for training and evaluating the VSR models, ensuring their robustness and reliability in real-world applications. The framework employs the deep learning techniques like YOLO, CNNs and ViT for robust mouth detection and recognition, which enables the extraction of crucial visual features for accurate speech transcription. The experimental results show that the proposed framework achieves promising performance in enhancing communication accessibility for Arabic speakers with hearing impairments. The framework also demonstrates its effectiveness in handling various linguistic and phonetic variations of the Arabic language, opening up new possibilities for wider applications in real-world scenarios. This research contributes significantly to advancing Arabic Visual Speech Recognition technology, enriching the VSR landscape and fostering greater inclusivity in communication for Arabic speakers.	fr_FR
dc.language.iso	en	fr_FR
dc.publisher	univ.Blida1	fr_FR
dc.subject	Lecture labiale	fr_FR
dc.subject	Transformateur de vision	fr_FR
dc.subject	CNN (Réseaux de neurones convolutifs)*	fr_FR
dc.title	Detection and tracking of targets in real-time images Case: Arabic Visual Speech Recognition System BlidaAVS10 (Lipreading)	fr_FR
dc.type	Other	fr_FR