In a significant step toward more inclusive technology, researchers at Florida Atlantic University (FAU) have developed a real-time American Sign Language (ASL) interpretation system that uses artificial intelligence (AI) to bridge communication gaps for individuals who are deaf or hard of hearing.
The system, built by a team from FAU’s College of Engineering and Computer Science, leverages the combined strengths of YOLOv11, an advanced object detection model, and MediaPipe, a tool for real-time hand tracking. This integration allows the system to interpret ASL alphabet letters from video input with remarkable accuracy and speed—even under inconsistent lighting or with complex backgrounds—and translate it into readable text.
“What makes this system especially notable is that the entire recognition pipeline—from capturing the gesture to classifying it—operates seamlessly in real time, regardless of varying lighting conditions or backgrounds,” Bader Alsharif, the study’s first author and a PhD candidate in FAU’s Department of Electrical Engineering and Computer Science, said in a press release. “And all of this is achieved using standard, off-the-shelf hardware. This underscores the system’s practical potential as a highly accessible and scalable assistive technology.”
A key innovation lies in the use of skeletal hand mapping. A webcam captures the ASL translator’s hand gestures, which are rendered into digital frames. MediaPipe identifies 21 key points on each hand—including fingertips, knuckles, and the wrist—creating a structural map that YOLOv11 then uses to distinguish between ASL letters, even those that look similar, such as “A” and “T” or “M” and “N.”
This approach helped the system achieve a mean Average Precision (mAP@0.5) of 98.2%, indicating high classification accuracy. The results, published in the journal Sensors, demonstrate minimal latency, making the system ideal for applications requiring real-time communication, such as virtual meetings or interactive kiosks.
The researchers also developed a robust dataset to train and test their model. The ASL Alphabet Hand Gesture Dataset consists of 130,000 images under a wide range of conditions, including various lighting scenarios, hand orientations, and skin tones. Each image was annotated with 21 landmarks to ensure the model could generalize across diverse users and environments.
“This project is a great example of how cutting-edge AI can be applied to serve humanity,” Imad Mahgoub, PhD, co-author and Tecore Professor in FAU’s Department of Electrical Engineering and Computer Science, said in the release. “By fusing deep learning with hand landmark detection, our team created a system that not only achieves high accuracy but also remains accessible and practical for everyday use.”
The team later extended the project to explore how another object detection model—YOLOv8—performed when combined with MediaPipe. In a separate study published in Franklin Open, researchers trained the model using a new dataset of nearly 30,000 annotated images. Results from this effort were similarly promising, demonstrating 98% accuracy and recall. The model maintained strong performance across a range of hand gestures and positions, reinforcing its real-world applicability.
Beyond academic validation, the system’s practical implications are significant. According to the National Institute on Deafness and Other Communication Disorders, approximately 37.5 million adults in the U.S. report some trouble hearing, while about 11 million are considered deaf or functionally deaf.
“The significance of this research lies in its potential to transform communication for the deaf community by providing an AI-driven tool that translates American Sign Language gestures into text, enabling smoother interactions across education, workplaces, health care, and social settings,” said Mohammad Ilyas, PhD, co-author and professor at FAU.
Future development will focus on expanding the model’s capability from recognizing static letters to interpreting full ASL sentences, as well as optimizing performance on mobile and edge devices. This would allow more natural conversations and greater accessibility on widely used platforms.