University of Bahrain · Senior Project 2026 · ITCE 499
Real-Time · Speech-To-Text · Wearable · Accessible
Smart glasses-based assistive communication for the deaf and hard-of-hearing. Real-time speech-to-text, environmental noise detection, and speaker differentiation — all on-device.
Capabilities
Six intelligent modules working in unison to break communication barriers.
01
Spoken language is transcribed instantly and displayed as live captions on the OLED screen built into the user's glasses — no lag, no looking down at a phone.
02
BLE ensures wireless, low-latency communication between the smartphone and the ESP32-S3 wearable device, with minimal power draw for all-day use.
03
A hybrid fingerprint + YAMNet model identifies important sounds — doorbells, alarms, sirens — and alerts the user with on-screen labels and app notifications.
04
Using Resemblyzer voice embeddings and cosine similarity, AURIS labels each speaker in a conversation — "Speaker 1", "Speaker 2" — in real time without prior registration.
05
Online mode uses OpenAI Whisper for highest accuracy across 96+ languages. Offline mode switches to Apple Speech Recognition locally on-device — no internet needed.
06
Heavy processing happens on the smartphone — not in the cloud. Offline mode ensures that sensitive conversations stay entirely on the user's device when preferred.
A real-time pipeline from sound capture to visual caption display.
Input Layer
MEMS Microphone
Compact, high-sensitivity microphone captures audio from the user's environment.
Processing
Flutter Mobile App
Handles STT, noise detection routing, and BLE communication. Acts as the main compute unit.
Intelligence
Whisper / Apple STT + FastAPI
Dual-mode transcription with cloud Whisper for accuracy or on-device Apple STT for privacy.
Communication
BLE Protocol
Bidirectional low-latency wireless link between smartphone and ESP32-S3 wearable.
Embedded
ESP32-S3 Microcontroller
Receives captioned text and drives the OLED display; integrated BLE support.
Output
OLED HUD Display
1.3″ OLED projects captions via beam-splitter optics directly into the user's field of view.
End-to-end Latency
1–3s
From speech input to caption visible on the OLED display.
Online Mode
OpenAI Whisper via backend server — highest accuracy, 96+ languages, noise robust.
Offline Mode
Apple Speech Recognition — fully on-device, privacy-safe, no internet required.
Database
Firebase
Realtime Database for user data & custom sounds. Firestore for persistent settings.
Noise Detection Engine
Hybrid
Custom fingerprint matching for personalized sounds + YAMNet for general classification.
Technology
Mobile & Frontend
AI & Backend
Hardware
The Team
Project Member
Project Member
Project Member
Project Supervisor
Dr. Aysha Al-Sayed
Project Supervisor · College of IT, UoB
Contact
Questions about the project, collaboration requests, or feedback — we'd love to hear from you.
contact@auris-glasses.it.com
Institution
University of Bahrain
Department
Computer Engineering, College of IT
Academic Year
2025–2026 · Semester 2