Vision-aware interactive AI avatar with voice, emotion detection, and gesture recognition. Built for Dubai AI Summit.
SIMO is a real-time interactive AI avatar that sees, hears, thinks, and speaks. Built for a Dubai AI Summit demo, it uses GPT-4 Vision and MediaPipe to detect faces, read emotions, and recognize hand gestures. When someone approaches, it auto-greets them. It holds natural voice conversations using Whisper for speech-to-text, GPT-4o-mini for responses, and ElevenLabs for voice synthesis, all synced with HeyGen avatar lip movements.
Demo'd at Dubai AI Summit. Demonstrates real-time multi-modal AI interaction combining vision, voice, and natural language in a single coherent experience.