MR

/projects
/ai speech simulator

ReactNext.jsPWATailwindCSSShadcnVercelPostgreSQLPrismaNode.jsWhisperChatGPTOpenAI
AI Speech Simulator

Overview

The AI Speech Simulator is a progressive web application that revolutionizes speech therapy practice by providing AI-powered conversational scenarios for realistic speech practice. The platform aims to move beyond traditional isolated exercises by offering contextual, interactive speech practice environments.

My Role

As the Full-Stack Developer and AI Integration Specialist, I designed and implemented the entire application architecture, from the real-time speech processing pipeline to the AI conversation system. I worked independently to create a comprehensive solution that bridges speech recognition technology with natural language AI.

Key Features / Achievements

  • Engineered a hybrid speech recognition system combining Web Speech API and OpenAI's Whisper for maximum reliability across devices
  • Developed an intelligent conversation management system that maintains context across four key scenarios: workplace interactions, café conversations, job interviews, and social situations
  • Implemented a comprehensive feedback and analysis system that generates personalized recommendations based on speech patterns
  • Created a Progressive Web App with offline capabilities and cross-device compatibility

Technical Implementation Details

Speech Recognition Pipeline:

  • Real-time speech-to-text processing with fallback mechanisms
  • Client-side audio preprocessing and buffering
  • Intelligent error handling for network and API failures

Conversation System:

  • Context-aware dialogue management
  • Scenario-specific conversation tracking
  • Session persistence with full history
  • PDF report generation capabilities

Database Architecture:

  • GDPR-compliant voice data storage
  • User session management
  • Conversation transcript archiving
  • Performance-optimized query patterns

Progressive Web Features:

  • Offline scenario access
  • Cross-device compatibility
  • Real-time audio processing
  • Responsive design for mobile access

Stack & Tools

Built using Next.js and React for the frontend, with TailwindCSS and Shadcn for UI components. The backend utilizes Node.js with PostgreSQL/Prisma for data management. AI capabilities are powered by OpenAI's ChatGPT and Whisper APIs, with the entire system deployed on Vercel.

Results & Outcomes

The platform successfully delivers a robust speech practice environment that handles complex real-time voice interactions while maintaining natural conversation flow. The hybrid speech recognition approach achieved 95%+ uptime across different devices and browsers, while the context management system maintains coherent dialogues even during extended practice sessions. Key learnings included optimizing AI response times without sacrificing conversation quality and implementing graceful degradation for unreliable browser APIs.

/screens

screencapture-localhost-3000-app-2025-06-18-13_43_43
screencapture-localhost-3000-quiz-2025-06-18-13_42_09
screencapture-localhost-3000-scenario-attending-a-social-event-2025-06-18-13_55_51
screencapture-localhost-3000-scenario-feedback-cmc1u4e6x000673q4p0gmw1ef-2025-06-18-13_54_29
screencapture-localhost-3000-scenario-feedback-cmc1u4e6x000673q4p0gmw1ef-2025-06-18-13_54_39
screencapture-localhost-3000-scenario-feedback-cmc1u4e6x000673q4p0gmw1ef-2025-06-18-13_55_15 (1)
screencapture-localhost-3000-scenario-feedback-cmc1u4e6x000673q4p0gmw1ef-2025-06-18-13_55_15
screencapture-localhost-3000-scenario-feedback-cmc1u4e6x000673q4p0gmw1ef-2025-06-18-13_55_28
screencapture-localhost-3000-scenario-starting-a-new-job-2025-06-18-13_44_48
screencapture-localhost-3000-scenario-starting-a-new-job-2025-06-18-13_52_48
screencapture-localhost-3000-scenario-starting-a-new-job-2025-06-18-13_53_11
screencapture-localhost-3000-scenario-starting-a-new-job-2025-06-18-13_53_56