Integrating Microphone Input with ChatGPT Using JavaScript

2 min readOct 23, 2024

Step-by-Step Guide

1. Set Up HTML Structure

Create a simple HTML structure with buttons to start and stop microphone input, and a display area for the ChatGPT response.

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Voice Input to ChatGPT</title>
</head>
<body>
    <h1>Use Your Voice to Chat with GPT</h1>
    <button id="startButton">Start Listening</button>
    <button id="stopButton" disabled>Stop Listening</button>
    <div id="response"></div>
<script src="app.js"></script>
</body>
</html>

2. Implement JavaScript for Speech Recognition and API Integration

Use the Web Speech API to capture voice input and send it to the OpenAI API.

// Import dotenv to load environment variables from .env file
import 'dotenv/config';
import { Configuration, OpenAIApi } from 'openai';
// Initialize OpenAI API
const openai = new OpenAIApi(new Configuration({
    apiKey: process.env.OPENAI_API_KEY,
}));
// Set up speech recognition
const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
const recognition = new SpeechRecognition();
recognition.continuous = true;
recognition.interimResults = false;
const startButton = document.getElementById('startButton');
const stopButton = document.getElementById('stopButton');
const responseDiv = document.getElementById('response');
startButton.addEventListener('click', () => {
    recognition.start();
    startButton.disabled = true;
    stopButton.disabled = false;
});
stopButton.addEventListener('click', () => {
    recognition.stop();
    startButton.disabled = false;
    stopButton.disabled = true;
});
recognition.onresult = async (event) => {
    const transcript = event.results[event.results.length - 1][0].transcript.trim();
    console.log('You said: ', transcript);
    // Send the recognized text to ChatGPT
    try {
        const gptResponse = await openai.createCompletion({
            model: "text-davinci-003",
            prompt: transcript,
            max_tokens: 150,
        });
        responseDiv.innerText = gptResponse.data.choices[0].text.trim();
    } catch (error) {
        console.error("Error fetching response from ChatGPT:", error);
        responseDiv.innerText = "Error fetching response. Please try again.";
    }
};
recognition.onerror = (event) => {
    console.error("Speech recognition error detected: " + event.error);
};

3. Configure Environment Variables

Ensure you have a .env file in your project directory with your OpenAI API key:

OPENAI_API_KEY=your_actual_api_key_here

4. Run Your Application

Ensure your project is set up to use ES modules by either renaming your JavaScript file to .mjs or adding "type": "module" in your package.json. Then, run your application using a local server or directly in the browser.

Conclusion

This setup allows you to use voice input from your microphone as prompts for ChatGPT, providing a seamless way to interact with AI using natural language. Be sure to handle any potential errors gracefully and consider implementing rate-limiting strategies if you encounter quota issues with the OpenAI API.

So, whether you’re a tech enthusiast, a professional, or just someone who wants to learn more, I invite you to follow me on this journey. Subscribe to my blog and follow me on social media to stay in the loop and never miss a post.

Together, let’s explore the exciting world of technology and all it offers. I can’t wait to connect with you!”

Connect me on Social Media: https://linktr.ee/mdshamsfiroz

Happy coding! Happy learning!