Building a Speech-to-Text Application with JavaScript

3 min readOct 17, 2024

In this technical era, we already see speech-to-text functionality in our everyday lives. Have you ever found yourself curious for making your own with the help of a programming language?

Let me tell you. One powerful tool for implementing this functionality is the Web Speech API, specifically its SpeechRecognition interface, which allows developers to incorporate voice data into web apps with ease.

Power of Web Speech API

The Web Speech API is a native browser API that enables developers to incorporate voice data into web applications without the need for additional libraries. This API provides a simple way to access a device’s microphone and convert speech to text in real time. While browser support varies, it’s widely available in modern browsers, making it a versatile choice for many web projects.

Setting Up the HTML Structure

The basic HTML structure for our speech-to-text application is straightforward:

<button onclick="startListening()">Start Listening</button>
<div id="output"></div>

This structure includes a button to initiate the listening process and a div to display the transcribed text.

Implementing Speech Recognition

Initializing the SpeechRecognition Object

To begin, we create a SpeechRecognition object:

const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
const recognition = new SpeechRecognition();

This code accounts for vendor prefixing, ensuring compatibility across different browsers.

Configuring Recognition Settings

We can configure various settings to customize the recognition process:

recognition.continuous = true;
recognition.lang = 'en-US';
recognition.interimResults = false;
recognition.maxAlternatives = 1;

These settings control continuous recognition, language, interim results, and the number of alternative transcriptions.

Handling Recognition Events

Key events in the recognition process include:

recognition.onresult = (event) => {
    const transcript = event.results[event.results.length - 1][0].transcript;
    document.getElementById('output').textContent = transcript;
};
recognition.onerror = (event) => {
    console.error('Speech recognition error:', event.error);
};
recognition.onend = () => {
    console.log('Speech recognition ended');
    recognition.start();
};

These event handlers manage successful recognition, errors, and the end of recognition sessions.

Starting the Recognition Process

The startListening function initiates the speech recognition

function startListening() {
    recognition.start();
    console.log('Ready to receive speech input');
}

Handling Microphone Permissions

It’s crucial to request and handle microphone access:

navigator.mediaDevices.getUserMedia({ audio: true })
    .then(function(stream) {
        console.log('Microphone permission granted');
    })
    .catch(function(err) {
        console.log('Microphone permission denied:', err);
    });

This code ensures proper handling of both granted and denied microphone permissions.

Full code :-

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Speech to Text</title>
</head>
<body>
    <button onclick="startListening()">Start Listening</button>
    <div id="output"></div>

    <script>
        const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
        const recognition = new SpeechRecognition();

        recognition.continuous = true;
        recognition.lang = 'en-US';
        recognition.interimResults = false;
        recognition.maxAlternatives = 1;

        recognition.onresult = (event) => {
            const transcript = event.results[event.results.length - 1][0].transcript;
            document.getElementById('output').textContent = transcript;
        };

        recognition.onerror = (event) => {
            console.error('Speech recognition error:', event.error);
        };

        recognition.onend = () => {
            console.log('Speech recognition ended');
            recognition.start();
        };

        function startListening() {
            recognition.start();
            console.log('Ready to receive speech input');
        }

        navigator.mediaDevices.getUserMedia({ audio: true })
            .then(function(stream) {
                console.log('Microphone permission granted');
            })
            .catch(function(err) {
                console.log('Microphone permission denied:', err);
            });
    </script>
</body>
</html>

Output Interface:-

Github Link:- https://github.com/mdshamsfiroz/Arth-4-Tasks/blob/main/FullStack/Task1.html

Potential Applications and Extensions

This basic implementation can be extended for various applications:

Voice commands for web navigation
Real-time captioning for video content
Voice-controlled forms or searches

Conclusion

Implementing speech-to-text functionality in web applications using JavaScript and the Web Speech API is both powerful and surprisingly simple. I encourage you to experiment with this code.

So, whether you’re a tech enthusiast, a professional, or just someone who wants to learn more, I invite you to follow me on this journey. Subscribe to my blog and follow me on social media to stay in the loop and never miss a post.

Together, let’s explore the exciting world of technology and all it offers. I can’t wait to connect with you!”

Connect me on Social Media: https://linktr.ee/mdshamsfiroz

Happy coding! Happy learning!