Building a Speech-to-Text Application with JavaScript
In this technical era, we already see speech-to-text functionality in our everyday lives. Have you ever found yourself curious for making your own with the help of a programming language?
Let me tell you. One powerful tool for implementing this functionality is the Web Speech API, specifically its SpeechRecognition interface, which allows developers to incorporate voice data into web apps with ease.
Power of Web Speech API
The Web Speech API is a native browser API that enables developers to incorporate voice data into web applications without the need for additional libraries. This API provides a simple way to access a device’s microphone and convert speech to text in real time. While browser support varies, it’s widely available in modern browsers, making it a versatile choice for many web projects.
Setting Up the HTML Structure
The basic HTML structure for our speech-to-text application is straightforward:
<button onclick="startListening()">Start Listening</button>
<div id="output"></div>
This structure includes a button to initiate the listening process and a div to display the transcribed text.
Implementing Speech Recognition
Initializing the SpeechRecognition Object
To begin, we create a SpeechRecognition object:
const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
const recognition = new SpeechRecognition();
This code accounts for vendor prefixing, ensuring compatibility across different browsers.
Configuring Recognition Settings
We can configure various settings to customize the recognition process:
recognition.continuous = true;
recognition.lang = 'en-US';
recognition.interimResults = false;
recognition.maxAlternatives = 1;
These settings control continuous recognition, language, interim results, and the number of alternative transcriptions.
Handling Recognition Events
Key events in the recognition process include:
recognition.onresult = (event) => {
const transcript = event.results[event.results.length - 1][0].transcript;
document.getElementById('output').textContent = transcript;
};
recognition.onerror = (event) => {
console.error('Speech recognition error:', event.error);
};
recognition.onend = () => {
console.log('Speech recognition ended');
recognition.start();
};
These event handlers manage successful recognition, errors, and the end of recognition sessions.
Starting the Recognition Process
The startListening
function initiates the speech recognition
function startListening() {
recognition.start();
console.log('Ready to receive speech input');
}
Handling Microphone Permissions
It’s crucial to request and handle microphone access:
navigator.mediaDevices.getUserMedia({ audio: true })
.then(function(stream) {
console.log('Microphone permission granted');
})
.catch(function(err) {
console.log('Microphone permission denied:', err);
});
This code ensures proper handling of both granted and denied microphone permissions.
Full code :-
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Speech to Text</title>
</head>
<body>
<button onclick="startListening()">Start Listening</button>
<div id="output"></div>
<script>
const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
const recognition = new SpeechRecognition();
recognition.continuous = true;
recognition.lang = 'en-US';
recognition.interimResults = false;
recognition.maxAlternatives = 1;
recognition.onresult = (event) => {
const transcript = event.results[event.results.length - 1][0].transcript;
document.getElementById('output').textContent = transcript;
};
recognition.onerror = (event) => {
console.error('Speech recognition error:', event.error);
};
recognition.onend = () => {
console.log('Speech recognition ended');
recognition.start();
};
function startListening() {
recognition.start();
console.log('Ready to receive speech input');
}
navigator.mediaDevices.getUserMedia({ audio: true })
.then(function(stream) {
console.log('Microphone permission granted');
})
.catch(function(err) {
console.log('Microphone permission denied:', err);
});
</script>
</body>
</html>
Output Interface:-
Github Link:- https://github.com/mdshamsfiroz/Arth-4-Tasks/blob/main/FullStack/Task1.html
Potential Applications and Extensions
This basic implementation can be extended for various applications:
- Voice commands for web navigation
- Real-time captioning for video content
- Voice-controlled forms or searches
Conclusion
Implementing speech-to-text functionality in web applications using JavaScript and the Web Speech API is both powerful and surprisingly simple. I encourage you to experiment with this code.
So, whether you’re a tech enthusiast, a professional, or just someone who wants to learn more, I invite you to follow me on this journey. Subscribe to my blog and follow me on social media to stay in the loop and never miss a post.
Together, let’s explore the exciting world of technology and all it offers. I can’t wait to connect with you!”
Connect me on Social Media: https://linktr.ee/mdshamsfiroz
Happy coding! Happy learning!