Instant Voice-to-Text
- Tanu Varshney

- Sep 8
- 2 min read
Let’s Click a Button, Speak, and Instantly See Your Words Appear as Text
In this article, you’ll build a small but powerful web app in Python that records your voice in the browser and sends it to OpenAI’s Whisper model for transcription, all in under 5 minutes.
Imagine clicking a button, speaking, and instantly seeing your words appear as text!
What We’ll Build
A simple browser page with:
Record button – Gradio microphone input
Submit button – Sends audio to Whisper
Text output – Displays the transcription

Prerequisites
Before we start, make sure you have:
Python 3.8+
An OpenAI API key (get it here)
Basic familiarity with a terminal

Step 1: Project Setup
1. Create a Project Folder
mkdir voice-to-text
cd voice-to-text
python -m venv venv
2. Activate Virtual Environment (Windows PowerShell)
.\venv\Scripts\activate
⚠️ If PowerShell blocks scripts, run this once:
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
Then reactivate:
.\venv\Scripts\activate
3. Create requirements.txt
Inside your project folder, create a file called requirements.txt and add:
openai
gradio
python-dotenv
soundfile
4. Install Dependencies
With your virtual environment activated, run:
pip install -r requirements.txt
This installs all your project dependencies at once, making setup quick and reproducible.
5. Create a .env File
Create a file called .env in your project root:
OPENAI_API_KEY=sk-YourKeyHere
This keeps your API key secure and separate from your code.
Step 2: The Code
Create a file app.py and add:
import os
from dotenv import load_dotenv
from openai import OpenAI
import gradio as gr
import tempfile, soundfile as sf
load_dotenv()
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
def transcribe(audio):
if audio is None:
return "No audio recorded."
if isinstance(audio, (tuple, list)):
data, sr = audio
tmp = tempfile.NamedTemporaryFile(suffix=".wav", delete=False)
sf.write(tmp.name, data, sr)
filepath = tmp.name
else:
filepath = audio
with open(filepath, "rb") as f:
resp = client.audio.transcriptions.create(model="whisper-1", file=f)
return resp.text
with gr.Blocks() as demo:
gr.Markdown("## 🎤 Voice-to-Text Mini App")
audio = gr.Audio(source="microphone", type="filepath", label="Speak here")
btn = gr.Button("Submit")
output = gr.Textbox(label="Transcribed Text")
btn.click(fn=transcribe, inputs=audio, outputs=output)
if __name__ == "__main__":
demo.launch()
Step 3: Run the App
python app.py
Open the local URL Gradio shows (usually http://127.0.0.1:7860). Click Record, speak, then click Submit. Your words appear instantly in the Textbox.
Common Pitfalls
API Key Error: Make sure .env exists and load_dotenv() is called.
TypeError proxies: Uninstall old OpenAI package, then reinstall:
pip install --upgrade openai
Gradio Errors: Check the terminal for the real traceback.
Next Steps
Save transcripts to a file or database.
Add language detection or translation.
Deploy on Hugging Face Spaces or Render (set your OPENAI_API_KEY in their Secrets panel).



Comments