top of page
Search

Instant Voice-to-Text

Let’s Click a Button, Speak, and Instantly See Your Words Appear as Text

In this article, you’ll build a small but powerful web app in Python that records your voice in the browser and sends it to OpenAI’s Whisper model for transcription, all in under 5 minutes.

Imagine clicking a button, speaking, and instantly seeing your words appear as text!



What We’ll Build

A simple browser page with:

  • Record button – Gradio microphone input

  • Submit button – Sends audio to Whisper

  • Text output – Displays the transcription



Prerequisites

Before we start, make sure you have:

  • Python 3.8+

  • An OpenAI API key (get it here)

  • Basic familiarity with a terminal


Step 1: Project Setup


1. Create a Project Folder

mkdir voice-to-text
cd voice-to-text
python -m venv venv

2. Activate Virtual Environment (Windows PowerShell)

.\venv\Scripts\activate


⚠️ If PowerShell blocks scripts, run this once:
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser

Then reactivate:

.\venv\Scripts\activate

3. Create requirements.txt

Inside your project folder, create a file called requirements.txt and add:

openai
gradio
python-dotenv
soundfile

4. Install Dependencies

With your virtual environment activated, run:

pip install -r requirements.txt
This installs all your project dependencies at once, making setup quick and reproducible.

5. Create a .env File

Create a file called .env in your project root:

OPENAI_API_KEY=sk-YourKeyHere
This keeps your API key secure and separate from your code.

Step 2: The Code

Create a file app.py and add:

import os
from dotenv import load_dotenv
from openai import OpenAI
import gradio as gr
import tempfile, soundfile as sf

load_dotenv()
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

def transcribe(audio):
    if audio is None:
        return "No audio recorded."

    if isinstance(audio, (tuple, list)):
        data, sr = audio
        tmp = tempfile.NamedTemporaryFile(suffix=".wav", delete=False)
        sf.write(tmp.name, data, sr)
        filepath = tmp.name
    else:
        filepath = audio

    with open(filepath, "rb") as f:
        resp = client.audio.transcriptions.create(model="whisper-1", file=f)
    return resp.text

with gr.Blocks() as demo:
    gr.Markdown("## 🎤 Voice-to-Text Mini App")
    audio = gr.Audio(source="microphone", type="filepath", label="Speak here")
    btn = gr.Button("Submit")
    output = gr.Textbox(label="Transcribed Text")
    btn.click(fn=transcribe, inputs=audio, outputs=output)

if __name__ == "__main__":
    demo.launch()

Step 3: Run the App

python app.py

Open the local URL Gradio shows (usually http://127.0.0.1:7860). Click Record, speak, then click Submit. Your words appear instantly in the Textbox.


Common Pitfalls

  • API Key Error: Make sure .env exists and load_dotenv() is called.

  • TypeError proxies: Uninstall old OpenAI package, then reinstall:

pip install --upgrade openai
  • Gradio Errors: Check the terminal for the real traceback.


Next Steps

  • Save transcripts to a file or database.

  • Add language detection or translation.

  • Deploy on Hugging Face Spaces or Render (set your OPENAI_API_KEY in their Secrets panel).


 
 
 

Recent Posts

See All
How to Connect With Netsetos

At Netsetos, we believe great ideas grow faster when people can reach us easily. Whether you want to explore our latest tech content, join our developer community, ask a question, or discuss a project

 
 
 

Comments


Contact

Hyderabad India

General Inquiries:
info@netsetos.com

Terms & Conditions

  • Instagram
  • Facebook
  • Twitter
  • LinkedIn
  • YouTube
  • Medium

© 2026 by Netsetos

bottom of page