How I Built and Ran a DIY Transcription Service from My Home

For the past 6 months, I've been running VideoToBe.com, a simple transcription service using a single machine hosted in my house. My DIY setup is small, yet functional. I use the OpenAI Whisper Model to convert audio to text and send the transcript via email. You can try my service at VideoToBe.com.

TL;DR: For the past 6 months, I've been running VideoToBe.com, a simple transcription service using a single machine hosted in my house. My DIY setup is small, yet functional. I use the OpenAI Whisper Model to convert audio to text and send the transcript via email. You can try my service at VideoToBe.com.

Building My Home AI Lab

First, I had to upgrade my desktop to one with a GPU. I had been exploring LLMs and generative AI, but my old hardware was not enough - in my upgrade, GPU is the most expensive component. I opted for the NVIDIA 3090 GPU, 62GB of RAM and AMD's Ryzen 9.

No alt text provided for this image

My Kitchen Table Tech Stack

  • A GPU computer running in my home network

  • A front-end that uploads files to a Storage Bucket and adds the task to a queue.

  • A cron job that downloads these files

  • OpenAI's Whisper model create the transcripts.

  • A simple email system to deliver results

The service runs on my home internet, using residential AT&T Fiber. Since traffic is still low, my home internet can handle the traffic. Requests are added to the queue and processed one by one by server. The machine can run Whisper Large Model. The Transcripts are delivered via email. Since the delivery mechanism is email, I have wiggle room in terms of performance. So far, I am able to delivered most transcripts requests within 30 minutes.

VideoToBe Transcription Pipeline

The Joys 😏 of Self-Hosting (Not Really)

Running this from home has led to some experiences.

  • My neighborhood had three outages last year — one power outage and two internet outages. Telling users, "Sorry, my home lost the internet!" just doesn’t feel right.

  • I once knocked out an Ethernet cable while cleaning. I didn’t notice for hours until emails started coming in.

  • Since I have only one machine, I sometimes run my LLM experiments on my production machine. When things break (and they do),  it disrupts the service.

My Privacy Policy

I’ve kept my privacy policy simple. I have kept user data secure and private, but I make no such promise in the privacy policy. The money this project makes isn’t enough to take legal risks,  I’m transparent about it.  

What Did I Learn?

Scaling is overrated: My small setup has worked fine for 6 months. One day, I may need to scale — but that day is not today. Over engineering kills more projects.

Users are nice: I used to think users were demanding and unreasonable. But when you set clear expectations and deliver, people are kind. Some have even given me marketing tips and encouragement!

Monthly Subscription Fatigue: Most of the people hate monthly subscription!

I noticed two types of users:

  1. People with Pro versions of Zoom, Teams, or other Podcasting transcription services — they don’t need my service.

  2. People with occasional transcription needs — they don’t want a subscription but are happy to pay per request. This second group became my target customers.

Why Transcription?

Little bit of history. During the pandemic, I watched a YouTube live stream with over 600 episodes. When I wanted to find specific moments from these shows, it was hard. I built a video search web tool using OpenSearch (ElasticSearch). It worked, but the show's audience had moved on. I ended up shutting down the project after a while.

Still, I believed video search and extracting insights from video had potential. I pitched my idea to people but interest was lukewarm. The pitch was that companies should organize their webinar and video collections and generate content based on it. The webinars and contents was already created - why not make it more visible and useful? The pushback was "nah - not a priority."

I decided to rethink. What could I build that was simple yet useful? I realized transcription was the starting point for any video or audio search system. I stripped away complex features — no accounts, no SaaS platform — just "upload a file, get a transcript by email."

That became my MVP.

Where Am I Right Now and What Next?

People actually use my service! We recently crossed 10,000 transcripts. Many users keep coming back and send kind emails. Their encouragement keeps me going!

I’m now building a more complete product. My original goal is to build a system with video search, insights, and chat with videos. So far, I have worked only on the audio part — adding visual context is one of my goals and my next step.

You can try it here: VideoToBe.com

Have a big collection of audio or video? Get in touch! I'd love to help you build your video insights media library. Have a feedback ? Feel free to get in touch at [email protected]

Share this post

Loading...