Simple Pricing - Superior Quality

Transcribe speech to text using the largest and most powerful AI models available, including: OpenAI Whisper large-v3 LLM. Excellent handling of background noise, multiple accents, or live speech.

🎉   Welcome to VocalStack Beta release!

Be an early adopter and receive additional free transcription hours each month!

Hobby

$0

No recurring costs

Get this plan
  • 8 hours of free transcriptions each month
  • Same high quality transcription as in the paid plans
  • No credit card required for the free monthly transcriptions

Premium

$40

per month

Billed annually

AnnuallyMonthly
Annually
Get this plan
  • 40 hours of free transcriptions each month
  • $0.35 per additional prerecorded transcription hour
  • $0.80 per additional live transcription hour
  • Unrestricted access to Polyglot
  • API for programmatic access

Enterprise

custom pricing
Let's talk
  • Unlimited transcriptions
  • Unlimited concurrent sessions
  • Dedicated IT support
  • Custom SLA

Price plan comparison

Premium

A plan that scales with your requirements

Get this plan

$40

per month

Billed annually

AnnuallyMonthly
Annually

Transcriptions
first 40 hours are free each month

Pre-recorded Transcriptions
$0.35 per hour
Live Transcription
$0.80 per hour

DevelopersIntegrate VocalStack functionality into your existing infrastructure using the API or JavaScript SDK.

API Access
Database Access
Managed Services
Transcription Rate Limit
max 50 concurrent sessions
Server Start
warm boot in non-peak times

DashboardReady to use instantly after signing up for an account. No technical skills needed. Accessible from any device with a web browser.

Transcribe Audio from Uploaded File
Transcribe Audio from URL
Transcribe Audio from from Microphone
Export Subtitles and Files
Translate Transcriptions
Polyglot

PolyglotShare your live transcription via a public link, and viewers can read in their preferred language.

Transcribe from Microphone
Transcribe from Live Stream
Real-Time Transcriptions via Public URL
Real-Time Translations via Public URL
Historical Transcriptions via Pubic URL
Enable Password Protection
Scheduled Livestream Transcriptions

AI EnhancementsAt no additional cost, VocalStack leverages a diverse range of AI models to significantly improve the quality of each transcription.

Language Support
57 languages plus dialects & accents
Automatic Language Detection
Paragraph Segmentation
Summarization
Word-Level Time Stamps
Word-Level Alignment
Speaker Diarization

Support

Help & Support
Email and Live Chat Support
SLA

Pricing Calculator

HobbyPremium
Hobby
AnnuallyMonthly
Annually
Pre-recordedLive
Pre-recorded
Hobby
Pre-recorded
Premium
Pre-recordedBilled Annually

Frequently Asked Questions

VocalStack uses large language models (LLMs) to get the best transcription quality possible, even in the most challenging audio environments. This includes Whisper, which serves as the core model for the VocalStack platform. The large Whisper model is a state-of-the-art AI model that has been trained on a vast amount of data to understand and transcribe speech accurately.

To better understand the impact of an AI model's size, let's use the different Whisper models to transcribe a fictitious excerpt:

97%
DifferenceRaw Text
Difference
In a quaint little cafée near the Thames, Claire chuckled as Pierre ate eight eclairs all in one go. Anticipating gastroesophageal reflux, he said, "nNope, they're not worth it!". Later, they called a Lyft to drive them to the park, as Pierre thinks it's cheaper than Uber. As they walked under the glow of the noctilucent sky, they jumped when they'd seen a bear clothed only in his bare fur. Pierre cried out loud, "Mon Dieu!" They both leapt hastily into the river and swam for Chiswick Eyot. "Phew!"
Original Text
In a quaint little café near the Thames, Claire chuckled as Pierre ate eight eclairs all in one go. Anticipating gastroesophageal reflux, he said "nope, they're not worth it!" Later, they called a Lyft to drive them to the park, as Pierre thinks its cheaper than Uber. As they walked under the glow of the noctilucent sky, they jumped when they'd seen a bear clothed only in his bare fur. Pierre cried out loud, "Mon Dieu!" They both leapt hastily into the river and swam for Chiswick Eyot. Phew!

No, you will not be billed for the whole hour. Our billing costs are always calculated per second of transcribed audio regardless of whether the transcription is a prerecorded audio or live audio. This means you only get billed for what you need transcribed. The only exception is that the audio must be at least one minute long. Otherwise, you will be billed for the whole minute.

To simplify this further, here is what you will be billed in each plan for a prerecorded transcription (assuming you've used up all your free transcription hours for the month):

30 minutes
Hobby Plan Prerecorded Transcription Cost
$0.3500
Premium Plan Prerecorded Transcription Cost
$0.1750

No, there are no hidden costs. You only pay for the transcription of your audio content. (In other words, only for the costs listed on the pricing table.) Other features such as automatic language detection, translations, summarizations, paragraph segmentation, keyword detection, and timestamps are included for free.

Importantly, the number of translations does not affect the transcription cost. For example, if you transcribe an audio file in English and then translate it into Spanish, French, and German, you will only be billed for the transcription of the English audio. This also applies to live transcriptions using Polyglot. You can perform an unlimited number of translations at any time without any additional charges.

Pre-recorded transcription refers to the process of transcribing audio that has been previously recorded. It can be uploaded as an audio file and transcribed at a later time, making it suitable for podcasts, interviews, videos, and other recorded content.

Live transcription refers to the process of transcribing audio in real time as it is being spoken. This is useful for live streams, podcasts, events, meetings, lectures, and other scenarios where immediate transcription (and possibly translation) is required

Support for over 57 languages, including different dialects and accents. Supported languages include: Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, and Welsh.