Documentation

Get Transcription Data

Get data from pending or completed transcriptions

Transcription Sessions

Monitor and manage transcription state with sessions

Translate a Transcription

Translate transcribed text to another language

Transcribe Audio from URL

Transcribe speech from pre-recorded audio in a URL to plain text

Client Side Authentication Tokens

Create a temporary authentication token for client side requests

Transcription Request and Response

Common request options and responses for all transcription operations

Transcribe and Present a Polyglot Session

Create a session that can be used to broadcast a live transcription via a public shareable link

Transcribe from a Microphone or LiveStream

Browse Documentation
Transcribe live speech from a microphone or live stream. Integrate with Polyglot to create a public shareable link for the transcription which users can read in any language.
To transcribe from a microphone we must continually send audio data stream packets to the VocalStack API.
JavaScript
import { LiveTranscription } from '@vocalstack/js-sdk'; const sdk = new LiveTranscription({ apiKey: 'YOUR-API-KEY' }); const stream = await sdk.connect({ // Optional: Integrate this stream with a Polyglot session polyglot_id: 'YOUR-POLYGLOT-SESSION-ID', // Optional: language of the speech spoken // (this can be used to improve the transcription accuracy) language: 'en', // Optional: Translate the transcription to these languages translations: ['de'], // Optional: Stop the stream after this many seconds of inactivity timeout_period_s: 60, // Optional: Hard stop the stream after this many seconds max_duration_s: 300, }); // Start the stream stream.start(); // Get audio data from a microphone and send it to the stream // stream.sendBuffer(buffer); // *** This is a placeholder for the actual implementation *** // Manually stop the stream (in this example, after 60 seconds) // If max_duration_s is set, stopping the stream is optional setTimeout(() => stream.stop(), 60000); // Listen for stream transcription data stream.onData((response) => { const { status, data } = response; console.log(status); // 'waiting', 'processing', 'done', 'stopping' or 'error' if (data) { console.log(data.timeline); // an object with the transcription timeline } if (status === 'done') { console.log(data.summary); // a summary of the transcription console.log(data.keywords); // an array of keywords console.log(data.paragraphs); // the entire transcription in paragraph form } });
Getting the audio stream data will differ based on the environment where you want to execute the transcription operation. Here are a couple of examples for how you might do this:
In NextJS you should install a package that can fetch audio data from your device, which you can then forward to the VocalStack API. Here is an example:
JavaScript
const mic = require('mic'); // Create a new instance of the microphone utility const micInstance = mic(); // Get the audio input stream const micStream = micInstance.getAudioStream(); // Capture the audio data from the microphone micStream.on('data', (data) => { stream.sendBuffer(data); // send the buffer data to the VocalStack API }); // Start capturing audio from the microphone micInstance.start();
On the web browser you may want to use the MediaRecorder API as can be seen in the following example. (It may also be a good idea to use a package like recordrtc which improves browser compatibility)
JavaScript
// Request access to the microphone const mediaStream = await navigator.mediaDevices.getUserMedia({ audio: true }); // Create a MediaRecorder instance to capture audio data const mediaRecorder = new MediaRecorder(mediaStream); // Event handler to process audio data packets mediaRecorder.ondataavailable = async (event) => { const blob = event.data; // this is the audio packet (Blob) const buffer = await blob.arrayBuffer(); // convert the Blob to a Buffer stream.sendBuffer(buffer); // send the buffer data to the VocalStack API }; // Start capturing audio, and send it to the stream every second mediaRecorder.start(1000);
Note that for access to the VocalStack API on web clients you will need to use an auth token:
Client Side Authentication Tokens
Create a temporary authentication token for client side requests. Safely implement API requests in web browsers without exposing your API keys.
VocalStack API can be used to transcribe any HLS LiveStream URL, including sources such as Youtube Live, Facebook Live, and Twitch. Please note that the stream URL must be a .m3u8 file extension that represents a valid HLS (HTTP Live Streaming) playlist file.
JavaScript
import { LiveTranscription } from '@vocalstack/js-sdk'; const sdk = new LiveTranscription({ apiKey: 'YOUR-API-KEY' }); const stream = await sdk.connect({ // must be a valid HLS streaming protocol livestream_url: 'http://a.files.bbci.co.uk/media/live/manifesto/audio/simulcast/hls/nonuk/sbr_low/ak/bbc_world_service.m3u8', // The rest of these options are the same as for microphone live transcriptons }); stream.start(); stream.onData((response) => { // The response object is the same as the one // returned by microphone transcriptions });
Integrating live transcriptions with Polyglot is as simple as adding a polyglot_id option to the transcription request, as demonstrated in the examples above.
Polyglot creates a public shareable link associated with your transcriptions (the link can be password protected):
  • Users can read your transcription in real time using the link.
  • Users can choose the language in which to read the transcription in real time.
  • Users can read your transcription at a later time, and all other transcriptions integrated with your particular Polyglot session.
You are welcome to use the VocalStack API and implement your own white labelled UI instead of using the one provided by VocalStack. We would love to hear about it if you do, so we can learn about how to make our product better!
Learn more about how Polyglot works at vocalstack.com/polyglot.
Transcribe and Present a Polyglot Session
Create a session that can be used to broadcast a live transcription via a public shareable link. Users can read live transcriptions in their preferred language, and even past transcriptions when your session is inactive.
Get Transcription Data
Get data from pending or completed transcriptions. This includes the transcription timeline, key words, summary, and paragraph segments.
Client Side Authentication Tokens
Create a temporary authentication token for client side requests. Safely implement API requests in web browsers without exposing your API keys.
Scroll Up