Creating Transcripts For Videos

Prosper Otemuyiwa
💬 comments

Legend has it that several years ago the inhabitants of the earth all spoke one language. They all communicated easily and there was peace in the land. Suddenly, they came together to build a tower to speak face-to-face with their maker. Their maker laughed hysterically and simply did one thing––disrupted their language. These inhabitants started speaking in many different languages, they couldn’t understand each other. Confusion was spread like fire over the face of the earth. And the plan and tower had to be abandoned. What a shame!

Wait a minute. What if there was a way for these earthly beings to record their conversations and have it translated for other people to understand? What if they had the ability to create transcripts for their videos? Perhaps, the mighty edifice would have been completed successfully. In this article, I’ll show how to create transcripts for your videos using Cloudinary. Check out the repo to get the code.

What’s Cloudinary?

Cloudinary is a cloud-based, end-to-end media management solution. As a critical part of the developer stack, Cloudinary automates and streamlines your entire media asset workflow. It handles a variety of media, from images, video, audio to emerging media types. It also automates every stage of the media management lifecycle, including media selection, upload, analysis and administration, manipulation optimization and delivery.

Uploading Videos

Uploading videos with Cloudinary is as simple as using the upload widget

Cloudinary Upload Widget Cloudinary Upload Widget

Performing server-side uploads is also as simple as:

        { resource_type: "video" },
        function(result) {console.log(result); });

Creating a Transcript For Videos

Cloudinary has a new add-on offering support for video transcription using Google speech. Quickly create a new account with Cloudinary if you don’t have any.

Step 1

Subscribe to the Google speech internal add-on plan. Currently, transcription only supports English audio. And 1 add-on unit is equivalent to 15 video seconds.

Step 2

Set up a Node server. Initialize a package.json file:

 npm init

Install the following modules:

 npm install express multer cloudinary cors body-parser --save

express: We need this module for our API routes multer: Needed for parsing http requests with content-type multipart/form-data cloudinary: Node SDK for Cloudinary body-parser: Needed for attaching the request body on express’s req object cors: Needed for enabling CORS

Step 3

Create a server.js file in your root directory. Require the dependencies we installed:

const express = require('express');
const app = express();
const cloudinary = require('cloudinary');
const cors = require('cors');
const bodyParser = require('body-parser');
const multer = require('multer');
const multerMiddleware = multer({ dest: 'video/' });

// increase upload limit to 50mb
app.use(bodyParser.json({limit: "50mb"}));
app.use(bodyParser.urlencoded({limit: "50mb", extended: true, parameterLimit:50000}));

    cloud_name: 'xxxxxxxxxxx',
    api_key: 'xxxxxxxxxxx',
    api_secret: 'xxxxxxxxxxxxx'
});'/upload', multerMiddleware.single('video'), function(req, res) {
  console.log("Request", req.file.path);
  // Upload to Cloudinary
      raw_convert: "google_speech",
      resource_type: "video",
      notification_url: ''
    function(error, result) {
      if(error) {
        console.log("Error ", error);
        res.json({ error: error });
      } else {
        console.log("Result ", result);
        res.json({ result: result });

console.log('Listening on localhost:3333');

Make sure your server is running:

nodemon server.js

Once a user makes a POST request to the /upload route, the route grabs the video file from the HTTP request, uploads to Cloudinary, and makes a request to Google Speech to extract the text from the voice in the video recording.

The notification_url is an HTTP URL to notify your application (a webhook) when the file has completed uploading. In this code demo, I set up a webhook quickly with the ever-efficient requestbin.

Immediately the video is done uploading, a notification is sent to the webhook.

Inspecting the response sent from Cloudinary to the Webhook

In the image above, you can see the response states that the transcript creation status is pending. Check out the full response.

"info": {
  "raw_convert": {
      "google_speech": {
          "status": "pending"

Extract from the full response

Another notification is sent to the webhook once the transcript has been fully extracted from the video recording.

Inspecting the response sent from Cloudinary to the Webhook

Check out the full response below:

  "info_kind": "google_speech",
  "info_status": "complete",
  "public_id": "tb5lrftmeurqfmhqvf6h",
  "uploaded_at": "2017-11-23T15:06:55Z",
  "version": 1511449614,
  "url": "",
  "secure_url": "",
  "etag": "47d8aad801c4d7464ddf601f71ebddc7",
  "notification_type": "info"

Now, the transcript has been created. Next, attach it to the l_subtitles transformation property.

Step 4

The transcript was created in step 3. It’s very important that you know that the transcript value is {public_id}.transcript. This is what I mean––the public_id of the video I uploaded is tb5lrftmeurqfmhqvf6h. Therefore, the transcript will be tb5lrftmeurqfmhqvf6h.transcript.

All you need to do now is add it as an overlay to the video URL with the l_subtitles parameter like so:


Finally, the URL of the video with the transcript enabled will be:

Video without transcript

Video with transcript enabled

Note: Cloudinary now supports converting video audio from stereo to mono using the fl_mono transformation.

Check this out:

Here, we used the Cloudinary Node.js library. The raw_convert parameter also works with other SDKs––PHP, Ruby, Java, etc

Styling Subtitles

Subtitles are displayed using the default white color and centered alignment. You can use transformation parameters to adjust the color, font, font size, and position. For example, let’s change the color of our example subtitle to green. I’ll also change its alignment.

Transformation––change color to green, change position to south_west. _cogreen, _g_southwest


The inhabitants of the earth now have a solution to their initial problem. You are the godsent programmer, problem solver and solution architect. Go, create transcripts for your videos. Programmatically creating video transcripts and applying them to the videos has never been this easy. Check out Cloudinary’s video solution for more insights on automating your video management workflow.

Worthy of note is that 1 add-on unit is equivalent to 15 seconds. If you need more, contact the Cloudinary support team.

This content is sponsored via Syndicate Ads

Prosper Otemuyiwa

9 posts

Food Ninja. Code Slinger and Self-Acclaimed Developer Evangelist