Inspired by the following exchange on Twitter, in which someone captures and posts a valuable video onto Twitter, but doesn't have the resources to easily transcribe it for the hearing-impaired, I thought it'd be fun to try out Amazon's AWS Transcribe service to help with this problem, and to see if I could do it all from the bash command-line like a Unix dork.
The instructions and code below show how to use command-line tools/scripting and Amazon's Transcribe service to transcribe the audio from online video. tl;dr: AWS Transcribe is a pretty amazing service!
tl:dr AWS Transcribe is surprisingly accurate and efficient. It took about 2 minutes for it to process a 57-second clip at a cost of less than 2.5 cents.
See the transcribed text here, and the full prettified JSON response here.
Sign-up for Amazon Web Services: https://aws.amazon.com
Install:
- youtube-dl - for fetching video files from social media services
- awscli - for accessing various AWS services, specfically S3 (for storing the video and its processed transcription) and Transcribe
- curl - for downloading from URLs
- jq - for parsing JSON data
- ffmpeg - for media file conversion, e.g. extracting mp3 audio from video
The best way to learn-and-use the command-line is to practice the UNIX philosophy of do one thing and do it well, which requires breaking the process down into individual steps:
- Find a tweet containing a video you like
- Get that tweet's URL, e.g. https://twitter.com/JordanUhl/status/1085669288051175424
- Use youtube-dl to download the video from that tweet and save it to disk, e.g.
cardib.mp4 - Because AWS Transcribe requires we send it an audio file, use ffmpeg to extract the audio from
cardib.mp4and save it tocardib.mp3 - Because AWS Transcribe only works on audio files stored on AWS S3, we use
awsclito uploadcardiob.mp3to an online S3 bucket, eg. http://data.danwin.com/tmp/cardib.mp3 - Use
awsclito access the AWS Transcribe API and start a transcription job - Wait a couple of minutes, and/or use awscli to occassionally get the details of the transcription job to see if it's finished (sample response JSON from the get-transcription-job endpoint)
- Use curl to download the transcript data from the expected URL, e.g. http://data.danwin.com/cardib-shutdown.json (see pretty preview here)
- Use jq to process the transcript data and extract the
transcriptvalue, which contains the transcription text as a single string.
So obviously you should not do this as a big ol bash script (or even bash/CLI at all). But I wrote this example up for a talk on how you can learn the CLI by messing around for fun, and this is an elaborate example of the pain you can put yourself through. Maybe later I'll show how to approach it as a novice but this is what it looks like if you're trying to not care too much, but also not wanting it to be too painful:
# Fetch that video and save it to the working directory
# as `cardib-shutdown.mp4`
youtube-dl --output cardib-shutdown.mp4 \
https://twitter.com/JordanUhl/status/1085669288051175424
# extract the audio as a mp3 file
ffmpeg -i cardib-shutdown.mp4 \
-acodec libmp3lame cardib-shutdown.mp3
# upload the mp3 file to a S3 bucket
# (and optionally make it publicly readable)
aws s3 cp --acl public-read \
cardib-shutdown.mp3 s3://data.danwin.com/tmp/cardib-shutdown.mp3
# Start the transcription job and specify that the transcription result data
# be saved to a given bucket, e.g. data.danwin.com
aws transcribe start-transcription-job \
--language-code 'en-US' \
--media-format 'mp3' \
--transcription-job-name 'cardib-shutdown' \
--media '{"MediaFileUri": "s3://data.danwin.com/tmp/cardib-shutdown.mp3"}' \
--output-bucket-name 'data.danwin.com'
# optionally: use this to check the status of the job before attempting
# to download the transcript
aws transcribe get-transcription-job \
--transcription-job-name cardib-shutdown
# Download the JSON at the expected S3 URL, parse it with jq
# and spit it out as raw text
curl -s http://data.danwin.com/cardib-shutdown.json \
| jq '.results.transcripts[0].transcript' --raw-outputHere's what Cardi B said, according to AWS Transcribe, which you can read along with the audio or the original tweet video:
(I've added some paragraph breaks for easier reading)
Hey. Yeah. I just want to remind you because there's been a little bit over three weeks, okay? It's been a little bit over three weeks. Trump is now ordering as his some missing federal government workers to go back to work without getting paid. Now, I don't want to hear your mother focus talking about all but Obama Shut down the government for seventeen days year bitch for health care. So your grandma could check her blood pressure and your business to go take a piss in the gynecologist with no motherfucking problem. Now, I know a lot of guys don't care because I don't work for the government or your partner. They have a job, but this shit is really fucking serious, bro. This city is crazy. Like a country is in a hell hole right now. All for fucking war. And we really need to take this serious. I feel that we need to take some action. I don't know what type of actual base because it is not what I do, but I'm scared. This is crazy. And I really feel bad for these people. They got to go to fucking work, to not get motherfucking paid.
The verdict? Not bad! You can see the word-by-word confidence in the full transcript JSON, but I'm impressed with the simple text output, which contains capitalization of proper nouns (e.g. "Obama") and guesses at where sentences begin, nevermind pretty good understading of Cardi B's Bronx accent.
How much did it cost? AWS Transcribe charges $0.0004 per second. This clip was 57 seconds. Not counting the S3 upload/stroage fee, the price for transcription comes out to about 2.3 cents

Wow, I really owe you a coffee sometime. I feel like I find myself in your footsteps all the time. Thanks for posting this!