Inspired by the following exchange on Twitter, in which someone captures and posts a valuable video onto Twitter, but doesn't have the resources to easily transcribe it for the hearing-impaired:
The instructions and code below show how to use command-line tools/scripting and Amazon's Transcribe service to transcribe the audio from online video. tl;dr: AWS Transcribe is a pretty amazing service!
Sign-up for Amazon Web Services: https://aws.amazon.com
Install:
- youtube-dl - for fetching video files from social media services
- awscli - for accessing various AWS services, specfically S3 (for storing the video and its processed transcription) and Transcribe
- curl - for downloading from URLs
- jq - for parsing JSON data
- ffmpeg - for media file conversion, e.g. extracting mp3 audio from video
- Find a tweet containing a video you like
- Get that tweet's URL, e.g. https://twitter.com/JordanUhl/status/1085669288051175424
- Use youtube-dl to download the video from that tweet and save it to disk, e.g.
cardib.mp4 - Because AWS Transcribe requires we send it an audio file, use ffmpeg to extract the audio from
cardib.mp4and save it tocardib.mp3 - Because AWS Transcribe only works on audio files stored on AWS S3, we use
awsclito uploadcardiob.mp3to an online S3 bucket, eg. http://data.danwin.com/tmp/cardib.mp3 - Use
awsclito access the AWS Transcribe API and start a transcription job - Wait a couple of minutes, and/or use awscli to occassionally get the details of the transcription job to see if it's finished (sample response JSON from the get-transcription-job endpoint)
- Use curl to download the transcript data from the expected URL, e.g. http://data.danwin.com/cardib-shutdown.json (see pretty preview here)
- Use jq to process the transcript data and extract the
transcriptvalue, which contains the transcription text as a single string
So obviously you should not do this as a big ol bash script (or even bash/CLI at all). But I wrote this example up for a talk on how you can learn the CLI by messing around for fun, and this is an elaborate example of the pain you can put yourself through.
# Fetch that video and save it to the working directory
# as `cardib-shutdown.mp4`
youtube-dl --output cardib-shutdown.mp4 \
https://twitter.com/JordanUhl/status/1085669288051175424
# extract the audio as a mp3 file
ffmpeg -i cardib-shutdown.mp4 \
-acodec libmp3lame cardib-shutdown.mp3
# upload the mp3 file to a S3 bucket
# (and optionally make it publicly readable)
aws s3 cp --acl public-read \
cardib-shutdown.mp3 s3://data.danwin.com/tmp/cardib-shutdown.mp3
# Start the transcription job and specify that the transcription result data
# be saved to a given bucket, e.g. data.danwin.com
aws transcribe start-transcription-job \
--language-code 'en-US' \
--media-format 'mp3' \
--transcription-job-name 'cardib-shutdown' \
--media '{"MediaFileUri": "s3://data.danwin.com/tmp/cardib-shutdown.mp3"}' \
--output-bucket-name 'data.danwin.com'
# optionally: use this to check the status of the job before attempting
# to download the transcript
aws transcribe get-transcription-job \
--transcription-job-name cardib-shutdown
# Download the JSON at the expected S3 URL, parse it with jq
# and spit it out as raw text
curl -s http://data.danwin.com/cardib-shutdown.json \
| jq '.results.transcripts[0].transcript' --raw-output
Wow, I really owe you a coffee sometime. I feel like I find myself in your footsteps all the time. Thanks for posting this!