Skip to content

Instantly share code, notes, and snippets.

@dannguyen
Last active October 18, 2025 15:51
Show Gist options
  • Select an option

  • Save dannguyen/9b8c51f5bb853209f19f1a0f18f0f74c to your computer and use it in GitHub Desktop.

Select an option

Save dannguyen/9b8c51f5bb853209f19f1a0f18f0f74c to your computer and use it in GitHub Desktop.
An example of how to use command-line tools to transcribe a viral video of Cardi B

Transcribing Cardi B's political speech with command-line tools

Inspired by the following exchange on Twitter, in which someone captures and posts a valuable video onto Twitter, but doesn't have the resources to easily transcribe it for the hearing-impaired:

Screencap of @jordanuhl's video tweet, followed by a request for a transcript

The instructions and code below show how to use command-line tools/scripting and Amazon's Transcribe service to transcribe the audio from online video. tl;dr: AWS Transcribe is a pretty amazing service!

Requirements

Sign-up for Amazon Web Services: https://aws.amazon.com

Install:

  • youtube-dl - for fetching video files from social media services
  • awscli - for accessing various AWS services, specfically S3 (for storing the video and its processed transcription) and Transcribe
  • curl - for downloading from URLs
  • jq - for parsing JSON data
  • ffmpeg - for media file conversion, e.g. extracting mp3 audio from video

The steps

The script

So obviously you should not do this as a big ol bash script (or even bash/CLI at all). But I wrote this example up for a talk on how you can learn the CLI by messing around for fun, and this is an elaborate example of the pain you can put yourself through.

# Fetch that video and save it to the working directory 
# as `cardib-shutdown.mp4`
youtube-dl --output cardib-shutdown.mp4 \
    https://twitter.com/JordanUhl/status/1085669288051175424

# extract the audio as a mp3 file
ffmpeg -i cardib-shutdown.mp4 \
    -acodec libmp3lame cardib-shutdown.mp3

# upload the mp3 file to a S3 bucket 
# (and optionally make it publicly readable)
aws s3 cp --acl public-read \
    cardib-shutdown.mp3 s3://data.danwin.com/tmp/cardib-shutdown.mp3 

# Start the transcription job and specify that the transcription result data
# be saved to a given bucket, e.g. data.danwin.com
aws transcribe start-transcription-job \
    --language-code 'en-US' \
    --media-format 'mp3' \
    --transcription-job-name 'cardib-shutdown' \
    --media '{"MediaFileUri": "s3://data.danwin.com/tmp/cardib-shutdown.mp3"}' \
    --output-bucket-name 'data.danwin.com'

# optionally: use this to check the status of the job before attempting
# to download the transcript
aws transcribe get-transcription-job  \
        --transcription-job-name cardib-shutdown 

# Download the JSON at the expected S3 URL, parse it with jq
# and spit it out as raw text
curl -s http://data.danwin.com/cardib-shutdown.json \
    | jq '.results.transcripts[0].transcript' --raw-output
{
"TranscriptionJob": {
"TranscriptionJobName": "cardib-shutdown",
"TranscriptionJobStatus": "COMPLETED",
"LanguageCode": "en-US",
"MediaSampleRateHertz": 44100,
"MediaFormat": "mp3",
"Media": {
"MediaFileUri": "s3://data.danwin.com/tmp/cardib-shutdown.mp3"
},
"Transcript": {
"TranscriptFileUri": "https://s3.amazonaws.com/data.danwin.com/cardib-shutdown.json"
},
"CreationTime": 1547795428.734,
"CompletionTime": 1547795570.152,
"Settings": {
"ChannelIdentification": false
}
}
}
{
"TranscriptionJob": {
"TranscriptionJobName": "cardib-shutdown",
"TranscriptionJobStatus": "IN_PROGRESS",
"LanguageCode": "en-US",
"MediaFormat": "mp3",
"Media": {
"MediaFileUri": "s3://data.danwin.com/tmp/cardib-shutdown.mp3"
},
"CreationTime": 1547795428.734
}
}
{
"jobName": "cardib-shutdown",
"accountId": "263510883111",
"results": {
"transcripts": [
{
"transcript": "Hey. Yeah. I just want to remind you because there's been a little bit over three weeks, okay? It's been a little bit over three weeks. Trump is now ordering as his some missing federal government workers to go back to work without getting paid. Now, I don't want to hear your mother focus talking about all but Obama Shut down the government for seventeen days year bitch for health care. So your grandma could check her blood pressure and your business to go take a piss in the gynecologist with no motherfucking problem. Now, I know a lot of guys don't care because I don't work for the government or your partner. They have a job, but this shit is really fucking serious, bro. This city is crazy. Like a country is in a hell hole right now. All for fucking war. And we really need to take this serious. I feel that we need to take some action. I don't know what type of actual base because it is not what I do, but I'm scared. This is crazy. And I really feel bad for these people. They got to go to fucking work, to not get motherfucking paid."
}
],
"items": [
{
"start_time": "0.09",
"end_time": "0.38",
"alternatives": [
{
"confidence": "0.9769",
"content": "Hey"
}
],
"type": "pronunciation"
},
{
"alternatives": [
{
"confidence": null,
"content": "."
}
],
"type": "punctuation"
},
{
"start_time": "0.38",
"end_time": "0.77",
"alternatives": [
{
"confidence": "1.0000",
"content": "Yeah"
}
],
"type": "pronunciation"
},
{
"alternatives": [
{
"confidence": null,
"content": "."
}
],
"type": "punctuation"
},
{
"start_time": "0.78",
"end_time": "0.89",
"alternatives": [
{
"confidence": "1.0000",
"content": "I"
}
],
"type": "pronunciation"
},
{
"start_time": "0.89",
"end_time": "1.09",
"alternatives": [
{
"confidence": "1.0000",
"content": "just"
}
],
"type": "pronunciation"
},
{
"start_time": "1.09",
"end_time": "1.24",
"alternatives": [
{
"confidence": "0.4934",
"content": "want"
}
],
"type": "pronunciation"
},
{
"start_time": "1.24",
"end_time": "1.3",
"alternatives": [
{
"confidence": "0.9752",
"content": "to"
}
],
"type": "pronunciation"
},
{
"start_time": "1.31",
"end_time": "1.74",
"alternatives": [
{
"confidence": "1.0000",
"content": "remind"
}
],
"type": "pronunciation"
},
{
"start_time": "1.74",
"end_time": "1.82",
"alternatives": [
{
"confidence": "0.9990",
"content": "you"
}
],
"type": "pronunciation"
},
{
"start_time": "1.83",
"end_time": "2.2",
"alternatives": [
{
"confidence": "1.0000",
"content": "because"
}
],
"type": "pronunciation"
},
{
"start_time": "2.2",
"end_time": "2.33",
"alternatives": [
{
"confidence": "0.9905",
"content": "there's"
}
],
"type": "pronunciation"
},
{
"start_time": "2.33",
"end_time": "2.52",
"alternatives": [
{
"confidence": "1.0000",
"content": "been"
}
],
"type": "pronunciation"
},
{
"start_time": "2.52",
"end_time": "2.59",
"alternatives": [
{
"confidence": "1.0000",
"content": "a"
}
],
"type": "pronunciation"
},
{
"start_time": "2.59",
"end_time": "2.81",
"alternatives": [
{
"confidence": "1.0000",
"content": "little"
}
],
"type": "pronunciation"
},
{
"start_time": "2.81",
"end_time": "2.97",
"alternatives": [
{
"confidence": "1.0000",
"content": "bit"
}
],
"type": "pronunciation"
},
{
"start_time": "2.97",
"end_time": "3.16",
"alternatives": [
{
"confidence": "1.0000",
"content": "over"
}
],
"type": "pronunciation"
},
{
"start_time": "3.16",
"end_time": "3.44",
"alternatives": [
{
"confidence": "1.0000",
"content": "three"
}
],
"type": "pronunciation"
},
{
"start_time": "3.44",
"end_time": "4.11",
"alternatives": [
{
"confidence": "1.0000",
"content": "weeks"
}
],
"type": "pronunciation"
},
{
"alternatives": [
{
"confidence": null,
"content": ","
}
],
"type": "punctuation"
},
{
"start_time": "4.18",
"end_time": "4.72",
"alternatives": [
{
"confidence": "0.5594",
"content": "okay"
}
],
"type": "pronunciation"
},
{
"alternatives": [
{
"confidence": null,
"content": "?"
}
],
"type": "punctuation"
},
{
"start_time": "4.73",
"end_time": "4.91",
"alternatives": [
{
"confidence": "0.9994",
"content": "It's"
}
],
"type": "pronunciation"
},
{
"start_time": "4.91",
"end_time": "5.06",
"alternatives": [
{
"confidence": "1.0000",
"content": "been"
}
],
"type": "pronunciation"
},
{
"start_time": "5.06",
"end_time": "5.13",
"alternatives": [
{
"confidence": "1.0000",
"content": "a"
}
],
"type": "pronunciation"
},
{
"start_time": "5.13",
"end_time": "5.32",
"alternatives": [
{
"confidence": "1.0000",
"content": "little"
}
],
"type": "pronunciation"
},
{
"start_time": "5.32",
"end_time": "5.43",
"alternatives": [
{
"confidence": "0.5651",
"content": "bit"
}
],
"type": "pronunciation"
},
{
"start_time": "5.43",
"end_time": "5.59",
"alternatives": [
{
"confidence": "0.9845",
"content": "over"
}
],
"type": "pronunciation"
},
{
"start_time": "5.6",
"end_time": "5.82",
"alternatives": [
{
"confidence": "1.0000",
"content": "three"
}
],
"type": "pronunciation"
},
{
"start_time": "5.82",
"end_time": "6.15",
"alternatives": [
{
"confidence": "1.0000",
"content": "weeks"
}
],
"type": "pronunciation"
},
{
"alternatives": [
{
"confidence": null,
"content": "."
}
],
"type": "punctuation"
},
{
"start_time": "6.54",
"end_time": "7.15",
"alternatives": [
{
"confidence": "0.9779",
"content": "Trump"
}
],
"type": "pronunciation"
},
{
"start_time": "7.64",
"end_time": "7.86",
"alternatives": [
{
"confidence": "0.7704",
"content": "is"
}
],
"type": "pronunciation"
},
{
"start_time": "7.87",
"end_time": "8.3",
"alternatives": [
{
"confidence": "1.0000",
"content": "now"
}
],
"type": "pronunciation"
},
{
"start_time": "8.31",
"end_time": "8.94",
"alternatives": [
{
"confidence": "1.0000",
"content": "ordering"
}
],
"type": "pronunciation"
},
{
"start_time": "8.94",
"end_time": "9.11",
"alternatives": [
{
"confidence": "1.0000",
"content": "as"
}
],
"type": "pronunciation"
},
{
"start_time": "9.11",
"end_time": "9.28",
"alternatives": [
{
"confidence": "0.6055",
"content": "his"
}
],
"type": "pronunciation"
},
{
"start_time": "9.28",
"end_time": "9.54",
"alternatives": [
{
"confidence": "0.9152",
"content": "some"
}
],
"type": "pronunciation"
},
{
"start_time": "9.54",
"end_time": "10.1",
"alternatives": [
{
"confidence": "0.9541",
"content": "missing"
}
],
"type": "pronunciation"
},
{
"start_time": "10.43",
"end_time": "10.86",
"alternatives": [
{
"confidence": "1.0000",
"content": "federal"
}
],
"type": "pronunciation"
},
{
"start_time": "10.86",
"end_time": "11.37",
"alternatives": [
{
"confidence": "0.6420",
"content": "government"
}
],
"type": "pronunciation"
},
{
"start_time": "11.37",
"end_time": "11.72",
"alternatives": [
{
"confidence": "0.9982",
"content": "workers"
}
],
"type": "pronunciation"
},
{
"start_time": "11.72",
"end_time": "11.82",
"alternatives": [
{
"confidence": "0.9979",
"content": "to"
}
],
"type": "pronunciation"
},
{
"start_time": "11.82",
"end_time": "12.08",
"alternatives": [
{
"confidence": "1.0000",
"content": "go"
}
],
"type": "pronunciation"
},
{
"start_time": "12.08",
"end_time": "12.39",
"alternatives": [
{
"confidence": "0.9311",
"content": "back"
}
],
"type": "pronunciation"
},
{
"start_time": "12.39",
"end_time": "12.53",
"alternatives": [
{
"confidence": "0.9311",
"content": "to"
}
],
"type": "pronunciation"
},
{
"start_time": "12.53",
"end_time": "13.05",
"alternatives": [
{
"confidence": "1.0000",
"content": "work"
}
],
"type": "pronunciation"
},
{
"start_time": "13.73",
"end_time": "14.24",
"alternatives": [
{
"confidence": "0.9964",
"content": "without"
}
],
"type": "pronunciation"
},
{
"start_time": "14.24",
"end_time": "14.52",
"alternatives": [
{
"confidence": "1.0000",
"content": "getting"
}
],
"type": "pronunciation"
},
{
"start_time": "14.52",
"end_time": "14.95",
"alternatives": [
{
"confidence": "0.5376",
"content": "paid"
}
],
"type": "pronunciation"
},
{
"alternatives": [
{
"confidence": null,
"content": "."
}
],
"type": "punctuation"
},
{
"start_time": "15.14",
"end_time": "15.64",
"alternatives": [
{
"confidence": "1.0000",
"content": "Now"
}
],
"type": "pronunciation"
},
{
"alternatives": [
{
"confidence": null,
"content": ","
}
],
"type": "punctuation"
},
{
"start_time": "15.65",
"end_time": "15.77",
"alternatives": [
{
"confidence": "0.9997",
"content": "I"
}
],
"type": "pronunciation"
},
{
"start_time": "15.77",
"end_time": "15.91",
"alternatives": [
{
"confidence": "0.9940",
"content": "don't"
}
],
"type": "pronunciation"
},
{
"start_time": "15.91",
"end_time": "16.04",
"alternatives": [
{
"confidence": "0.9951",
"content": "want"
}
],
"type": "pronunciation"
},
{
"start_time": "16.04",
"end_time": "16.1",
"alternatives": [
{
"confidence": "0.9992",
"content": "to"
}
],
"type": "pronunciation"
},
{
"start_time": "16.1",
"end_time": "16.21",
"alternatives": [
{
"confidence": "0.9996",
"content": "hear"
}
],
"type": "pronunciation"
},
{
"start_time": "16.21",
"end_time": "16.37",
"alternatives": [
{
"confidence": "0.6916",
"content": "your"
}
],
"type": "pronunciation"
},
{
"start_time": "16.37",
"end_time": "16.56",
"alternatives": [
{
"confidence": "0.3523",
"content": "mother"
}
],
"type": "pronunciation"
},
{
"start_time": "16.56",
"end_time": "16.84",
"alternatives": [
{
"confidence": "0.3799",
"content": "focus"
}
],
"type": "pronunciation"
},
{
"start_time": "16.84",
"end_time": "17.08",
"alternatives": [
{
"confidence": "0.9541",
"content": "talking"
}
],
"type": "pronunciation"
},
{
"start_time": "17.08",
"end_time": "17.25",
"alternatives": [
{
"confidence": "0.9243",
"content": "about"
}
],
"type": "pronunciation"
},
{
"start_time": "17.26",
"end_time": "17.54",
"alternatives": [
{
"confidence": "1.0000",
"content": "all"
}
],
"type": "pronunciation"
},
{
"start_time": "17.54",
"end_time": "17.71",
"alternatives": [
{
"confidence": "0.9833",
"content": "but"
}
],
"type": "pronunciation"
},
{
"start_time": "17.71",
"end_time": "18.13",
"alternatives": [
{
"confidence": "0.9066",
"content": "Obama"
}
],
"type": "pronunciation"
},
{
"start_time": "18.13",
"end_time": "18.34",
"alternatives": [
{
"confidence": "0.9986",
"content": "Shut"
}
],
"type": "pronunciation"
},
{
"start_time": "18.34",
"end_time": "18.51",
"alternatives": [
{
"confidence": "0.9986",
"content": "down"
}
],
"type": "pronunciation"
},
{
"start_time": "18.51",
"end_time": "18.6",
"alternatives": [
{
"confidence": "1.0000",
"content": "the"
}
],
"type": "pronunciation"
},
{
"start_time": "18.6",
"end_time": "19.0",
"alternatives": [
{
"confidence": "0.8789",
"content": "government"
}
],
"type": "pronunciation"
},
{
"start_time": "19.0",
"end_time": "19.08",
"alternatives": [
{
"confidence": "1.0000",
"content": "for"
}
],
"type": "pronunciation"
},
{
"start_time": "19.08",
"end_time": "19.46",
"alternatives": [
{
"confidence": "1.0000",
"content": "seventeen"
}
],
"type": "pronunciation"
},
{
"start_time": "19.46",
"end_time": "19.73",
"alternatives": [
{
"confidence": "1.0000",
"content": "days"
}
],
"type": "pronunciation"
},
{
"start_time": "19.87",
"end_time": "20.36",
"alternatives": [
{
"confidence": "0.6234",
"content": "year"
}
],
"type": "pronunciation"
},
{
"start_time": "20.36",
"end_time": "20.92",
"alternatives": [
{
"confidence": "1.0000",
"content": "bitch"
}
],
"type": "pronunciation"
},
{
"start_time": "21.18",
"end_time": "21.42",
"alternatives": [
{
"confidence": "1.0000",
"content": "for"
}
],
"type": "pronunciation"
},
{
"start_time": "21.42",
"end_time": "21.75",
"alternatives": [
{
"confidence": "0.7029",
"content": "health"
}
],
"type": "pronunciation"
},
{
"start_time": "21.76",
"end_time": "22.25",
"alternatives": [
{
"confidence": "0.7029",
"content": "care"
}
],
"type": "pronunciation"
},
{
"alternatives": [
{
"confidence": null,
"content": "."
}
],
"type": "punctuation"
},
{
"start_time": "22.32",
"end_time": "22.53",
"alternatives": [
{
"confidence": "0.9736",
"content": "So"
}
],
"type": "pronunciation"
},
{
"start_time": "22.53",
"end_time": "22.66",
"alternatives": [
{
"confidence": "0.8489",
"content": "your"
}
],
"type": "pronunciation"
},
{
"start_time": "22.66",
"end_time": "23.06",
"alternatives": [
{
"confidence": "0.7744",
"content": "grandma"
}
],
"type": "pronunciation"
},
{
"start_time": "23.06",
"end_time": "23.25",
"alternatives": [
{
"confidence": "0.9692",
"content": "could"
}
],
"type": "pronunciation"
},
{
"start_time": "23.25",
"end_time": "23.47",
"alternatives": [
{
"confidence": "0.9988",
"content": "check"
}
],
"type": "pronunciation"
},
{
"start_time": "23.47",
"end_time": "23.55",
"alternatives": [
{
"confidence": "0.7296",
"content": "her"
}
],
"type": "pronunciation"
},
{
"start_time": "23.55",
"end_time": "23.89",
"alternatives": [
{
"confidence": "1.0000",
"content": "blood"
}
],
"type": "pronunciation"
},
{
"start_time": "23.89",
"end_time": "24.51",
"alternatives": [
{
"confidence": "1.0000",
"content": "pressure"
}
],
"type": "pronunciation"
},
{
"start_time": "24.67",
"end_time": "24.79",
"alternatives": [
{
"confidence": "0.6264",
"content": "and"
}
],
"type": "pronunciation"
},
{
"start_time": "24.79",
"end_time": "24.92",
"alternatives": [
{
"confidence": "0.6246",
"content": "your"
}
],
"type": "pronunciation"
},
{
"start_time": "24.92",
"end_time": "25.2",
"alternatives": [
{
"confidence": "0.9947",
"content": "business"
}
],
"type": "pronunciation"
},
{
"start_time": "25.2",
"end_time": "25.31",
"alternatives": [
{
"confidence": "0.8231",
"content": "to"
}
],
"type": "pronunciation"
},
{
"start_time": "25.31",
"end_time": "25.48",
"alternatives": [
{
"confidence": "0.9360",
"content": "go"
}
],
"type": "pronunciation"
},
{
"start_time": "25.48",
"end_time": "25.73",
"alternatives": [
{
"confidence": "0.8723",
"content": "take"
}
],
"type": "pronunciation"
},
{
"start_time": "25.73",
"end_time": "25.85",
"alternatives": [
{
"confidence": "0.8261",
"content": "a"
}
],
"type": "pronunciation"
},
{
"start_time": "25.85",
"end_time": "26.04",
"alternatives": [
{
"confidence": "0.7526",
"content": "piss"
}
],
"type": "pronunciation"
},
{
"start_time": "26.04",
"end_time": "26.15",
"alternatives": [
{
"confidence": "0.9690",
"content": "in"
}
],
"type": "pronunciation"
},
{
"start_time": "26.15",
"end_time": "26.23",
"alternatives": [
{
"confidence": "0.9797",
"content": "the"
}
],
"type": "pronunciation"
},
{
"start_time": "26.23",
"end_time": "26.85",
"alternatives": [
{
"confidence": "0.7954",
"content": "gynecologist"
}
],
"type": "pronunciation"
},
{
"start_time": "26.85",
"end_time": "26.94",
"alternatives": [
{
"confidence": "0.4121",
"content": "with"
}
],
"type": "pronunciation"
},
{
"start_time": "26.94",
"end_time": "27.07",
"alternatives": [
{
"confidence": "0.9952",
"content": "no"
}
],
"type": "pronunciation"
},
{
"start_time": "27.07",
"end_time": "27.56",
"alternatives": [
{
"confidence": "0.9913",
"content": "motherfucking"
}
],
"type": "pronunciation"
},
{
"start_time": "27.57",
"end_time": "28.25",
"alternatives": [
{
"confidence": "1.0000",
"content": "problem"
}
],
"type": "pronunciation"
},
{
"alternatives": [
{
"confidence": null,
"content": "."
}
],
"type": "punctuation"
},
{
"start_time": "28.6",
"end_time": "29.21",
"alternatives": [
{
"confidence": "1.0000",
"content": "Now"
}
],
"type": "pronunciation"
},
{
"alternatives": [
{
"confidence": null,
"content": ","
}
],
"type": "punctuation"
},
{
"start_time": "29.22",
"end_time": "29.39",
"alternatives": [
{
"confidence": "1.0000",
"content": "I"
}
],
"type": "pronunciation"
},
{
"start_time": "29.39",
"end_time": "29.6",
"alternatives": [
{
"confidence": "1.0000",
"content": "know"
}
],
"type": "pronunciation"
},
{
"start_time": "29.6",
"end_time": "29.65",
"alternatives": [
{
"confidence": "0.9913",
"content": "a"
}
],
"type": "pronunciation"
},
{
"start_time": "29.65",
"end_time": "29.82",
"alternatives": [
{
"confidence": "0.9913",
"content": "lot"
}
],
"type": "pronunciation"
},
{
"start_time": "29.82",
"end_time": "29.88",
"alternatives": [
{
"confidence": "0.9764",
"content": "of"
}
],
"type": "pronunciation"
},
{
"start_time": "29.88",
"end_time": "30.02",
"alternatives": [
{
"confidence": "0.6201",
"content": "guys"
}
],
"type": "pronunciation"
},
{
"start_time": "30.03",
"end_time": "30.25",
"alternatives": [
{
"confidence": "0.9986",
"content": "don't"
}
],
"type": "pronunciation"
},
{
"start_time": "30.25",
"end_time": "30.46",
"alternatives": [
{
"confidence": "1.0000",
"content": "care"
}
],
"type": "pronunciation"
},
{
"start_time": "30.46",
"end_time": "30.8",
"alternatives": [
{
"confidence": "1.0000",
"content": "because"
}
],
"type": "pronunciation"
},
{
"start_time": "30.81",
"end_time": "30.85",
"alternatives": [
{
"confidence": "0.8321",
"content": "I"
}
],
"type": "pronunciation"
},
{
"start_time": "30.85",
"end_time": "30.98",
"alternatives": [
{
"confidence": "0.9692",
"content": "don't"
}
],
"type": "pronunciation"
},
{
"start_time": "30.98",
"end_time": "31.16",
"alternatives": [
{
"confidence": "1.0000",
"content": "work"
}
],
"type": "pronunciation"
},
{
"start_time": "31.16",
"end_time": "31.25",
"alternatives": [
{
"confidence": "0.9996",
"content": "for"
}
],
"type": "pronunciation"
},
{
"start_time": "31.25",
"end_time": "31.35",
"alternatives": [
{
"confidence": "1.0000",
"content": "the"
}
],
"type": "pronunciation"
},
{
"start_time": "31.35",
"end_time": "31.77",
"alternatives": [
{
"confidence": "0.5647",
"content": "government"
}
],
"type": "pronunciation"
},
{
"start_time": "31.77",
"end_time": "31.87",
"alternatives": [
{
"confidence": "0.9998",
"content": "or"
}
],
"type": "pronunciation"
},
{
"start_time": "31.87",
"end_time": "32.03",
"alternatives": [
{
"confidence": "0.9808",
"content": "your"
}
],
"type": "pronunciation"
},
{
"start_time": "32.03",
"end_time": "32.41",
"alternatives": [
{
"confidence": "0.7594",
"content": "partner"
}
],
"type": "pronunciation"
},
{
"alternatives": [
{
"confidence": null,
"content": "."
}
],
"type": "punctuation"
},
{
"start_time": "32.41",
"end_time": "32.5",
"alternatives": [
{
"confidence": "0.8450",
"content": "They"
}
],
"type": "pronunciation"
},
{
"start_time": "32.5",
"end_time": "32.66",
"alternatives": [
{
"confidence": "1.0000",
"content": "have"
}
],
"type": "pronunciation"
},
{
"start_time": "32.66",
"end_time": "32.75",
"alternatives": [
{
"confidence": "1.0000",
"content": "a"
}
],
"type": "pronunciation"
},
{
"start_time": "32.75",
"end_time": "33.15",
"alternatives": [
{
"confidence": "1.0000",
"content": "job"
}
],
"type": "pronunciation"
},
{
"alternatives": [
{
"confidence": null,
"content": ","
}
],
"type": "punctuation"
},
{
"start_time": "33.34",
"end_time": "33.51",
"alternatives": [
{
"confidence": "1.0000",
"content": "but"
}
],
"type": "pronunciation"
},
{
"start_time": "33.51",
"end_time": "33.67",
"alternatives": [
{
"confidence": "1.0000",
"content": "this"
}
],
"type": "pronunciation"
},
{
"start_time": "33.67",
"end_time": "33.82",
"alternatives": [
{
"confidence": "0.8953",
"content": "shit"
}
],
"type": "pronunciation"
},
{
"start_time": "33.82",
"end_time": "33.96",
"alternatives": [
{
"confidence": "0.9805",
"content": "is"
}
],
"type": "pronunciation"
},
{
"start_time": "33.96",
"end_time": "34.19",
"alternatives": [
{
"confidence": "0.8380",
"content": "really"
}
],
"type": "pronunciation"
},
{
"start_time": "34.2",
"end_time": "34.49",
"alternatives": [
{
"confidence": "0.9950",
"content": "fucking"
}
],
"type": "pronunciation"
},
{
"start_time": "34.5",
"end_time": "35.0",
"alternatives": [
{
"confidence": "0.9943",
"content": "serious"
}
],
"type": "pronunciation"
},
{
"alternatives": [
{
"confidence": null,
"content": ","
}
],
"type": "punctuation"
},
{
"start_time": "35.0",
"end_time": "35.35",
"alternatives": [
{
"confidence": "0.9517",
"content": "bro"
}
],
"type": "pronunciation"
},
{
"alternatives": [
{
"confidence": null,
"content": "."
}
],
"type": "punctuation"
},
{
"start_time": "35.69",
"end_time": "35.91",
"alternatives": [
{
"confidence": "1.0000",
"content": "This"
}
],
"type": "pronunciation"
},
{
"start_time": "35.91",
"end_time": "36.11",
"alternatives": [
{
"confidence": "0.5930",
"content": "city"
}
],
"type": "pronunciation"
},
{
"start_time": "36.11",
"end_time": "36.21",
"alternatives": [
{
"confidence": "0.9769",
"content": "is"
}
],
"type": "pronunciation"
},
{
"start_time": "36.21",
"end_time": "36.74",
"alternatives": [
{
"confidence": "1.0000",
"content": "crazy"
}
],
"type": "pronunciation"
},
{
"alternatives": [
{
"confidence": null,
"content": "."
}
],
"type": "punctuation"
},
{
"start_time": "36.75",
"end_time": "37.02",
"alternatives": [
{
"confidence": "1.0000",
"content": "Like"
}
],
"type": "pronunciation"
},
{
"start_time": "37.03",
"end_time": "37.42",
"alternatives": [
{
"confidence": "1.0000",
"content": "a"
}
],
"type": "pronunciation"
},
{
"start_time": "37.43",
"end_time": "38.05",
"alternatives": [
{
"confidence": "1.0000",
"content": "country"
}
],
"type": "pronunciation"
},
{
"start_time": "38.84",
"end_time": "39.04",
"alternatives": [
{
"confidence": "0.8721",
"content": "is"
}
],
"type": "pronunciation"
},
{
"start_time": "39.04",
"end_time": "39.16",
"alternatives": [
{
"confidence": "0.8711",
"content": "in"
}
],
"type": "pronunciation"
},
{
"start_time": "39.16",
"end_time": "39.23",
"alternatives": [
{
"confidence": "0.9954",
"content": "a"
}
],
"type": "pronunciation"
},
{
"start_time": "39.23",
"end_time": "39.41",
"alternatives": [
{
"confidence": "0.8528",
"content": "hell"
}
],
"type": "pronunciation"
},
{
"start_time": "39.41",
"end_time": "39.6",
"alternatives": [
{
"confidence": "0.8528",
"content": "hole"
}
],
"type": "pronunciation"
},
{
"start_time": "39.61",
"end_time": "39.8",
"alternatives": [
{
"confidence": "1.0000",
"content": "right"
}
],
"type": "pronunciation"
},
{
"start_time": "39.8",
"end_time": "40.1",
"alternatives": [
{
"confidence": "1.0000",
"content": "now"
}
],
"type": "pronunciation"
},
{
"alternatives": [
{
"confidence": null,
"content": "."
}
],
"type": "punctuation"
},
{
"start_time": "40.11",
"end_time": "40.4",
"alternatives": [
{
"confidence": "1.0000",
"content": "All"
}
],
"type": "pronunciation"
},
{
"start_time": "40.4",
"end_time": "40.67",
"alternatives": [
{
"confidence": "0.4697",
"content": "for"
}
],
"type": "pronunciation"
},
{
"start_time": "40.67",
"end_time": "40.98",
"alternatives": [
{
"confidence": "0.9511",
"content": "fucking"
}
],
"type": "pronunciation"
},
{
"start_time": "40.98",
"end_time": "41.31",
"alternatives": [
{
"confidence": "1.0000",
"content": "war"
}
],
"type": "pronunciation"
},
{
"alternatives": [
{
"confidence": null,
"content": "."
}
],
"type": "punctuation"
},
{
"start_time": "41.69",
"end_time": "41.81",
"alternatives": [
{
"confidence": "0.8176",
"content": "And"
}
],
"type": "pronunciation"
},
{
"start_time": "41.81",
"end_time": "41.89",
"alternatives": [
{
"confidence": "0.9436",
"content": "we"
}
],
"type": "pronunciation"
},
{
"start_time": "41.89",
"end_time": "42.12",
"alternatives": [
{
"confidence": "0.8468",
"content": "really"
}
],
"type": "pronunciation"
},
{
"start_time": "42.12",
"end_time": "42.27",
"alternatives": [
{
"confidence": "1.0000",
"content": "need"
}
],
"type": "pronunciation"
},
{
"start_time": "42.27",
"end_time": "42.34",
"alternatives": [
{
"confidence": "1.0000",
"content": "to"
}
],
"type": "pronunciation"
},
{
"start_time": "42.34",
"end_time": "42.63",
"alternatives": [
{
"confidence": "1.0000",
"content": "take"
}
],
"type": "pronunciation"
},
{
"start_time": "42.63",
"end_time": "42.79",
"alternatives": [
{
"confidence": "1.0000",
"content": "this"
}
],
"type": "pronunciation"
},
{
"start_time": "42.79",
"end_time": "43.38",
"alternatives": [
{
"confidence": "0.7554",
"content": "serious"
}
],
"type": "pronunciation"
},
{
"alternatives": [
{
"confidence": null,
"content": "."
}
],
"type": "punctuation"
},
{
"start_time": "43.63",
"end_time": "43.77",
"alternatives": [
{
"confidence": "0.9593",
"content": "I"
}
],
"type": "pronunciation"
},
{
"start_time": "43.77",
"end_time": "43.92",
"alternatives": [
{
"confidence": "0.9835",
"content": "feel"
}
],
"type": "pronunciation"
},
{
"start_time": "43.92",
"end_time": "44.04",
"alternatives": [
{
"confidence": "0.5382",
"content": "that"
}
],
"type": "pronunciation"
},
{
"start_time": "44.04",
"end_time": "44.14",
"alternatives": [
{
"confidence": "0.9993",
"content": "we"
}
],
"type": "pronunciation"
},
{
"start_time": "44.14",
"end_time": "44.26",
"alternatives": [
{
"confidence": "1.0000",
"content": "need"
}
],
"type": "pronunciation"
},
{
"start_time": "44.26",
"end_time": "44.32",
"alternatives": [
{
"confidence": "1.0000",
"content": "to"
}
],
"type": "pronunciation"
},
{
"start_time": "44.33",
"end_time": "44.51",
"alternatives": [
{
"confidence": "1.0000",
"content": "take"
}
],
"type": "pronunciation"
},
{
"start_time": "44.51",
"end_time": "44.69",
"alternatives": [
{
"confidence": "1.0000",
"content": "some"
}
],
"type": "pronunciation"
},
{
"start_time": "44.69",
"end_time": "45.1",
"alternatives": [
{
"confidence": "0.9994",
"content": "action"
}
],
"type": "pronunciation"
},
{
"alternatives": [
{
"confidence": null,
"content": "."
}
],
"type": "punctuation"
},
{
"start_time": "45.22",
"end_time": "45.51",
"alternatives": [
{
"confidence": "1.0000",
"content": "I"
}
],
"type": "pronunciation"
},
{
"start_time": "45.51",
"end_time": "45.69",
"alternatives": [
{
"confidence": "0.9950",
"content": "don't"
}
],
"type": "pronunciation"
},
{
"start_time": "45.69",
"end_time": "45.87",
"alternatives": [
{
"confidence": "1.0000",
"content": "know"
}
],
"type": "pronunciation"
},
{
"start_time": "45.87",
"end_time": "46.17",
"alternatives": [
{
"confidence": "1.0000",
"content": "what"
}
],
"type": "pronunciation"
},
{
"start_time": "46.17",
"end_time": "46.41",
"alternatives": [
{
"confidence": "1.0000",
"content": "type"
}
],
"type": "pronunciation"
},
{
"start_time": "46.41",
"end_time": "46.48",
"alternatives": [
{
"confidence": "1.0000",
"content": "of"
}
],
"type": "pronunciation"
},
{
"start_time": "46.49",
"end_time": "46.83",
"alternatives": [
{
"confidence": "0.7523",
"content": "actual"
}
],
"type": "pronunciation"
},
{
"start_time": "46.83",
"end_time": "47.07",
"alternatives": [
{
"confidence": "0.2273",
"content": "base"
}
],
"type": "pronunciation"
},
{
"start_time": "47.07",
"end_time": "47.55",
"alternatives": [
{
"confidence": "0.9998",
"content": "because"
}
],
"type": "pronunciation"
},
{
"start_time": "48.24",
"end_time": "48.37",
"alternatives": [
{
"confidence": "0.8595",
"content": "it"
}
],
"type": "pronunciation"
},
{
"start_time": "48.37",
"end_time": "48.55",
"alternatives": [
{
"confidence": "1.0000",
"content": "is"
}
],
"type": "pronunciation"
},
{
"start_time": "48.55",
"end_time": "48.76",
"alternatives": [
{
"confidence": "0.9915",
"content": "not"
}
],
"type": "pronunciation"
},
{
"start_time": "48.76",
"end_time": "48.89",
"alternatives": [
{
"confidence": "1.0000",
"content": "what"
}
],
"type": "pronunciation"
},
{
"start_time": "48.89",
"end_time": "49.01",
"alternatives": [
{
"confidence": "1.0000",
"content": "I"
}
],
"type": "pronunciation"
},
{
"start_time": "49.02",
"end_time": "49.34",
"alternatives": [
{
"confidence": "1.0000",
"content": "do"
}
],
"type": "pronunciation"
},
{
"alternatives": [
{
"confidence": null,
"content": ","
}
],
"type": "punctuation"
},
{
"start_time": "49.49",
"end_time": "50.06",
"alternatives": [
{
"confidence": "1.0000",
"content": "but"
}
],
"type": "pronunciation"
},
{
"start_time": "50.56",
"end_time": "50.71",
"alternatives": [
{
"confidence": "0.9841",
"content": "I'm"
}
],
"type": "pronunciation"
},
{
"start_time": "50.71",
"end_time": "51.25",
"alternatives": [
{
"confidence": "1.0000",
"content": "scared"
}
],
"type": "pronunciation"
},
{
"alternatives": [
{
"confidence": null,
"content": "."
}
],
"type": "punctuation"
},
{
"start_time": "52.04",
"end_time": "52.34",
"alternatives": [
{
"confidence": "1.0000",
"content": "This"
}
],
"type": "pronunciation"
},
{
"start_time": "52.34",
"end_time": "52.47",
"alternatives": [
{
"confidence": "1.0000",
"content": "is"
}
],
"type": "pronunciation"
},
{
"start_time": "52.47",
"end_time": "52.95",
"alternatives": [
{
"confidence": "1.0000",
"content": "crazy"
}
],
"type": "pronunciation"
},
{
"alternatives": [
{
"confidence": null,
"content": "."
}
],
"type": "punctuation"
},
{
"start_time": "52.95",
"end_time": "53.15",
"alternatives": [
{
"confidence": "1.0000",
"content": "And"
}
],
"type": "pronunciation"
},
{
"start_time": "53.15",
"end_time": "53.23",
"alternatives": [
{
"confidence": "0.9940",
"content": "I"
}
],
"type": "pronunciation"
},
{
"start_time": "53.24",
"end_time": "53.5",
"alternatives": [
{
"confidence": "0.9821",
"content": "really"
}
],
"type": "pronunciation"
},
{
"start_time": "53.5",
"end_time": "53.73",
"alternatives": [
{
"confidence": "1.0000",
"content": "feel"
}
],
"type": "pronunciation"
},
{
"start_time": "53.73",
"end_time": "54.15",
"alternatives": [
{
"confidence": "1.0000",
"content": "bad"
}
],
"type": "pronunciation"
},
{
"start_time": "54.3",
"end_time": "54.45",
"alternatives": [
{
"confidence": "1.0000",
"content": "for"
}
],
"type": "pronunciation"
},
{
"start_time": "54.45",
"end_time": "54.6",
"alternatives": [
{
"confidence": "1.0000",
"content": "these"
}
],
"type": "pronunciation"
},
{
"start_time": "54.6",
"end_time": "54.81",
"alternatives": [
{
"confidence": "1.0000",
"content": "people"
}
],
"type": "pronunciation"
},
{
"alternatives": [
{
"confidence": null,
"content": "."
}
],
"type": "punctuation"
},
{
"start_time": "54.81",
"end_time": "54.9",
"alternatives": [
{
"confidence": "0.7858",
"content": "They"
}
],
"type": "pronunciation"
},
{
"start_time": "54.9",
"end_time": "55.03",
"alternatives": [
{
"confidence": "0.7654",
"content": "got"
}
],
"type": "pronunciation"
},
{
"start_time": "55.03",
"end_time": "55.09",
"alternatives": [
{
"confidence": "0.7615",
"content": "to"
}
],
"type": "pronunciation"
},
{
"start_time": "55.1",
"end_time": "55.24",
"alternatives": [
{
"confidence": "1.0000",
"content": "go"
}
],
"type": "pronunciation"
},
{
"start_time": "55.24",
"end_time": "55.32",
"alternatives": [
{
"confidence": "1.0000",
"content": "to"
}
],
"type": "pronunciation"
},
{
"start_time": "55.32",
"end_time": "55.67",
"alternatives": [
{
"confidence": "1.0000",
"content": "fucking"
}
],
"type": "pronunciation"
},
{
"start_time": "55.68",
"end_time": "56.02",
"alternatives": [
{
"confidence": "1.0000",
"content": "work"
}
],
"type": "pronunciation"
},
{
"alternatives": [
{
"confidence": null,
"content": ","
}
],
"type": "punctuation"
},
{
"start_time": "56.2",
"end_time": "56.35",
"alternatives": [
{
"confidence": "0.6794",
"content": "to"
}
],
"type": "pronunciation"
},
{
"start_time": "56.35",
"end_time": "56.57",
"alternatives": [
{
"confidence": "0.9940",
"content": "not"
}
],
"type": "pronunciation"
},
{
"start_time": "56.57",
"end_time": "56.75",
"alternatives": [
{
"confidence": "1.0000",
"content": "get"
}
],
"type": "pronunciation"
},
{
"start_time": "56.75",
"end_time": "57.32",
"alternatives": [
{
"confidence": "0.9937",
"content": "motherfucking"
}
],
"type": "pronunciation"
},
{
"start_time": "57.33",
"end_time": "57.65",
"alternatives": [
{
"confidence": "0.9516",
"content": "paid"
}
],
"type": "pronunciation"
},
{
"alternatives": [
{
"confidence": null,
"content": "."
}
],
"type": "punctuation"
}
]
},
"status": "COMPLETED"
}
@briankung
Copy link

Wow, I really owe you a coffee sometime. I feel like I find myself in your footsteps all the time. Thanks for posting this!

@briankung
Copy link

Oh wow, I wish the speaker label was embedded in each item. It'd be a lot nicer for differentiating who said what

@briankung
Copy link

briankung commented Oct 9, 2020

So in my case, I've used the following to verify that the item start times and the speaker start times are the same:

TRANSCRIPTION_JSON=path/to/downloaded/file
cat $TRANSCRIPTION_JSON | jq '.results.items |  map([.start_time]) | flatten | del(.[] | nulls)' > item_start_times
cat $TRANSCRIPTION_JSON | jq '.results.speaker_labels.segments | map([.items]) | flatten | map([.start_time]) | flatten' > speaker_start_times

# Diff the files backwards and forwards...

diff item_start_times speaker_start_times
diff speaker_start_times item_start_times

# Oh right, I can just compare sha hashes (on macOS):

$ shasum -a 256 *_start_times
2642201229098c2cb926c7cf8999c7a4070745d94a5a47155ac983a56db8f4ca  item_start_times
2642201229098c2cb926c7cf8999c7a4070745d94a5a47155ac983a56db8f4ca  speaker_start_times

# ...and they're identical

So in my case, the speaker start times were a subset of the item start times (which had more items, some of which had no start_time key, hence the del(.[] | nulls) jq filter). Still, I'm pretty confident that they line up for the most part.

I'm thinking about using the start times to map the transcribed words to the speakers so I can organize the transcript better.

@dannguyen
Copy link
Author

@briankung so I think this gist is pretty old, and I'm unsure of how feature-complete the Transcribe API was when I wrote it.

In any case, I just popped open the Transcribe console and did a test of the real-time streaming endpoint (couldn't find the demo interface for uploading audio files for transcription), and it appears that speaker identification is included in the JSON response:

https://gist.github.com/dannguyen/92990a177d511bdd055ec3817da85238

I can't imagine the response for the non-streaming endpoint not having speaker identification. Too lazy right now to look in the API docs but I'm sure there's a flag for it

@briankung
Copy link

briankung commented Oct 9, 2020 via email

@dannguyen
Copy link
Author

dannguyen commented Oct 10, 2020

@briankung sorry I'm dumb. didn't even read my old gist that had the Senate example. Looks like there is speaker identification #file-transcript-senate-bennett-json, but you were asking if it could be embedded with each transcribed item instead of its own object in the JSON that you then have to process/align on your own. Yeah I'd be surprised if they've changed the output format of this transcribe-job API to include that now ¯\_(ツ)_/¯

@briankung
Copy link

Yeah I tried on my own test file, a podcast, and it didn’t include the speaker identification information in the transcribed items. And no worries! Thanks for posting in the first place, helped a lot.

@rlau1115
Copy link

rlau1115 commented Oct 18, 2020

Thanks for sharing this! Looking to adapt this for a video editing workflow. Do you happen to know if the transcribe API can handle multiple languages at once?

@briankung
Copy link

After poking around with it for a bit, it doesn't seem like it supports multiple languages in a single audio file, but further research may provide more answers. I think you mostly supply a --language-code for the primary language or let it identify the primary language.

What do you mean by multiple languages at once, though?

@rlau1115
Copy link

rlau1115 commented Oct 19, 2020 via email

@briankung
Copy link

briankung commented Oct 19, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment