Depending on your needs, and how you look on them, YouTube’s embedded captions are either extremely useful, or an irritating extra that people abuse, distracting you from the video. The good thing is: Captions just got a whole lot cleverer.
YouTube notes that its captions are good for boosting a clip’s profile in search, since there’s more material to be searched for, and it also helps you jump within longer clips. Proper transcribed captions are also invaluable for hearing-challenged YouTube users. Although there’s a degree of automation already, captioning is still pretty laborious–so it’s not ubiquitous. That’s why Google has added two new functions that boost the automation significantly. And by significantly, I really mean “wow that tech is amazing!” Check out this demo video to see what I mean:
Automated machine-generated captions is the big new trick, and is made to work with technology lifted directly from Google Voice’s voicemail transcribing system. Basically it listens to the clips, transcribes the text, and auto-overlays the captions over the video at just the right moment. Google notes that it’s not perfect, but that the “technology will continue to improve over time”–and, get this, you can even auto-translate them into other languages. But if you’ve already got a transcription written out, the tech is even more accurate–you just upload it and Google speech-recognizes the video and pins the relevant captions in the right place.
It’s an experimental feature, so the auto-caption thing is being rolled out for just a few channels, such as MIT and PBS, at the moment. (Presumably it’s in trials because it’s extremely processor-intensive, and its recognition reliability needs a big boost.) The auto-timing feature is, however, being made available for all videos in English on YouTube.
Why should you care about this? Well, if you’re deaf or partially deaf, then it’s obvious. But it’s also an indication of how seriously Google’s taking YouTube, and it hints at just how much search engine power is going to expand–if it can target the spoken text of online video, what will be next? It’s also a reminder of how sophisticated the seemingly light-hearted world of online uploaded user video can get.