MIT Students Redesign The How-To Video, In 4 Easy Steps

MIT researchers aim to use crowd feedback to build better step-by-step menus for online how-to videos.


Whether you want to learn how to make scrambled eggs like Gordon Ramsay, drive a stick shift without stalling out or just wax the tips of your mustache, you can find plenty of videos online offering step-by-step instructions.


But while YouTube and other websites make how-to videos for almost every conceivable task available on demand, the actual process of learning from those recordings hasn’t changed much from when do-it-yourselfers fast-forwarded and rewound through VHS tapes of Julia Child and Bob Vila. Other than scrolling and clicking through, there’s no good way to find a particular place in a video, making it hard to skip over the steps you know or reexamine the tricky parts of, say, a new dance step or dog trick. A team at MIT is working on using crowdsourced navigational information to build better interfaces for online how-to videos.

“Video interfaces like YouTube are not really not designed for learning,” MIT PhD student Juho Kim says. “All of these common tasks while people try to learn seem to be not well supported by existing video interfaces, so that’s sort of how the idea came up.”

Kim and his colleagues built a how-to video player called ToolScape that highlights each step in an instructional video with short descriptions and before-and-after thumbnails. That makes it possible to quickly check whether a video teaches what you want to learn and to focus on the parts of the process that are new or challenging while skipping over the easy steps and tedious introductory remarks.

When they first tested the new system, they found ToolScape users studying a Photoshop tutorial had more confidence in their abilities and did better in an image manipulation task judged by outside experts than users studying the same tutorial through a traditional online video player, according to a research paper.

“I find that fascinating, because it’s the same video content that you’re interacting with,” says Kim. “[It] results in better learning, even when you’re using the same video content as the original material.”


The MIT team isn’t the first to look into building a better how-to interface, Kim says. There has been plenty of university research into building better instructional videos, especially as distance learning and online open courses have grown more popular, he says. At the same time, software companies have realized many of their customers learn to better use their products by studying online tutorials and have looked into ways of improving them, he says. “Companies like Adobe or Autodesk, companies that make complex software, really want people to learn better,” says Kim, who previously interned at Adobe.

Researchers from Adobe and the University of California at Berkeley published a paper in 2012, showing a system for neatly integrating screen capture videos into step-by-step Photoshop tutorials, and Adobe has released a tutorial builder tool that lets users automatically record their actions in the program for other users to watch or even run themselves in their own copies of the program. Similar research by Autodesk focuses on letting users share and annotate step-by-step processes in the company’s computer-aided design tools.

But those tools and others being researched require integration with specific software or heavy participation from the makers of the tutorial videos. That doesn’t help viewers who want to master skills that don’t involve software, and it doesn’t help viewers of the innumerable instructional videos already online, Kim says.

“We can always ask authors to create these labels, but we have thousands, maybe millions, of those videos on the web, and we cannot possibly expect everyone to add those annotations when they create those videos,” he says. “That’s why we thought crowdsourcing would be a reliable approach.”

Here’s how Kim and his team built ToolScape, in four steps:

1. Experiment with Mechanical Turk.
Kim and his colleagues first experimented with using Amazon’s Mechanical Turk platform to pay Internet viewers to pick out the discrete steps from existing how-to videos on subjects from cooking to makeup to Photoshop; they found the Mechanical Turk users performed comparably to experts in the various fields, he says. “That is great—these are untrained people, not necessarily experts in this domain, but they were still able to produce good enough labels,” he says. But even at Mechanical Turk’s low per-task rates, the cost of annotating those thousands, if not millions, of existing videos would still be quite hefty, he says.


2. Crowdsource.
Kim and colleagues decided to work on a tool called Crowdy that would let viewers annotate videos as they watch them by answering questions designed to generate and refine labels for individual steps.

3. Solicit feedback.
“What we do is we occasionally pause the video and ask people to summarize the part that they just watched, and we combine notes from different people,” he says. “We ask another set of people to verify what other people have done to describe that process.”

4. Vet the results.
Kim and his colleagues found user-generated annotations were again comparable to those compiled by experts and observed users found their own learning experience improved from the note taking and editing process, they wrote in a paper scheduled for presentation at a March conference.

The team is continuing to add videos for crowdsourced annotation, and while Kim says he has no plans to turn the system into a commercial product, he intends to see it continue to grow as an academic project and as a service to the web community.

“We are trying to get more videos annotated that way, but I’m not particularly interested in starting a company with this,” he says. “But, of course, having thousands of videos annotated this way would be really cool, and it would open up a lot of opportunity.”


Ultimately, once the system is more complete and enough videos are properly annotated, Kim envisions users even being able to quickly jump from one video to another to see similar steps taught by different instructors. Someone confused about how to poach an egg as part of a larger recipe could easily skip around to egg-poaching steps in other cooking videos, then return and finish learning how to make the complete original dish, he says.

“What I’m trying to push forward is for people to be more actively engaged in the video and also contribute to the knowledge of the community,” he says. “In the process of trying to learn better, what you generate can serve a bigger cause.”

About the author

Steven Melendez is an independent journalist living in New Orleans.