advertisement
advertisement

Google’s new YouTube-8M dataset includes over 500,000 hours of video

Now the wealth of information in YouTube videos—from mischievous cats and impossible stunts to documentaries and commencement speeches—will be available to researchers. The new YouTube-8M dataset includes 8 million YouTube video URLs (representing over 500,000 hours of video) is Google’s newest research breakthrough. The labeled dataset “enables researchers and students without access to big data or big machines to do their research at previously unprecedented scale,” according to Google’s blog. For quality control, they used only public videos with more than 1,000 views and built a vocabulary of entities (for example, from “acoustic guitar” to “Guitar Hero III: Legends of Rock” in the “Guitars” filter in the “Arts and Entertainment” category).