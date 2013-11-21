If you’ve ever dug through endless web search results, you have scraped, albeit manually. Web data scraping is tech slang for getting your hands on useful but unstructured information.

Computers can do a lot of this work for us, but those of us who are used to old-fashioned research–ahem, journalists–don’t always realize how easy this can be, or how much time we can save.

Scraping is not the same as web indexing, which is how bots and web crawlers like Google match up search results to your query by looking at metadata. “Data scraping,” by contrast, tries to replicate the human process of manually looking for information that is not as accessible–think research papers, where data is usually just plain text.

Scraping was born out of developers’ need to interact with data from websites without an open API, or data interface. Since there was no easy way for them to extract information their apps needed, they wrote scraping software. (A great example of commercialization of a scraper can be found in the famous case of Craigslist versus Padmapper.)

Craigslist, to this day, chooses not to offer an API for developers, making it nearly impossible for outside apps to do cool things with their data. This left developers such as Padmapper with a duty to innovate software that took the good stuff from plain text and structured it for purpose. Padmapper used a software which scraped data from Craigslist and mapped out rentals based on zip code, number of rooms, and price. Pretty useful and way more user friendly than the Craiglist interface.

Craiglist sued Padmapper on grounds of infringement, claiming the copyrights on user data. But Craiglist only had the license to publicly share their user’s data, nowhere near a copyright, and the case was lost. This is one of many situations which illustrate why developers have built scraping solutions for a lack of APIs.

It’s not always so legally entangled, a lot of times scraping can be used for simple tasks like taking data on a table inside a PDF or searching for a new pair of cheap kicks. It’s especially a skill worth learning if you regularly use information that comes from charts, or you’re an investigative journalist, or are constantly looking at sources with periodic updates.