Here’s the thing about an ad: If you can’t recognize it, it’s worth nothing to the advertiser. That’s the fatal flaw with web-based ads. No matter how much ad technology evades ad-blocking software by disguising itself, it still has to be recognizable to a user and potentially clickable.
Researchers at Princeton and Stanford believe they have shown how to end the escalating blocker/anti-blocker battle as a result of that crucial point, and in favor of user choice. While a “war to win our eyeballs” sounds like the theme of a Guillermo del Toro film, it describes the interplay between advertisers (and ad-technology companies) and the visitors who reject the panoply of tracking techniques and page bloat that come with current online ads.
Some sites go beyond just trying to route around blocking techniques used by Ghostery, AdBlock Plus, and others by showing a scolding message when they detect blocking action in use. A visitor often has to disable an ad blocker or add a rules exception to proceed to a site. But Princeton and Stanford’s academics have determined it’s possible to identify ads with an extremely high degree of reliability without using any of the current ad-blocking tricks of identifying underlying page elements, domains, and the like, and also block counter-defenses from sites and adtech companies.
In a paper currently in draft form, the authors detail an interlocking set of theory, code, and legal reasoning about the state of ad blocking and the response by ad networks and site publishers. It’s been assumed that the blocking and anti-blocking war would escalate indefinitely, with battles fought as a series of measures and countermeasures. The researchers lay out the case that browser users and browser makers have the upper hand, and that in any given skirmish, publishers will quickly lose.
The Telltale Signs Of Advertising
Instead of looking at network and code, the proof of concept the authors first deployed as a Chrome plug-in–which identifies ads on Facebook–uses computer vision, optical-character recognition of text rendered as images, and other cues. It allows ads to load and scripts to run, at which point it can determine what on a page is an ad.
To discourage robots from automatically filling them out, text-based CAPTCHAs became ever more baroque to avoid scripts puzzling out the results, to the point where they frustrated many users as well as the bots. That can’t work with ads; it even stopped working with CAPTCHAs, as scammers adopted deep-learning computer vision techniques. “So long as advertisements, even malicious advertisements, are recognizable by users, you should be able to use these techniques to find them,” says Grant Storey, a Princeton undergraduate in computer science who coauthored the paper with Arvind Narayanan and Dillon Reisman of Princeton and Jonathan Mayer of Stanford. (Mayer is currently at work in the FCC’s enforcement bureau as chief technologist.)
Their approach relies in part on legitimate advertisers, ad networks, and publishers complying with U.S. regulations and with guidelines for industry self-monitoring. Reputable ads have labels and other attributes that make them stand out. It might be subtle to a user, but it’s obvious to a trained machine-learning system. (Other countries vary in their practices, though some have even stricter laws and industry self-monitoring.)
As the researchers note, “In order to defeat a filter list [such as is used by conventional ad blockers], all that is required is moving an advertisement to a different URL; in order to defeat a perceptual ad blocker, an entirely new ad disclosure standard must be approved.” The researchers limited their testing to ads on Facebook pages and ads that comply with regulations and industry practice. “For this paper, our focus was on this well-behaved universe, where there are certain sort of norms that are being followed,” Storey says.
The researchers’ system is modular and adaptable, and could be trained to recognize unlabeled ads, although the researchers have found that over time more advertising on more sites has proper labels and disclosure. Their framework doesn’t encompass “malvertising,” or the delivery of malware via ads. Anti-malware, Google Safe Browsing, and other software and services better handle that separate from identifying them as ads. Nor does it block the trackers that are often part of ad serving, but are a concern because of privacy issues rather than than visual interaction.
In their testing, the Facebook extension, in the field for several months, matched 50 out of 50 ads, including those in both the news feed and sidebars. The four researchers also report they saw no false negatives or positives in their personal use over six months.
On the broader web, they tested a module that looks for disclosures under the AdChoices program, used in North America and Europe, and which the papers’ authors found was used in over 60% of ads in a sample of 183 ads from top news websites. Their AdChoices module correctly labeled over 95% of AdChoices ads from 100 sites randomly selected from the top 500 news sites.
The researchers’ technology could create a beneficial feedback loop, too, as users who might employ ad-detection software could complain to advertisers, sites, ad networks, state attorneys general, trade groups, and the FTC about commercial messages that were identifiable as out of compliance with regulations and industry guidelines. (In fact, this approach could be automated by nonprofit and governmental consumer-protection groups to identify out-of-compliance ads.)
These techniques, and another exploration into blocking the execution of anti-blocking code altogether, raise ethical concerns that are addressed briefly in the paper, because such tools could be used in advertising fraud, a large industry in which automated scripts attempt to rack up page views and perform clicks while appearing to be legitimate actions by humans.
The research might offer more insight to fraudsters in preventing detection by using extensions, but, Storey notes, “there are still other ways to detect the ad-fraud bot that should available” and these techniques don’t work for fraud systems that load in a browser. The researchers also omitted a few details to prevent releasing full details on their technique.
The brain-a-in-jar method could be escalated further if browser makers go further and either provide deeper access for extension creators or build in ad blocking directly. Google reportedly is considering changes to Chrome that would prevent certain kinds of irritating ads from loading or bar all ads from loading on pages that use any of those forms of irritating ads.
The only way to win most wars is to avoid conflict in the first place. As web-ad revenue has slipped away to Facebook, Twitter, and mobile apps, among other places, publishers have developed adtech or signed up with networks that offer it. That’s led to heavier use of invasive techniques such as pop-up ads with hard-to-click Xs to close and auto-play video, as well as large downloads for the web code to support them.
JPMorgan Chase recently discovered that automated advertising on 400,000 sites brought clicks only from 12,000. It winnowed that list to 5,000 handpicked sites and saw no overall change in results. That would indicate that aggressive techniques to deliver ads to users aren’t working for advertisers, either.
Princeton and Stanford’s research, combined with results like those from Chase, might force publishers to rethink ad approaches entirely. That could lead them to back out of the blocking/anti-blocking situation, finding a way to attract users into viewing well-behaved marketing and leaving the tricks behind.