Google Bombing and the IRS

Tax season is here, and like many Americans I recently went online to download forms. Usually I'd head straight to the Internal Revenue Service Web site but I find Google a more efficient way of navigating big government sites. When I Googled "IRS form 1065," there was no direct link to it on the IRS Web site, just manuals and tax tips. I skimmed the page and the ninth result was:

Form 1065 B

IRS Form 1065-B (Schedule K-1) is the partner's share of income or loss from an electing large partnership. This form is to be filled out by each partner in ...

form-1065-b.bejegsugy.com/

Curious, I clicked on the link but a warning popped up, claiming the site contained malware, which meant the application, if downloaded, could cause serious harm to a PC—anything from surreptitiously installing adware, spyware, and malicious programs to turning it into a zombie that unleashes billions of spam emails, or even wipes out the hard drive. I wondered how a site like this ended up in the top 10 search results, with Google's much-vaunted claims of relevance and reliability. Indeed, Google boasts that it uses "more than 200 signals," including its patented PageRank algorithm, to rank sites.

Yet, here was a site that clearly shouldn't have been in the first 10 results. I entered other forms—1041s, K-1s—and found more suspicious sites appearing within the first 20 results, one of them listed as the fourth result.

For example:



1041-es.jpg

When I used Mozilla Firefox these bogus sites were blocked automatically, part of the security features built into the browser. This was not the case when I visited the sites on Safari. Nor was it so when I switched to a PC running Internet Explorer. Notably, Google's own Chrome browser didn't offer any protection either. I followed the link to a site that warned I had malware on my computer, urging me to click on a program to eradicate it from my hard drive and protect me from future incursions. The only way you can click away is to quit the browser. It won't let you close the window or move backward or forward to another page. If you own a PC (Macs are not affected) and download the promised cleansing agent (called Malware Defender 2009), you would be downloading spyware that has been traced to hackers from the Russian Federation. Pretty clever, offering an antivirus tool that is in itself a virus tool.

Over the years, the IRS has issued numerous warnings covering online scams (last updated nine months ago). Usually they are classic identity-theft phishing schemes that rely on official-looking email messages informing you that you are going to be audited, are due a big refund or government stimulus check, or offered $80 to participate in a survey. Another version attempts to lure you to a Web site offering free online tax-filing services. In each instance, the message advises you to click on a link that then takes you to a fake IRS site where you're asked for personal information such as social security and credit card numbers.

In this case, however, fraudsters manipulate Google search results to hijack a user's browser. The fact that these sites are lodged high in Google's search rankings give them the patina of authenticity. That's what makes them so dangerous. (The same didn't appear true for Yahoo or Microsoft's search, which, as far as I could tell, didn't display these bogus sites—at least not in the first several pages of results.)

This Google bomb tactic is not new. Black hat search engine optimization (SEO) has been going on for years. According to Dave Dittrich, a senior security engineer and researcher at the Information School at the University of Washington, a typical approach is to create thousands of web pages running on hundreds of servers that cross-link to one another. Each file contains text that includes a word and strings that result from doing a search for that word. It can then push a product or service on to the first page of results—and that is by far the most valuable search engine real estate, because most people don't bother to venture past the first page.

As far back as November 2007, cybercriminals have been borrowing black hat SEO techniques to target popular keywords on Google—everything from "how to teach a dog to play fetch" to recent ones that include terms relating to Easter, March Madness and Barack Obama. Their goal: to disseminate destructive payloads. By one count, more than 1 million links point to a single poisonous domain. A while back Google created a filter in response to this malware frenzy, which earlier this year went haywire, blocking every single site Google turned up for almost an hour and freaking out some users.

With April 15 approaching, it was perhaps inevitable the IRS would also become a prime target. The attackers appear to be taking advantage of a specific PageRank vulnerability that weighs a page's popularity by treating every inbound link as a "vote," with pages attracting lots of links given more weight than pages with just a few. Larry Page and Sergey Brin, Google's founders, view this as a form of democracy on the Web. (Apparently governing through democracy in search is as difficult as it is in the real world.) These digital ne'er-do-wells also found a way around Google's "hypertext-matching analysis" that claims to analyze "the full content of a page" and factors in "the precise location of each word." If a Google searcher clicks on the bogus link, he is either taken directly to a site hosting malicious software or redirected to one.

To see if this IRS Google bomb tactic adhered to this model, I googled "b.bejegsugy.com," which was the first bogus site I'd encountered.

The first four sites listed were:



schedulec.jpg

After clicking on them at different times I was transported to either a) a fake YouTube site (carelessly misspelled YuoTube), b) the same Malware Defender 2009 site, or a page that looked like this:

form1041schedd.jpg

That's the content the scamsters use to fool PageRank and push to the top of Google's search results.

On the bottom were links to links and more links, such as the ones on this page (which the page above linked to):

form4952.jpg

When I visted bejegsugy.com, I found a semi-legitimate search page with topics like "Film School," "Stock Photos," and "Car Insurance," which offered links to genuine sites. (Later it would morph into different-looking search site.)

And the search box? It was powered by Google. I found it a tad unnerving that it remembered many of my previous searches. For example, one I recently conducted on the economist Paul Krugman.


bejegsugy.jpg

The domain name Bejegsugy.com was registered to an individual affiliated with a company called Zitoclick on March 26—the day I first encountered the site as a malware host. The registrant information provided an email address: support@zitoclick.com.

Zitoclick.com is a barebones search site that claims to offer for download a toolset that "combines a richer, more intuitive internet search experience" and "works directly with Windows XP or Vista and either Internet Explorer or Firefox." A quick search indicated it was part of the extensive cross-linking network that characterizes a site used to help juice Google rankings, often appearing as a link on a page with no obvious connection. Plus, Zitoclick owns more than 13,438 other domains.

I contacted Google to ask about this latest twist on the IRS scam—namely, how was it possible to so badly fool PageRank? A Google spokesperson, via email, offered the usual corporate boilerplate response. (Below you'll find the entire statement.)

More to the point, it appeared that Google took immediate steps to clean up its search results, eradicating the bogus malware sites from IRS form-related searches, and reprogramming its Chrome browser to block the site that hosted the malware. When I checked later that day, none of the malware sites I'd stumbled on were there anymore.

And the next time I used the Chrome browser to visit the malware scanner Web site, Google had blocked it. The advisory listed the site as "suspicious," and warned that visiting it "could harm your computer." It also reported: "The last time Google visited this site was on 2009-03-26, and the last time suspicious content was found on this site was on 2009-03-26."

That was the day I contacted Google.

Now that Sergey and Larry's engineers were on the case, I figured these bogus IRS form malware sites wouldn't stand a chance.

I was wrong. Two days later I checked again by googling "IRS form 1065."


form1065.jpg

 

The 39th result was another malware site:


formk.jpg

I also tried other keywords, like "IRS Form 940 January 2009."


form940search.jpg

This time three bogus sites appeared on the first page. In other words, three of the top ten results were malware sites. Worse, Google didn't block any of them. As soon as it did, hackers would unleash another wave of malware sites, and the game will continue round and round.


940ez.jpg

Google has built its billion-dollar empire on search, yet hackers have learned to subvert the system at will. It makes you wonder what other keywords are tainted. If Google search isn't democracy incarnate, which is how the company advertises it, then what is it? In some instances a rigged system that rewards not the sites that have earned placement on the most valuable real estate—the first page or two of results—but one in which scammers can profit.

And what if these cybercriminals, like those behind the mysterious Conficker worm, which has been getting heaps of press lately, were to deploy more damaging payloads? For now, they have stuck with basic PC-busting malware that is often sniffed out by antivirus products. If these hackers switch to more damaging Microsoft PC "0days" (pronounced "oh-days" or "zero days," it generally refers to unknown, or zero-hour, software threats that are easily attained on the hacker black market ) Google could become a most inhospitable place to do your searching. And those responsible for Conficker are not the only ones worth worrying about. A recent report identified a vast cyberespionage campaign dubbed GhostNet that infected 1,295 infected computers in 103 countries, including embassies, international organizations, ministries of foreign affairs, news media and NGOs. It, too, relied on malware to disseminate an application called Gh0st Rat that transformed PCs into spy devices—pilfering confidential documents and turning on cameras and microphones without users' knowledge. And most antivirus products didn't provide protection.

As for those who plan to download IRS forms, it probably doesn't need to be said that you should skip Google and head straight to the IRS Web site, which will necessarily have "irs.gov" in its address. Accept no substitutes. If you do, you do so at your own peril.

Google Spokesperson response:

Hi Adam,

Thanks for getting in touch with us. Feel free to attribute the following information to a Google spokesperson:

Search is a complex problem and the heart of what we do. We use more than 200 signals, including our PageRank technology, to help us rank sites. At the same time, we work hard to protect our users from malware. We've removed many of these types of results from our search index. However, this issue affects more than just Google, as these sites are still part of the general web. In all cases, we actively work to detect and remove sites that serve malware from our index. To do this, we have manual and automated processes in place to enforce our policies. We also flag suspicious sites with malware warnings using our Safe Browsing tools. We'll continue to monitor for these bad results and will remove any as necessary. Additionally, we're always exploring new ways to identify and eliminate malicious sites from our index.

Please let me know if you have any other questions.

All the best,

XXXX

Add New Comment

3 Comments

  • Wasim Haque

    Awesome Posting
    Very informative and detail posting, I think it is a great help for the people who wants to know more about these things.
    ventrilo server

  • Bob Kerns

    Bill: Agreed, but the difficult thing with applying human judgment is making it scale.

    Google has had a few rounds of technologies aimed at incorporating human judgment in search. The latest incarnation is Custom Search. There's also Preferred Sites, which lets you list sites you'd prefer, and lets you promote or remove search results. Unfortunately, that's currently a solitary activity.

    Still, none of their facilities have really taken off. I don't think they've found the right way to leverage community and social networking to accomplish useful search organization.

    Part of the problem is creating an incentive for putting in the effort for rating.

    There may well be ways of gaming the "human judgment" piece. Fake humans is one. Perhaps that could be controlled for with reputation management of one sort or another.

    Another problem is that human judgment may be applying a very different context than the one in which you are searching. To take an extreme example, a malware researcher would be looking for exactly the ones most of us would want to exclude.

    Or consider the word "breast", which depending on context, should trigger filtering, or prefer cooking sites, or health sites, or porn sites, or literary, or swimming, or ...

    You get the idea. One of the big strengths AND weaknesses of democratic search is that it is at least neutral about context.

    Still, I can't help but thing that in this arms race, eventually we will USUALLY come out ahead, much like the body's immune system usually wins out over invaders. But it will take us a while to evolve that immune system, and it will involve much learning through painful infection.

  • Bill York

    The examples discussed in this article are pretty disturbing. They clearly demonstrate the limits of robotic crawling and heuristic algorithms as ways to find and deliver good search results. That's why I think that the best way to gain the upper hand in this arms race is to take greater advantage of human judgement in selecting and ranking the good stuff.