Home
SEO Articles
How Black-hatters Artificially Inflate their Alexa Ranking | How Black-hatters Artificially Inflate their Alexa Ranking |
|
|
|
OverviewAs sure as rain will fall in Seattle, you'll find that not long after a legitimate internet technology is developed, some enterprising soul will develop a method to abuse it. This article aims to address one commonly employed technique to artificially increase Alexa rankings. BackgroundAlexa Internet is a subsidiary of Amazon.com that collects information related to the traffic patterns of individuals who have installed the Alexa toolbar. Much discussion has occurred regarding the relevancy of results offered by Alexa. For example, it’s important to note that the assumptions reached by Alexa are not representative of the internet population as a whole, but rather only of those individuals whom have installed the Alexa toolbar, be it by hook, or crook. Further, there is no reliable information to suggest what percentage of the of the web-using population actually uses browsers with toolbar installed. In addition, the toolbar itself is only offered in English, and only for Internet Explorer and Firefox (since July 2007) browsers, and only on the Microsoft Windows platform. Computers and workstations used in the workplace, college labratories, and internet cafes are often highly managed, and the alexa toolbar, if ever installed, often doesn't last long. Despite this, it's not uncommon for website operators to reference Alexa rankings when evaluating their own websites, or the websites of prospective or current clients. In the course of this article, I am using version 7.2 of the Alexa toolbar, the most recent available. For the packet analysis, I used Wireshark v0.99.8, a freely available and open-source network protocol analyzer. For more information about Wireshark, consult their website. Technical AnalysisEvery time a request for a webpage is made, the toolbar sends a parallel request to Alexa’s server (data.alexa.com) to retrieve stored information about the site in question. Below is an example of such a request: GET /data/j6HV718Dy0g2GJ?cli=10&dat=snba&ver=7.2&cdt=alx_vw=20&wid=23976&act=00000000000 The server probably records the request and the data which it contains for use in its analytics, but also returns an XML formated document that is used by the toolbar to display key information about the website, including it's Alexa Rank. Using the above, we can make very reasonable assumptions as to the function and method as to how the Alexa toolbar works, and the what type of information is transmitted to the Alexa servers. More specifically, the first line contains a standard HTTP/1.1 GET request, the same method in which a browser requests a webpage from a server. Coupled with the ‘Host’ string a few lines below, we see the particular “web page” being requested is: data.alexa.com/data/j6HV718Dy0g2GJ?cli=10&dat=snba&ver=7.2&cdt=alx_vw=20&wid=23976&act=00000000000 We can further break down this URL into smaller chucks by analyzing the query string. By observing successive queries, certain patterns regarding their use become apparent: j6HV718Dy0g2GJ cli=10 dat=snba ver=7.2 cdt=alx_vw=20 wid=23976 act=00000000000 ss=1536x960 bw=749 t=0 ttl=4000 vis=1 rq=6 url=http://www.cybernac.com/ Weakness in DesignThe Alexa service contains no ability to validate information sent its clients. This is to say, toolbar reports sent from a client are never verified as actually coming from a valid Alexa Toolbar client. As such, the data can easily be replicated to appear to come from virtually any client- even those without the toolbar installed. The easiest method to replicate the report is to simply imbed the URL in a standard HTML <IMG> tag, for example: <img src=”http://data.alexa.com/data/j6HV718Dy0g2GJ?cli=10&dat=snba&ver=7.2&cdt=alx_vw=20&wid=23976&act=00000000000 By doing so, the browser will send a report that is structurally identical to the report given by the Alexa Toolbar. Logically, one would expect the Alexa Internet organization to take reasonable steps to prevent erroneous reports. It’s conceivable that a connection problem could cause a legitimate client to transmit multiple reports. To this end, reports containing essentially identical information would be filtered out. To counteract this, black-hatters will attempt to randomize thru programming certain aspects of the code to reduce or eliminate the possibility of Alexa invalidating the report. Proof of Concept CodeThe following PHP code illustrates how a deceptive website operator might use to artificially inflate their Alexa rank: Located somewhere inside each page where the Alexa trigger should occur: <?php As part of a separate file named ‘alexa.inc’: <?php AnalysisSo why does this work? The rational is simple. Let’s suppose that 1% of all web users have the Alexa toolbar installed. An overly simplistic logic would suggest that if Alexa receives 10 reports from unique clients to your website during a given period, your actual traffic is probably much closer to 1,000 visitors, when accounting for all the users without the toolbar installed. But because nearly all visitors will load the image supplied by the above referenced code, the reality is that 100% of your visitors are sending the toolbar report instead of the 1% Alexa is expecting. However, Alexa still applies the same assumption that only 1% of the internet population have the toolbar installed, and still performs the same multiplication to estimate your traffic value. By using the above code along with a traffic economizer such as PRIVOXY, a single website operator can simulate traffic coming from hundreds or thousands of unique sources from all over the world, each sending valid reports to Alexa Internet. The Black-hatters RationaleWhy would anybody want to go thru the trouble of the above? The reason is rather simple; money. While a higher Alexa ranking does not impact search engine results, dishonest web marketers may choose to artificially increase their rankings in preparation for a sale in order to make the property appear more valuable than it really is. On the same token, a unscrupulous search engine marketer might point to inflated Alexa scores as part of a sales pitch to illustrate an increase in web traffic; this goes beyond just filling the log files with illegitimate traffic. By presenting reports from a independent third party, the marketer may attempt to reinforce statistics that in reality aren’t true. In ReviewIt is unclear why Alexa Internet has chosen not to take reasonable steps to prevent ‘ballot-stuffing’ of their rankings. Doing so would be rather simple; all that would be required would be for the Alexa server to request that each toolbar “verify” any data sent to it. The data would be simply ignored by clients without the toolbar installed. As such, using Alexa to gauge the popularity and/or value of a given website is discouraged as the results are relatively easy for a website operator to manipulate. Correction: Digg user Hijinks was kind enough to point out a typo in the code above, which has been corrected. Thanks.
Powered by !JoomlaComment 3.20
3.20 Copyright (C) 2007 Alain Georgette / Copyright (C) 2006 Frantisek Hliva. All rights reserved." |
||||||||
| Next > |
|---|


