Links: Google Bombs and Other Crimes
Simon Pitt |
Tuesday 9th February
In the last article, I got rather distracted by a few etymological oddities, but in this part I'm returning to look at the actual link.
It's all very well having a URI, but if you want to make it clickable in your browser, you have to encode it to HTML. As I mentioned briefly before, when you see some blue underlined words, behind the scenes what you're actually seeing is this:
blue underlined words
The text that appears in your browser, blue and underlined is called the "anchor text", and it can say anything. There's nothing to ensure that what you write there actually refers to the website you're linking to.
This has led to some unusual consequences. Most unexpectedly, the rise of the phrase "click here". Website designer Jutta Degener remarks on the stupidity of this phrase:
If you owned a shop, you'd write 'Welcome' on the door, not 'Open this door to enter the shop.'
People started writing "click here" because they needed to talk users through using the Internet. The concept of clicking on links was not immediately obvious. Many people, for example, weren't used to computer mice. Now we're entirely at home with links. We know without thinking that when something is blue and underlined we can move the little arrow on the screen to it, watch the arrow turn into a hand, and press down with the left key to see a new page appear on the screen. In fact we are so used to this that making text blue when it isn't a link is seen as a faux pas on a similar scale to playing the bongos on a bald man's head.
However, the nature of links had a number of side effects. The first was about the "anchor text". It wasn't immediately apparent that Google used the anchor text to work out what the page was about. Consequently, if you Google search click here the first match is the Adobe Acrobat Reader home page, despite the fact that this page doesn't contain the words "click here" anywhere. However, thousands of other sites have said "click here to download Adobe Reader", and that led Google to think that "click here" was an accurate description of Adobe Reader.
As with so many things on the Internet, once people saw the power they had, they started to use it maliciously. People began to intentionally put certain words in anchor text to cause the website they were linking to be listed under that word. In 2006, if you entered "Miserable Failure" into Google, the first match was George W Bush's homepage, despite his page not containing the word "miserable" or the word "failure". So many other people had linked to his page on the words "miserable failure" that Google took this as a description of the content. This process has become known as Google Bombing.
Throughout his years in office, poor old Dub-ya was frequently a victim of this, although he wasn't the first hate figure to suffer this way. One of the first Google Bombs was against Microsoft, when it appeared first for the search term: "more evil than Satan himself".
As you might expect, Google were a bit annoyed people were doing this, and so in 2007, they changed how their algorithm worked to prevent this happening. Nevertheless, Google bombs continue to occur and even in 2009, "trou du cul du web " ("The Asshole of the Internet" in English) returned the official website of French president Nicolas Sarkozy.
This whole thing threw up some interesting questions. Linking is often seen as a passive action; I write a link on my site, and show you something else. It's a bit like pointing, or citing a reference in an essay. However, as the Internet developed, it became clear that linking had effects of its own. As the power of search engines increased, links began to affect the whole internet. Search engines like Google base a significant part of their ranking on links. While the search algorithm is highly complex, at some level it decides how important a web page is by how many sites link to it. Over time this has became a problem as people began to cheat the system. Since lots of extra links to a page will increase it's Google position, unscrupulous web managers hunted for sites that would allow them to submit HTML text (comment pages on blogs or Wikipedia) and "spammed" these with links back to their page. It didn't matter that no human would click on these links, Google would see them as an extra mark of the site's value and increase it's search position.
In early 2005, Google's Matt Cutts and Blogger's Jason Shellen came up with an extra attribute that could be added to links to tell search engines not to count this link. This was the controversial rel="nofollow" attribute. It's worth pointing out that although it's called "nofollow", it doesn't prevent search engines from following links, it just prevents them counting that link to the page.
The use of rel="nofollow" in links has angered many people. It has been described as a "a link-condom" and "like reaching to shake someone's hand, but stopping to put on a pair of latex gloves". Either way it affects the search engine rank of the site you're linking to. Or rather it doesn't, and that's the problem. It may prevent link spam, but it blocks legitimate, organic link building. Sites that use it add value to their own site but they fail to give the sites they are linking to the search engine credit they deserve. This becomes a particular problem with Wikipedia. Since Wikipedia has such a huge Google page rank, with billions of pages linking to it, when Wikipedia mentions something retrieved from another website, it will often appear more highly in the search results than the page it got the information from.
This becomes even more of a problem when people "deeplink" into a page. Deeplinking is linking directly to content that may be buried deeply into a website. Many people make their money from selling advertising space on their websites, but if someone deeplinks into their site, users can access the content without viewing any of the adverts that pay for that content.
When the W3C were asked to comment on this problem they compared it to going into a building:
A building might have a policy that the public may only enter via the main front door, and only during normal working hours. People employed in the building and in making deliveries to it might use other doors as appropriate. Such a policy would be enforced by a combination of security personnel and mechanical devices such as locks and pass-cards. One would not enforce this policy by hiding some of the building entrances, nor by requesting legislation requiring the use of the front door and forbidding anyone to reveal the fact that there are other doors to the building.
They concluded that it was impossible to prevent deeplinking.
The Web is so large that any policy enforcement requires considerable automated support from software to be practical. Since a deep link looks like any other link to Web software, such automated support is not practical.
Nevertheless, cases of deeplinking have ended up in court. In particular, the Belgian Association of Newspaper Editors successfully sued Google in 2006 and demanded they stop deep-linking to articles on their pages. Similarly, in Scotland, The Shetland Times sued another site for deeplinking to pages of their newspaper. The objection was that deeplinking by-passed the front advert. The case was upheld.
As well as pointing to content, links can actually pull in content from remote servers as well. For example, I could post a link to a lovely picture of Bill Nighy on the BBC website, or using the IMG source tag, I could actually feature it here:
As you can see from the link, the picture resides on the BBC servers, but is displayed on this page. This is known as Inline Linking, and has various legal consequences. When you write an inline link, you don't actually copy the file you're linking to onto your page, you jus point to the file on the original server. The user's browser then jumps to the owner's server, grabs the file, and displays it. This brings with it various problems.
This came to a head with the case of Perfect 10, Inc. v. Amazon.com, Inc.
Perfect 10 was a subscription-only website. People had to pay to view the pictures on the site. Unfortunately for Perfect 10, a number of other sites had displayed images on their sites, and Google cached these, and displayed a preview image on their image search. Perfect 10 argued that Google was infringing their copyright for caching the pictures and Amazon for providing inline links to the site.
The court, however, decided that Google's use of inline linking did not constitute copyright violation.
However, not all courts have been as reasonable. Under different regimes people have actually been held responsible for the content of the pages to which they are linking. In 2004, Josephine Ho was acquitted of 'hyperlinks that corrupt traditional values' in Taiwan. She was charged with "disseminating obscenities". The suggestion was that linking to illegal material is an illegal act in itself, regardless of whether referencing illegal material is illegal.
Throw in the issue of piracy and links, and you get even more of a headache. Take torrent files for example. Torrents are not links themselves, but hold collections of IP addresses of people who are willing to share certain files. This has always been The Pirate Bay's defence, that they are not storing files, but simply pointing to them. And, on the whole, this has worked for them. Most recently the owner of the Oink music sharing website was found not guilty of conspiracy to defraud.
Links have cropped up in the law that whole websites have been dedicated to tracking cases involving them.
In the next part of this seemingly endless series on Links, I will talk about what links look like and how different websites create their own styles of linking.