Links: Urns, Earls and a Guy Called Uri
Simon Pitt |
Friday 15th January
On the 5th December 2003, Josephine Chuen-juei Ho, chair of the English department of the National Central University in Taiwan was formally charged with "disseminating obscenities" and "corrupt[ing] traditional values". That same year, Perfect 10, a purveyor of "artistic" images started sending Google copyright notifications. Four years later they sued Google for copyright infringement. One year before all this, BT claimed that they had invented something that everyone used everyday, and demanded the patent.
All of these situations revolved, in some way, around the hyperlink; those clickable, underlined blue words with which we've all become so familiar.
The link has changed our lives. We click on them dozens if not hundreds of times a day. "I read a good story the other day", someone might say. "Send me a link," you'll shout back with a dismissive wave of your hand. They have become our way of sharing, storing, accessing, telling and even making money.
But how much do we really know about these seemingly simple little blue things? Over the next few articles, a narrative about the link will unfold; about how it affects our daily lives and how it has gone from being an academic concept to being such an important tool that people go to prison over them.
But our journey will begin, as always, with a web address. That's straightforward enough isn't it?
Only it's not. We only have to mention the name and we come into problems. The phrase "web address" doesn't actually mean anything. "Okay, okay", you say, "I see what you're up to. Their proper name is URLs." Only that's not quite true either. Technically, when people say "web address" what they actually mean is URI (Uniform Resource Identifier) rather than URL (Uniform Resource Locator). In colloquial use, URL has become commonplace. But in technical application that term has been deprecated in exchange for URI. The URL is a type of URI. When talking about a "web address", you should use the more general URI. As one commentator pointed out: "if the URL describes both the location and name of a resource, the term to use is URI." Next time you hear URL, if you're the sort of person that does this kind of thing, you can smugly correct the speaker, safe in the knowledge that you've just made one less friend.
But, anyway, here is a typical URI:
Clicking this link will bring us to this page. Welcome back. Of course, if you click it, the page will reload and jump back to the top, and you will have to scroll back down to this point again. So, well done on getting back here if you did click it, and well done on resisting the temptation to click it if you didn't.
Let's go through this URI one bit at a time:
The first four letters in our address are the 'scheme name' (also known as the 'protocol'). In this case, they stand for "HyperText Transfer Protocol". This is only one of many possible protocols; the most popular others are ftp ("file transfer protocol") or mailto which is used for sending eMails, for example:
If you're on Windows and you click on that, you'll probably find that Outlook Express pops open and prepares an eMail to us.
Mailto links are particularly clever. It's possible to encode a lot more information that just the address into them. Here's an example:
mailto:email@example.com?subject=You're Just So Brilliant&body=I just wanted to drop you a note to say how flipping wonderful you are
Clicking that link will load up an eMail will a prepared (and entirely unbiased) subject and message already in place, just ready for sending.
This first section of the address (up to the colon) tells your computers what to do with what is coming. "Mailto" is used for sending information about eMails, "ftp" means you're going to be using the "file transfer protocol", "http" means you're requesting (hyper) text documents. Or, in more common parlance, web pages.
Writing "http://" at the beginning of all your web addresses is a bit of a faff. After all, you've already opened Internet Explorer, haven't you done enough work for one day? Isn't it obvious you want a web page? In most modern browsers you can miss this first section off and just go straight on with the rest. Your browser will assume that this is probably what you meant anyway and pop this on at the beginning.
The colon following the this first section tells your computer that it has finished talking about the scheme now and it's time to get on with something else.
The next thing is two slashes. These largely serve no purpose other than making it difficult to say a web address: "aitch tee tee pee colon (that's two dots above each other) forward slash forward slash". This is less of a problem now, but in the early days, television presenters would fumble around saying things they clearly didn't understand, trying to explain a web address. Now, we're so savy we can just say, "yeah, like, go on bbc.co.uk", or, more commonly, "just go on the BBC page". Even when they have a URI writen down, most people will now Google the site rather than risk mistyping a URI.
This saves all the faff with the http and punctuation. Which has been quite a faff. So much so,in fact, that in an article in the New York Times Tim Bernes-Lee, the father of the modern Internet, apologised for adding the two forward slashes.
The double slash, though a programming convention at the time, turned out to not be really necessary, Mr. Berners-Lee explained. Look at all the paper and trees, he said, that could have been saved if people had not had to write or type out those slashes on paper over the years - not to mention the human labor and time spent typing those two keystrokes countless millions of times in browser address boxes.
"I could have designed it not to have the //," he said at a symposium on the future of technology. With a shrug of the shoulders he added: "It seemed like a good idea at the time". Who knew, after all, that it would become "so much hassle".
Of course it could be worse. Take the rather messy method that windows uses to navigate to files
If you're on a Windows PC, and you haven't done anything weird, copying and pasting this into the address bar should load a blank page saying:
But look at that, URI! All those slashes (three of them) and colons and such like. Not to mention the fact that this isn't the normal way to link to files in Windows. The normal way to do it is like this:
For the current version of the Outlook Express Read Me information, please select
Help Read Me from the menu bar in Outlook Express
C:\Program Files\Outlook Express\msoe.txt
If you copy and paste this one, it will take you to exactly the same file! But here, the slashes are the other way round. It's almost as if they're trying to make this confusing. Incidentally, you have to copy and paste these URIs due to a patch in Windows that prvevents websites loading files that are already on your PC. So, even though the URI works, most browsers have broken the ability to make a link on this text.
The BBC took Bernes-Lee's apology as an opportunity to look at everything the "forward slash" was used for. Of course, we shouldn't call it a "forward slash".The proper name for it is a virgule (although it's also known as a diagonal, a stroke, a right-leaning stroke, an oblique, an oblique dash, an oblique stroke, a slant, a scratch comma, a slak, a whack and, perhaps most improbably of all, aseparatrix).
Oh, and also, it shouldn't be confused with a solidus, which is completely different! See:
In fact, "forward slash" is pretty much the only thing it shouldn't be known as. The term is, according to rumour, a microsoftism. In fact it's linked to the reason why the slashes were the other way round in the earlier example of an address to a file in Windows. In one of their first operating systems, Microsoft ran into a problem. When they wanted to write an address to a file, they found they couldn't because they had already used the slash for "options". This was despite the fact that all other computing systems used the slash for describing the address of a file. Consequently, Microsoft had to use the backslash for addresses. To try to cover their embarrassment Mircosoft, supposedly, invented the term "forward slash" to suggest that both slashes had a direction, and both were equally valid.
Stephen Fry, and later John Peel, suggested that slash was much too violent sounding for use in Web addresses and suggested the use of "stroke" instead of "slash". Despite the wonderful logic, this has failed to catch on. Indeed, in the early days of the Internet, "stroke" was used much more frequently than "forward slash". Now, this strange new word has become common and looks like it's here to stay.
In the next part of this series, we will move into the domain of the domain, where we will learn what that is, what it means, and a few interesting facts as well.