Links: Spaces, Brackets and Dead Links
Simon Pitt |
Wednesday 20th January
As ever Image Dissectors asks the questions that no one wants to know the answers to: if article.php?q=57 shows you the page "article.php" with variable q set to 57, how would you display a file called article.php?q=57?
How would the computer know to display the file called article.php?q=57 rather than loading article.php?
Well, the first thing to do is ask why the hell anyone would call a file article.php?q=57. I mean, that's quite a stupid name. But we live in quite a stupid world, and someone's going to do it. And, there is a solution. You may remember earlier on, when I was mainly talking about my non-existent, but reticent, Uncle Ernie I explained that the question mark was a reserved character. That means you can't use it in URIs. If you want to use it you have to encode it to %3F. So, that's what you do. You take the file name:
and encode the question mark:
and then you add it to the domain:
And hey presto, there you go.
The question mark isn't the only reserved character. There are a whole range of them. The most infamous of them is "percent twenty" or %20 that inexplicably litters URIs.
The reason for this is simple. %20 is an encoded space (as in a press of the spacebar). So if you have a space in a file name, it is converted into %20.
Spaces, generally, are a bit of a problem in web addresses. Sometimes, the browser will assume when it reaches a space that that's it for the address and it can start running it immediately. If it does this, it will miss off anything that happens after the space. Some browsers accept spaces, some automatically convert them to %20 and some don't accept them at all.
Here's an example. Let's say I want to find articles by me on Image Dissectors. If I click on my name at the top, it'll take me to the page:
There are several occasions when this will be a problem. Many rich-text, WYSIWYG (What You See Is What You Get; or, more usually, What You See Is What You Wished You Had) editors try to automatically form links from URIs. So, when I type
in GMail, say, GMail thinks, "Whoa, that looks like a web address, I'd better make it into one". And so it turns it into:
Which, when seen in your eMail client looks like this:
You can click it and it'll take you to the page.
And here's the problem. This action on the part of the computer is initiated on a press of the space bar (or, a regular expression search for a space at least). This has been the way of doing thigns since the -correct and auto-format features in Microsoft Word. The computer assumes that once you press space you've finished with that word and are moving on to the next one. It then starts work tidying up the mess you've left, probably grumbling to itself about the "good old days" of typewriters.
But let's say there's a space in your URI.
But when you get to Simon and hit the space bar, the computer thinks, "oh, that's the end of the link, I'd better format it and you end up with this:
<a href="http://www.imagedissectors.com/author/Simon"> http://www.imagedissectors.com/author/Simon</a> Pitt
Which, in your eMail looks like this:
And takes you nowhere.
Conversely, the opposite can happen if you put a webaddress at the end of brackets. Here's an example of something I might send to a friend:
"I read a really good website the other day, (much better than the article I read here: http://www.imagedissectors.com/article/57) I'll send you a link"
You haven't actually pressed the spacebar until after the bracket, and the computer assumes the closing bracket is part of the URI, and does this:
<a href="http://www.imagedissectors.com/article/50)"> http://www.imagedissectors.com/article/50)</a>
Which creates this:
The website loads the page, and hunts for an article with the ID "50)", can't find one, and returns a 404.
On the subject of "stuff at the end of the link", it is worth paying attention to your slashes and hashes. For example:
is not the same page as:
The first is a file called "slash" (without a file extension) in the folder called. The second takes you to a folder called "slash" and then loads the file there called "index". In this case, there isn't a file called index, so it displays a 404 error.
So, what, you may be asking, do you do if you want to go to a file called "slash/" complete with slash? Well, again, you encode the slash; this time to: %2F
And link to that:
The only thing is "slash/" isn't a valid filename, so I can't upload it anyway.
It's also worth nothing that while the domain name is not context sensitive, anything after the domain is. So:
Is the same as
is not the same as this:
Up until now, we have always been talking about absolute links. That is, you can type www.imagedissectors.com/article/52 into your browser anywhere, on any site, and you'll end up back here. This is all very well, when you're linking to articles all over the Internet, but very often you want to link to pages on your own site, and it's a bit annoying having to put the full address in.
Consequently, on the current page there are two ways of linking to an Image Dissectors article. As well as providing an absolute link, we could also write a relative link:
The browser knows that this is a relative link, and it takes you to a page relative to the one you're on. Again, we have to be careful with our slashes:
Is not necessarily the same as
The former looks in the current position for a folder called article and looks in there for a file called 57. The initial slash in the later means, "relative to this domain", so it goes right back to the www folder and looks in there for a folder called article and then in there for a file called 57. These can be the same (ie, if you're in the root domain anyway, these would be the same. So, if I was on a page called;
Would be the same as
Automatically loading a file called "index" when you just specify a folder is a mixed blessing. On the one hand, it saves us writing out the full link every time, but, on the other hand, it can create confusion and duplicate file names, especially duplicate "index" files, since any URI that ends in a / must have an index file in that folder.
May look a lot cleaner and be easier to remember than:
But by making things easier for the audience, web developers run into the problem of staring at dozens of files all called the same thing, and can easily edit or replace the wrong one.
To be continued. Next time we'll be looking at another symbol you might find in the address, explaining why the abbreviation for pounds is lb and looking at what you call one of these: #