My DA Favourites
Angela I by cosfrog
29 Sep, 2006

Search Engine Friendly URLs

If you hadn't already noticed, whilst my website is database-driven, it doesn't have those annoying URL's that are full of ?, = and & along with meaningless random integers or strings of characters.  Nope, my URL's are clean and friendly to not only search engines, but visitors who might want to refer to one of my pages.

I've had friendly URL's since I first created this site, but today I made a copy of this system for a new site and I forgot the subtleties of how my URL's work and had to go looking for the combinations of tutorials and forum posts I used when creating the system.  So this blog is mostly a record for myself.

If you hadn't noticed, most URL's on my site has /page/ after the domain.  Normally this would be understood to be a folder, but it's actually a PHP file.  The text that trails after /page/ in the URL is used by the page PHP file to determine what page the visitor is requesting.

You can see an example of how a PHP file can explode a URL to use it's elements as parameters on this Sitepoint tutorial.  My code is far more complex as it recognises and preserves the hierarchy of the pages.  It's not until page 3 of that tutorial will you see the method I used to tell Apache to force my page file to parse as PHP even though it doesn't have a .php file extension.

But because my URL's are hierarchical I was having issues dealing with relative URL's.  On a normal site where pages are organised into folders, such as www.domain.com/folder/index.htm, if you type the url www.domain.com/folder, Apache will recognise that folder is a folder and will automatically add '/' to the end of the URL, changing the url to www.domain.com/folder/ and then Apache will look for an index.htm file to load.

This didn't happen with my system because Apache isn't stupid and it knows damn well that what follows after the page PHP file are not folders.  So www.microugly.com/page/folder doesn't get a trailing '/' added to it by Apache.  And then when you click a link with a relative patch such as 'subfolder' your web browser will look for www.microugly.com/page/subfolder instead of www.microugly.com/page/folder/sub-folder.  Without that trailing '/' after 'folder' the web browser will assume 'folder' is a file.

So I needed manually tell Apache to treat my fancy URL's as folders by creating a rewrite rule.  It took some nutting out with the much needed help of jharnois, but this is the result:

RewriteEngine on
RewriteCond %{REQUEST_URI} ^(.*/)([a-zA-Z0-9_-]+)$
RewriteRule ^.+$ %1%2/ [R]

The rewrite rule will add a slash (/) after the last section of the URL if it doesn't contain a period (.). If it contained a period then it would be assumed to be file not a folder.

For example:
www.microugly.com/page/folder becomes www.microugly.com/page/folder/
www.microugly.com/page.php/folder becomes www.microugly/page.php/folder/
www.microugly.com/page/folder/file.jpg doesn't change.

And that's it.  If you're curious of how I maintain a hierarchy of pages in my database, I followed the example on this tutorial on using modified pre-ordered tree-traversal.

Filed In:

Comments

No comments have been posted.

Add comment:


Submit Comment Preview Comment

Rules: Paragraphs and linebreaks are automatically created (two or more linebreaks create a paragraph). Linebreaks between code tags remain linebreaks. Block tags cannot be enclosed by inline tags. Red attributes are required and green is optional.

Use "&lt;" and "&gt;" for "<" and ">". Enclosing PHP code in <code> tags will highlight the code (i.e. <code>&lt;?php echo 'hello world'; ?&gt;</code>).

List of valid tags: