A few weeks ago we launched
our new website design
. Working on the website took up the majority of my time in the few weeks prior. As part of the redesign we added a number of new pages, removed some and moved others from one place to another. During this process a question was raised, “Should our URLs have trailing slashes?”. What we were talking about is this: http://example.org/page/
versus this: http://example.org/page
. The former has a slash on the end, the latter doesn’t. Does it matter? Which one is better?
Does it matter?
It certainly matters if you are allowing both URLs to access the same content. Google’s SEO Starter Guide says
“Provide one version of a URL to reach a document”
or risk splitting the reputation of the page between all the URLs used to access it. This means that at best it won’t matter for you. At worst you’ll have your page’s reputation diminished, and you might also be penalised for duplicate content.
The best option is to pick one scheme, use that scheme in all your links, and redirect users who access your pages using the other scheme.
So it probably matters. Now do we slash or not?
I lean heavily towards slashless URLs, but at first I couldn’t put a finger on why. “They just look right”. So I thought about it for a while, made a list of the factors I could think of, and wrote it up as a blog post.
Semantics
What does it mean for a URL to contain a trailing slash? For me, it traditionally means “this is a directory listing”.
When performing a request on a URL that maps to an actual directory in the file system in absence of an index document most web servers serve up a listing of the files in that directory. This is similar to doing an ls
or dir
from a terminal.
Does the same apply when accessing https://www.papercut.com/blog/
? That page is serving up the X most recent blog posts, and the view might depend on whether or not you’re logged in. In my opinion it is not a canonical list of sub-items items being served up, so the directory analogy doesn’t fully hold.
The more interesting part of how slashes affect semantics is the format of the response. What’s actually happening when a web browser requests /blog
is the web server recognising that no specific format was requested in the URL (i.e. there was no .html
extension), but because it’s a web browser making the request it makes an assumption that it wants the response in HTML. On the modern web that’s no longer a given.
We have .json
, .atom
, .rss
, .xml
and others. If a browser wants a view of something in a format other than HTML, it makes sense to allow that simply by adding the requested extension. E.g. http://example.org/blog.atom
. If the URL had a trailing slash that would become http://example.org/blog/.atom
, which looks horrible.
Reddit provides at least one example of where both a trailing slash and format extension are used together !
Role Models
What are the big boys doing? Google sometimes add slashes and sometimes don’t , but they pick one and use redirects to enforce it. Stack Overflow allows both (!), but their own links omit them. Wikipedia omits slashes and doesn’t know what you mean if you add one .
Ruby on Rails put a lot of thought into their URL routing functionality . The result is, in my opinion, the most intuitive and complete definition for URL schemes anywhere. It should serve as a model to other frameworks and to those creating URL schemes by hand. Oh, and trailing slashes aren’t used (unless you go through some extra work to add them back in).
Legacy
For us the main factor was legacy. Most of our website URLs would remain the same (we were just introducing some new ones), and they already had trailing slashes. Would it hurt to redirect all the slashful URLs across to slashless ones? The best we could come up with is “maybe”, but that was enough to can the idea.
Redirecting one URL to another via a 301 redirect (“moved permanently”) is rumoured to result in the page’s reputation flowing to the new URL. In practise, and confirmed at least once by Google , some of that reputation will be lost.
We’ve done a similar thing once in the past when we switched our main domain from papercut.biz
to papercut.com
. We used 301 redirects for all our URLs but several pages lost some reputation (e.g. from
PageRank
6 to 5). Any pages that lost reputation gained it back after a month or two, however.
Implementation
Ask four web devs how to implement pretty URLs or remove/add your slashes and you’ll get seventeen answers. We use Apache with PHP, and implement the redirection in our root .htaccess file via Apache’s mod rewrite .
Firstly we provide access to .php pages using “directory naming”: RewriteCond %{PATH_INFO} ^/$ RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteCond %{REQUEST_FILENAME}.php -f RewriteRule . %{REQUEST_FILENAME}.php [L]
(the above reads “if the requested filename doesn’t exist as a real file or directory, but adding .php on the end results in a real file, serve up that .php file (but don’t change the URL”).
Then we add a trailing slash if none was present in the request: RewriteCond %{PATH_INFO} ^/?$ RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteCond %{REQUEST_FILENAME}.html -f [OR] RewriteCond %{REQUEST_FILENAME}.php -f RewriteRule (.*[^\/])$ $1/ [R=301,NS]
(the above reads “if the requested filename doesn’t exist as a real file or directory, but adding .php or .html on the end results in a real file, if they didn’t add a trailing slash then send the browser a permanent redirect to add the slash”).
Search on this topic and you’ll find hundreds of ways to do similar things, many of which have subtle problems in certain situations. Ours probably isn’t perfect, but it’s been working for us so far.
Summary
- Picking slashful versus slashless URLs probably matters. Pick one or the other, don’t allow both to return the same content.
- Accessing a URL with a slash on the end is not fully analogous with performing a directory listing.
- Slashless URLs look much better when you want to support multiple formats (
/page.json
not/page/.json
). - In the wild, people do it both ways.
- Ruby on Rails has a very nice and complete system for dealing with URLs, and they don’t use trailing slashes.
- If you already do it one one, your pages will probably lose some reputation if you redirect them to the other way.
- Implementation is a black art. Allow yourself time to understand the details.