As the maker of JoomlaPack Akeeba Backup – the Open Source utility to backup, restore and migrate your Joomla! site – I often have to face certain challenges. Like when a user told me that as soon as he transferred his site to a different domain, all links in his content would link to the “old” site. Fighting the temptation to dismiss it as a user error, I did some digging around. Throughout this journey I found out some of Joomla!’s link handling deficiencies, their repercussions and coded a workaround.
In this article I am going to talk about how Joomla! handles the link base and canonical URLs, as well as what happens when you migrate your site to a different domain, subdomain or even a subdirectory.
In this article I am going to talk about how Joomla! handles the link base and canonical URLs, as well as what happens when you migrate your site to a different domain, subdomain or even a subdirectory.
Updated: Since Johan Janssens pointed out that my views on the base tag handling were wrong (and provided enough arguments to convince me of the fallacy of my information), I decided to remove the relevant section altogether. I kept the canonical URL handling and the link migration part because they are essential to moving your site around.
The base tag (mis)handling and why it ruined my day
This section of the post used to talk about the base tag. It turns out that I got it wrong and Joomla! does it right, as in standards compliant right. So, I just have to scrape off this section of the blog post and apologize to the Joomla! development team for falling short on my research.
Oh, by the way... If you do use my plugin, please turn off the "Base tag fixing" until I release a new version of it :)
And since you might have gotten curious, please read what Chris Davis wrote on the base tag, the W3C standard and the Joomla! developers discussion which led to the current Joomla! approach. A bit technical and long, but worth reading.
Content here, content there, duplicate content everywhere?
It’s not like they advertise it, but Google – and leading search engines – hate duplicate content! Duplicate content is a method which has been abused by spammers trying to make their crappy pages appear in the first places of search results. If you were doing extensive web searches a few years ago you know what I mean. In all fairness, Google tries to identify which is the definitive URL it should present in its search results when it has cached two pages with substantially the same content. This definitive URL is called the canonical URL of the content.
Even though Google’s top-notch engineers try their mightiest to do that, they do urge us to help them by putting a “hint” in the source code of our pages. Given the fact that your Joomla! content is available from at least two different URLs, specifying the canonical to point to the SEO URL would give you an advantage in search indexing and relieve you of the anxiety of getting your site bashed because Google thinks it is made up from duplicate content. It would also improve the indexing volume (number of different pages the Googlebot indexes on your site every times it comes by), as it would take shorter for Googlebot to deduct which content is relevant and worth indexing and which is effectively an alias of content it has already indexed.
Joomla! does not produce a canonical URL hint, at all. This is a huge oversight, because it already has this potential in its code base. As any Joomla! developer knows, it exposes the JRoute::_() method which pretty much converts a non-SEO URL to a SEO URL, taking care for making this a valid relative link to your domain as well.
Think about this: this lack of feature leads to URL proliferation, which is bad. The same content can be accessed from two or more URLs. The ugly, non-SEO, non-canonical URLs should either redirect to their canonical form – so that Google knows they’re an alias – or include a canonical “hint” to let Google know that, well, they are an alias! Joomla! does neither, so Google has to try hard to figure out what the *bleep* is going on with your site. From what I gather, it simply resorts to assuming that the most linked URL is the canonical. On a large site, with many inbound links, this problem goes unnoticed. But in a small site, where most links are cross-links between content items (which are usually non-SEO, thanks to the leading content editors) you end up with the wrong kind of assumption being enforced.
And now what?
This mantra of mine is a testimony to the extensibility of the Joomla! CMS. AFAIK, it’s one of the few content management systems which a propos recognize that they are lacking a feature you need and delegate the responsibility to third party developers. In the hands of a determined developer, Joomla! is like soft clay in the hands of a skilled craftsman.
As you might have guessed, my initial reaction was to search high and low for an extension which would:
- add a canonical “hint” pointing to the best guess of the page’s SEO URL;
- automatically migrate absolute URLs pointing to the old site to URLs pointing to the new site in both links (<a> tags) and images (<img> tags);
- redirect non-canonical URLs to their canonical counterpart with an HTTP 302 redirection;
- rewrite non-canonical URL links in my content items to their canonical form.
About the latter feature, you may think it’s redundant. It’s not. It means that all Googlebot will see when indexing my site is canonical URLs. This will allow it to index my site faster and more thoroughly. Two birds with one stone, as the old saying goes.
Alas, I couldn’t find an extension to cater for everything, or even multiple extensions which would individually solve each problem. So, I did what I am best at: wrote the code myself.
Enter Link-A-Tron
The solution I created, which I aptly named Link-A-Tron, consists of two Joomla! plug-ins.
The “Link-A-Tron System Plug-in” takes care of:
- Base tag fixing. Well, scrape that. My "fix" reverts the base tag handling to what it was in Joomla! 1.0.x. As Johan pointed out, it's wrong and breaks URL fragments (page anchors). If you don't care about anchors and standards compliance use it. If unsure, turn it off.
- Canonical hinting. Includes a canonical URL “hint” instruction inside the page’s for Google and other search engines to use. It uses Joomla!’s router infrastructure to calculate the canonical URL, so it should work with third-party SEO components such as sh404SEF, AceSEF, etc.
- Canonical link redirection. If the current page’s URL differs from the canonical URL, it will promptly redirect the web client (be it a browser or a search engine spider) to the canonical URL, mitigating URL proliferation.
Each of these options can be separately toggled on and off in the plug-in configuration.
The “Link-A-Tron Content Plug-In”, on the other hand, takes care of the remaining issues:
- Link migration. You supply a list of addresses where your site used to be installed and the plug-in will replace all links to URLs referencing those sites with ones which point to the current site. Let's say that you had a site on www.example.com and you moved it to www.example.net. You just supply the old domain name (www.example.com) and all links (<a> tags) and images (<img> tags) will be rewritten to replace the old domain with the new domain. This is useful when migrating your site to a new domain using JoomlaPack!!
- Non SEF URL rewriting. It replaces links – found inside your content items – to non SEO URLs so that they point to their SEO equivalent. For example, it will transform the ugly http://www.example.com/index.php?option=com_plants&catid=1:fruits&id=1:banana URL to the SEO-friendly http://www.example.com/fruits/banana.html URL.
This plug-in works independently from the system plug-in and its options can be separately toggled on and off in the plug-in configuration. Since it is a content plug-in, it will be used in Joomla! articles, as well as any other content extension which makes use of content plug-ins.
Edit May 2012: Instead of the plugins mentioned in this post you can now use Admin Tools Core by AkeebaBackup.com for link migration. The canonical URL features are not implemented because of several technical reasons. The main reason is that you can't possibly guarantee canonical URLs for each and every Joomla! extension without knowing exactl how each extension produces its own URLs.