Google PageRank Gotchas (MediaWiki 301 Redirects)

Beware of Multiple Pages with the Same Content

Recently I noticed that a page about Web Servers in the Docunext wiki seemed to be missing from Google's search index. Turns out it isn't missing, but it is way down at the end of the search results, even when the search in limited to on docunext.com:

Today it seems higher up, at position 14. Why so low? I think it might have been due to the fact that my MediaWiki setup had two urls for the same content: "Web Servers" and "Web servers" (maybe even more, I just found a "Web server" page too). Why? I setup a REDIRECT using MediaWiki, and unfortunately it wasn't responding with a permanent redirect, aka HTTP response code 301.

I had heard of pages getting penalized in Google's index for duplicate content, but I was surprised at how significant the penalty was. On Bing, a search not even limited to docunext.com for "Web Servers" +docunext lists the page in question at the top ranking. That surprises me, and almost makes me want to use Bing instead of Google!

MediaWiki 301 Redirects So how can this risk of duplicate content be avoided? I hacked up my MediaWiki installation to respond with a permanent 301 redirect and I'm hoping that does the trick.

I'm surprised MediaWiki doesn't do this already and that I haven't noticed it until now. I previously added MediaWiki redirects to an Apache redirect bdb index I maintain on my servers, but that was for performance reasons, not to avoid duplicate content pages. Its nice when you discover "added bonuses" like this, isn't it? Ultimately, I don't view the the Apache bdb hash as a real solution though because it requires intermittent manual updates.

By Albert on November 30, 2009 7:05 PM

Categories: