While we were doing the Flash tests I got to thinking about the other binary blobs out there and how the search engines were coping. While word docs are perhaps the most common way for people to send text data around it is used less as the end format. Currently the format of choice is Adobe PDF particularly in the corporate world. So it was time for another round of tests do PDFs rank for keywords? Will their links pass weight? more importantly why are they so annoying!
How I conducted the tests
Like the previous Flash test I used 4 separate domains 2 are “authority” domains in day to day use while 2 are my little test areas. Just over 16 PDFs were used with 6 end pages.
Results for those who can’t wait, commentary is below.
PDFs Ranking, Indexing, and optimising
Given we have all seen PDF files in the SERPs it will come as no surprise all major search engines do indeed index and rank PDFs. None of the major search engines provided high rankings to the PDFs if HTML page were present in the SERPs to compete with them but this is not surprising. Attempts to optimise a PDF is still limited to keyword repartition and basic manipulation of some of a PDFs meta data.
PDF and search friendly Links
Barring anecdotal evidence I have not until now seen research to support the fact that PDFs links carry weight. Our tests indicate that PDF links are followed, crawled and indexed though as PDFs themselves carry little weight there is little to pass to their links. Ask was the only search engine which did not index our test links however once more this maybe a time issue and so yet again Ask is in the sin bin. The rankings in Yahoo and MSN/Live were so poor for our test links it was hard to tell if they were gaining any weight from their PDF links.
Anchor Text did make a small difference in Google and Yahoo, but once again it was hard to tell in MSN/Live.
PDFs and Duplicate content
One of the factors that was noted in our Flash testing was that copy held inside the SWF file could be a duplicate of a HTML page and both would be crawled and ranked without a penalty in all of the major engines this also appears to be the case with PDFs. All the major engines indexed and ranked scraped content held in a PDF, though no PDF outranked it’s HTML equivalent even with the same inbound links. Out of the engines only Google prevented Duplicate content in PDFs. Google only discounted PDFs which were identical, change the Title, Meta description or change the file size by adding any extra code and both versions would be indexed.
What does this mean for me?
Anyone who has searched for hardware manuals will know that PDFs are often found in long tail search terms and while it is unlikely a PDF could compete for competitive search terms they carry enough weight to be of some use in a search engine optimisation campaign.
Unless Google petitions Adobe a link is a do-follow link and that’s it, there is no way for a writer to designate a do not crawl indication except to the entire document. This of course has implications for those looking to circumvent paid link issues, a PDF version of your review for example would provide your links with more (though not a lot more) weight then your no-followed link. Since you have no way to separate your editorial and non editorial links Google can’t penalise you without preventing all links from a PDF to be no-followed.
SPDFs or Spam PDFs
Splogs are increasingly becoming accepted, most bloggers accept that their site is going to be scraped and many are looking at ways to maximise this unauthorised syndication. It’s generally accepted that such scraping is not likely to cause the original site any major problems, scrapers using PDFs might however prove more problematical. First off the PDFs are much more likely to rank for some terms then a traditional Splog, because PDFs take time to rank you are much less likely to notice a SPDF as by the time they are ranking you will have forgotten about your post and its rankings. SPDFs perhaps represent more of a reputation management issue then a splog, apart from the general annoyance of PDFs the way to monetise PDFs are limited and so people may go to more unusual lengths (PDF viruses are not unheard of).
PDF usability on the web sucks
PDFs maybe the file format of choice for ebooks, and secure documents but when it comes to viewing them in a browser for most people they just plain suck! I know that when I accidentally click a link to a PDF my first reaction is to frantically click the close icon in Firefox! Before some one suggests Foxit I’ve done it for you and recommend it for anyone currently using Adobe Reader. Consequently anyone seriously considering using PDFs as part of a strategy might want to reconsider using them as a primary source. Given that the search engines are relaxed regarding duplicate content issue in PDFs it makes sense to where possible include a HTML version of your PDF which should rank higher then the PDF.
Deep linking in PDFs
Ok so that is a misleading heading but a useful tip in case you didn’t know it, you can jump to a specific page in a file using the name anchor attribute.
Like any anchor link it is treated as being the same page by the search engines.
The long wait
One of the biggest issues with PDFs seem to be the time they take to rank which seems to be between 1 and 2 weeks on an authority domain in Google and anywhere up to 6 elsewhere. If you are hoping to make a quick buck then you might want to plan ahead.
So PDF SEO worth it?
It’s a lot of work for little gain, I would not be surprised if some clever spammer developed a method to turn their daily scrapings into PDFs with enough interlinked PDFs they could provide some weight to pages. Personally I am not a fan, I think unless you have a reason to be protecting your content (In which case should it be available to a search engine?) their is little point to PDFs on the web, HTML does a much more efficient job without upsetting your users. However those using PDFs can be safe in the knowledge their PDFs are helping their rankings and the duplicate content should not be an issue.
Do you use PDFs? I would like to hear and gather peoples thoughts?