With recent problems with Cross site scripting (XSS) at Reddit (though massivily over hyped) I thought it was a good time to discuss its use in SEO.
Before we start the disclaimer I nor does Venture Skills condone or practice XSS or similar practices except as demonstration purposes, ultimately we make a judgement call as to where the moral line is. Our clients sites are treated with the utmost respect and we would do nothing to damage the sites rankings or damage the reputation of our clients. Finally we do not condone nor recommend using any of the techniques outlined in this post, most of which are not cutting edge and as you will see not worth your time!
The subject of XSS in SEO has been covered before and relatively recently by SEOMoz How to get 20 .gov links in 20 minutes it was an interesting post and one worth a read.
What is XSS
Cross site scripting is when a visitor places a link or other code on your site normally through an open doorway based around GET forms to really understand XSS its best to show an example so Take this example courtesy of the UK Government [UPDATE: Link no longer works guess the Gordon fixed it ] A perfectly innocent looking form Notice the Drupal after Go to the right of the search box, move your mouse over to see a link back to this site Take a closer look at the link and you can see how this was achieved:
Getting Links Crawled
Now we have gotten to the real problem most XSS comes from badly written form but by the time you have encoded your HTML string the URL is very long and so will normally be discounted from most search engines! Now long URLs are sometimes found in Google results normally these come from lots of inbound links so it makes sense that the way to get the URL crawled is to throw more then one link at it. Now the second issue is that Google algorithm is pretty smart it can spot orphan pages, and orphan pages with lots of inbound links will surely be suspicious. The third issue Duplicate content a site with hundreds of pages being searched and crawled every day is going to suffer from duplicate content, now Google doesn’t penalise the site it just makes a choice about what is and isn’t duplicate content. Once its made the choice about which is the original page it tends to stick with it. Our fourth issue, is while a link is just that a link with some anchor text with no relevant page content. 4 Big problems that need to be overcome should you wish to make use of your newly discovered exploit so how do you get round the issues. 1) encode your final URL with link and anchor text to desired site (when choosing a keyword for the search make sure it will return some results) 2) Encode several other URL strings linking to your first encoded string, with search keywords relating to your desired anchor text. 3) Link to the encoded URL strings from several sites include them in highly context keyword rich pages which are being crawled. 4) Make sure you weave a decent web so that each URL string is found both internally and externally Some particularly badly designed site will allow you to pass more then straight html offering you the option of including remote text files to add further content to the page, such problems are rare.
Is it worth it?
Nope, Very few sites will return results even with the method outlined, even unpublished exploits take months of development to return a results and by then some one will have published the exploit and the hard work is gone. XSS maybe a scary looking tool in the Blackhat arsenal but its all show and no umph, unless you can get a link indexed then it has limited use in driving links to your site but it still can be used for other nefarious uses. So myadvice is while its good to know about these things its not worth playing with them though the above hopefully has put some light on why all those pharmaceutical spammers seem to use .edu domains. Finally make sure your site doesn’t become a victim of this sort of thing by either stripping HTML from URL strings or if it needs to be there looking at other work rounds a good example is the safestring class for PHP and ha.ckers.org also has some good advice.