Tim Nash "stuff" Blog

Bots meet form, form meet Google

2

Ok so very quickly Google announced via Webmaster Blog that the ever mighty and slightly naughty GoogleBot has been given new directives to boldly go and fill in forms. Now it is pretty restrictive and currently purely an experiment…

In the past few months we have been exploring some HTML forms to try to discover new web pages and URLs that we otherwise couldn’t find and index for users who search on Google. Specifically, when we encounter a <"FORM"> element on a high-quality site

You can read the whole article over on their blog, while I have yet to test this their is a few obvious things that come to mind…

  • What data does it use to fill the form in? Now they partially answer…

    For text boxes, our computers automatically choose words from the site that has the form

    So that’s really clear!

  • By using Get one presumes it is simply generating URL strings and processing them, given this is liable to lead to error messages or similar content how will it cope with duplicate content?
  • What is personal data?
  • What is a high quality site
  • What circumstances would you want Google crawling form results?
  • Won’t this increase the chance of weird queries ala a few days ago?

Ultimately Google attempting to crawl more of the web is a good thing right? So why do I feel uneasy? If you are worried about Google gobbling bandwidth or harassing your sales team you could either fix your form or block Googlebot in your preferred normal way. Which is fine, except what happens when the form is on the front page? will we have no follows on submit buttons?

Have you come across Google new crawling experiment if so what did it do? has Google left a comment on your blog because you renamed the fields in your comment form?

Consulting

While I no longer offer personal consultancy if you are interested in going further then please let us know at Coding Futures


2 comments

  • Melanie Prough

    I honestly believe Google should concentrate on perfecting what they do crawl before moving on to more problems.
    For example my sitemap errors in webmaster tools which show the HTTP ERROR status a 200, but page unreachable! And showing 9 indexed urls from sitemap, when all 119 pages but 1 are indexed. Give me a break, I had to resubmit 4 times to clear it.

  • Michele

    Oh, goody just what everyone needs more gibberish coming in via their webforms.

    This seems to be a trend with them. On their front man’s blog not too long ago he pondered/asked for feedback about whether or not their bots should ignore no-index requests and list those pages anyway. Their excuses, sorry their reasons, were to “improve their customer’s searching experience” and my personal favorite “because someone else did it first.”

    There are simply so many better things they could be doing to “improve their customer’s searching experience” besides ignoring the wishes of website owners and worrying about some dastardly page hidden behind a form.

    What’s next, we’ll only index your site if it doesn’t have password protected areas?

    I just wish someone would go talk to their search engine team, smack them with a bit of reality, and tell them to stick with what they know (and usually do quite well) and stop trying to be the Internet police.

Add a comment



*Required

You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> in your comment.