Search Engine War Blog : « The most popular Digg users & domains | Good Day Sunshine »

The Implications to Google Indexing Forms

Tuesday, 15 April 2008


Google recently made a post on their official blog regarding them beginning to crawl HTML forms this is interesting, not just because it could lead to the discovery of far more pages which is a good thing, or due to it potentially saving developers a lot of hassle when creating SE Friendly content, but because it could cause reasonably well SEO'd websites bit of headache.

Currently Google has told us that they will begin indexing pages in larger websites via the websites form elements when the form uses a GET request (this displays the entire query request in the location bar. POST the alternative does not) and making justified queries relevant to the content on the page.

For text boxes, our computers automatically choose words from the site that has the form; for select menus, check boxes, and radio buttons on the form, we choose from among the values of the HTML.

While this might seem fine and quite cool at the same time, I can see this causing duplicate content issues on a large scale if the engines begin to index certain forms, take for example:

<form action="form.php" method="GET>
Name: <input type="text" name="product" />
<input type="hidden" name="prev" value="page2" />
<input type="submit" name="formSubmitted" value="Submit" />

This would generate a URL similar to

But of course, as a common task we would either rewrite the URL to make it more elegant for users and search engines, or if that's not possible remove some of the less important name value pairs and just use a portion of the URL. While I expect Google to be able to notice that

is the same as aforementioned, if I have some pages that are regularly changing and I have a URL more like:

Will Google treat that as the same page? I doubt it somehow.

The Implications?

While in the short term I don't see this having any drastic implications, I can foresee:

  • Duplicate Content Penalties
  • More supplementally indexed pages
  • Link juice being spread more thinly
  • Causing issues with your "nofollow sculpting"

Possible Fixes?
While I hate the theory that Google yet again are going to end up forcing us to use more semantic markup in our HTML I can see us having to add one of the following:

  • Robots.txt the form pages, though other pages may depend on these?
  • <input type="hidden" name="robots" value="nofollow" />
  • <meta name="robots" content="NOFORMFOLLOW" />
  • Add our own hidden attribute to our forms, and using "Disallow: *specialAttribute*" to robots.txt
  • Not using forms altogether

Google has taken a large step forward that we had been expecting for some time, and it is a good progression for them,  however now it is on our doorstep it's worth bearing in mind from a site level SEO perspective it will add a new set of considerations to think about. 


Patrick Altoft

I think the key is to track the urls being indexed and then fix when you spot them.

Otherwise just 301 the form pages to a re-written url as appropriate.


Nice post...


Drawing forms using JavaScript( or ajax) will solve this issue right? Will google bot respect "NOFORMFOLLOW" meta tag. I can not see any post from google.

mac makeup bags

nice post and i like it very much~it's so useful,thanku for sharing it~

The comments to this entry are closed.


TrackBack URL for this entry:

Listed below are links to weblogs that reference The Implications to Google Indexing Forms:

» Adipex phentermine vs. from Adipex.
Phentermine adipex. Adipex hurt me. Adipex. Adipex without a prescription. [Read More]

Subscribe to this blog's feed

Add to My Yahoo!
Subscribe with Bloglines
Add to Google
Subscribe in NewsGator Online

Add to My AOL
Add to Technorati Favorites!
Add to netvibes