Many readers wrote to me requesting additional information on the process of submitting dynamic sites to robot engines after my article “Registering Dynamic Sites” ran a couple of weeks ago. To review the basic procedures: Precise compliance with rules on the “add URL” page for each robot engine is important. Because these rules vary widely, the submission of documents has to be intuitive. You must track the steps taken, keeping a history of submissions and resubmissions. And you must check the robot engine before submitting a document a second time.
In order to provide more specific technical information, I contacted a couple of programmers who have experimented with various methods for modifying and submitting dynamic ColdFusion pages. Dan Miller at WebClay.net was able to provide me with some technical data. John Charlesworth, president of BellaCoola Software Corporation, also contributed information to assist me with registering dynamic pages.
Miller says, “If the page needs one specific variable to load, like ‘Page ID,’ for example, you can replace the ‘?’ with a ‘/’ and pass just the value of the needed variable, like this: http://WebOnTheFly.com/page.cfm/1.
“There are several CGI variables that are generally available to a dynamic application with every pageload. One of these variables is called Path_Info. This variable contains the file name and everything after it. So in this example, Path_Info is ‘page.cfm/1.’
“With a few lines of code, you can read Path_Info, strip off everything before the ‘/’ (the file name), and set a variable called ‘pageid equal to 1.’ It’s been noted that some web servers prefer a variable called ‘Script_Name’ rather than ‘Path_Info.’
“There are variations on this to pass multiple variables, etc., but this is the basic concept. When the robot engines see the URL with no ‘?,’ in theory they’re fooled into thinking the variable is just a subdirectory, and they keep going.
“There is one big caveat to this technique: All links and images on the site must be absolute. This is because the browser is also fooled into seeing the variable as a subdirectory and looks for the files there, causing broken links and images when a relative path is used. For example, a link to ‘home.cfm’ should instead be represented as ‘/home.cfm’.”
I asked Charlesworth about several other possible scenarios:
- What effect do server-side-include files have on search engines?
- Do the search engines follow client-side links in the included files?
- How does one deal with URL-based parameters on pages submitted to search engines?
- How does one deal with file-name suffixes ending with “.pl” or “.cgi”?
Depending on which scenario fits your situation, the answers are shown below.
Scenario 1 (server-side-include files): If you use server-side-include files, then your web server will fully assemble the page (including meta tags and links) before it gets sent out, regardless of what is requesting it (that is, a browser or search engine). No special action is required.
Scenario 2 (client-side-include files): If you use client-side-include files (for instance, the JavaScript “source=” tag, or Microsoft Internet Explorer’s “IFRAME” tag), then the search engines will index the pages as if these parts were not included.
Scenario 3 (URL-based parameters): If your dynamically generated pages require URL-based parameters (that is, a “?param=value” suffix), then most engines will not index the page (because they assume that there are endless permutations, based on the input parameters). In this case, you would need to build the parameters into the path name of the page to get the search engines to index it. BellaCoola Software has successfully used this technique.
Scenario 4 (file-name suffixes): If your dynamically generated page names end in “.pl” or “.cgi” then AltaVista will index the pages, but Infoseek (Go.com) won’t. (.cfm pages are fine, as long as they have no parameters.) For Infoseek, there are workarounds, which BellaCoola Software has also successfully used.
Both Miller and Charlesworth have excellent technical backgrounds in the areas of modifying and submitting dynamic documents to robot search engines. As you can see by reading the above information, this is a very technical issue that can’t be adequately covered in an article and should be managed by professionals with the right expertise.
Exclusive to the above possibilities of reprogramming, you can run a totally separate search engine optimization positioning campaign that doesn’t change your dynamic site. I’ll discuss this in future articles on ClickZ. If you need quick answers, don’t hesitate to write or call for more information.