Google
 
Web www.albaspectrum.com

computer, internet, programming, Microsoft Dynamics, Oracle, Java, J2EE, EJB, SAP, ecommerce strategies, hightech jobs, h1b, web design, MS SQL Server, reporting, customizations, software, ERP, MRP, accounting systems, CRM - popular articles

 

How to Control Search Engine Robots

User-agent: ia_archiver

Disallow: /
ia_archiver is the crawler name for the wayback machine that you may have
heard of, and the / after disallow tells ai_archiver not to index any of your
site. The # allows you to write comments to yourself so you
can keep track of what you typed.
Type the above three lines into notepad from your computer and save it to the
root directory of your web site as robots.txt. Web crawlers look for this
document first at a web site before doing anything else. This helps the
crawler to do its job, and helps the web site owner tell the spider what to do.
Say for instance you have some data that you don't want the crawlers to see.
(Like duplicate content for other browser referrer pages) You can deter
crawlers from indexing the 'duplicate' directory by typing this into your
robots.txt file.
Or if you would like to have the robots.txt file created for you, visit
www.rietta.com/robogen. To validate
your robots.txt file to make sure it works properly you can visit

www.searchengineworld.com/cgi-bin/robotcheck.cgi
User-agent: *

Disallow: /duplicate/
The * after user-agent says that this action applies to all crawlers and
/duplicate/ after disallow tells all crawlers to ignore this directory and not
search it. For each user-agent and disallow line there must be a blank
space between them in order for it to function correctly. So this is how
you would create the above two commands into a robots.txt file:
# this identifies the wayback machine

User-agent: ia_archiver

Disallow: /
User-agent: *

Disallow: /duplicate/
One thing to note that is very important: Anyone can access the
robots.txt file of a site. So if you have information that you don't want
anyone to see don't include it into the robots.txt file. If the directory
that you don't want anyone to see is not linked to from your web site the
crawlers won't index it anyway.
An alternative to blocking indexing of your site is to put a meta tag into
the page. It looks like this:
You put this into the tag of your web page. This line tells the
robot crawlers not to index (search) the page and not to follow any of the
hyperlinks on the page. So as an example
tells the robots crawlers to not index the page, but follow the hyperlinks on
this page.
Did you know that Google has its own tag?
It looks like this:
This tells the Google robot crawler not to index the page, not to follow any of
the links, and not to keep from storing cached versions of your web site.
You will want this done if you update the content on your site frequently.
This prevents the web user from seeing outdated content that isn't refreshed
because of storage in the cache.
You can use the tag to specifically talk to Google's robots to avoid
complications or if you are optimizing your site for Google's search engine.
This concludes this month's article.
Until the next article have a great day!
Copyright © Michael Rock

(You have permission to copy this article as long as it remains intact with the
author's byline)


Web development contractor (Web Design and Hosting)

Internet Presence

www.TheInternetPresence.com


The owner of this registered company
has over twenty years experience with DOS, windows business applications, numerous
programming languages, artistic development, and web design. Other areas of
interest include web marketing, web promoting, and business marketing and
development. After the persuasion of those praising his work, he decided to go
into business himself and highly suggests everyone else to do the same.


About the Author



Internet Presence was founded in 2003
from a desire to become independent. Less than 1 year later Internet Presence
has had accounts in three different states ranging from a locally owned auto
collision repair shop to a glass packaging industry that sells its product
worldwide.

Alba Spectrum popular articles series: FAQ, Reviews, Introductions, Product Selections, Advises, Definitions, online marketing

We are serving wholesale & retail customers in Illinois, California, Texas, Wisconsin, New York, Washington, Ohio, Michigan, Indiana, Arizona, New Mexico, Louisiana, Florida, Georgia, Minnesota, Utah, Virginia, Georgia, Hawaii, Iowa, Colorado, Ontario, Quebec, Alberta, British Colombia.  We also serve customer internationally in New Zealand, Europe: UK, France, Poland, Italy, Germany, Russia, India, Byrma, Thailand, Holland, Denmark, Sweden, Norway, Indonesia, Austria, New Zealand, Pakistan, Afghanistan, Iran, Spain, Argentina, Brazil, Chile, Uruguay, Paraguay, Peru, Equador, Colombia, Venezuela, Panama, Costa Rica, Canada, South Africa, Nigeria, Portugal, Greece, Turkey, Asia: India, China, Philippines, South Korea, plus business metros: Chicago, Los Angeles, Phoenix, Boston, Atlanta, Minneapolis, Fargo, Seattle, Miami, Orlando, Detroit, Buffalo, Toronto, Paris, London, Montreal, Denver, Warsaw, Berlin, Prague, Rome, Karachi, Sao Paulo, Rio de Janeiro, Moscow, Buenos Aires, Dehli, Mumbai, Beigin, Cairo, San Francisco, Fremont, Naperville, Oakland, Melburn, Sidney, Sent Petersburg, Tampa, New Orleans, Houston, Dallas, Mexico City, Bogota, Caracas, Lima, Salvador, Recife, Brasilia, Curitiba, Goiania. http://www.albaspectrum.com