C & G web

META Tagging for Search Engines

The use of the META tag for specifying, to search engines, how you would like your document to be indexed.

There are now some 100 million publicly available web pages, I understand. Not even the most ardent surfer is going to visit all those, or even a fraction, and discover your wonderful little corner. You can advertise your presence, for free or for fee. You could have your URL tattoed in big letters on your body and then run naked across the White House lawn. Make sure the media is there and can get a good view.. Personally, I prefer to understand how search engines work, how to select and use keywords, etc.

You will need to study search techniques before you can optimize for them. The very least you should be familiar with is Alta Vista: Help for Simple Query. You will also need to understand how to construct an HTML Head with META tags, to declare keywords and a description, to be used by search engines. See the source of this page for example.

The META tag: Controlling how your Web page is indexed by AltaVista: In the absence of any other information, Alta Vista and some other search engines, will index all words in your document (except for comments), and will use the first few words (e.g. 250 characters) as a short abstract to serve back. It is possible for you to control how your page is indexed by using the META tag to specify additional keywords to index, and a short abstract.

This tag can be used to augment documents with 'meta' information that is not normally displayed by browsers. It provides document authors with a mechanism for identifying information that should be included in the response headers for an HTTP request. The markup is stored as attributes of the META tag and is not displayed when the document is loaded into a browser. However it can be extracted by servers and clients for use in identifying, indexing, and cataloging documents. Here is an example

<HTML>
<HEAD>
    <TITLE>CERTIFICATE IN TELEMATICS (INTERMEDIATE) Unit 02 Developing web pages
    </TITLE>
    <META	NAME	= "Keywords" 
	CONTENT = "HTML, web, design, CSS, browsers, plugins,
	graphics, HTTP, course, Runshaw, tags, notepad,
	hypertext mark-up language">

    <META	NAME	= "Description"
	CONTENT="A resource for CERTIFICATE IN TELEMATICS
	(INTERMEDIATE) students. Course notes for
	Unit 02 Developing web pages.">
</HEAD>
Use acronyms and spell them out. In general, you will only use a very large number of keywords on the index page of a large site. Do not include spurious irrelevant keywords - you might attract extra visitors who would not otherwise have come, but they might not thank you for it. If I added 'sex' to the above list I could expect both my traffic and my hate mail to increase dramatically..

One FAQ is: would the various search engines (like Infoseek, Alta Vista, etc.) that normally enter ALL the Web page text into their database, would they just enter the "keywords" from the meta tag in place of the HTML body text on the page. Or do they include all the regular HTML text visible on the Web page PLUS the meta tag "keywords"?

According to Infoseek's Using META Tags to Define Index Terms for Your Page, when a site is added to Infoseek's index, all the words on the page are included with the exception of any text within a <Comments> field.

<!--	This is a comment	-->
The META tag keyword field can be used to specify additional key words or synonyms that describe the contents of a site. META tag keywords are used in the indexing process but will not display on your Web page. The keywords can include up to 1000 characters of text. Be sure that the key words chosen are relevant to the contents of the page. Infoseek Guide indexes your entire page (except any text within comments), regardless of whether or not you include a description or keywords in the tags. The words in the tags are indexed in addition to the rest of the document.

According to Alta Vista, it is possible for you to control how your page is indexed by using the META tag to specify additional keywords to index, and a short abstract.

Both of them say 'additional', i.e. using the META tag supplies extra keywords, they do not inhibit the search for keywords in the text. Note that the same cannot be true for the abstract served back to the user - if you supply a description in META then this is what the user will see. You should do this if for some reason you don't have a descriptive paragraph at the start of your document, e.g. you really believe 1 GIF = 1Kwords. If you do have a descriptive paragraph at the start of your document (recommended) then better to omit the META description because you will forget to update one of the duplicates..

I used these tags on all my pages long before AV and ISG existed; they were necessary for Aliweb, and at the time this was all so experimental I only put 'Web Developers Virtual Lib.' in the description. It took me a while to realize the experimental period was over and everybody searching for HTML, CGI etc was only getting that terse abstract..

Not all engines use META; e.g. Excite says:- Our spider doesn't honor meta tags. We believe our decision protects our users from unreliable information.

So the best advice seems to be, work very hard on selecting your keywords (e.g. use a thesaurus to find other words people might use; brainstorm with friends and colleagues, etc) and put the most important ones into a carefully crafted paragraph at the start of your HTML document. Put the whole list into a META tag, most important or selective words first. Announce your page or site using one of the multiple submission services such as Entity. Keep your clothes on.

The FAQ goes on to ask: Also, does anyone know whether the text within "Comments" tags and "Alt" tags are entered into search engine databases?

The ISG goes on to say

"Infoseek Guide also indexes the ALT attribute in the [INLINE] tag. If your site mainly consists of graphics, you can also use the ALT attribute to describe your page."

So, comments are ignored, ALT text is not.

  • The HTML Meta tag.

  • How To Use Meta Tags, from Search Engine Watch.

  • The META Tag Builder is a fill-in form that will build an HTML header with appropriate META tags. These tags allow better indexing by robot-driven search engines, such as AltaVista, Infoseek.

  • PICS is an infrastructure for associating labels (metadata) with Internet content. It was originally designed to help parents and teachers control what children access on the Internet, but it also facilitates other uses for labels, including code signing, privacy, and intellectual property rights management. To generate PICS META tags for adult content see the RSAC or SafeSurf

Search Spiders

A spider is an automatic program that searches the Web and indexes what it finds. The index is a list of all the content the search service knows about. This index is what you search when you use a search engine. A search engine is the tool that translates a reader's search request into a query that searches through the indices and returns a search response. You are not directly searching the Web.
Some of the search sites services use spiders and some don't.

Feeding the Spiders with Meta Tags
Search spiders do not need meta tags. They will index your site without them and some of them do not take any notice of meta tags at all.

You can, however, use meta tags to help the search spiders that recognise them index your site more accurately and completely.

Suppose you have a site devoted to dinosaurs. Using the different values for the NAME attribute you can help the spider describe and classify the site so that the users you are trying to attract will find your site easily. Here are the most common:

Author
<META NAME="author" CONTENT="Steven Spielberg">
This identifies the author of the page. Sometimes a spider uses this but it is also useful for your own information, especially if you are part of a team working on a site.

Description
<META NAME="description" CONTENT="A site devoted to the appealing habits and wondrous diversity of dinosaurs.">
This description appears after your site's name in a search result. Without a description, a search spider uses the first few words on your page instead.

Keyword
<META NAME="keywords" CONTENT="jurassic, terrible lizard, fossils, prehistoric">
Keywords lets you list words and phrases that relating to your site.

Abstract
<META NAME="abstract" CONTENT="Taphonomy, the study of how fossils form, including all the processes undergone between death and study of the fossil, sheds light on the specimens from the nodule-rich conglomerates. They form in areas where soil-formed nodules and vertebrate remains are a substantial part of the coarse fraction of sediments available.">
The abstract is a longer description that more fully describes the page.

Expiration
<META NAME="expiration" CONTENT="4 June 2001">
Expiration meta tag lets you identify when your page becomes out of date.

So the finished page about Dinosaurs looks something like this:

<HTML>
<HEAD>

<META NAME="author" CONTENT="Steven Spielberg">

<META NAME="description" CONTENT="A site devoted to the appealing habits and wondrous diversity of dinosaurs.">

<META NAME="keywords" CONTENT="jurassic, terrible lizard, fossils, prehistoric">

<META NAME="abstract" CONTENT="Taphonomy, the study of how fossils form, including all the processes undergone between death and study of the fossil, sheds light on the specimens from the nodule-rich conglomerates. They form in areas where soil-formed nodules and vertebrate remains are a substantial part of the coarse fraction of sediments available.">

<META NAME="expiration" CONTENT="4 June 2001">

<TITLE>The World of Dinosaurs</TITLE>
</HEAD>

<BODY>

This is my page about dinosaurs. They are big and cool.

</BODY>
</HTML>

With some more actual content of course.

Another value for name that you may see is:

Generator
<META NAME="generator" CONTENT="BBEdit 4.5">
Some commercial HTML editors automatically include a generator meta tag in every page they create. It's just a bit of self promotion.

Note: Meta tags are important if you use frames on your site as there may be nothing on a frameset page except for a frameset.

Stopping Indexing

You may not want to have your page indexed by a spider. For example, when you are creating a site for a select group of people and you want to deter random browsers but don't want to go to the trouble of using a password system.

You avoid indexing by using the NAME value robots and setting the CONTENT to noindex.

<META NAME="robots" CONTENT="noindex">

Most spiders will now ignore your page.


Exercise

This is a practical exercise that needs to be added to your Project Folder. Also please record your answer for when you come to complete the exercise form.

  • Choose a topic that interests you and write a short HTML page about it. Name this page meta01.htm and save it to the metatag folder inside your Project Folder (project).

  • Add a series of <META> tags that provide a spider with:
    1. a description
    2. an author
    3. keywords

    Site Filtering

    Another way to use HTTP-EQUIV is with ratings services to define the sex and violence level of your content and screen out inappropriate viewers.

    Platform for Internet Content Selection, or PICS, is a World Wide Web Consortium-backed technical standard that lets a site self-rate its content.

    The application of PICS is through a meta tag that looks something like this:

    <META http-equiv="PICS-Label" content='(PICS-1.1 "http://www.rsac.org/ratingsv01.html" l comment "RSACi North America Server" for "http://www.foobar.org" on "1996.04.16T08:15-0500" r (n 0 s 0v 0 l 0))'>

    Previous page...

    Back to Top of Page

    Computeach International Ltd

  • Christopher Ward London Limited

    Christopher Ward London Limited