No Fluff Description Of Good HTML

Basic information on writing HTML files. OUT OF DATE: developed for mid-1990s.


Copyright1995, 2000, 2001 by Gilbert Healton
All rights reserved
If your browser is smart enough, selecting one of the following subjects should jump you to that section of this document. Once there, selecting the hyperlinked section number should return you to this table of contents.
1. Important
2. Introduction
2.1 Document Intent
2.2 Restrictions
2.3 Document Conventions
2.4 Point And Click
2.5 Target Audience
3. WWW History
3.1 WWW Project History
3.2 HTML History
4. Web Growth
5. Generalizations Of Web Pages
5.1 Web Browsers & URLs
5.2 Web Servers
5.3 Where To Put HTML Files
6. Writing Web Pages
6.1 File Names
6.2 HTML Files
6.3 Basic HTML Theory
6.4 Basic HTML Syntax
6.4.1 Element (Command) Chars & Escapes
6.5 Character Sets
6.5.1 Elements (Commands, or Tags)
6.5.2 Document Template
6.6 Elements For The Heading Area
6.6.1 *<HTML> Element
6.6.2 *<HEAD> Element
6.6.3 *<BASE> Base URL
6.6.4 *<TITLE> Element
6.6.5 *<LINK> Element
6.7 Elements For The Text Area
6.7.1 *Comment Elements
6.7.2 *<BODY> Element
6.7.3 *Heading Elements
6.7.4 *Paragraph And Line Break Elements
6.7.5 *<CENTER> Element
6.7.6 *Block Quote, a.k.a. Extended Quote Elements
6.7.7 List Elements
6.7.8 *<PRE> Preformatted Text
6.7.9 <PLAINTEXT> Plain Text
6.7.10 *Character Style Elements
6.7.11 Tricks For Special Chars
6.8 <IMG> In-line Image Element
6.8.1 Basic Image Command
6.8.2 Image Placement
6.8.3 Borrowing Images
6.9 <A> Hypertext Anchors
7. Testing & Debugging Documents
7.1 Local File Opens
7.2 WARNING!Reload Button
7.3 WARNING!Which Host To Login On
7.4 HTML Validation Programs
8. Putting Web Pages On Your ISP
9. Tips On Writing HTML files
9.1 Maximum File Size
9.2 Thumbnail Images
10. Forms and Imagemaps
11. References To Other Documents
12. Footnotes
13. Disclaimer

    ============================================================

1. Important

This information was first written in 1995 to reflect popular, and effective, ways of writing web pages at that time. While the contents of this page have not been updated much since then as the techniques described herein will still work.

I hope to get these pages updated to full HTML4 sometime this year.

2. Introduction

2.1 Document Intent

The intent of this document is to provide a practical introduction to directly writing good Web Pages with a minimum of fluff. For people who want to understand HTML and/or write their own web pages without using any of the HTML editors out there (which tend to produce verbose, and ugly HTML, all too often not looking good on all browsers). While this document doesn't describe the entire HTML mark-up language, what it does describe is fairly detailed. Hopefully I've combined the more useful information spread over numerous areas of the Internet into a single document to make it easier to learn how to write good html. HTML that will work on many browsers. People wanting more to chew on can look at the references at the end of this document. Some of the documents I referenced from are public domain. This is all intensively mixed up with my own words and discoveries. (Borrow from one document, and it is called plagiarism. Borrow from a dozen documents and it is called research.) I've been pushing composition systems beyond their intended limits for twenty years and have stumbled across some tricks in HTML that I haven't seen elsewhere.{1} Some of the less esoteric ones are shared herein.

[NOTE!!]To keep things simple I am not above telling little white lies. If you spot some small technical detail that is wrong, I don't want to hear about it unless either, 1) you can think of a simpler way of saying things I can use, or, 2) what I say can get people into trouble.

For the most part, only features widely available on most current browsers are described. Newer HTML features only rarely supported are rarely described, and only then, if they are particularly useful.

In practice this document is a direct result of wanting something to handout during a HTML presentation I made to a local group. Something short enough for people to read yet practical enough to be used. While doing HTML consulting I developed a number of tricks not in any documents I've seen. Some of the simpler ones are in this document.

2.2 Restrictions

This document assumes traditional ASCII character sets and the English language are sufficient for your web pages. Other encodings are quite possible, but not covered by this document.

2.3 Document Conventions

Throughout this document additional information is available by footnotes. Such information may be interesting, but is rarely critical. Each footnote uses the format {footnote number}, in curly braces, as hot text. Click on the footnote if you want to read more about the subject.

WARNING!Paragraphs that start with this graphic need to be read with care. They describe features, etc., that are frequent sources of problems to HTML authors.

[NOTE!!]Paragraphs that start with this graphic contain helpful, often overlooked, helpful hints. Something that is not required, but often very useful.

Some may argue, not without reason, that these graphics have been over used in this document, making the document uglier than needed. My response is important information before beauty.

2.4 Point And Click

In the beginning programmers created the Internet. People saw that it was good, but it was also difficult to use. Finding resources required acts of great technical virtuosity. Today, with a decent Web Browser, it takes little more than point-and-click (getting most Web Browsers to work however can still be a technical feat).

2.5 Target Audience

This document is targeted at people who know something about the Internet, and should have at least moderate "Web Browser" experience.

3. WWW History

3.1 WWW Project History

http://www.w3.org/hypertext/WWW/History.html

The Web is fairly new.

3.2 HTML History

HTML was designed with the original Web software to distribute information. HTML was based on ISO Standard 8879:1986 -- Standard Generalized Markup Language (SGML). Only text information was distributed at that time. There were no provisions for pictures or sounds. However, images became implemented and rapidly supported. HTML gave away to standard extensions that became known as HTML2. At the time of the latest revision of this document HTML 4.01 is the latest version.

Then some people, most notably those at Netscape, have added their own extensions to HTML. Although this document is mostly concerned with strict HTML compliance, some popular Netscape enhancements will be described herein. Most remaining netscape enhancements should be avoided to keep your document readable by most browsers. Others can be used if proper care is taken to prevent them from breaking your documents on non-netscape browsers.

4. Web Growth

HREF="http://www.cc.gatech.edu/gvu/stats/NSF/Packet_3_Area.GIF"

From Jan 1993 when there were about 50 known HTTP servers to to April 1995 with over 12,000 servers web has grown exponentially. As of January 1995, WWW packets now compose over 13% of the Internet packet count, second only to ftp packets (18%, and I'll bet a lot of those packets are from web browser requests).

For reports on WWW growth, complete with charts:


    ============================================================
(WWW) (WWW) (WWW)

5. Generalizations Of Web Pages

5.1 Web Browsers & URLs

Programs that let people read the web are called "Web Browser" programs. There are browsers for most platforms. Likely many browsers. Two popular browsers are Mosaic, the original browsers, and Netscape.

All web pages are known by their URL (Uniform Resource Locator). The key here is Uniform. URLs provide a consistent, uniform, way of accessing the diverse types of information on the Internet. No matter

Most URL's are in the format:

method : //hostname : port / path/name
Where method identifies how the web browser is to access the data, hostname identifies the computer on the Internet (defaults to current computer if omitted), port is the "port" number (usually omitted to default to 80), and path is the optional path name to the resource you want.

Most of the time Web users do not directly use URLs. The web pages themselves contain hypertext pointers to the URLs of interest. However, there are times users need to enter URLs directly, so be prepared. URLs can be case sensitive, so copy them exactly as given.

5.2 Web Servers

Each computer providing information to the web is called a "web server". The typical server program is called httpd for "HyperText Transport Protocol Daemon". Unless you are very serious in your webbing, you will rarely encounter the name of this program. Even this document only mentions it in a very few places.

If you are experiencing trouble on a particular web server, the E-mail address webmaster@www.wherever.com usually works in getting a message to the local "webmaster". Please be sure it really is their problem before sending a message. Describe the tests you made, their results, etc., to help the webmaster fix the problem.

5.3 Where To Put HTML Files

Most UNIX web page servers allow users to create a public_html directory within their home directory. HTML files placed in public_html are available to the web.
The public_html directory, and all files in it that anyone, even yourself, are to access through the web must be readable to the world. Directories must have execute permission for the world set. (chmod a+rx ~; chmod -r a+rX ~/public_html is one way to get this on UNIX systems). Remember, your browser is not reading the files, but the httpd program. So if they are not readable by "other", likely httpd will not be able to read them, even from requests from your logon.
The alternative to public_html, at least on UNIX systems, is to put the files into a special directory tree. The httpd program is configured to call some directory it's local "root" directory where incoming absolute path names start at. Unlike public_html, this later method usually requires the services of a friendly system administrator.
I would love to hear about conventions on other systems. You can E-mail me at ghealton@exit109.com. Please refer to the document title in your E-mail.

    ============================================================

6. Writing Web Pages

6.1 File Names

Generally any file name acceptable to the server can also be made acceptable to the web. The operative word is generally. Restricting file names to the basic special characters of "-", ".", and "_", is strongly recommended. Under pain of extreme headaches, avoid "<", ">", "&", "?", "*", "+", quotes, "#" and "%".

The names of HTML hypertext files must have an extension of ".html" (".htm" on MS-DOS systems{2} Regular ASCII files, which are displayed without any formatting, have extensions of ".txt". Graphic files can use a variety of data formats, though more web browsers support ".gif than any other format. Unfortunately .gif files are now subject to a licensing controversy. If you use gif files not produced by a product properly licensed by Unisys you may be subject to legal action. Therefore using the newer, and now very popular, .png image format is suggested.

Web browsers support file types beyond graphics. The one additional format I'll mention here is ".au for audio messages. Look at the configuration / preferences of your own web browser for additional types.

The name of the file extension identifies the type of data in the file. Your friendly web browser uses this extension to determine how to process the file. Somewhere each browser has a list of file extensions that identify the program needed to process them. If the extension is unknown, the typical default is to ask users if they want to stop the operation or download the file{3}

A slash (/) is always used to separate directory names, regardless of the server.

The permissions of all .html files must be readable by the world.

[NOTE!!]The file name index.html is reserved by most UNIX web servers (and other servers?). When a URL pointing to a directory name is given, the server first attempts to open the file index.html to return that as your index. If this is not found, then some type of listing of the real directory is made and returned. Each file having a hot anchor to allow its selection. This typically includes "..", or "Up to previous level", to return to previous directory level.

WARNING!To keep your directories secure, always have an index.html file in your top-level directory; This is a wonderful place to put your welcome to the world. It also provides a shorter web address: /~ghealton/ instead /~ghealton/index.html. It also helps you to keep your secrets.

6.2 HTML Files

HTML documents are in plain text format (e.g., ASCII). They can be created and updated using your favorite text editor. A few Web browsers include rudimentary HTML editors in them{4}

Stand alone WYSIWYG editors are also available (e.g. HotMetal for Sun SparcStations, HTML Edit for Macintoshes), along with programs that convert between HTML and the formats of some word processors. However, there is nothing like knowing the basics of HTML to let you know what can and can not be done in HTML.

[NOTE!!] Most browsers provide a way of viewing the "Source": the original raw HTML file before it is edited. Looking at documents that do things you want is a good way to learn HTML. Finding good documents to mimic however......

This document describes files formatted in the HyperText Mark Up Language, HTML. While some web browsers support PostScript*{5}, HTML documents by far and away allow more people to read them.

6.3 Basic HTML Theory

Hypertext Mark Up Language (HTML) is derived from Standard Generalized Markup Language, SGML.{6}

WARNING!To cope with the truly huge variety of computer hardware used to read, and print, web pages, HTML has been designed from the start to be truly device independent. You have no control, yea not even any idea, about how browsers around the world are configured. You can take some averages, but that is about it{7} You really can't have tight control over your document the way you may be used to over your local printer. Unlike WYSIWYG word processors, HTML authors only have minimal control over most aspects of the final output.

Web browsers locally format the document and images to get viewable pages each time the document is referenced. Thus HTML is simple to allow quick formatting to the local environment. There are text environments and graphic environments. Environments with narrow windows and wide windows. Environments that default to English and those that don't. Environments for the color blind and environments for the plain blind.

Most styles of the page: text size, heading size, etc., are under the control of the local browser configurations. You can say where things like paragraph breaks are to occur, but you can't control what they look like. Therefore authors must forget about having much control over point size, fonts, etc. True, there is some control which we will get into later, but it is minimal, and in practice, not always known in advance. The only way to have 100% control is to make a graphic with the text you want, and display that. But this greatly degrades performance of your pages, especially to people using dial-up Internet access. It can also make your documents quite difficult to read in some environments and impossible to read in too many others.

HTML was originally designed for text documents. The ability to include graphics was included later and was only intended for occasional graphics. HTML is not a graphics oriented mark-up language. Was a result, HTML is not a beautiful language. Documents with lots of elements can be quite clumsy. Personal tastes of authors also affect the way the raw unformatted files look.

For many years, SGML and HTML were generated by manually editing the raw files. Thus many people used to better WYSIWYG editors develop a hatred of these languages. The growth of WYSIWYG HTML editors should help here. However, some advanced people may still wish to do extensive hand editing to retain total control over the files.

6.4 Basic HTML Syntax

6.4.1 Element (Command) Chars & Escapes

HTML documents are made of regular text interspersed with elements (commands) and special escape sequences for character references. Commands are sometimes called tags. Each tag, including any contents and ending tag, is called an element. Escapes are sometimes known as Entities.

All HTML tags are enclosed within "angle brackets", "< >". Escape sequences for special "character references" are introduced by an ampersand character, "&", and end with a semi-colon, ";".

WARNING! As a result of this, the three characters "<", ">", and "&", can not be keyed directly into a document for printing. The corresponding escape sequence "&lt;", "&gt;", and "&amp;", must be used instead. Other escape sequences are possible, and discussed elsewhere. One of interest is "&quot;", which provides a printable double quote within < > characters. Visit http://www.w3.org/TR/html401/sgml/entities.html for a complete list of references.

WARNING!Never omit the semi-colon that closes the escape sequence!

To simplify formatting, multiple white spaces, which usually includes line endings, are collapsed into a single white space. It doesn't matter what combination of carriage return/line-feed codes are used to end lines. Indeed you don't even have to use these codes at all in your document. At the start of paragraphs, and a few other places, white space is ignored.

The first part of an element is the HTML tag name, which may be up to 33 letters, digits, hyphens, or periods.

Many tags have optional attributes. Some attributes have values (arguments) to them. Attributes with values have an attribute name, an equal sign, and then the value. The standard requires all values to be quoted with double or single quotes, though most browers do not enforce this for simple values without strange characters or spaces. (NOTE: this author has yet to meet a browser that enforces the double quote rule... if you know of one, please E-Mail the author about it at ghealton@exit109.com). Values may be up to 1024 characters.

<IMG src="/~ghealton/images/warning1.png" ALT="[WARNING!!]" align=bottom>

WARNING!Browsers are supposed to silently ignore tags, attributes, or entities they do not understand. While this makes it possible for old browsers to read pages with newer commands, care must be taken if the resulting page is to still look appropriate. A lot of this document is dedicated to this facet.

6.5 Character Sets

In general the complete ASCII character sets are available to all users. Most graphic browsers support the The ISO LATIN I character set. This includes fractions, section marks, and other useful characters. A general discussion of entities is also available at http://www.w3.org/MarkUp/html3/latin1.html.

If you need to make pages readable by text-only browsers, you need to avoid these characters (but see Tricks For Special Chars).

6.5.1 Elements (Commands, or Tags)

A non-blank character should immediately follow the <. The tag name, followed by any attribute names, are nOt sENSiTIve To THE CASE oF the LETTERS. Text arguments after "=" attributes are case sensitive only if they contain text to be printed.

Once the initial tag name is given, line breaks may occur wherever spaces are valid.

WARNING!Never forget the closing > nor attempt to use > or < embedded within text arguments of the attribute. Some browsers totally mess up the formatted pages on these omissions. Always close elements requiring closure and properly use escape sequences.

WARNING!Another popular mistake is to omit the closing quote on a string. This too blows up some web browsers while others terminate on the >, even though it is officially in a quote (remember, use &gt; escapes to output magic characters, except in ALT, where some browers do not service & escapes).

WARNING!WARNING!The mistakes in the previous two warnings are the most common mistakes for both beginning and advanced authors. Head this warning well.

Some tags are paired (e.g., start and end italic type). In this case, the same tag name is used to both start and stop the element pair, but the closing tag starts with a slash, "<BLOCKQUOTE>/</BLOCKQUOTE>". To close an element, it is very important to use a "/".

to close an element, "<STRONG>it is very important to use a "/"</STRONG>.

6.5.2 Document Template

The following provides a template for preparing HTML documents. While beyond the absolute minimum needed to make documents, it provides the minimum format considered "good" by various authorities.
	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
        "http://www.w3.org/TR/html4/loose.dtd">
	<HTML>
	<HEAD>
	<TITLE>Quick Overview To Writing Html</TITLE>
	<META name=author content="Gilbert Healton">
	</HEAD>

	<BODY>
	<CENTER><H1 align=center>QUICK OVERVIEW TO WRITING HTML</H1></CENTER>
	<H1>Introduction</H1>
	<H2>Document Intent</H2>
	The intent of this sample is to show you text
	close to what was used to open this document.

	<EM>The rest of the document</EM>


	<HR>
	</BODY>
	</HTML>

6.6 Elements For The Heading Area

6.6.1 *<HTML> Element

<HTML> should start all HTML files and be the last line of all HTML files.

6.6.2 *<HEAD> Element

<HEAD> starts the heading area for the document. This is information that is NOT part of the normally printed information. While the official specification allows </HEAD< elements to be omitted (their locations can be inferred), this author prefers not to use this shortcut.

6.6.3 *<BASE> Base URL

Allows the base URL of a document, usually a collection of separate files, to be explicitly specified. Normally, when URL's omit selected fields to a file, the default takes the corresponding fields from the file containing the abbreviated URL. Use of <BASE> allows authors to specify the default fields.

6.6.4 *<TITLE> Element

<TITLE>, which occurs in the heading area, provides a title to the entire document. There must only be one <TITLE> in a document. The title should fit within 60 characters. The title text usually displays at the top of the web browser window and remains in place no matter how the window is scrolled.
WARNING!TITLE text can not contain most HTML elements. Titles may only contain the minimal escape characters needed to get the ASCII characters <, >, and &. Remember, the title text is often not rendered in a graphic area, but in a simple character bar/window/something, that can only display dump text characters.

6.6.5 *<LINK> Element

<LINK> allows authors to define relationships between the document containing the <LINK> and the document specified by the corresponding URL. This information is static and is not formatted. Sometimes this is used to make tool bars of navigation buttons or menu items.
URL=
URL of the other document. Often a href="mailto:owner@somewhere.com attribute or a link to the author's home page.
REL=
Relationship to other document.
  • Bookmark: References a document providing direct links into an extended document.
  • Copyright: Link references copyright to current document.
  • Home: Top of hearty. Can be a home page.
  • Glossary: Link references glossary to current document.
  • Help: Link references document offering help. Intended to help people who have lost their way by offering more links or a wider context.
  • Index: Link references index to current document.
  • Made: The file maker or owner is described
  • Next: Link references "next" page of a guided tour.
  • Previous: Link references "previous" page of a guided tour. Compare with Up.
  • TITLE: used to label BOOKMARK entries.
  • ToC: Link references table of contents document.
  • Up: Link references "previous" page of a document hiearchy.
REV=
Reverse relationship. How the URL relates to the document.

REV can be any reasonable text. HREF must contain your E-mail address.

6.7 Elements For The Text Area

This is just a quick summary of HTML mark up elements currently supported by enough web browsers to make them useful. See A Beginner's Guide to HTML for more details.

6.7.1 *Comment Elements

Comments are introduced by "<!-- " and end with " -->" (note the two hyphens at the beginning and end, and the position of the spaces). Comments should not contain multiple hyphens in a row as part of the comment text as some older browsers get confused over this.

Comments may also appear in the heading area, or most other places. Avoid them within tags, <TITLE> text areas, and text form fields.

WARNING!Nested comments are a no-no as some browsers are reported to stop at any >, not just -->. Also avoid enclosing <> tags within comments as some older browsers definitely do the wrong things here.

Wrong way to comment a tag: <!-- <CENTER> -->
Right way to comment a tag: <!-- CENTER -->

Some browsers do not allow "--" to occur in comments except at the final --%gt. Thus "--" must also be avoided within comments.

6.7.2 *<BODY> Element

<BODY> starts the elements and text that are to appear on the formatted web pages. Once <BODY> is given, full graphic abilities are available. </BODY> ends the text area of the document.

Three interesting attributes are BACKGROUND="path", BGCOLOR="#rrggbb", and TEXT="#rrggbb".

BACKGROUND= provides the URL to a file that is to be used as a background image onto which the text and regular graphics are to be rendered. This is usually a small graphic which will be tiled by the browser to fill the page.

BGCOLOR= and TEXT= accept six-digit hexadecimal values that provide the Red, Blue and Green levels of the indicated items on the page. A BGCOLOR should be used with most BACKGROUND= requests to a value that matches the background color of the background image. This ensures consistent background color in all cases. There are also LINE=, VLINE=, and ALINK= for the various links. C0C0C0 is close to the grey used by most browsers.

NOTE: earlier browsers, and there are a lot of them still in use, do not support these attributes. Just use them to make your documents look neater... do nothing that depends on them.

6.7.3 *Heading Elements

HTML supplies six levels of headings: <H1> down to <H6>. <H1> generally produces the largest type and <H6> the smallest.

Each heading ensures a line break to start a new line, and usually changes the point size, and sometimes style or typeface, used to typeset the heading. This is under the control of the local configuration.

Skipping heading levels is not recommended by the official documents (e.g., going from <H2> to <H4> without an <H3>).

However, due to the original lack of well received font size elements, many documents go directly to <H6> to print entire paragraphs in small print.

While I am not aware of any browser having trouble with this, it is still unofficially blessed. This paragraph is an example of such a <H6> element.

6.7.4 *Paragraph And Line Break Elements

Paragraphs are started by <P> elements and simple line breaks by <BR> elements. Paragraphs may end with an option </P> tag. If the paragraph is not explicitly ended, it ends at the next start or end of a block.

<P> breaks not only force an end to any previous line, but typically insert a small amount of extra vertical white space above the new paragraph.

<BR> elements end any pending line and ensure the start of a new line. Extra space is not formatted.

If one of these elements are given when a partial line is not pending, the result is undefined. Some browsers ignore superfluous break elements while others put out a blank line.

WARNING!If you don't want extra, ugly, white space in your documents then avoid these break codes after codes that imply a break. Just because it doesn't look ugly on your browser is no reason to assume ugly space is never on other browsers.

For the rest of this document, each element that produces a break will be identified. So if the description includes "this element breaks", it is not referring to built-in bugs.

6.7.5 *<CENTER> Element

<CENTER> is an element that starts one or more centered paragraphs. As a paragraph element, it breaks a line. The kicker is that this is an Enhanced Netscape element and is ignored by many other web browser programs.
WARNING!To ensure documents don't look too ugly on systems that don't support <CENTER>, a break should be forced with both <CENTER> and </CENTER> (e.g., <BR><CENTER> and </CENTER><BR>). This may put extra space into browsers supporting <CENTER>, but it will ensure other browsers don't wrap lines in unfortunate ways. Omit explicit <BR>s only if there already is an element that guarantees line breaks (e.g., <HR> <BLOCKQUOTE>).

6.7.6 *Block Quote, a.k.a. Extended Quote Elements

The <BLOCKQUOTE>, and matching </BLOCKQUOTE>, delimit paragraphs that are to be indented on both the left and right margins. Often used to quote a block of text from another document.

The standard break characters of <BR> and <P> are used within block quotes for longer paragraphs.

While <BLOCKQUOTE> results in a paragraph break, </BLOCKQUOTE> should be followed by an element that issues an appropriate break for the next paragraph.

6.7.7 List Elements

The various list elements allow authors to present lists of information. All of the list elements break. There are three types of lists.
  1. <OL> Ordered List Elements produce lists where each paragraph starts with a number. These numbers are generated locally by the web browser, but should be consistent across all browsers.
    Ordered lists are enclosed in <OL> and </OL> elements.
    1. First item
    2. Second item
  2. <UL> Unordered List Elements produce lists that have bullets or other characters before each paragraph.
    Unordered lists are enclosed in <UL> and </UL> elements.

    <LI> List Item Elements start each item within the above lists. <LI> element. There is no matching </LI>.

    [NOTE!!]There should be no spaces between the <LI> element and the first word of text. In particular <LI> should not be on a line by itself. Otherwise an extra, ugly, space may be inserted before the first word.

    To get additional line breaks within lists, use <BR> and <P> in the normal way.

    WARNING!Some authors use <LI> elements without enclosing them in a proper list. On some browsers this gives you bulleted lists without indents. On other browsers it brings disaster.

  3. Definition Lists are used to provide a list of definitions. Instead of bulleting, etc., each item, the definition is placed in the left margin and the text indented to the right.
        <DL>
        <DT>Def1<DD>Text to definition.
        <DT>Def2<DD>Text to definition.
        </DL>
    
    Produces:
    Def1
    Text to definition.
    Def2
    Text to definition.

    WARNING!Don't forget to end the list with </DL>.

    WARNING!Each <DT> element may be followed by no more than one <DD> element. To get line breaks within lists, use <BR> and <P> in the normal way.

    NOTE: the DL element on some browsers supports a COMPACT attribute, <DL COMPACT>. When used, it closes up what vertical white space it can. Without COMPACT, each string of <DT> text is guaranteed to be on a separate line than the <DD> text. With COMPACT, they can share the same line, given room.

    Without COMPACT:

    Text to definition.
    Another definition.

    With COMPACT:

    Text to definition.
    Another definition.
    If the above two paragraphs look the same on your browser, then your browser doesn't support COMPACT.
List elements may be arbitrarily nested, though limiting to three nest levels is often recommended by some authorities.

6.7.8 *<PRE> Preformatted Text

Blocks of preformatted text are enclosed within <PRE> and </PRE> elements. Both of which force line breaks. The text is typeset in a monospace font.

Use of < > elements within <PRE> blocks should be limited as not to break the preformatting of text.

When elements in a line result in painfully long lines, you can break lines without breaking the preformatting. Simply put the extra line break to keep the raw file workable, a break that you don't want formatted, within a < > element.

    <PRE>
    This is preformatted text
    To return <A HREF="http://www.exit109.com/~ghealton/"
	><IMG SRC="/~ghealton/images/smbtprev.png" 
	ALT="[RETURN]" align=middle
	>to home page</A>
    Now is the time for all good computers 
    to come to the aid of their users.
    </PRE>

The above would produce:

    This is preformatted text
    To return [RETURN]to home page
    Now is the time for all good computers 
    to come to the aid of their users.
Note how "To return [RETURN] to home page" appears on a single line, even though it is spread over multiple lines in a pre-formatted block. If nothing else, line breaks could be contained within <!-- and --> commend elements.

Another Note: if you put a <PRE> on a line by itself to start a block of preformatted text, the end of line after the <PRE> produces a blank line on the output.

6.7.9 <PLAINTEXT> Plain Text

WARNING!Some browsers support </PLAINTEXT>, though it is not an official element. Avoid </PLAINTEXT>.

6.7.10 *Character Style Elements

Once you have characters, there becomes a need to bold and italicize them for various reasons. While there are <B>bold</B>, <I>italic</I>, and <TT>typewriter text</TT> elements, use of such "physical styles" is not recommended. Remember, not every output device supports such attributes and requesting them may result in the wrong assumptions for your needs.

Some browsers also support <U>for underline, but don't count too heavy on underlines as many browsers ignore underline requests.</U> Some authorities discourage underlines due to the potential confusion between hot text and non-hot text, especially on monochrome screens. This author also votes this way. Always avoid explicit underlines within <A> anchor commands.

Rather than assign specific physical styles, logical styles should be used to allow greater flexibility in matching your requests to the local output device. A line showing a sample of the style follows each *:

<ADDRESS>:
for E-mail addresses. Results in a break.
* Please send comments to
ghealton@exit109.com
for consideration.
<CITE>:
for titles of books, magazines, etc. Typically displayed in italics, but very much subject to change for the local nation's conventions.
* The Basics Of Good HTML
<CODE>
for snippets of computer code. Always displayed in a monospace (fixed-width) font.
* chmod -r a+rX public_html is one way
<DFN>
for a defined word. Typically displayed in italics.
* HTML stands for Hypertext Markup Language
WARNING!Not every browser supports <DFN>.
<EM>
for emphasis. Typically displayed in italics on browsers that support italic. This is the primary italic substitute.
* The public_html directory, and all files in it that anyone,
<KBD>
for signaling user keyboard entry. Should be displayed in a monospace (fixed-width) font, and sometimes bolded, font.
* Enter the URL http://www.exit109.com/~ghealton/
<SAMP>
for samples of computer output. Displayed in a fixed-width font.
* can result in Permission denied errors
<STRONG>
for strong emphasis. Typically displayed in bold on browsers that support bold. This is the primary bold request.
* The file name index.html is reserved by
<VAR>
for ``metasyntactic'' variables. Shows where user's are to replace a string with one of their own. Typically displayed in italic fonts.
* Type in HREF="mailto:yourcode@somewhere.xxx".

WARNING!When using nested styles, care must be used in unwinding the styles using the reverse of the order they were called out in:

WRONG: <I><B>Bold Italic </I></B>
CORRECT: <I><B>Bold Italic </B></I>

WARNING!When using nested styles, as above, some browsers do not support nested styles. There are two major ways nested styles fail:

  1. Selecting a nested style abandons the previous style.
  2. Clearing the innermost style also clears all other styles.

When using nested styles, be sure the all styles clear at once. Using one style inside another style will fail on some browers. It will may render as "Using one style inside another style will fail on some browers." Ending nested styles at a common location is OK if you don't mind it rendering as "Ending nested styles at a common location is OK."

Use of styles in headings is discouraged as heading already imply assorted font changes.

6.7.11 Tricks For Special Chars

Using the special characters available through the escape sequence can cause problems when people on non-graphic browsers read the documents. Text browsers, and there are still lots of them out there, simply can't display the non-ASCII characters. Some characters on popular lists also fail on some graphic browsers as most lists extend on the formal standards, defining their own version of officially "undefined" characters.

The copyright and trademark characters, which have legal implications when they are improperly published, pose particular problems. Use of (C) and (R) is not legally sufficient. Use of TM (or (TM)) works for non-registered trademarks, but looks ugly. With Netscape, <FONT SIZE=-2>(TM)</FONT>, which produces (TM) can help, but still looks poor. Other browsers support <SUP>TM</SUP>, which produces TM, which is real nice, when it works. The problem is many browsers do not support either of these commands.

My solution is to make special png files containing these symbols with transparent grey backgrounds. To cope with text-based browser, and appropriate ALT= line is provided.

<IMG src="/~ghealton/images/c12.png" ALT="(Copyright)" >(Copyright)
<IMG src="/~ghealton/images/r12.png" ALT="(Reg. Trademark)" >(Reg. Trademark)
<IMG src="/~ghealton/images/tm12.png ALT="(TM)">(TM)

6.8 <IMG> In-line Image Element

6.8.1 Basic Image Command

To place an image into a document, the <IMG> element is used. <IMG> Has three basic attributes:
SRC=
URL of the actual image file. Should be enclosed in double quotes if it contains odd characters.
ALIGN=
Controls how the image is to align with the baseline of the line.
[WWW]BOTTOM is the default.
[WWW]MIDDLE aligns to the middle.
[WWW]TOP aligns to the top.
ALT=
Provides alternate text for use by text-based browsers that do not support graphics. Without this, such browsers usually default to "[IMAGE]". However, this can look ugly. So alternate text should be provided for this.

NOTE: ALT="" will not stop the default. Nor will ALT=" " (space between the quotes). There must be at least one non-white-space character to override the default.

Avoid all & escape codes within ALT text. Some text browers, particularly the popular lynx, do not service escapes in alternate text. Other browers do, but lynx does not, at least the revision found on various systems I use for testing.

Some tips on constructing URLs are described in the next section on anchors.

6.8.2 Image Placement

Images generally appear flushed left against whatever precedes them on the line.

WARNING! Browsers treat an in-line image as a single, and perhaps very large, character. If any text or more images follow the image call, such continues to the right in the normal fashion. Once the right margin is reached, the text starts a new line below the image.

     __________
    |          |   Although you cries and wines,
    |          |   You can not put many lines
    |          |   To the image right like this
    |  IMAGE   |   As many browsers may miss
    |          |   Displaying your pages true
    |          |   the way you intended them to.
    |__________|   At least in a portable manner.

Some browsers allow it!
However, if you are careful, it can be done, at least with some browsers. You must be very careful when constructing these or the document will look truly horrible on any browser. There are lots of "gotyous" here!

Instructions on the proper way to construct these will be provided to registered users of this document.

6.8.3 Borrowing Images

If you borrow images from many of the public domain or freely distributable archives on the net, please copy the image to your server rather than If it gets too bad they will take the images offline, which will break your web pages. point your document to the archives. Pointing your document to the archive only annoys the archive's webmaster In this case the owner will explicitly ask you to link to their pages. by needlessly increasing their local Internet traffic. The main exception to this rule are for images that by their nature are always being updated and need to be frequently reloaded.

This author maintains a list of pointers to public image libraries he has used at /~ghealton/.computer_web.html.

6.9 <A> Hypertext Anchors

Hypertext anchors are used to reference other web pages, external images, sounds, animations, and even E-mail requests. Anchors can also provide optional reference points for other anchors.

Anchors are bounded by <A> and </A> elements. The <A> element has two attributes. At least one attribute must be used.

HREF=
URL to the resource to call. The extension on the file identifies how the file is to be processed by the browser. The most widely supported formats are .html for other HTML files, .png for images, and .au for audio files.
NAME=
Label to assign to this particular location in the document. An anchor in another part of the document may reference this by use of HREF="#Label" (note pound sign before the label. This # only occurs in the HREF= attribute and is not duplicated in the name= attribute.

The footnote calls in this document make use of these features.

The # may be used to select a specific part of any document.
HREF="/~ghealton/nofhtml.html#foot-"

NOTE: HTML3 replaces name= with ID=. NAME is still supported under HTML3, but is "deprecated". However, name= must still be used at this time if you want to write portable HTML files.

The text between the anchor elements is the hot text that user's may select to go to the described page. This may include inline image calls.

Some points about anchor use:


    ============================================================

7. Testing & Debugging Documents

7.1 Local File Opens

Most browsers allow you to call up local files an any directory tree. This can be very useful when developing new web pages. Simply put your draft pages within a special directory, away from live directories like public_html. Once you are happy with the pages, they can be moved to the live area.

People stuck with developing pages under dial-up connections to the Internet can find local files particularly useful. Once any images you need are saved on the local disk, and relative path names are used to call all in-line images and related documents, you can happily develop the pages without an active dial-up connection.

7.2 WARNING!Reload Button

To reduce download times, many browsers "cache" images, and perhaps even actual .html files. This is particularly true of browsers intended to work over dial-up lines.

As a result of the cache, when corrections or other changes are made to the original document, you may not see them immediately if you browse the web again. To see the file as it currently exists, discarding any cache, most browsers have some type of Reload button. Pressing Reload should throw away the cache for the current page and force the browser to reload all elements of the page.

7.3 WARNING!Which Host To Login On

Depending on local conventions, most users update files on their web server by using telnet or ftp. The kicker is, you need to login to the actual host serving the web. On some networks it usually doesn't matter what host you log in on: the network service makes all files available to all hosts. This can even appear to happen with your web files. The kicker is, if you have not logged in on the actual web server host, it may take a short span of time before updates made on another host are flushed to disk and become available to the server.

If you make changes to a file, write them out to disk, press Reload, but don't see the changes, this host issue may be your problem. Logging into the web server host avoids this problem.

7.4 HTML Validation Programs

One of the biggest headaches in developing HTML pages is the fact that you rarely, if ever, get error reports out of browsers. No matter how bad the HTML file. If browsers encounter something they don't understand, the problem is silently ignored. This allows new HTML elements, features, etc., to be added to HTML without having to change every old browser on the net. Hopefully, but having browsers ignore new elements, the documents can still be displayed in a somewhat readable format. It may look ugly to the intended format, but all text and images should still be there.

While very useful, this can make it difficult to find problems in your web pages. Especially if you are using a smart browser that does a decent job recovering from many popular errors (omitted > chars, and quotes within < > elements being the most popular). Other browsers definitely do not recover from such errors.

To help with this, several people have developed programs that check web pages to ensure they not only comply with HTML standards, whatever those are, but point out many popular mistakes. If you use the WYSIWYG editors, you should not normally need these validation programs.


    ============================================================

8. Putting Web Pages On Your ISP

This document assumes you have a UNIX based ISP that uses the traditional "public_html" directory in your home directory. Other conventions exist, but this is by far and away the most popular convention.

Your "home page" file name is typically "index.html". If necessary, you can store it under the DOSish name "index.htm" and rename it on the fly as it gets moved to the ISP.

In general the remaining HTML files may use .html or .htm, but be consistent everywhere. Use of .html is strongly suggested.

A file transfer program named FTP is the traditional program for updating the web pages on your ISP. There are windows based FTP programs and DOS based FTP programs. The use of DOS based FTP is described here (command line FTP). The beauty of command line FTP programs is they are mostly the same no matter which operating system you are using. If you have a windows based FTP that you prefer to use, go ahead and use it. It shouldn't be to hard to figure out how to use a decent windows FTP program based on the following instructions.


    ============================================================

9. Tips On Writing HTML files

9.1 Maximum File Size

For a practical limit, especially considering the number of dial-up browsers out in the world, this author likes to limit file sizes to 40K to 50K bytes. This includes not only the raw HTML file, but any in-line graphic files.

When building a heiarchy of web pages, this does not need to count the small icons that are used over and over again on most pages. Due to caching, these only need to be downloaded once by the browser and from then no repeated use should not incur a download penalty.

This page is larger than my traditional limits. By keeping it as a single page it is much easier to print. Some people keep their files as both a single page and a series of hypertext links. I've done it to, but this page is so new, and no one is paying me to maintain it, so it is kept as a single file. At least for now.

9.2 Thumbnail Images

As large images incur large download penalties for dial-up browsers, having large images, especially many large images, are not advised. If you have a T1 line, this is not a connection. But dial up lines can load about 1,000 bytes-per-second. 250K images are not appreciated.

The typical way out is to have a small image, as small as you can manage, perhaps with a resolution of 72 lines-per-inch, that is called up on the page. By placing this in an anchor, users can click on the thumbnail to get an enlarged image. So, if they want to take the time to download an image, they can. If they don't, they are not forced to wait.

This author prefers not to have the anchor point directly to the enlarged image, but another .HTML file that calls the enlarged image in-line. This allows TITLE, and other useful information, to be included in the call. If this done, you should also have this second HTML file include an anchor that directly calls the image. Some dial-up web-browsers don't automatically display in-line images and this direct image call allows them to see the image.

To be finished.


    ============================================================

10. Forms and Imagemaps

UNIX web servers support a Common Gateway Interface "bin" directory called "cgi-bin".

To be finished.


    ============================================================

11. References To Other Documents

http://www.cc.ukans.edu/~acs/docs/other/HTML_quick.shtml
Html Quick Reference
http://www.netscape.com/
Netscape Corporation Home Page
Where to get information on Netscape, including the program itself.
http://www.w3.org/MarkUp/
HyperText Markup Language Home Page
General information about HTML including plans for new versions.
http://www.w3.org/MarkUp/SGML/Overview of SGML
Simplified Generalized Markup Language.

    ============================================================

12. Footnotes

The following footnotes are used throughout the document. Selecting {footnote number} in the footnote should return you to the spot in the document the footnote was first referenced (typically where you came from).

{1} Of course this doesn't mean that other people haven't discovered them. Just that I haven't seen them.

{2} Unfortunately using extensions of .html to call out files on local DOS systems that actually have extensions of .htm fails on some DOS browsers. Usually the DOS browser can be configured to accept both .htm and .html, but for some reason, this was not the original default. The real .html files from the web world don't have problems.

{3} Some web pages provide their contents as a zip file to reduce both archival local storage and down load times. Very handy for groups of pages spread over many different files. By not having a reader configured for ".zip" format, you get the option of downloading the file. Thus most people don't configure their browsers to have ".zip" readers.

{4} I read in 1996 that this includes CERN's Web browser for NeXT computers and tkWWW for X Windows.

{5} PostScript* files are static. They can't make use hypertext, or other, HTML features, that makes HTML so powerful. However, they look nice as you get much more control of the pages. PostScript is a registered trademark of Adobe Systems, Inc.

{6} The term "markup" goes back to the time of hand-type when people would "mark up" galley proofs for corrections before final typesetting. Compared to interactive WYSIWYG programs, such as many popular word processor programs, "markup" languages are "batch" languages. You throw the files at the computer and it spits out the final result as a separate process. For many years the only composition programs on computers were "mark up" languages where someone had to insert lots of complex commands into the document. Usually the author wrote it on a typewriter and an editor marked it up for the data entry people. Authors Note: After twenty years of calling mark up instructions "composition commands" or "mark up commands", I frequently slip and say "HTML commands" instead of HTML elements.... I ask HTML gurus to pardon me here.

{7} The desire to finely tune web pages to your browser needs to be suppressed. Strongly suppressed. Unless you really know what you are doing such tuning often makes your pages look ugly on other browsers. Despite official specifications, getting documents to look good on as many browsers as possible without being dull is a fine art. I speak from experience.


    ============================================================

13. Disclaimer

While I have provided this information is provided in good faith, no warranty can be made for its accuracy. This reflects my own understanding of WWW or may be the opinions of other netters. None of which can be taken to guarantee a reflection of a constantly changing reality, with bugs contributed from thousands of sources. I reserve the right to change, modify, even attack, my opinions at any time without any notice whatever.

Permission to quote short passages is freely given providing proper credit is included with the quote. Hypertext documents should point to my original document.

For any comment, error reports, you-blew-it-baby, etc., please contact Gilbert Healton at ghealton@exit109.com.
 

   ============================================================

   ============================================================
[home] / [writings]writings
[AnyBrowser]
NetMechanic HTML Code Excellence Award
.http://www.exit109.com/~ghealton/writings/Nofhtml.html  
 Hits since 2002-09-10: [unavailable]  $Id: Nofhtml.hmac,v 1.9 2006/04/09 00:28:35 ghealton Exp $
Last formatted 2008-10-15
(Disclaimer)