A Very Brief Introduction to
Markup Languages

TeX and HTML are two markup languages that most modern mathematicians should learn. Many mathematicians know one of these languages but not the other. This document gives an extremely brief introduction to both; it may be useful for learning one or both.

We should contrast three main types of documents:

  1. Plain text documents. These are edited using a program such as Emacs, VI, Windows Notepad, MS-DOS Edit, Pico, etc. These documents do not contain any formatting, such as boldface or italics.

  2. WYSIWYG documents. That stands for "What You See Is What You Get." The most well known examples of WYSIWYG editors are Wordperfect, Microsoft Word, Windows Write, and Windows Wordpad. The documents include formatting such as boldface and italics. It is built into the document file using special "invisible" characters that are seen by the program but not by the user of the program. For instance, the file may contain a special character or string of characters that means "begin boldface here", and another string that means "end boldface here". These special characters can sometimes be revealed (for instance, in Wordperfect, use the "reveal codes" command). But most of the time, all the user of the program sees is that the text begins boldface at some point, and ends boldface at another point. The invisible characters can sometimes cause problems -- for instance, if you delete all the letters between a "begin boldface" marker and an "end boldface" marker, those markers might still remain in your file, and they may cause trouble later. --- To understand how this business works, it may be helpful to think of the file (which contains the "invisible characters") and the screen display (which does not display those characters) as two separate stages of the process. Only the first stage is stored; only the second stage is ordinarily displayed.

  3. Markup language documents. In this arrangement, the two stages are separated more completely. We have a source file which can be edited with any plain text editor (VI, Emacs, etc.); it look something like a Wordperfect document with the "reveal codes" command turned on. We send it through a "compiler" or "viewer" or "interpreter" or "browser" to form the display; in some cases (e.g., for TeX) the display is via a second file (different from the source file).

So much for the general theory. Now let's look at some very simple examples.

A sample tex source code A sample html source code
    This example shows how to do {\bf boldface}, {\em italics}, and a greek letter $\alpha$. Also $su_bscr_{ip}ts$ and $su^{per}scripts$.
    This example shows how to do <B>boldface</B>, <I>italics</I>, and a greek letter <font face=symbol>a</font>. Also su<sub>b</sub>scr<sub>ip</sub>ts and su<sup>per</sup>scripts.
The resulting tex display The resulting html display
    [picture of tex output]
    This example shows how to do boldface, italics, and a greek letter a. Also subscripts and superscripts.

The particulars are different, but the basic concept is the same. (In fact, TeX and HTML are, more or less, two special cases of SGML. But that's probably more technical and less useful than is appropriate for most readers of this document.) Here are some other main differences between TeX and HTML:

TeX HTML
Designed to work well with the complicated spacing required by mathematical expressions. For instance, it can handle fractions and matrices quite well. Includes special tools for web pages -- for instance, links.
The source file "myfile.tex" is sent through a program called "tex", which creates an output file called "myfile.dvi". The "dvi" originally stood for "device independent", but that is somewhat misleading -- the file may behave differently on different computers, depending on what tex fonts and other information are installed on those computers. The dvi file can be used to create another file, such as "myfile.ps" (for "postscript"). Some printers are equipped with software to print dvi or ps files. Some operating systems, such as Windows, can be equipped with software to display dvi or ps files. The only file stored is the "myfile.htm" or "myfile.html". It can be sent through a "browser" program such as Netscape or MSIExplorer, which displays the result on the screen but generally does not save that display as a file.