About the Text to HTML Conversion
While the explicit aim of markup languages is to signal
structure in text aimed for WWW publication, the same procedure may also be useful in simple text.
The conversion takes notice of a few conventions often
used in everyday writing and electronic messages.
Translation rules
- A text string with a carriage return at its end, followed by an empty line is interpreted as a paragraph
- A group of text lines followed by an empty line are marked as a paragraph with line breaks
- A group of lines, all with an initial star, hyphen, degree sign or middle dot character ('*', '-', '°', '·') is marked as an unordered list
- A group of lines, all with initial line numbers ('1.', '2.', '3.' etc.) is marked as an ordered list
- A line starting with hyphen, underline or macron characters ('-----', '_____', 'ŻŻŻŻŻ') is interpreted as a horizontal rule
- Copyright, registered trademark etc. shorthands (c), (r), (tm) and '--' are converted to '©', '®', '™' and '—' respectively
- Any special characters (e.g. ampersand '&' or euro sign '€') will be
translated into HTML character entity references
- A URL (e.g. http://<host>/<directory>/<file>) is translated into an anchor (http://<host>/<directory>/<file>)