HTML Document Conversion: Converting to Ascii

Creating Formatted Text for HTML Documents


The next step is to convert your old documents to HTML. There are many software packages that will do this for you. They have varying degrees of success. Some do a pretty good job and others are more trouble than they are worth.

For my conversion of a few hundred pages, I did it manually.

If you decide to try an automated converter, please try this list of Conversion Tools.

If you decide (or are forced) to go it by hand, here's what to do:

Over-View:

Need to get your document into ASCII text format. Ideally, simply load up the original and save as text. But this is not always possible. For example: If you have old documents from long-gone programs.

There are different conversion techniques and software available:
The best is dependent on your desired format.
WP-2-ANYTHING is an example of a not-so-great solution
Word for Windows 2.0 and 6.0 work very well to convert documents from many types of formats
Save directly into formatted ASCII or TXT
Chinese University of Hong Kong has a very good MS-Word to HTML converter
Microsoft has their own Word add-on that works OK.

When I converted JPL STD00009 from WP 5.1, I didn't like the way the tables were saved as text. So what I did was use MS Word 2.0. I:

Imported the file into Word
Changed text to Courier font, 13 pts
Reformatted text and borders
Resized and/or Replaced tables and columns
Inserted Embedded Links and Dead-Links to graphics, images, etc.
Saved as DOS-Text with Layout

This is all dependent on your desired format!

Tips:

Most document tasks are repetitive. Develop Macros to do it! Save time and energy!

Remember to keep the original .DOC format as a backup.

OCR and Scanning Text: If you don't have an electronic copy of the original, you may want to scan it using OCR (Optical Character Recognition)

After OCR, run a spell-check
Look for common OCR mistakes and Character errors
~,$,|, replaced 0 with O, etc.
Search and Replace all repetitive OCR errors
Do NOT scan in a page and simply post the image!

Keep in Mind:

Develop Macros to Automate Procedure
Visually proof the document before and after conversion
Use special characters as tags which you can later Search and Replace with HTML codes
Beware of In-line graphics and Links
Beware of cut off lines and mis-formatted columns!
Periodically view .asc files to ensure correct conversion

Last updated on: Thu Jun 15 10:04:47 PDT 1995