"Dirty HTML"

2001

July 21, 2001
      "Dirty HTML" is an industry term that refers to that funny sting of empty HTML commands that do nothing but take up space. They are the bane of a webmaster's existence.
      Ands, YES, they are executed, thereby wasting process time. And, YES, they take up storage space.
      Where do they come from?
      WYSIWYG editors.
      When you edit something you formatted before, the editor chains the formatting commands.
      It leaves you with strings of multiple commands <FONT....><FONT....><FONT.... in front of text, and then a string matched-pair commands at the end.
      It may even have strings of open and close brackets with no text in between!
      You may delete a sentence and unknowingly leave the paragraph commands behind.
      You may think you inserted a vertical space, <P></P><BR> when you inserted <P><BR></P>, a whole lot more!
      You may have chained <BR><BR><BR>.
      You may have empty headers <h2></H2>.
      In the extreme you could double-bang a table. <table><tr><td>< table><tr><td> ---when you did not do it on purpose. (Sometimes you mean to do that.)
      In short, they do terrible code structures you would never, in your worst coding nightmare, think of doing if you were writing HTML directly.
      And WYSIWYG editors occasionally go brain dead.
      No matter what, they will not do the edit you want them to do. They won't outdent, indent, change font, whatever.
      They have become lost in "dirty HTML".
      The user is left frustrated and confused.
      The novice is convinced the software is broken (it is).
      The webmaster knows to "go inside" and tune-up.
      It is why all WYSIWYG editors have that sneaky little "look at the code" button.
      This allows you to sneak in behind their back and "fix" them.
      ALL WYSIWYG EDITORS DO THIS.
      Some more than others.
      It has to do with garbage collection.
      For the most part, it can be tolerated, even ignored, unless it gets in the way. Every webmaster knows about it.
      There is even a software package that "cleans up code" - called MISER.
      It also smooshs gif and jpg files too.
      Because dirty code can cost runtime, disk space and other problems.
      With a vendor I am working with, the color bar pages (graphic) have had the lesson objectives edited in.
      (Ooops. We said the bad word - edit.)
      The color bar is formatted in a table. Because that's how the vendor choose to do it.
      Tables have auto spacing (after the </table> command).
      The Lesson x Objective line is in a header, also with auto spacing top and bottom.
      Therefore, additional spacing between the colorbar and the header line is not required.
      In fact, the additional spacing is causing the objective to appear too far down the page, needing scrolling to view.
       On viewing the code (View Frame Source), I consistently found
       <P><BR>
      <BR></P><BR>
      <P></P>
      This is - yep - "dirty HTML".
      In this instance, it cannot be ignored; it needs to be removed.
      The fix is a search and replace on the entire HTML database contents (grep?). Quick and dirty fix. Five minutes. Tops.
      DreamWeaver4 even has advanced search and replace capabilities.
      Now I need to learn them.
     
     

Copyright 2000, 2001 Donnamaie E.White.
Material may not be reproduced without written permission of the author.

For information about this file or to report problems in its use email dewhite@best.com