Sep 03

Josh Duck has put together a fun and useful list of the 104 elements currently in the HTML5 working draft but organized like a periodic table of elements:

When you click on one of the tags more information appears:

Who says chemistry can’t be fun?

[via Jackson Harper]

Continue reading »

Tagged with:
Apr 16

Chris Winberry needed an HTML parser for a project he was working on and started to use John’s parser but found it to be a touch too strict for some of the HTML he was using (sloppy HTML? never). It was also too heavy to run on a server that would see considerable traffic, and so, being lazy, he wrote a new one from the ground up that is both light weight (extremely simple DOM) and very forgiving.

Which brings us to node-htmlparser which works in both Node:

JAVASCRIPT:

  1.  
  2. var htmlparser = require(“node-htmlparser”);
  3. var rawHtml = “Xyz <script language= javascript>var foo = ‘<<bar>>’;</  script><!–<!– Waah! — –>”;
  4. var handler = new htmlparser.DefaultHandler(function (error) {
  5.     if (error)
  6.       [do something for errors…]
  7.     else
  8.       [parsing done, do something…]
  9. });
  10. var parser = new htmlparser.Parser(handler);
  11. parser.ParseComplete(rawHtml);
  12. sys.puts(sys.inspect(handler.dom, false, null));
  13.  

and on a modern browser:

JAVASCRIPT:

  1.  
  2. var handler = new Tautologistics.NodeHtmlParser.DefaultHandler(function (error) {
  3.     if (error)
  4.       [do something for errors…]
  5.     else
  6.       [parsing done, do something…]
  7. });
  8. var parser = new Tautologistics.NodeHtmlParser.Parser(handler);
  9. parser.ParseComplete(document.body.innerHTML);
  10. alert(JSON.stringify(handler.dom, null, 2));
  11.  

Continue reading »

Tagged with:
Mar 10

Good old Kangax has been playing with HTML minification and has shared his new tool in an early stage.

What does it do?

Kangax has forked John Resig’s HTML parser which parses the HTML and sends that into the Minifier. This has rules that do things like whitespace optimization, comment removal, and collapsing boolean attributes (e.g. disabled=”true” -> disabled).

He also has a linter going:

While working on minifier, I realized that oftentimes the most wasteful part of the markup is not white space, comments or boolean attributes, but inline styles, scripts, presentational or deprecated elements and attributes. None of these can be simply stripped, as that could affect state of the document and is just too obtrusive. What can be done, however, is reporting of these occurences to the user. HTMLLint is even a smaller script, whose job is exactly that—to log any deprecated or presentational elements/attributes encountered during parsing. Additionally, it detects event attributes (e.g. onclick, onmouseover, etc.). The rationale for this is that moving contents of event attributes to external script allows to take advantage of resource caching.

Continue reading »

Tagged with:
preload preload preload