LogoopenSUSE Build Service > Projects
Sign Up | Log In

SAX-compliant HTML parser written in Java

TagSoup, a SAX-compliant parser written in Java that, instead of parsing
well-formed or valid XML, parses HTML as it is found in the wild: poor, nasty
and brutish, though quite often far from short. TagSoup is designed for people
who have to process this stuff using some semblance of a rational application
design. By providing a SAX interface, it allows standard XML tools to be
applied to even the worst HTML. TagSoup also includes a command-line processor
that reads HTML files and can generate either clean HTML or well-formed XML
that is a close approximation to XHTML.

Source Files

Filename Size Changed Actions
_service 385 Bytes Download File
tagsoup.changes 144 Bytes Download File
tagsoup.spec 4.04 KB Download File

Comments for home:concyclic (0)

Login required, please login or signup in order to comment