Build and scan parse-trees of HTML
HTML-Tree is a suite of Perl modules for making parse trees out of HTML
source. It consists of mainly two modules, whose documentation you should
refer to: HTML::TreeBuilder and HTML::Element.
HTML::TreeBuilder is the module that builds the parse trees. (It uses
HTML::Parser to do the work of breaking the HTML up into tokens.)
The tree that TreeBuilder builds for you is made up of objects of the class
If you find that you do not properly understand the documentation for
HTML::TreeBuilder and HTML::Element, it may be because you are unfamiliar
with tree-shaped data structures, or with object-oriented modules in
general. Sean Burke has written some articles for _The Perl Journal_
('www.tpj.com') that seek to provide that background. The full text of
those articles is contained in this distribution, as:
"User's View of Object-Oriented Modules" from TPJ17.
"Trees" from TPJ18
"Scanning HTML" from TPJ19
Readers already familiar with object-oriented modules and tree-shaped data
structures should read just the last article. Readers without that
background should read the first, then the second, and then the third.
Redirects to HTML::TreeBuilder::new
Redirects to HTML::TreeBuilder::new_from_file
Redirects to HTML::TreeBuilder::new_from_content