File perl-HTML-SimpleParse.spec of Package perl-HTML-SimpleParse

Overview Repositories Revisions Requests Users Attributes Meta

File perl-HTML-SimpleParse.spec of Package perl-HTML-SimpleParse

#
# spec file for package perl-HTML-SimpleParse
#
# Copyright (c) 2019 SUSE LINUX GmbH, Nuernberg, Germany.
#
# All modifications and additions to the file contributed by third parties
# remain the property of their copyright owners, unless otherwise agreed
# upon. The license for this file, and modifications and additions to the
# file, is the same license as for the pristine package itself (unless the
# license for the pristine package is not an Open Source License, in which
# case the license is the MIT License). An "Open Source License" is a
# license that conforms to the Open Source Definition (Version 1.9)
# published by the Open Source Initiative.

# Please submit bugfixes or comments via https://bugs.opensuse.org/
#

Name:           perl-HTML-SimpleParse
Version:        0.12
Release:        0
%define cpan_name HTML-SimpleParse
Summary:        Bare-bones HTML parser
License:        Artistic-1.0 OR GPL-1.0-or-later
Group:          Development/Libraries/Perl
Url:            https://metacpan.org/release/%{cpan_name}
Source0:        https://cpan.metacpan.org/authors/id/K/KW/KWILLIAMS/%{cpan_name}-%{version}.tar.gz
BuildArch:      noarch
BuildRoot:      %{_tmppath}/%{name}-%{version}-build
BuildRequires:  perl
BuildRequires:  perl-macros
BuildRequires:  perl(Module::Build)
%{perl_requires}

%description
This module is a simple HTML parser. It is similar in concept to
HTML::Parser, but it differs from HTML::TreeBuilder in a couple of
important ways.

First, HTML::TreeBuilder knows which tags can contain other tags, which
start tags have corresponding end tags, which tags can exist only in the
<HEAD> portion of the document, and so forth. HTML::SimpleParse does not
know any of these things. It just finds tags and text in the HTML you give
it, it does not care about the specific content of these tags (though it
does distiguish between different _types_ of tags, such as comments,
starting tags like <b>, ending tags like </b>, and so on).

Second, HTML::SimpleParse does not create a hierarchical tree of HTML
content, but rather a simple linear list. It does not pay any attention to
balancing start tags with corresponding end tags, or which pairs of tags
are inside other pairs of tags.

Because of these characteristics, you can make a very effective HTML filter
by sub-classing HTML::SimpleParse. For example, to remove all comments from
HTML:

package NoComment;
 use HTML::SimpleParse;
 @ISA = qw(HTML::SimpleParse);
 sub output_comment {}
 
 package main;
 NoComment->new($some_html)->output;

Historically, I started the HTML::SimpleParse project in part because of a
misunderstanding about HTML::Parser's functionality. Many aspects of these
two modules actually overlap. I continue to maintain the HTML::SimpleParse
module because people seem to be depending on it, and because beginners
sometimes find HTML::SimpleParse to be simpler than HTML::Parser's more
powerful interface. People also seem to get a fair amount of usage out of
the 'parse_args()' method directly.

%prep
%setup -q -n %{cpan_name}-%{version}

%build
perl Build.PL installdirs=vendor
./Build build flags=%{?_smp_mflags}

%check
./Build test

%install
./Build install destdir=%{buildroot} create_packlist=0
%perl_gen_filelist

%files -f %{name}.files
%defattr(-,root,root,755)
%doc Changes README

%changelog

Places

File perl-HTML-SimpleParse.spec of Package perl-HTML-SimpleParse

Places