File linkchecker.spec of Package linkchecker
#
# spec file for package linkchecker (Version 4.9)
#
# Copyright (c) 2008 SUSE LINUX Products GmbH, Nuernberg, Germany.
# This file and all modifications and additions to the pristine
# package are under the same license as the package itself.
#
# Please submit bugfixes or comments via http://bugs.opensuse.org/
#
# norootforbuild
Name: linkchecker
%define _prefix /usr
BuildRequires: python-devel python-xml
Requires: tidy
Url: http://linkchecker.sourceforge.net
License: GPL v2 or later
Group: Productivity/Networking/Web/Utilities
Summary: Check Websites and HTML Documents for Broken Links
Version: 4.9
Release: 1
Source0: %{name}-%{version}.tar.bz2
BuildRoot: %{_tmppath}/%{name}-%{version}-build
Prefix: %{_prefix}
%description
LinkChecker checks websites and HTML documents for broken links.
Features are:
* recursive checking
* multithreaded
* output in colored or normal text, HTML, SQL, CSV, XML or a sitemap
graph in different formats
* HTTP/1.1, HTTPS, FTP, mailto:, news:, nntp:, Gopher, Telnet and
local file links support
* restriction of link checking with regular expression filters for
URLs
* proxy support
* username/password authorization for HTTP and FTP
* robots.txt exclusion protocol support
* i18n support
* a command line interface
* a (Fast)CGI web interface (requires HTTP server)
Authors:
--------
Bastian Kleineidam <calvin@users.sourceforge.net>
%prep
%setup -q
cp -a doc/examples .
for file in `grep -rl python2.4 .` ; do
sed -i -e "s@python2.4@python2.5@g" $file
done
%build
env CFLAGS="%{optflags} -fno-strict-aliasing" python setup.py build
%install
python setup.py install --root=%{buildroot} --prefix=%{_prefix} --record-rpm=INSTALLED_FILES.in
# 'brp-compress' compresses the manpages without distutils knowing.
# So append ".gz" suffixes to the affected manpage filenames.
sed -i -e 's@/usr/share/man/man\([[:digit:]]\)/\(.\+\.[[:digit:]]\)$@%doc /usr/share/man/man\1/\2.gz@g' INSTALLED_FILES.in
sed -i -e 's@/usr/share/man/de/man\([[:digit:]]\)/\(.\+\.[[:digit:]]\)$@%doc /usr/share/man/de/man\1/\2.gz@g' INSTALLED_FILES.in
sed -i -e 's@/usr/share/man/fr/man\([[:digit:]]\)/\(.\+\.[[:digit:]]\)$@%doc /usr/share/man/fr/man\1/\2.gz@g' INSTALLED_FILES.in
rm -rf examples
mv %{buildroot}%{_datadir}/%{name}/examples ./
grep -F -v /usr/share/linkchecker/examples INSTALLED_FILES.in > INSTALLED_FILES
rm -rf doc/examples
#%clean
#rm -rf %{buildroot}
%files -f INSTALLED_FILES
%defattr(-,root,root)
%doc examples TODO.txt doc/ cgi-bin/lconline/ README.txt
%changelog
* Tue Apr 29 2008 pth@suse.de
- Update to 4.9:
* Parse Shockwave Flash (SWF) for URLs to check
* Don't parse <script for=""> attributes since they specify IDs,
not URLs.
* Fix bash filename completion script:
- add missing COMPREPLY variable
- support whitespace in files using "-o filenames" bash completion
option
- support subdirs by adding a FileCompleter argument matcher to
optcomplete.autocomplete()
* Prevent unicode errors when an email address contains non-ascii
characters.
* Workaround for buggy servers that break protocol synchronization of
persistent HTTP connections.
* Properly fall back to DNS A requests when no MX host could be found
for a mailto: URL.
* Double Ctrl-C aborts checking immediately, without cleanup.
* Internal patterns now accept URLs with and without "www." prefixes
as default. This allows sites to check that use both variants.
* Added --check-html and --check-css options to enable HTML and CSS
syntax checking. Uses third-party modules "tidy" and "cssutils"
for the actual check.
* Mon Jan 07 2008 pth@suse.de
- Update to 4.8. Changes since 4.5:
* Fixed default config file syntax by not indenting comment lines
* Don't set the URL result on redirections when getting the content.
* Ignore errors when opening the log file output, and display a warning
instead.
* Added some more examples.
* Pull in changes from Python subversion repository to locally stored
gzip and httplib modules.
* Mention in the documentation that --anchors enables logging of
the anchor warning.
* Make sure --anchors and --no-warnings play along in the configuration.
* Check that charset is not None before lowering it in set_encoding().
* Use standard "utf-8" charset name instead of "utf8" for the XML output
encoding.
* Added "created" attribute in XML output root element.
Added "result" attribute in XML output valid element.
* Fix printing of unicode names. Thanks to Frank Bennet for the hint.
* Deprecate gopher: URLs. They do not really exist anymore and the
gopherlib module in Python 2.5 is deprecated and will vanish soon.
* Fix message typo for not disclosing information.
* Always read the request body data on persistent HTTP connections, else
subsequent calls will get data from the previous request.
* Zope server workaround: assume missing HEAD support when receiving
text/plain on a HEAD request. Switch to GET request in this case.
* Prevent double encoding in HTML info output.
* Honor urllib.proxy_bypass() when ignoring proxy settings.
This only affected Windows systems, since on other platforms
the proxy_bypass() function always return False (on Python <= 2.5
that is).
* Document the --configfile option in the man page.
* Remove comments from CSS content before searching for links.
* Try to detect unkonwn URL schemes from the command line, eg. URLs
like "rtsp://foo".
* Fix typo in warnings and use constants for the warning strings
to avoid this in the future.
* Make sure LinkChecker does not check paths that are not prefixed
with the start URL.
* Try to solve the "Too many open files" errors that users have
encountered.
+ Ensure that the connection of a checked URL are closed after checking
(except for reused connections in the connection pool).
+ Regularly close expired connections from the connection pool, and
finally close all of them when the program is finished.
Closes: SF #1758338, SF #1678055, SF #1631042
* Add man page linkcheckerrc(5) for the configuration file format.
* Drop french translations, they are less than 20%% complete for
years now.
* Correct misnamed colums in create.sql script: r/*string/\1/g
* Improved cookie parsing:
+ Allow spaces in attribute values. Example:
"Set-Cookie: expires=Wed, 12-Dec-2001 19:27:57 GMT"
is now parsed correctly
+ Add an optional leading dot for domain names, and account for that
in the domain checking routine.
* Don't print cached errors or warnings unless verbose output is
requested.
* Mon Apr 02 2007 ro@suse.de
- added non-english manpage directories to filelist
* Wed Oct 11 2006 ro@suse.de
- update to 4.5
- Don't ignore robots.txt entries consisting only of Allow: directives
- Don't rely on HTTP HEAD requests to generate the same response status
as HTTP GET. So we have to follow redirections when using HTTP GET to
get page contents.
- Document proxy URL syntax
- Print active URLs on Ctrl-C interrupt
- Replace all old "entry1, entry2" configuration entries with
multiline "entry" config entry. The old syntax is still supported,
but deprecated.
- If LinkChecker was not able to spawn the initial checker and status
threads, print an informative error instead of an internal error.
- update to 4.4
- The JavaScript URL syntax check allows now digits and underscores
- Add "internlinks" documentation and example to the default config
file linkcheckerrc.
- Detect more cases when a HTTP connection cannot be reused and
must be closed. And close response objects after usage
- Only wait before a new connection to a host, not when reusing
a previous connection.
- Add more infos to various HTTP errors. Don't close connection when
the response object is still open.
- Ignore keyboard interrupts during shutdown
- Removed old Psyco references from man page and documentation
- fix dependencies for old python2.4
* Wed Sep 13 2006 pth@suse.de
- Pass --record-rpm instead of --record to setup.py to get
everything needed recorded.
- Compile with -fno-strict-aliasing
* Sat Sep 02 2006 pth@suse.de
- Fix file list.
* Thu Aug 31 2006 pth@suse.de
- Initial package creation at version 4.3