Removes HTML Contructs from Documents
Dehtml removes HTML constructs from documents for indexing, spell checking and so on. My own implementation is a little smarter than the other implementations I have seen, because it knows about certain tags and expands entities to Latin 1 characters. It is able to generate a word list for spell checking tools and to omit headers for sentence analysis tools.
- Download package
-
Checkout Package
osc -A https://api.opensuse.org checkout home:zhonghuaren/dehtml && cd $_
- Create Badge
Refresh
Refresh
Source Files
Filename | Size | Changed |
---|---|---|
dehtml-1.8.tar.gz | 0000095645 93.4 KB | |
dehtml-add_destdir_and_remove_strip.patch | 0000001318 1.29 KB | |
dehtml-fix_missing_return_in_nonvoid_function.patc |
0000000271 271 Bytes | |
dehtml.spec | 0000001531 1.5 KB |
Latest Revision
Huaren Zhong (zhonghuaren)
committed
(revision 8)
Comments 0