LogoopenSUSE Build Service > Projects
Sign Up | Log In

Removes HTML Contructs from Documents

Dehtml removes HTML constructs from documents for indexing, spell checking and so on. My own implementation is a little smarter than the other implementations I have seen, because it knows about certain tags and expands entities to Latin 1 characters. It is able to generate a word list for spell checking tools and to omit headers for sentence analysis tools.

Source Files

Filename Size Changed Actions
dehtml-1.8.tar.gz 93.4 KB Download File
dehtml-add_destdir_and_remove_strip.patch 1.29 KB Download File
dehtml-fix_missing_return_in_nonvoid_function.patch 271 Bytes Download File
dehtml.spec 1.5 KB Download File

Comments for home:zhonghuaren (0)

Login required, please login or signup in order to comment