python3-html-text - Fedora Packages

How is html_text different from .xpath('//text()') from LXML or .get_text() from Beautiful Soup? - Text extracted with html_text does not contain inline styles, javascript, comments and other text that is not normally visible to users; - html_text normalizes whitespace, but in a way smarter than .xpath('normalize-space()), adding spaces around inline elements (which are often used as block elements in html markup), and trying to avoid adding extra spaces for punctuation; - html-text can add newlines (e.g. after headers or paragraphs), so that the output text looks more like how it is rendered in browsers.

Releases Overview

Release	Stable	Testing
Fedora Rawhide	0.6.2-8.fc45	-
Fedora 44	0.6.2-7.fc44	-
Fedora 43	0.6.2-6.fc43	0.6.2-6.fc43

File a new bug report »

Package Info

Upstream: https://github.com/zytedata/html-text
License(s): MIT
Maintainer: fed500

You can contact the maintainers of this package via email at python-html-text dash maintainers at fedoraproject dot org.

python3-html-text Subpackage of python-html-text

Releases Overview