lxml
Powerful and Pythonic XML processing library combining libxml2/libxslt with the ElementTree API.
This package has a good security score with no known vulnerabilities.
Community Reviews
Powerful XML library with critical security considerations for untrusted input
The security story requires careful attention. XML External Entity (XXE) attacks are a constant concern—you must explicitly configure XMLParser with resolve_entities=False and no_network=True when handling untrusted input. The defaults have improved over time, but it's not fully secure-by-default. Error messages can leak filesystem paths during parsing failures, which I've had to sanitize in production logs. The C extension dependency means you're trusting the libxml2/libxslt supply chain, and building from source requires development headers.
Documentation is comprehensive with good security sections, but the onus is on developers to read and apply them. Input validation helpers like defusedxml integration would be welcome. For trusted internal XML processing, it's excellent. For user-facing parsers, you need defensive configuration and thorough testing.
Best for: High-performance XML processing in trusted environments or when you need full XSLT/XPath capabilities with careful security hardening.
Avoid if: You're parsing untrusted XML without deep understanding of XXE mitigations, or need pure-Python portability without compilation requirements.
Powerful XML/HTML processing with excellent API design and solid docs
The documentation is comprehensive with practical examples that you can actually copy and adapt. The tutorial walks you through common patterns step-by-step, and the API reference is detailed without being overwhelming. Error messages are generally helpful, though encoding issues can sometimes produce cryptic libxml2 errors that require digging.
Debugging is straightforward thanks to clean stack traces and the ability to pretty-print XML at any point. The cssselect integration for HTML parsing is a killer feature that makes web scraping much more pleasant than dealing with raw XPath. Community support is solid - most common issues have Stack Overflow answers, and the mailing list archives are searchable when you hit edge cases.
Best for: Projects requiring robust XML/HTML parsing, web scraping, or XSLT transformations with good performance.
Avoid if: You need pure Python with zero compiled dependencies or only process simple JSON-like XML structures.
Rock-solid XML processing with excellent XPath support and clear APIs
Error messages are generally helpful, pointing to line numbers in malformed XML and providing context. The documentation is comprehensive with a tutorial that covers 80% of common use cases. I particularly appreciate the detailed examples for XPath expressions and CSS selectors via `cssselect`. Debugging is straightforward since you can inspect the tree structure intuitively.
The main learning curve is installation-related (C dependencies) and understanding when to use lxml vs standard library xml.etree. Performance is excellent even with large documents. Stack Overflow has extensive coverage of common issues, and the official docs answer most questions before you need to search elsewhere.
Best for: Projects requiring robust XML/HTML parsing, XPath queries, XSLT transformations, or processing large XML documents with good performance.
Avoid if: You need pure Python with zero C dependencies or only have trivial XML parsing needs covered by stdlib.
Sign in to write a review
Sign In