A high-performance, drop-in replacement for BeautifulSoup written in Rust, exposed to Python via PyO3. 450 tests passing. Full BS4 API compatibility.
BeautifulSoup is the most widely used HTML parsing library in Python. It has an elegant, readable API and handles malformed real-world HTML gracefully. But its performance is fundamentally limited by how it was built.
Every node in a BeautifulSoup tree is a full Python object. On a typical page with 5,000 nodes, that is 5,000 heap allocations, 5,000 reference-counted objects the GIL must protect, and roughly 2.5 GB of memory per 1,000 concurrent documents. Parsing is entirely single-threaded and Python-bound. CSS selector evaluation re-parses the selector string on every call.
Tag holds several of them.
For scripts that scrape a few pages this is fine. For production pipelines that process tens of thousands of documents per second, it becomes the binding constraint — requiring more machines, more RAM, and more engineering to work around.
Teams hitting BeautifulSoup's limits typically report three things:
The usual escape routes: switching to lxml, writing custom C extensions, or moving to a different language — all require giving up the BeautifulSoup API that the rest of the codebase depends on.
WhiskeySour replaces BeautifulSoup's Python internals with a Rust library while keeping the API identical. Existing code needs no changes, only the import line.
# Before
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
# After this everything else stays the same
from whiskeysour import WhiskeySour as BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
The library is built on five technical pillars, each targeting a specific weakness in BeautifulSoup's design.
Servo's battle-tested, spec-compliant HTML5 parser written in Rust, parses in a single pass without any Python overhead.
All nodes live in a single pre-allocated slab. ~40 bytes per node versus ~500 bytes per BeautifulSoup Tag.
Selectors are compiled to a deterministic finite automaton once and cached. Repeated queries cost only a single DFA traversal.
The GIL is released during all Rust operations. find_all on large trees runs across all CPU cores simultaneously.
Text scanning uses CPU vector instructions (SSE2 / AVX2 / NEON). 16–32 bytes are inspected per clock cycle.
Zero-copy bridge between Rust and Python. Rust objects are exposed as native Python types with no serialisation overhead.
All numbers below are medians over 100–200 rounds on Apple Silicon (M-series). Documents are synthetic but structurally representative of real scraping targets. Measured with a dev build (maturin develop); release builds are typically 2–3× faster.
| Operation | WhiskeySour | BeautifulSoup 4 | Speedup |
|---|---|---|---|
| parse() | 4.08 ms | 42.87 ms | 11× |
| find(id=…) | 0.21 ms | 2.21 ms | 11× |
| find_all(class_=…) | 0.62 ms | 4.41 ms | 7× |
| select("div.item") | 0.64 ms | 8.92 ms | 14× |
| get_text() | 0.17 ms | 0.68 ms | 4× |
| str() — full serialise | 0.43 ms | 21.58 ms | 50× |
| tag.get("class") | 0.29 µs | 7.0 µs | 24× |
String buffer with no Python involvement.
BeautifulSoup delegates parsing to a pluggable backend. The default backend, Python's built-in html.parser, is a pure-Python tokeniser that calls back into Python for every token. Even when using lxml as the backend, BeautifulSoup still converts the resulting tree into Python objects node-by-node.
WhiskeySour uses html5ever, the HTML parser from the Servo browser engine. It implements the full HTML5 parsing specification and is written entirely in Rust. The parser feeds tokens directly into WhiskeySour's arena-allocated node tree without ever creating a Python object. The tree is built once, in Rust, and Python only receives a handle to the root.
html5ever also handles malformed HTML correctly by following the HTML5 error-recovery specification. This means WhiskeySour handles unclosed tags, mismatched nesting, and invalid attributes the same way every major browser does — not in an ad-hoc way.
# html5ever produces the same tree a browser would build:
soup = WhiskeySour("<p>text<b>bold</p>")
# → <html><body><p>text<b>bold</b></b></p></body></html>
# Correct per HTML5 spec — implicit </b> before </p>
In BeautifulSoup, every Tag is a Python object that holds a dict of attributes, a list of children, a reference to its parent, and several other fields. Python's object header alone is 16–24 bytes, and the surrounding data structures add several hundred more. A typical node costs around 500 bytes.
WhiskeySour stores every node in a compact Rust struct inside a single pre-allocated arena a contiguous block of memory. A node stores its tag name as an interned 32-bit ID, its attributes as a flat SmallVec (inline for up to 8 attributes, heap-allocated only when needed), and its tree position as integer indices. A typical node costs around 40 bytes.
| Field | BeautifulSoup Tag | WhiskeySour Node |
|---|---|---|
| Object header | 24 bytes | 0 (Rust struct) |
| Tag name | ~50 bytes (str) | 4 bytes (interned u32) |
| Attributes | ~240 bytes (dict) | ~24 bytes inline SmallVec |
| Children | ~56 bytes (list) | 4 bytes (index) |
| Parent ref | 8 bytes (pointer) | 4 bytes (index) |
| Total (typical) | ~500 bytes | ~40 bytes |
Beyond the per-node size, arena allocation has a second benefit: cache locality. When find_all walks the tree, all nodes are adjacent in memory. The CPU prefetcher can predict and load the next nodes before they are needed. BeautifulSoup's scattered heap objects cause frequent cache misses.
Freeing the entire document is also O(1) the arena is dropped as a single allocation, rather than recursively garbage-collecting thousands of Python objects.
BeautifulSoup's CSS selector support comes from the soupsieve library. Every call to soup.select("div.item > a") tokenises and parses the selector string, builds an AST, then walks the Python node tree evaluating each selector predicate in Python.
WhiskeySour uses Mozilla's cssparser crate (the same CSS parser used in Firefox) to parse selectors at compile time into a deterministic finite automaton. Once compiled, matching a node against a selector is a state-machine lookup: no string parsing, no AST traversal, no heap allocation. Compiled selectors are cached in an LRU cache keyed by the selector string, so even the compilation cost is paid at most once per unique selector.
# First call: selector is parsed, compiled to DFA, cached
results = soup.select("div.card > h3.title + p")
# All subsequent calls: pure DFA lookup, no re-parsing
results = soup.select("div.card > h3.title + p") # ~17× faster than bs4
select() on many documents with the same selector — e.g. a scraper extracting prices from product pages the speedup compounds with every document. The selector is compiled once in the first call and reused for every document thereafter.
WhiskeySour also exposes compiled selectors as first-class objects for cases where the caching is not sufficient:
# Explicit pre-compilation: zero overhead on every use
q = soup.compile("div.card > h3.title + p")
for document in documents:
results = q.select(document)
Python's Global Interpreter Lock (GIL) prevents more than one thread from executing Python bytecode at a time. This makes CPU-bound Python code fundamentally single-threaded regardless of how many cores are available.
Because WhiskeySour's tree lives entirely in Rust, traversal and matching can happen outside the GIL. PyO3 provides the allow_threads mechanism to release the GIL for the duration of a Rust call. WhiskeySour releases the GIL before every tree operation and reacquires it only when constructing the Python result list.
For find_all on large trees, the work is split across all available CPU cores using Rayon: Rust's data-parallelism library. The tree is partitioned into subtrees, each core searches its partition independently, and the results are merged.
# The GIL is released for the full duration of every tree operation
results = soup.find_all("article", class_="featured")
text = soup.get_text()
html = str(soup)
# Other Python threads (e.g. an async event loop) run freely
# while WhiskeySour is traversing or serialising the tree.
asyncio application, a find_all over a 500KB document does not stall the event loop, other coroutines execute while the Rust traversal runs.
Many HTML operations reduce to searching a sequence of bytes for a specific character the tokeniser looking for <, text extraction scanning for whitespace, attribute search walking a flat list. In Python, even simple loops have significant per-iteration overhead from bytecode dispatch.
WhiskeySour uses the memchr crate by Andrew Gallant, which provides byte-search routines backed by platform-specific SIMD instructions: SSE2 and AVX2 on x86/x64, NEON on ARM/Apple Silicon. Instead of checking one byte per loop iteration, these instructions check 16 or 32 bytes per clock cycle.
| Method | Bytes checked / cycle | Platform |
|---|---|---|
| Python loop | 1 | all |
| Scalar Rust | 1–4 | all |
| memchr SSE2 | 16 | x86/x64 |
| memchr AVX2 | 32 | modern x64 |
| memchr NEON | 16 | ARM / Apple Silicon |
The effect is most visible in get_text() extracting all text from a large document requires scanning every text node for whitespace and newline characters. WhiskeySour's SIMD-backed implementation is consistently 4–5× faster than BeautifulSoup's Python equivalent.
WhiskeySour passes a 450-test unit suite and 508 tests including integration coverage, all modelled directly on BeautifulSoup's documented behaviour. The shim layer in python/whiskeysour/__init__.py translates between the Rust node types and Python objects that satisfy every public BeautifulSoup API contract.
The compatibility strategy is deliberate: the Rust layer exposes only what it is fast at (tree storage, traversal, matching), and the Python shim handles the ergonomic API surface (property access, string formatting, lazy wrapping). No Python object is created until explicitly requested by the caller.
| BeautifulSoup API | Status |
|---|---|
find(), find_all(), find_one() | ✓ Full |
select(), select_one() | ✓ Full (CSS3 + :has, :is, :where) |
get_text(), .string, .strings | ✓ Full |
Tree navigation: .parent, .children, .siblings | ✓ Full |
Mutation: append(), prepend(), insert(), decompose(), replace_with() | ✓ Full |
NavigableString, Comment, CData | ✓ Full — .name is None (identical to bs4) |
Multi-valued attributes (class, rel, …) | ✓ Full |
Encoding detection and encode() | ✓ Full |
prettify(indent_width=) / prettify(indent=) | ✓ Both forms supported |
decode_contents(), encode_contents() | ✓ Full |
Streaming parser (StreamParser, parse_stream()) | ✓ Full |
Compiled selectors (soup.compile()) | ✓ Full — CompiledSelector object |
if child.name: — works identically in WhiskeySour because NavigableString.name is None, just as in BeautifulSoup.
lxml and Selectolax are the two libraries most commonly cited as "fast alternatives to BeautifulSoup." Both are genuinely fast. But they solve a narrower problem, and each has architectural constraints that WhiskeySour does not share.
lxml wraps libxml2, a C library originally written for the GNOME project in 1999. Its parsing speed is excellent, and it supports both XPath and CSS selectors via the cssselect add-on. For many use cases it is the right choice.
The key limitation is that lxml is not HTML5-compliant. libxml2 has its own error-recovery heuristics that diverge from the HTML5 parsing specification in several hundred edge cases. This means lxml and a browser can produce different trees from the same malformed HTML — a real problem for scraping, where the HTML you receive is almost never well-formed. html5ever, which WhiskeySour uses, implements the exact same tree-construction algorithm as Chrome, Firefox, and Safari.
The second issue is that lxml's Python bindings create a Python wrapper object for every node on access. The underlying tree is compact C memory, but as soon as you call .cssselect() or iterate .getchildren(), Python objects are allocated for each result. WhiskeySour's Rust tree is accessed via integer indices; Python objects are only created for the final result set, not for every intermediate node touched during traversal.
lxml also has no parallel traversal. Its C internals are not thread-safe for concurrent reads, so the GIL cannot safely be released during tree operations. WhiskeySour's arena-allocated, immutable-during-search tree allows the GIL to be released for the full duration of any find_all or select call.
Finally, lxml's API is fundamentally different from BeautifulSoup's. Teams using lxml directly cannot simply swap it out — the tree navigation model (getparent(), getchildren(), XPath), the attribute access pattern, and the serialisation methods are all different. BeautifulSoup can use lxml as a backend, but then lxml's speed advantage mostly disappears because BeautifulSoup still converts the entire lxml tree into Python objects.
| Property | WhiskeySour | lxml |
|---|---|---|
| Parser core | html5ever (Rust, HTML5 spec) | libxml2 (C, own heuristics) |
| HTML5 compliant | Yes, matches browsers exactly | Partial, diverges on edge cases |
| Memory model | ~40 bytes/node, arena | C heap + Python wrappers on access |
| CSS selectors | Compiled DFA, LRU-cached | cssselect add-on, re-parsed each call |
| Parallel traversal | Yes (Rayon, GIL released) | No (C internals not thread-safe) |
| BS4-compatible API | Yes, drop-in replacement | No, different API entirely |
| Tree mutation | Full BS4 mutation API | Different API, limited via BS4 wrapper |
Selectolax is a Python library wrapping Lexbor, a C HTML parsing and CSS matching library. It is genuinely fast — parse times are comparable to lxml, and CSS selection is very quick. For pipelines that only need to extract nodes by CSS selector and read attribute values, it is an excellent tool.
The constraint is that Selectolax's API is intentionally minimal. It has no find(), no find_all(), no NavigableString, no get_text() with separator control, no tree mutation, no prettify(), and no attribute list handling (multi-valued class attributes are returned as raw strings). It is a selector engine with a thin Python wrapper, not a document manipulation library.
Selectolax also wraps a C library. This means it shares lxml's constraints around thread safety and GIL release: the underlying Lexbor tree is not designed for concurrent access, so parallel traversal is not possible. WhiskeySour's Rust ownership model makes the safety of concurrent reads statically verified at compile time.
Like lxml, Selectolax uses Lexbor's own HTML5-like parser rather than a fully spec-compliant implementation. The gap is smaller than libxml2 but still present for certain error-recovery cases.
| Property | WhiskeySour | Selectolax |
|---|---|---|
| Parser core | html5ever (Rust, HTML5 spec) | Lexbor (C, HTML5-like) |
| HTML5 compliant | Yes | Mostly, with some gaps |
| find() / find_all() | Yes | No |
| Tree navigation API | Full (.parent, .children, …) | Minimal (.parent, .next) |
| Tree mutation | Yes (append, insert, decompose) | No |
| NavigableString | Yes | No |
| get_text() control | separator, strip, types | .text() only, no options |
| Multi-valued attrs | Yes (class → list) | No (raw string only) |
| Parallel traversal | Yes | No |
| BS4-compatible API | Yes, drop-in replacement | No, requires rewrite |
Each library occupies a different position in the trade-off space:
WhiskeySour demonstrates that a full, spec-compliant BeautifulSoup replacement can be built in Rust without sacrificing the Python API that makes BeautifulSoup worth using in the first place.
The performance improvements are not incremental. Parsing is 11× faster in a dev build (2–3× more in a release build), CSS selectors are 14× faster, serialisation is 50× faster, and memory consumption per node is reduced by 12×. These are structural gains that come from the architecture — they apply to every document, every operation, and every version of the application that uses WhiskeySour.
The core API is complete and verified by a 450-unit / 508-total test suite that covers parsing, finding, CSS selectors, tree navigation, mutation, serialisation, encoding, streaming, and compiled queries — all modelled on BeautifulSoup's documented behaviour. The streaming parser (StreamParser, parse_stream()), compiled selectors (compile()), and all tree mutation operations are fully implemented.
WhiskeySour is the answer to: "I need BeautifulSoup to be fast enough to use in production."