Several of my projects do heavy markdown parsing. Comment rendering, documentation pipelines, content management. The volume keeps growing, and I've been hitting the point where pure-PHP parsers (Parsedown, league/commonmark, cebe/markdown, michelf) just can't keep up. They're solid libraries, but parsing thousands of documents per request or chewing through 200 KB files in interpreted PHP is slow no matter how well the code is written.

I wanted something 10x+ faster that could serve as a drop-in replacement for the common cases. The result is mdparser, a native C extension that wraps cmark-gfm (GitHub's CommonMark parser) and exposes it through a clean PHP 8.3+ OO API. I'm releasing it today.

How it works

mdparser vendors a copy of cmark-gfm 0.29.0.gfm.13 directly into the extension's shared object. No external library to link against, no cmake, no runtime dependencies. The entire cmark-gfm codebase compiles alongside the PHP wrapper into a single .so (or .dll on Windows). Four cherry-picked commits from cmark upstream close the 0.29-to-0.31 spec gap, giving full CommonMark 0.31 conformance: 652 out of 652 spec examples pass.

The PHP API is intentionally small. Two classes, one exception:

use MdParser\Parser;
use MdParser\Options;

// Defaults: safe mode on, GFM extensions on.
$parser = new Parser();
echo $parser->toHtml('# Hello');

// Or the static shorthand:
echo Parser::html('# Hello');

// Custom options via named arguments:
$parser = new Parser(new Options(
 smart: true,
 footnotes: true,
 sourcepos: true,
));

// Three output formats:
$html = $parser->toHtml($markdown);
$xml = $parser->toXml($markdown);
$ast = $parser->toAst($markdown); // nested PHP arrays

Options is final readonly with 17 boolean fields. The Parser constructor translates those bools into cmark's internal bitmask once, so every subsequent parse call is pure cmark work with zero per-call overhead. Static factory presets (Options::strict(), Options::github(), Options::permissive()) cover common deployment patterns.

If you're migrating from Parsedown's line() or cebe/markdown's parseParagraph(), there's toInlineHtml(): inline-only HTML without the wrapping <p> tags. Useful for chat messages, table cells, and short user-facing strings.

Performance

This was the primary motivation. Measured on PHP 8.4 with each parser in its default configuration:

Parser Small (200 B) Medium (1.8 KB) Large (200 KB)
mdparser 30,447 ops/s 5,697 ops/s 105 ops/s
Parsedown 1,651 ops/s (18x slower) 325 ops/s (17x) 6 ops/s (17x)
cebe/markdown (GFM) 1,350 ops/s (22x) 374 ops/s (15x) 6 ops/s (16x)
michelf (Extra) 1,006 ops/s (30x) 209 ops/s (27x) 5 ops/s (19x)

15-30x faster, from 200-byte chat messages to 200 KB documents. Your absolute numbers will differ by hardware, but the ratios hold. mdparser processes roughly 100 full CommonMark-spec-sized documents per second on a single core. The pure-PHP parsers manage 5-6.

The benchmark uses hrtime(true) around each parse call, 200 iterations with warm-up, trimmed mean to filter GC pauses. Reproducible scripts are in the bench/ directory.

Feature comparison

mdparser covers CommonMark core plus all five GFM extensions. Here's how it stacks up against the pure-PHP alternatives:

Feature mdparser Parsedown league/cm cebe GFM michelf Extra
CommonMark core full partial full partial partial
GFM tables yes yes via ext yes via Extra
Strikethrough yes yes via ext yes no
Task lists yes no via ext no no
Autolinks yes yes via ext yes no
Tag filter yes yes via ext partial no
Smart punctuation yes no via ext no no
Footnotes yes Extra via ext no yes
Sourcepos yes no yes no no
XML output yes no no no no
AST output yes (arrays) no yes (objects) no no

What mdparser doesn't do

mdparser is scoped to what cmark-gfm supports: CommonMark core plus five GFM extensions. It doesn't cover definition lists, abbreviations, attribute syntax, heading permalinks, table of contents, YAML front matter, mentions, LaTeX math, emoji shortcodes, or custom containers. If you need those, league/commonmark is the right choice. It's the most featureful pure-PHP option and actively maintained. Speed doesn't help if the feature you need isn't there.

Compatibility

mdparser builds and tests on PHP 8.3, 8.4, and 8.5 across Linux (x86_64), macOS (arm64/x86_64), and Windows (x86/x64, both TS and NTS). CI runs on all three platforms, with an ASAN job on Linux to catch memory issues. Pre-built Windows DLLs ship with each GitHub release.

Installation

pie install iliaal/mdparser

PIE handles the download, phpize, configure, make, and install. On a minimal PHP image you'll need git, bison, and libtool-bin as build dependencies.

From source:

git clone https://github.com/iliaal/mdparser.git
cd mdparser
phpize && ./configure --enable-mdparser
make -j && sudo make install

Links

Add a comment