OLPDF
WordforPDFs.
A structure-first operating system for PDFs. Reconstruct semantic layouts from raw coordinates and edit with AI. Completely free.
The Future of Documents
This document was once a static picture of textan editable, semantic structure.
Just like a Word processor, you can now seamlessly edit PDFs using AI.
Mission
PDFs shouldn't be
read-only forever.
Most PDFs are digital paper — unstructured, unsearchable, and impossible to edit without destroying the layout. OLPDF turns every document into a machine-readable, AI-editable, and always re-exportable to the exact same format it came from.
01
Open Source
The extraction heuristics, block model specification, and export pipeline are all public. Audit the logic, fork it, run it yourself. No black boxes, no paywalls on the core engine.
02
Structure First
We don't OCR and call it done. Every page is classified before extraction runs — native text, scanned, table-heavy — and each block is assigned a type, confidence score, and column index.
03
AI as a Tool
The model is never given a blank document and told to rewrite it. Every AI action goes through a validated tool call. The diff is logged, reversible, and requires explicit user acceptance.
AI Suggestion
Table of Contents
Compiling Volume
Cross-referencing entities...
Pipeline
The Extraction Loop
We reconstruct the semantic DOM from raw coordinates.
Extract
Raw PDF binaries are parsed to extract untagged text and coordinates.
Reconstruct
Python heuristics engine rebuilds paragraphs, tables, and lists.
AI Edit
Gemini performs precise JSON tree mutations seamlessly.
Export
Compiled back into a pristine PDF/A or EPUB3 document.
Integrate in
Minutes
Skip the UI entirely. Hook into our public API with a free rate-limited key and parse documents from your own backend.
- Semantic block extraction
- AI-powered rewrite endpoint
- PDF → EPUB3 conversion
curl -X POST https://api.olpdf.xyz/v1/extract \ -H "Authorization: Bearer free_beta_key" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com/invoice.pdf", "mode": "semantic" }'
Embed OLPDF anywhere.
Drop a full AST-driven PDF editor into any web app in three lines. Works with every major framework.
import { OlPDFEmbed } from '@olpdf/embed'; const editor = new OlPDFEmbed(container, { host: 'https://olpdf.xyz', documentId: 'doc_abc123', token: userToken, }); editor.on('MODEL_UPDATE', ({ documentModel }) => { myDB.save(documentModel); });
No Paywalls.
Just Documents.
Document intelligence should be a public good. Free for individuals and open-source projects, always.
Unlimited Projects
No cap on documents or books. Create without limits.
Full AI Access
Gemini-powered structural editing, no subscription needed.
Open API
Integrate our extraction engine into your own apps for free.
How do we survive?
“Supported by infrastructure grants and contributors. We don't want your credit card — we want your feedback and pull requests.”
Open Source & Community Driven
Built by developers,
for developers.
Check out our good first issues, sponsor the project, or build your own custom extraction plugins.
Built on open infrastructure — no black boxes
The core extraction engine, block model spec, and export pipeline are open source. Audit it, fork it, self-host it.