Private PDF APIs for parsing, conversion, and secure document automation.
DocuShell combines browser-first PDF tools with authenticated API workflows. Parse PDFs into structured JSON, extract tables, run OCR-capable jobs, render webpages, compress files, and stream artifacts from secure queued workers.
Request accepted
201 CreatedMultipart upload validated, idempotency key captured, job id returned.
Job Queued...
queuedThe request waits in the worker queue instead of blocking the API connection.
Polling...
processingClient checks the job status endpoint until the parser or renderer completes.
Complete
completedDownload JSON, Markdown, HTML, text, or PDF artifacts through a signed API request.
Base URL
https://api.docushell.com/api
Parse output
JSON, Markdown, HTML, text
Job model
Queued, polled, streamed
Retention
Temporary files swept within 1 hour
Use a PDF tool now
Compress, merge, split, OCR, convert, and protect PDFs from the browser-first DocuShell tool surface.
Open PDF toolsBuild a PDF workflow
Create an API key, send versioned requests, poll queued jobs, and stream generated artifacts into your product.
Read API docsWhat The API Understands
Structured extraction without flattening the document.
The Parse API is designed for systems that need more than raw text. It keeps structure, tables, coordinates, and debug artifacts visible so downstream automation can be reviewed and trusted.
Layout-aware parsing
Preserve readable order across multi-column reports, manuals, statements, research papers, and dense business PDFs.
Table extraction
Extract spreadsheet-like tables, row and column structure, merged cells, and multi-page table context where available.
Structured JSON tree
Return headings, paragraphs, lists, captions, tables, metadata, page numbers, and source coordinates for downstream systems.
Coordinates and citations
Use bounding boxes and page positions to build review screens, source-highlighting, audit trails, and RAG citations.
OCR-capable worker path
Route scanned and image-heavy PDFs through OCR-capable processing when configured, while native PDFs stay on the faster path.
AI-safe extraction
Filter hidden, tiny, off-page, and machine-only text so downstream agents receive content closer to what humans can see.
RAG-ready artifacts
Generate Markdown, HTML, plain text, JSON, annotated PDF, and image-aware artifacts for indexing and knowledge workflows.
Deterministic job contracts
Build against clear async states, predictable error codes, idempotency keys, authenticated downloads, and credit accounting.
Interactive Playground Preview
Show the queue before developers write code.
The playground should make the async lifecycle obvious: submit, watch the job enter the queue, poll while workers process it, then download the result artifact.
formats=json,markdown,html,text,annotated_pdf
Request accepted
Multipart upload validated, idempotency key captured, job id returned.
Job Queued...
The request waits in the worker queue instead of blocking the API connection.
Polling...
Client checks the job status endpoint until the parser or renderer completes.
Complete
Download JSON, Markdown, HTML, text, or PDF artifacts through a signed API request.
Poll response
Polling...{
"jobId": "job_7f2c9a",
"status": "processing",
"progress": 62,
"queue": {
"state": "active",
"worker": "parse-pdf"
},
"downloads": {
"json": "https://api.docushell.com/api/v1/jobs/job_7f2c9a/download?format=json"
}
}API Quickstart
Submit a PDF, poll the job, download structured output.
DocuShell uses a deliberate async contract so uploads, workers, retries, and downloads stay predictable for production integrations.
Parse endpoint
https://api.docushell.com/api/v1/parse
1. Submit
cURL
curl -X POST "https://api.docushell.com/api/v1/parse" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Idempotency-Key: parse-demo-001" \
-F "file=@./quarterly-report.pdf;type=application/pdf" \
-F "formats=json,markdown,html,text,annotated_pdf" \
-F "reading_order=xycut" \
-F "table_method=cluster"2. Poll
Node.js
const statusUrl = "https://api.docushell.com/api/v1/jobs/job_7f2c9a";
while (true) {
const res = await fetch(statusUrl, {
headers: { Authorization: "Bearer YOUR_API_KEY" },
});
const job = await res.json();
if (job.status === "completed") break;
if (job.status === "failed") throw new Error(job.error?.message);
await new Promise((resolve) => setTimeout(resolve, 1500));
}3. Download
Python
import requests
url = "https://api.docushell.com/api/v1/jobs/job_7f2c9a/download?format=json"
headers = {"Authorization": "Bearer YOUR_API_KEY"}
with requests.get(url, headers=headers, stream=True, timeout=60) as response:
response.raise_for_status()
with open("quarterly-report.json", "wb") as output:
for chunk in response.iter_content(chunk_size=8192):
output.write(chunk)Privacy And Worker Architecture
Browser-first when possible. Secure cloud workers when necessary.
DocuShell separates instant browser tools from heavier API jobs. That gives users fast on-device utilities and gives developers a hardened worker pipeline for tasks that need server-side parsing, rendering, or conversion.
Validated intake
Requests are sanitized before queueing with schema validation, MIME checks, magic-byte checks, page preflight, and plan limits.
Isolated workers
Heavy parsing, rendering, compression, and conversion happen in worker services instead of the Next.js request path.
URL safety checks
Webpage capture blocks private networks, metadata addresses, intranet targets, and unsafe URL patterns before Chromium runs.
Ephemeral files
Uploads and generated artifacts are temporary, scoped to one-time delivery, and swept from storage within the retention window.
Authenticated downloads
API artifacts stay behind authenticated status and download routes, with ownership checks on the server side.
Rate limits and credits
Fingerprint rate limits, API keys, idempotency, and hard credit balances keep production usage predictable.
Use Cases
One API surface for document-heavy workflows.
The page should rank for concrete extraction and automation questions while giving buyers enough product context to choose the correct API path.
AI and RAG ingestion
Research PDFs, manuals, policy docs
Parse PDF
JSON, Markdown, coordinates
Finance operations
Statements, reports, invoices
Parse PDF plus table extraction
Structured rows and audit artifacts
Recruiting and ATS
Resumes and candidate PDFs
Resume Parse
Candidate, skills, roles, education
Web archiving
Public URLs and dashboards
Webpage to PDF
Rendered PDFs from secure Chromium workers
Document delivery
Large PDFs, client packets
Compress PDF, PDF to Word
Smaller files and editable documents
Developer Experience
A PDF API should be easy to test and hard to misuse.
The developer story is intentionally boring where it matters: stable routes, explicit statuses, authenticated downloads, documented failures, and plan limits that are visible before launch.
Production checklist
The page should make these contract points visible before signup.
- Versioned endpoints under /api/v1
- Bearer API key authentication
- Idempotency keys for retried submissions
- Shared job status and download model
- Predictable credit accounting
- Dashboard visibility for recent API activity
Tools And APIs
Try manually, automate when the workflow repeats.
DocuShell should feel useful before a developer ever creates a key. The API landing page connects the public tools to the production automation path.
- 1
Try a document in a browser tool or playground.
- 2
Inspect the result shape, warnings, and downloadable artifacts.
- 3
Move the same workflow into API code with keys and idempotency.
- 4
Monitor job status, credit usage, and recent API activity in the dashboard.
Pricing
Free browser tools. Paid API credits.
The pricing story should stay direct: browser tools are available for everyday use, while API plans unlock keys, queued workers, and monthly credits for production jobs.
Free
$0
Browser tools and starter credits for account exploration. Public API keys start on paid plans.
- Browser PDF tools
- 500 monthly credits
- No API key access
Starter
$9/mo
Light production automations for small internal workflows and prototype integrations without webhooks.
- 5,000 monthly credits
- API keys included
- Webhooks start on Pro
Pro
$19/mo
More credits for developers shipping regular PDF automation, parsing workflows, and callbacks.
- 12,000 monthly credits
- API access + Webhooks
- Good for recurring jobs
Scale
$79/mo
A larger credit pool for sustained API usage, higher volume parsing, and operational workloads.
- 60,000 monthly credits
- API access + Webhooks
- Built for sustained volume
Documentation
Every conversion path should have a next click.
A search visitor should be able to land here, answer their question, test the API, compare pricing, or jump directly into docs without hunting through navigation.
Getting started
Create keys, make the first request, and understand the base URL.
OpenParse PDF
Request fields, output formats, artifacts, and parsing options.
OpenJob lifecycle
Queued, processing, completed, failed, and download behavior.
OpenSecurity model
Validation, file retention, URL protection, and worker boundaries.
OpenPlaygrounds
Try request shapes and output formats before writing integration code.
OpenPricing
Monthly API credits, browser tools, operation costs, and plan limits.
OpenFAQ For Search And Answer Engines
Direct answers to the questions developers actually ask.
These questions should be mirrored in the route-level FAQPage schema so answer engines can quote the API contract accurately.
What is DocuShell API?+
DocuShell API is a developer API for private PDF processing workflows. It accepts versioned API requests, queues work on secure cloud workers, and returns status, download, and structured output endpoints for parsing, conversion, compression, and webpage capture.
Can DocuShell convert PDF to JSON?+
Yes. The Parse PDF API returns structured JSON for document content, including headings, paragraphs, lists, tables, metadata, page numbers, and source coordinates when available.
Can DocuShell extract tables from PDFs?+
Yes. DocuShell highlights table extraction for bordered tables, complex layouts, multi-page reports, and spreadsheet-style downstream workflows. Results can be requested as structured JSON and companion artifacts such as Markdown, HTML, text, and annotated PDF.
Does DocuShell support scanned PDFs and OCR?+
DocuShell can route scanned or image-heavy PDFs through an OCR-capable hybrid worker path when the OCR backend is available. Text-native PDFs continue through the faster structured parser path.
Does DocuShell store uploaded PDFs?+
DocuShell uses ephemeral storage for server-side API jobs. Uploaded and generated files are retained only long enough to process and stream the requested result, then cleanup removes temporary files within the one-hour retention window.
How does the DocuShell async job lifecycle work?+
A production API request returns a job id and status URL. Clients poll the job endpoint while the status is queued or processing, then stream the result through the authenticated download endpoint after completion.
Can I try DocuShell without writing code?+
Yes. DocuShell includes browser tools and API playgrounds. You can process common PDF tasks manually first, then move the same workflow into API automation when you need server-side integration.
Open a playground before you wire production code.
Validate request shape, queued status, polling behavior, output formats, and downloads before moving into a backend integration.