API Hub

Source-aware PDF APIs for private RAG and document workflows.

Parse private PDFs into structured JSON and Markdown with source context for compliance, legal, policy, support, and startup AI workflows. Free PDF tools stay available for one-off jobs; authenticated APIs handle repeated processing.

Open API playground Read API docs Use free tools

Async worker flow

live contract

Request accepted

201 Created

Multipart upload validated, idempotency key captured, job id returned.

Job Queued...

queued

The request waits in the worker queue instead of blocking the API connection.

Polling...

processing

Client checks the job status endpoint until the parser or renderer completes.

Complete

completed

Download JSON, Markdown, HTML, text, or PDF artifacts through a signed API request.

Base URL

https://api.docushell.com/api

Parse output

JSON, Markdown, source context

Job model

Queued, polled, streamed

Retention

Temporary files swept within 1 hour

Use a free PDF tool now

Compress, merge, split, OCR, convert, and protect PDFs from the browser-first DocuShell tool surface when the job is one-off.

Open PDF tools

Build a source-aware PDF workflow

Create an API key, parse private PDFs, preserve source context, poll queued jobs, and stream generated artifacts into your product.

Read API docs

What the API understands

Structured extraction without flattening the document.

The Parse API is designed for systems that need more than raw text. It keeps structure, tables, coordinates, and debug artifacts visible so downstream automation can be reviewed against the source.

Layout-aware parsing

Preserve readable order across multi-column reports, manuals, statements, research papers, and dense business PDFs.

Table extraction

Extract spreadsheet-like tables, row and column structure, merged cells, and multi-page table context where available.

Structured JSON tree

Return headings, paragraphs, lists, captions, tables, metadata, page numbers, and source coordinates for downstream systems.

Coordinates and citations

Use bounding boxes and page positions to build review screens, source-highlighting, audit trails, and RAG citations that users can inspect.

OCR-capable worker path

Route scanned and image-heavy PDFs through OCR-capable processing when configured, while native PDFs stay on the faster path.

AI-safe extraction

Filter hidden, tiny, off-page, and machine-only text so downstream agents receive content closer to what humans can see.

RAG-ready artifacts

Generate Markdown, HTML, plain text, JSON, annotated PDF, and image-aware artifacts for indexing, support knowledge, and policy workflows.

Deterministic job contracts

Build against clear async states, predictable error codes, idempotency keys, authenticated downloads, and credit accounting.

Interactive playground

Inspect the source map before developers write code.

Submit a sample PDF, watch the job enter the queue, poll while workers process it, then inspect and download the structured result artifacts.

Open playgrounds Job lifecycle docs

Parse PDFResume ParseMarkdown to PDFWebpage to PDFCompress PDF

quarterly-report.pdf

formats=json,markdown,html,text,annotated_pdf

Request accepted

Multipart upload validated, idempotency key captured, job id returned.

Job Queued...

The request waits in the worker queue instead of blocking the API connection.

Polling...

Client checks the job status endpoint until the parser or renderer completes.

Complete

Download JSON, Markdown, HTML, text, or PDF artifacts through a signed API request.

Poll response

Polling...

{
  "jobId": "job_7f2c9a",
  "status": "processing",
  "progress": 62,
  "queue": {
    "state": "active",
    "worker": "parse-pdf"
  },
  "downloads": {
    "json": "https://api.docushell.com/api/v1/jobs/job_7f2c9a/download?format=json"
  }
}

API Quickstart

Submit a PDF, poll the job, download structured output.

DocuShell uses a deliberate async contract so uploads, workers, retries, and downloads stay predictable for production integrations.

Parse endpoint

https://api.docushell.com/api/v1/parse

1. Submit

cURL

curl -X POST "https://api.docushell.com/api/v1/parse" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Idempotency-Key: parse-demo-001" \
  -F "file=@./quarterly-report.pdf;type=application/pdf" \
  -F "formats=json,markdown,html,text,annotated_pdf" \
  -F "reading_order=xycut" \
  -F "table_method=cluster"

2. Poll

Node.js

const statusUrl = "https://api.docushell.com/api/v1/jobs/job_7f2c9a";

while (true) {
  const res = await fetch(statusUrl, {
    headers: { Authorization: "Bearer YOUR_API_KEY" },
  });

  const job = await res.json();
  if (job.status === "completed") break;
  if (job.status === "failed") throw new Error(job.error?.message);

  await new Promise((resolve) => setTimeout(resolve, 1500));
}

3. Download

Python

import requests

url = "https://api.docushell.com/api/v1/jobs/job_7f2c9a/download?format=json"
headers = {"Authorization": "Bearer YOUR_API_KEY"}

with requests.get(url, headers=headers, stream=True, timeout=60) as response:
    response.raise_for_status()
    with open("quarterly-report.json", "wb") as output:
        for chunk in response.iter_content(chunk_size=8192):
            output.write(chunk)

Privacy And Worker Architecture

Browser-first when possible. Secure cloud workers when necessary.

DocuShell separates instant browser tools from heavier API jobs. That gives users fast on-device utilities and gives developers a hardened worker pipeline for tasks that need server-side parsing, rendering, or conversion.

Validated intake

Requests are sanitized before queueing with schema validation, MIME checks, magic-byte checks, page preflight, and plan limits.

Isolated workers

Heavy parsing, rendering, compression, and conversion happen in worker services instead of the Next.js request path.

URL safety checks

Webpage capture blocks private networks, metadata addresses, intranet targets, and unsafe URL patterns before Chromium runs.

Ephemeral files

Uploads and generated artifacts are temporary, scoped to one-time delivery, and swept from storage within the retention window.

Authenticated downloads

API artifacts stay behind authenticated status and download routes, with ownership checks on the server side.

Rate limits and credits

Fingerprint rate limits, API keys, idempotency, and hard credit balances keep production usage predictable.

Use cases

One API surface for private document evidence workflows.

Use DocuShell when a team needs extracted document content, tables, source context, and delivery artifacts that can be reviewed before entering a RAG or automation pipeline.

WorkflowInputDocuShell APIOutput

Compliance and policy RAG

Policies, standards, procedures

Parse PDF

Markdown, JSON, source coordinates

Legal and contract review

Contracts, exhibits, legal memos

Parse PDF plus source context

Clause text, page refs, review artifacts

Support knowledge bases

Manuals, onboarding docs, help PDFs

Parse PDF plus webhooks

Source-aware chunks for retrieval

Finance operations

Statements, reports, invoices

Parse PDF plus table extraction

Structured rows and audit artifacts

Recruiting and ATS

Resumes and candidate PDFs

Resume Parse

Candidate, skills, roles, education

Web archiving

Public URLs and dashboards

Webpage to PDF

Rendered PDFs from secure Chromium workers

Document delivery

Large PDFs, client packets

Compress PDF, PDF to Word

Smaller files and editable documents

Developer Experience

A PDF API should be easy to test and hard to misuse.

The developer story is intentionally boring where it matters: stable routes, explicit statuses, authenticated downloads, documented failures, and plan limits that are visible before launch.

Production checklist

Check the contract points that matter before using the API in production.

Versioned endpoints under /api/v1
Bearer API key authentication
Idempotency keys for retried submissions
Shared job status and download model
Predictable credit accounting
Dashboard visibility for recent API activity

Tools And APIs

Try manually, automate when the workflow repeats.

Start with a public tool or playground, then automate the same workflow through API keys, idempotency, job polling, and signed delivery.

1
Try a document in the Parse Playground or a free browser tool.
2
Inspect the result shape, source context, warnings, and downloadable artifacts.
3
Move the same workflow into API code with keys and idempotency.
4
Monitor job status, credit usage, and recent API activity in the dashboard.

Pricing

Free browser tools. Paid source-aware API workflows.

Browser tools stay available for everyday use. API plans unlock keys, queued workers, webhooks, and monthly credits for private parsing, RAG ingestion, and production jobs.

View pricing Create API key

Free

Free browser tools for one-off document jobs. Public API keys start on paid plans.

Browser PDF tools
500 monthly credits
No API key access

Starter

$9/mo

Prototype private PDF ingestion, internal RAG experiments, and lightweight automations with signed webhooks.

5,000 monthly credits
API keys included
Signed webhooks included

Pro

$19/mo

More credits for developers shipping recurring parsing workflows, source-aware RAG ingestion, and callbacks.

12,000 monthly credits
API access + Webhooks
Good for recurring jobs

Scale

$79/mo

A larger credit pool for sustained parsing, support knowledge ingestion, and operational document workloads.

60,000 monthly credits
API access + Webhooks
Built for sustained volume

Documentation

Test the workflow, then ship the integration.

Jump from examples to request docs, playgrounds, lifecycle details, security notes, and pricing without guessing which surface owns the workflow.

Getting started

Create keys, make the first request, and understand the base URL.

Open

Parse PDF

Request fields, output formats, artifacts, and parsing options.

Open

Job lifecycle

Queued, processing, completed, failed, and download behavior.

Open

Security model

Validation, file retention, URL protection, and worker boundaries.

Open

Playgrounds

Try request shapes and output formats before writing integration code.

Open

Pricing

Monthly API credits, browser tools, operation costs, and plan limits.

Open

Developer FAQ

Direct answers to the questions developers actually ask.

Short answers on parsing, source context, OCR-capable jobs, retention, and the async job lifecycle.

What is DocuShell API?+

DocuShell API is a developer API for private, source-aware PDF processing workflows. It accepts versioned API requests, queues work on secure cloud workers, and returns status, download, and structured output endpoints for parsing, conversion, compression, and webpage capture.

Can DocuShell convert PDF to JSON?+

Yes. The Parse PDF API returns structured JSON for document content, including headings, paragraphs, lists, tables, metadata, page numbers, and source coordinates when available for review screens and RAG citations.

Can DocuShell extract tables from PDFs?+

Yes. DocuShell highlights table extraction for bordered tables, complex layouts, multi-page reports, and spreadsheet-style downstream workflows. Results can be requested as structured JSON and companion artifacts such as Markdown, HTML, text, and annotated PDF.

Does DocuShell support scanned PDFs and OCR?+

DocuShell can route scanned or image-heavy PDFs through an OCR-capable hybrid worker path when the OCR backend is available. Text-native PDFs continue through the faster structured parser path.

Does DocuShell store uploaded PDFs?+

DocuShell uses ephemeral storage for server-side API jobs. Uploaded and generated files are retained only long enough to process and stream the requested result, then cleanup removes temporary files within the one-hour retention window.

How does the DocuShell async job lifecycle work?+

A production API request returns a job id and status URL. Clients poll the job endpoint while the status is queued or processing, then stream the result through the authenticated download endpoint after completion.

Can I try DocuShell without writing code?+

Yes. DocuShell includes browser tools and API playgrounds. You can process common PDF tasks manually first, inspect parser output in a playground, then move the same workflow into API automation when you need server-side integration.

Ready to test

Open a playground before you wire production code.

Validate request shape, queued status, polling behavior, output formats, and downloads before moving into a backend integration.

Open playgrounds Create API key