API Hub

Private PDF APIs for parsing, conversion, and secure document automation.

DocuShell combines browser-first PDF tools with authenticated API workflows. Parse PDFs into structured JSON, extract tables, run OCR-capable jobs, render webpages, compress files, and stream artifacts from secure queued workers.

Async worker flow
live contract
1

Request accepted

201 Created

Multipart upload validated, idempotency key captured, job id returned.

2

Job Queued...

queued

The request waits in the worker queue instead of blocking the API connection.

3

Polling...

processing

Client checks the job status endpoint until the parser or renderer completes.

4

Complete

completed

Download JSON, Markdown, HTML, text, or PDF artifacts through a signed API request.

Base URL

https://api.docushell.com/api

Parse output

JSON, Markdown, HTML, text

Job model

Queued, polled, streamed

Retention

Temporary files swept within 1 hour

Use a PDF tool now

Compress, merge, split, OCR, convert, and protect PDFs from the browser-first DocuShell tool surface.

Open PDF tools

Build a PDF workflow

Create an API key, send versioned requests, poll queued jobs, and stream generated artifacts into your product.

Read API docs

What The API Understands

Structured extraction without flattening the document.

The Parse API is designed for systems that need more than raw text. It keeps structure, tables, coordinates, and debug artifacts visible so downstream automation can be reviewed and trusted.

Layout-aware parsing

Preserve readable order across multi-column reports, manuals, statements, research papers, and dense business PDFs.

Table extraction

Extract spreadsheet-like tables, row and column structure, merged cells, and multi-page table context where available.

Structured JSON tree

Return headings, paragraphs, lists, captions, tables, metadata, page numbers, and source coordinates for downstream systems.

Coordinates and citations

Use bounding boxes and page positions to build review screens, source-highlighting, audit trails, and RAG citations.

OCR-capable worker path

Route scanned and image-heavy PDFs through OCR-capable processing when configured, while native PDFs stay on the faster path.

AI-safe extraction

Filter hidden, tiny, off-page, and machine-only text so downstream agents receive content closer to what humans can see.

RAG-ready artifacts

Generate Markdown, HTML, plain text, JSON, annotated PDF, and image-aware artifacts for indexing and knowledge workflows.

Deterministic job contracts

Build against clear async states, predictable error codes, idempotency keys, authenticated downloads, and credit accounting.

Interactive Playground Preview

Show the queue before developers write code.

The playground should make the async lifecycle obvious: submit, watch the job enter the queue, poll while workers process it, then download the result artifact.

Parse PDFResume ParseMarkdown to PDFWebpage to PDFCompress PDF
quarterly-report.pdf

formats=json,markdown,html,text,annotated_pdf

1

Request accepted

Multipart upload validated, idempotency key captured, job id returned.

2

Job Queued...

The request waits in the worker queue instead of blocking the API connection.

3

Polling...

Client checks the job status endpoint until the parser or renderer completes.

4

Complete

Download JSON, Markdown, HTML, text, or PDF artifacts through a signed API request.

Poll response

Polling...
{
  "jobId": "job_7f2c9a",
  "status": "processing",
  "progress": 62,
  "queue": {
    "state": "active",
    "worker": "parse-pdf"
  },
  "downloads": {
    "json": "https://api.docushell.com/api/v1/jobs/job_7f2c9a/download?format=json"
  }
}

API Quickstart

Submit a PDF, poll the job, download structured output.

DocuShell uses a deliberate async contract so uploads, workers, retries, and downloads stay predictable for production integrations.

Parse endpoint

https://api.docushell.com/api/v1/parse

1. Submit

cURL

curl -X POST "https://api.docushell.com/api/v1/parse" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Idempotency-Key: parse-demo-001" \
  -F "file=@./quarterly-report.pdf;type=application/pdf" \
  -F "formats=json,markdown,html,text,annotated_pdf" \
  -F "reading_order=xycut" \
  -F "table_method=cluster"

2. Poll

Node.js

const statusUrl = "https://api.docushell.com/api/v1/jobs/job_7f2c9a";

while (true) {
  const res = await fetch(statusUrl, {
    headers: { Authorization: "Bearer YOUR_API_KEY" },
  });

  const job = await res.json();
  if (job.status === "completed") break;
  if (job.status === "failed") throw new Error(job.error?.message);

  await new Promise((resolve) => setTimeout(resolve, 1500));
}

3. Download

Python

import requests

url = "https://api.docushell.com/api/v1/jobs/job_7f2c9a/download?format=json"
headers = {"Authorization": "Bearer YOUR_API_KEY"}

with requests.get(url, headers=headers, stream=True, timeout=60) as response:
    response.raise_for_status()
    with open("quarterly-report.json", "wb") as output:
        for chunk in response.iter_content(chunk_size=8192):
            output.write(chunk)

Privacy And Worker Architecture

Browser-first when possible. Secure cloud workers when necessary.

DocuShell separates instant browser tools from heavier API jobs. That gives users fast on-device utilities and gives developers a hardened worker pipeline for tasks that need server-side parsing, rendering, or conversion.

Validated intake

Requests are sanitized before queueing with schema validation, MIME checks, magic-byte checks, page preflight, and plan limits.

Isolated workers

Heavy parsing, rendering, compression, and conversion happen in worker services instead of the Next.js request path.

URL safety checks

Webpage capture blocks private networks, metadata addresses, intranet targets, and unsafe URL patterns before Chromium runs.

Ephemeral files

Uploads and generated artifacts are temporary, scoped to one-time delivery, and swept from storage within the retention window.

Authenticated downloads

API artifacts stay behind authenticated status and download routes, with ownership checks on the server side.

Rate limits and credits

Fingerprint rate limits, API keys, idempotency, and hard credit balances keep production usage predictable.

Use Cases

One API surface for document-heavy workflows.

The page should rank for concrete extraction and automation questions while giving buyers enough product context to choose the correct API path.

WorkflowInputDocuShell APIOutput

AI and RAG ingestion

Research PDFs, manuals, policy docs

Parse PDF

JSON, Markdown, coordinates

Finance operations

Statements, reports, invoices

Parse PDF plus table extraction

Structured rows and audit artifacts

Recruiting and ATS

Resumes and candidate PDFs

Resume Parse

Candidate, skills, roles, education

Web archiving

Public URLs and dashboards

Webpage to PDF

Rendered PDFs from secure Chromium workers

Document delivery

Large PDFs, client packets

Compress PDF, PDF to Word

Smaller files and editable documents

Developer Experience

A PDF API should be easy to test and hard to misuse.

The developer story is intentionally boring where it matters: stable routes, explicit statuses, authenticated downloads, documented failures, and plan limits that are visible before launch.

Production checklist

The page should make these contract points visible before signup.

  • Versioned endpoints under /api/v1
  • Bearer API key authentication
  • Idempotency keys for retried submissions
  • Shared job status and download model
  • Predictable credit accounting
  • Dashboard visibility for recent API activity

Tools And APIs

Try manually, automate when the workflow repeats.

DocuShell should feel useful before a developer ever creates a key. The API landing page connects the public tools to the production automation path.

  1. 1

    Try a document in a browser tool or playground.

  2. 2

    Inspect the result shape, warnings, and downloadable artifacts.

  3. 3

    Move the same workflow into API code with keys and idempotency.

  4. 4

    Monitor job status, credit usage, and recent API activity in the dashboard.

Pricing

Free browser tools. Paid API credits.

The pricing story should stay direct: browser tools are available for everyday use, while API plans unlock keys, queued workers, and monthly credits for production jobs.

Free

$0

Browser tools and starter credits for account exploration. Public API keys start on paid plans.

  • Browser PDF tools
  • 500 monthly credits
  • No API key access

Starter

$9/mo

Light production automations for small internal workflows and prototype integrations without webhooks.

  • 5,000 monthly credits
  • API keys included
  • Webhooks start on Pro

Scale

$79/mo

A larger credit pool for sustained API usage, higher volume parsing, and operational workloads.

  • 60,000 monthly credits
  • API access + Webhooks
  • Built for sustained volume

Documentation

Every conversion path should have a next click.

A search visitor should be able to land here, answer their question, test the API, compare pricing, or jump directly into docs without hunting through navigation.

FAQ For Search And Answer Engines

Direct answers to the questions developers actually ask.

These questions should be mirrored in the route-level FAQPage schema so answer engines can quote the API contract accurately.

What is DocuShell API?+

DocuShell API is a developer API for private PDF processing workflows. It accepts versioned API requests, queues work on secure cloud workers, and returns status, download, and structured output endpoints for parsing, conversion, compression, and webpage capture.

Can DocuShell convert PDF to JSON?+

Yes. The Parse PDF API returns structured JSON for document content, including headings, paragraphs, lists, tables, metadata, page numbers, and source coordinates when available.

Can DocuShell extract tables from PDFs?+

Yes. DocuShell highlights table extraction for bordered tables, complex layouts, multi-page reports, and spreadsheet-style downstream workflows. Results can be requested as structured JSON and companion artifacts such as Markdown, HTML, text, and annotated PDF.

Does DocuShell support scanned PDFs and OCR?+

DocuShell can route scanned or image-heavy PDFs through an OCR-capable hybrid worker path when the OCR backend is available. Text-native PDFs continue through the faster structured parser path.

Does DocuShell store uploaded PDFs?+

DocuShell uses ephemeral storage for server-side API jobs. Uploaded and generated files are retained only long enough to process and stream the requested result, then cleanup removes temporary files within the one-hour retention window.

How does the DocuShell async job lifecycle work?+

A production API request returns a job id and status URL. Clients poll the job endpoint while the status is queued or processing, then stream the result through the authenticated download endpoint after completion.

Can I try DocuShell without writing code?+

Yes. DocuShell includes browser tools and API playgrounds. You can process common PDF tasks manually first, then move the same workflow into API automation when you need server-side integration.

Ready to test

Open a playground before you wire production code.

Validate request shape, queued status, polling behavior, output formats, and downloads before moving into a backend integration.