Developer API June 20, 2026 12 min read

Document Extraction API — Invoices, Receipts & Statements

The FlowParse document extraction API turns financial documents — invoices, receipts and bank statements — into clean, structured JSON over one REST endpoint. `POST /api/v1/extract` classifies the document, extracts every field and line item or transaction, and returns a stable, typed schema you can store or pipe into validation, export, reconciliation and Smart Merge. No per-vendor templates, no OCR plumbing — one key, one contract, every financial document type.

FlowParse
flowparse.io

One API for every financial document

A document extraction API replaces a wall of bespoke parsers with a single endpoint that understands documents by meaning. FlowParse's `POST /api/v1/extract` ingests a PDF, scan, image, XLSX or CSV, classifies it as an invoice, receipt, bank statement (or a mixed document), and returns typed JSON for that type — supplier and totals and `line_items` for an invoice, account header and `transactions` for a statement.

It's intelligent document processing (IDP) you can call from code: classification, OCR, field extraction, table reconstruction and validation behind one bearer key. Because it generalises across layouts, a supplier or bank you've never seen works on the first request — no template to author, no rule to maintain. For statement-specific behaviour see the bank statement API; for the generic contract see the PDF to JSON API.

Every response shares one envelope — `{ type, pages, billedPages, data }` — and the inner `data` is the same snake_case schema the rest of the API consumes. That uniformity is the point: classify, extract, validate, export and reconcile all speak the same shape, so your integration is a short, linear pipeline rather than a pile of adapters.

FlowParse
flowparse.io

Extract any document in one call

Base64-encode the document and POST it. You don't need to tell the API what it is — classification is automatic and returned in the response `type`. Create a key in the API dashboard.

POST /api/v1/extract
curl -X POST https://flowparse.io/api/v1/extract \
  -H "Authorization: Bearer pf_live_xxx" \
  -H "Content-Type: application/json" \
  -d '{ "file": "JVBERi0xLjcK...", "filename": "receipt.pdf" }'
FlowParse
flowparse.io

What it extracts, by type

The API auto-classifies and returns the fields that matter for each type. A mixed document (an invoice with an attached statement, say) returns both objects.

TypeExtracted fieldsArray
invoicesupplier_name, invoice_number, invoice_date, subtotal, tax_amount, total, currencyline_items[]
receiptmerchant, date, total, tax, payment_methodline_items[]
bank_statementbank_name, account_holder, currency, opening_balance, closing_balancetransactions[]
mixedan invoice object and a bank_statement object togetherboth

A receipt, structured

200 OK — receipt
{
  "type": "receipt",
  "pages": 1,
  "billedPages": 1,
  "data": {
    "type": "invoice",
    "data": {
      "supplier_name": "Corner Cafe",
      "invoice_date": "2024-10-02",
      "currency": "USD",
      "tax_amount": 1.20,
      "total": 15.20,
      "line_items": [
        { "description": "Flat white", "quantity": 2, "unit_price": 5.00, "amount": 10.00 },
        { "description": "Croissant",  "quantity": 1, "unit_price": 4.00, "amount": 4.00 }
      ]
    }
  }
}

Validation as a quality gate

Extraction is only useful if you can trust it, so the API ships a deterministic validator. `POST /api/v1/validate` returns a 0–100 quality score, a grade and concrete checks — invoice totals and tax math, statement balance reconciliation, duplicate and out-of-order rows, low-confidence fields. Auto-accept high grades, queue the rest. The full rule set is documented on the validation engine and the AI VAT auditor adds tax-specific review on top.

POST /api/v1/validate
curl -X POST https://flowparse.io/api/v1/validate \
  -H "Authorization: Bearer pf_live_xxx" \
  -d '{ "type": "invoice", "data": { ... } }'
# → { "validations": [ { "score": { "value": 96, "grade": "A" }, "checks": [ ... ] } ] }

Classify → extract → act

1

Authenticate

Send your key as Authorization: Bearer pf_live_….

2

Extract

POST the base64 document to /api/v1/extract; read the classified `type` and typed `data`.

3

Validate

Score `data` with /api/v1/validate and branch on the grade for straight-through vs review.

4

Export

Turn invoices and statements into XLSX/CSV/XML or accounting files (QBO/QFX/OFX/Xero/DATEV/1С) via /api/v1/export.

5

Reconcile

Match invoices to bank payments with /api/v1/reconcile, or merge a batch with /api/v1/merge.

FlowParse
flowparse.io

Beyond extraction: export and reconcile

Most pipelines need more than JSON. `POST /api/v1/export` converts any extracted document into XLSX, CSV, XML, QuickBooks/Quicken (`.QBO`/`.QFX`/`.OFX`), Xero, DATEV or 1С — base64, previews free. `POST /api/v1/reconcile` matches a set of invoices against bank payments and returns matched and unmatched items with a reconciliation report, the same engine behind the reconciliation feature. And `POST /api/v1/merge` consolidates up to 100 documents into one reconciled Excel.

Together these turn the extraction API into a full back-office automation surface: capture an invoice, validate it, export it to your ledger, then reconcile it against the bank statement you extracted from the same API — no manual re-keying anywhere in the chain.

FlowParse
flowparse.io

Pricing, keys and limits

Extraction and file exports bill per page from your page balance; classification is part of extraction, and validation, reconciliation and previews are free. Over-budget calls return `429` with the exact shortfall. Manage keys, rotate per environment, and watch per-key usage in the dashboard. The per-page rate and plans are on the pricing page, the complete reference is in the API docs, and you can try requests in the playground.

CapabilityEndpointBilling
Extract any document → JSONPOST /api/v1/extractPer page
Validate / quality scorePOST /api/v1/validateFree
Export to file / accountingPOST /api/v1/exportPer page (preview free)
Reconcile invoices ↔ paymentsPOST /api/v1/reconcileFree
Merge many → one ExcelPOST /api/v1/mergePer page (preview free)
FlowParse
flowparse.io

What teams automate with it

Accounts payable

Extract supplier invoices, validate totals and tax, and post to the ledger without manual entry.

Expense & receipts

Turn receipt photos into line-item data for expense reports and reimbursement.

Lending & finance ops

Combine statement and invoice extraction to assess cash flow and obligations.

Vertical SaaS

Embed document capture in your product so customers upload PDFs and you get structured data.

Why AI extraction, not templates or rules

Traditional intelligent-document-processing stacks are built from per-template rules: you define where each field lives for each vendor or document layout, and maintain that library forever. It works until a supplier redesigns an invoice, a bank moves a column, or a customer uploads a format nobody anticipated — then the rule misfires silently and bad data flows downstream. The whole approach scales badly because every new layout is an engineering ticket.

An AI document extraction API generalises instead of memorising. It understands that 'the total is the largest tax-inclusive amount near the bottom' or 'this column of dated rows is a transaction table' regardless of exact position, font or wording, so a document it has never seen returns the same clean schema on the first request. That's the difference between an automation that needs constant babysitting and one that just keeps working as your document mix grows. Pair it with the deterministic validation engine and you get generalisation *and* a hard correctness check — the combination most rule-based stacks lack.

FlowParse
flowparse.io

Integrate from any stack

There's no SDK to adopt and no language constraint: the API is JSON over HTTPS, so any HTTP client works. The shape is always the same — base64-encode the document, POST it to `/api/v1/extract`, read the classified `type` and typed `data`, then branch your logic on the type. Because every endpoint shares that schema, a thin internal wrapper of `extract`, `validate`, `export` and `reconcile` functions is usually all you need, and it stays valid as the contract is versioned under `/api/v1`.

A clean pattern at volume is a queue plus workers: enqueue each uploaded document, have a worker call extract then validate, write the result and notify your own system — effectively your own webhook, fully under your control. Make jobs idempotent on a file hash so retries don't duplicate records, cap worker concurrency so a big batch can't drain your page budget at once, and log the billed pages and validation grade for a complete audit trail. The API docs have copy-paste curl, and the playground runs real requests in the browser.

Status codes, billing and limits

Standard HTTP codes make error handling simple, and no call returns unbilled data. `200` carries the structured JSON; `400` is a malformed request; `401` is a bad or missing key; `422` means the document was unreadable or had nothing extractable (not billed for a file); `429` means the page budget is exhausted, with the exact shortfall in the message; and `503` is transient — retry with backoff. Extraction and file exports bill per page from your page balance, while classification is part of extraction and validation, reconciliation and previews are free.

Documents up to 20 MB and multiple pages are supported. For high throughput, extract each document individually and in parallel rather than in one giant call, and consolidate the results with Smart Merge when you need a single workbook. You can build and test the entire flow for free against validation and previews, then enable billed extraction at go-live — plans and the per-page rate are on the pricing page.

CodeMeaningAction
200Success — classified JSON returnedBranch on type, then process
400Bad request (file/base64)Fix the request body
401Invalid or missing keyCheck Authorization / rotate key
422Unreadable / nothing extractable (not billed)Re-scan or send a cleaner file
429Page budget exhaustedTop up or upgrade, then retry
503Temporarily unavailableRetry with backoff

Classification and accuracy you can gate on

A document extraction API earns its place only if you can trust what it returns, so two things matter: correct classification and correct fields. Classification comes first — the engine decides whether a file is an invoice, a receipt, a bank statement or a mixed document, and returns that in the response `type` so your code branches correctly without you sniffing the file yourself. Get this wrong and everything downstream is wrong; get it right and each document flows to the right handler automatically.

Field accuracy is then protected the same way across every type. The engine reconstructs tables from the document's own geometry rather than guessing from flat text, types every value (numbers as numbers, dates as ISO-8601), and records lower confidence instead of inventing data when a source is ambiguous. You convert that into a decision with `/api/v1/validate`, which scores invoices on totals and tax math and statements on balance reconciliation, duplicates and date order. Auto-accept the clean ones, review the rest — and keep the original values in `raw_table` for a complete audit trail. That combination of broad coverage and a hard, per-document check is what makes the API safe to run unattended.

FlowParse
flowparse.io

Best practices for document automation

Treat the extraction API as one step in a pipeline, not a magic box. Validate every document and branch on the grade so a human only ever sees the genuinely ambiguous cases. Prefer original digital files over scans where you can, since the text layer is read exactly. Persist the classified type, the validation score and `billedPages` with each record so you have a queryable audit trail of what was captured, what it cost and what was auto-accepted versus reviewed — invaluable when finance or compliance asks how a number got into the books.

Run extraction behind a queue with a capped worker pool rather than inline on a request, make jobs idempotent on a file hash so retries don't duplicate records, and handle `429` (top up and resume) and `503` (exponential backoff) explicitly. Store only the fields you need and keep PII out of logs. Build and test the whole flow for free against validation and previews, then switch on billed extraction at go-live. Done this way, a document extraction API turns accounts payable, expense capture and statement onboarding into reliable, unattended automation — the guide walks the full pattern with code.

Monitoring usage and controlling cost

Running extraction at volume means watching two things: spend and quality. Every API key tracks its own request count, page total and cost, visible in the dashboard, so you can see exactly what each integration or customer is consuming and spot anomalies — a sudden spike usually means a retry loop or a malformed batch. Setting a budget that matches your plan, and capping worker concurrency so a single run can't exhaust it, keeps cost predictable rather than surprising.

On the quality side, log the validation grade and `billedPages` with every document and chart the auto-accept rate over time. A falling auto-accept rate is an early signal that input quality has dropped — a new scanner, a worse photo flow, a new bank format — and lets you act before bad data reaches the books. Together these two habits turn the API from a black box into an observable, controllable part of your pipeline; the per-page rate and plan limits are on the pricing page.

FlowParse
flowparse.io

Choose where to start

If statements are your focus, start with the bank statement API or the bank statement OCR API for scans. For the generic PDF contract, see the PDF to JSON API. To build a complete integration with batching and error handling, follow the guide to parsing bank statements with an API. Everything is documented at /api-docs.

FlowParse
flowparse.io

Automate document capture end to end

One endpoint to classify and extract invoices, receipts and statements — then validate, export and reconcile over the same API key.

Frequently asked questions

Related