How do I convert a bank statement to JSON?

POST the file (base64) to the extraction API with your key. The engine runs layout detection, OCR fallback, extraction and balance validation, and returns structured JSON with a transactions array in one call.

What does the JSON look like?

An envelope of { type, pages, billedPages, data }, where data.data holds bank_name, account_holder, currency, opening/closing balance and a transactions[] array of { date, description, amount (signed), balance }. The original table is kept in raw_table.

Is the schema stable?

Yes. It's versioned under /api/v1 and consistent across document types — switch on the inner type to read invoice, receipt or bank_statement fields with the same envelope.

Does it work for any bank?

Yes. The AI normalises any layout to one schema, so there's no per-bank template or branching. US, UK, Canada, Australia and EU banks and neobanks all return the same shape.

Do scanned statements return JSON?

Yes. Image-only PDFs fall back to OCR, then the recognised text is structured into the same JSON schema, with confidence scores on uncertain fields.

Yes. Each transaction amount is a signed number — negative for money out, positive for money in — so you can sum and compute without parsing DR/CR text.

Is the data validated?

Yes. Every statement is balance-checked (opening + transactions = closing), so a dropped or duplicated row is caught before the JSON reaches you.

How is this better than regex or templates?

Regex and positional scraping break on layout changes, wrapped descriptions and new banks. An AI converter reads layout semantically and emits one schema for all banks, so there's no per-format parser to maintain.

Per page. Every response includes pages and billedPages, and usage is metered per API key so you can attribute cost precisely.

Is the original table preserved?

Yes. Alongside the typed fields, raw_table keeps the statement's original columns and rows 1:1, so you never lose a column the schema doesn't explicitly name.

Can I also get Excel or CSV?

Yes. The same conversion exports to Excel, CSV or Google Sheets for human use, or pushes to QuickBooks/Xero — JSON is just one destination.

How accurate is the extraction?

Around 98% field-level accuracy on standard formats, with balance validation and confidence scoring so you can trust or route each row programmatically.

Where are the API docs?

See the PDF-to-JSON API and document extraction API pages for the full reference, endpoints and examples.

Bank Statement to JSON — Convert PDF Statements to Structured Data

Q: Can I test without being billed?

Yes. A preview mode lets you wire up and test the integration without incurring page charges; pages are billed only on real extraction.

Why developers want statements as JSON

A bank statement PDF is the worst possible input for software: it is a page layout, not data. Tables wrap across pages, columns drift, amounts are formatted for humans (£1,200.00 DR), and two banks never lay things out the same way. JSON is the opposite — a predictable, typed structure your code can iterate, validate and store. Converting the statement once, into a clean `transactions` array, removes every downstream parsing headache.

The naive approaches all fail at scale. Regex breaks the moment a bank tweaks its template; positional PDF scraping shatters on a multi-line description or a new column order; and hand-built per-bank parsers become an unbounded maintenance burden as you add banks. A converter that reads layout the way a person does — and emits one consistent JSON schema regardless of the source — is what lets you support any bank without writing a parser per format.

Once you have JSON, everything downstream gets simpler: persist it to a database, run it through your categorisation or reconciliation logic, feed it to a lending model, or render it in your own UI. The statement stops being a document you have to read and becomes data you can compute on.

flowparse.io

What the JSON looks like

Every response follows the same envelope: `{ type, pages, billedPages, data }`, where `data` is `{ type, data }`. For a statement the inner `type` is `bank_statement` and the inner `data` carries the account-level fields plus a `transactions` array. Each transaction is an object with a normalised `date`, the full `description`, a signed `amount` (negative for money out, positive for money in) and, where the statement prints it, a running `balance`.

Crucially, the original table is preserved alongside the typed fields in `raw_table` (columns and rows, 1:1 with the PDF), so you never lose a source column the schema doesn't name. You get the best of both: clean typed data to build on, and the untouched original for audit or edge cases.

200 OK — bank_statement

{
  "type": "bank_statement",
  "pages": 12,
  "billedPages": 12,
  "data": {
    "type": "bank_statement",
    "data": {
      "bank_name": "Barclays",
      "account_holder": "ACME LTD",
      "currency": "GBP",
      "opening_balance": 4120.55,
      "closing_balance": 3118.20,
      "transactions": [
        { "date": "2026-03-14", "description": "CARD PAYMENT TfL TRAVEL", "amount": -42.50, "balance": 4078.05 },
        { "date": "2026-03-15", "description": "FASTER PAYMENT FROM CLIENT", "amount": 1200.00, "balance": 5278.05 }
      ]
    }
  }
}

A stable, typed, versioned schema

The contract is versioned under `/api/v1`, so you can build against it with confidence — fields don't disappear from under you. The same envelope covers other document types too, which means one integration handles invoices, receipts and statements without branching parsers: switch on the inner `type` and read the typed fields. For the full developer reference, see the PDF-to-JSON API and the extraction API.

Because the schema is consistent across banks, your code path is the same whether the upload was a Chase PDF, a Barclays scan or a neobank export. There is no per-bank branching, no template registry to maintain, and no special case for a layout you haven't seen — the AI normalises all of them to this one shape.

Field	Type	Notes
bank_name / account_holder	string	From the statement header where present
currency	string	ISO-style code, e.g. GBP, USD, EUR
opening_balance / closing_balance	number	Used for the balance reconciliation check
transactions[].date	string	Normalised date (ISO-style)
transactions[].amount	number	Signed: negative out, positive in
raw_table	object	Original columns/rows preserved 1:1

flowparse.io

How to get JSON from a statement

Send the file and read the response — that's the whole flow. POST a base64-encoded PDF (or scan, or XLSX/CSV) with your API key, and the same engine that powers the app runs end to end: layout detection, OCR fallback for image-only pages, field extraction, classification and balance validation. The JSON comes back in one call, already structured and checked. There's a free preview mode for wiring up your integration before you bill a page.

For interactive or one-off needs you can also convert in the app and export, but for anything programmatic the API is the right tool: it's stateless, authenticated per key, billed per page, and returns the JSON directly so you can pipe it straight into your system.

Get an API key

Create a key from your dashboard — every request authenticates with it. See the bank statement API docs.

POST the file

Send the base64 file to the extraction endpoint; use preview mode first to integrate without billing.

Read the JSON

Switch on the inner `type`, then iterate `data.data.transactions` — dated, signed, balance-checked.

Store or compute

Persist to your DB, run categorisation/reconciliation, or feed a model — the data is ready to use.

flowparse.io

Scanned statements become JSON too

Plenty of statements arrive as scans or phone photos — image-only PDFs with no text layer at all. The pipeline handles them: pages with no extractable text fall back to OCR, and the recognised text is then structured into the same JSON schema as a digital PDF. From your code's perspective there's no difference — you get the same `transactions` array either way.

Confidence scores accompany uncertain fields so you can decide how to handle low-certainty rows programmatically — auto-accept the high-confidence majority, route the rest for review. That makes it safe to build automated pipelines on real-world documents, including the messy scanned ones, without silently trusting a bad read.

flowparse.io

JSON you can trust: balance-validated

Structured doesn't automatically mean correct, so validation is built into the response. Every statement is balance-checked — opening balance plus the sum of transactions must equal the closing balance — which catches a dropped or duplicated row before the JSON ever reaches your system. For a developer that's the difference between data you can trust and data you have to re-check.

The same deterministic Validation Engine that powers the app runs here, so the JSON you ingest has already passed the checks a careful human would run. You can surface the validation result in your own pipeline, gate on it, or log it — but you're not starting from raw, unverified extraction.

flowparse.io

What developers build on statement JSON

Lending and underwriting platforms ingest applicant statements to compute income, expense and balance signals for their own risk models — JSON in, decision features out, with the raw transactions retained for audit. See bank statement analysis for loans for the analysis angle. Accounting and bookkeeping tools use the JSON to auto-import transactions and reconcile, skipping manual entry entirely.

Personal-finance and budgeting apps turn the `transactions` array into categorised spending views; expense and travel tools match transactions to receipts; and internal finance teams build month-end automation that pulls every account into one normalised dataset. In each case the value is the same: a clean, consistent schema across every bank means one integration instead of an endless backlog of per-format parsers.

flowparse.io

Status codes and error handling

Building a reliable pipeline means handling the unhappy paths, not just the `200`. The API uses standard HTTP status codes so your integration can branch cleanly: a successful extraction returns `200` with the JSON body; an authentication problem returns `401`; a file that genuinely can't be read as a financial document returns `422` and isn't billed; and hitting your balance or rate limit returns `429`. Because an unreadable file isn't charged, you can fail fast without paying for a bad upload.

The practical pattern is to switch on the status first, then on the inner `type`. Treat `401` as a configuration error (rotate or check the key), `422` as "this upload isn't usable" (surface it to the user rather than retrying blindly), and `429` as back-pressure (slow down or top up). On `200`, read `billedPages` to track cost and iterate the transactions. Handling these four cases covers essentially every real-world outcome.

Status	Meaning	What to do
200	Extraction succeeded	Read data.data.transactions; log billedPages
401	Missing or invalid API key	Check or rotate the key — config error
422	File not a readable financial doc	Surface to user; not billed, don't retry blindly
429	Rate or balance limit reached	Back off or top up, then retry
preview mode	Free integration test	Build and verify without billing a page

flowparse.io

Reading the transactions array

Once you have the `200` body, the data you most often want lives at `data.data.transactions` — an ordered array of objects you can iterate directly. Each carries a normalised `date`, a `description`, a signed `amount` and, where present, a `balance`. Because amounts are signed numbers, summing inflows and outflows is a one-line reduce rather than a parse-and-clean exercise, and because dates are normalised you can group by month without wrangling a dozen regional formats.

The account-level fields sit one level up alongside the array — `bank_name`, `account_holder`, `currency`, `opening_balance`, `closing_balance` — which is what you need to label the data and to run your own reconciliation if you want to double-check the engine's. And when you hit a field the typed schema doesn't name, `raw_table` has the original columns and rows verbatim, so an unusual statement never leaves you stuck. In practice most integrations read a handful of fields and the transactions array, and ignore the rest until they need it.

Ordering is reliable too: transactions come back in statement order, so you can present or persist them without re-sorting, and a stable order makes diffing or de-duplicating across overlapping statements straightforward. If you need a different arrangement — newest-first for a UI, grouped by month for a report — you sort the array yourself once it's in memory, which is trivial because the dates are already normalised. The combination of a predictable shape, signed numbers, normalised dates and preserved order is what makes the array genuinely pleasant to build on, rather than something you have to defensively clean before you can trust it.

flowparse.io

Why not regex, templates or scraping

It's tempting to start with a regex against the PDF text, and it works — for exactly one bank's current template. The moment that bank changes its layout, or you add a second bank, the approach collapses into a thicket of special cases. Positional scraping (reading text by x/y coordinates) is even more fragile: a wrapped description or a shifted column throws the whole row off, silently.

An AI converter sidesteps this by reading the statement the way a person does — understanding that this column is the amount and that block is a description, regardless of pixel position — and emitting one schema for all of them. You trade a growing pile of brittle parsers for a single API call. The maintenance cost of supporting a new bank drops to zero, because there's nothing bank-specific to maintain.

flowparse.io

Running it at volume

The API is stateless and authenticated per key, so it scales horizontally with your workload — fire requests concurrently and bill per page. A preview mode lets you build and test the integration without incurring page charges, and usage is metered per key so you can attribute cost to a customer or a job. For pipelines processing thousands of statements a month, that predictability matters as much as the extraction itself.

Because every response carries `pages` and `billedPages`, your own accounting of cost is exact, and the consistent schema means you can batch-process a mixed pile of banks through one code path. Pair it with Smart Merge when you need many statements consolidated rather than returned individually.

flowparse.io

Security and data handling

Financial data demands care, and the handling reflects that: requests run over TLS to EU-hosted infrastructure, the uploaded file is deleted right after processing, and documents are never used to train AI models. API keys are yours to rotate and revoke, and usage is logged per key so you have a clear audit trail of what was processed.

Nothing about the JSON path weakens this — the data goes in, the structured result comes back, and the source file doesn't linger. That makes the API suitable for handling your customers' statements within your own product, where you carry the trust and need a processor that won't become a liability.

flowparse.io

JSON, or Excel, CSV and Sheets

JSON is the right output for code, but the same conversion feeds every other format too. When a human needs the data, export the same statement to Excel, CSV or Google Sheets; when your accounting lives elsewhere, push it to QuickBooks or Xero. One accurate extraction, many destinations — you choose per use case rather than re-converting.

For teams that are partly technical and partly not, this matters: developers consume the JSON API while finance staff use the app's spreadsheet exports, both backed by the same engine and the same validated data. There's no divergence between what the code sees and what the spreadsheet shows.

flowparse.io

Get clean JSON from any statement

Send a PDF or scan to the API and get a typed, balance-validated transactions array back — one schema for every bank, no templates to maintain.

Frequently asked questions

PDF to JSON API Document Extraction API Bank Statement API Bank Statement OCR API Bank Statement to Excel Bank Statement to Google Sheets PDF to CSV Statement Analysis for Loans Scanned Statement to Excel Bank Statement Converter (hub)

Convert a bank statement to JSON

Why developers want statements as JSON

What the JSON looks like

A stable, typed, versioned schema

How to get JSON from a statement

Get an API key

POST the file

Read the JSON

Store or compute

Scanned statements become JSON too

JSON you can trust: balance-validated

What developers build on statement JSON

Status codes and error handling

Reading the transactions array

Why not regex, templates or scraping

Running it at volume

Security and data handling

JSON, or Excel, CSV and Sheets

Get clean JSON from any statement

Frequently asked questions

Related