JSON / Developers June 23, 2026 12 min read

Convert a bank statement to JSON

Turn a PDF bank statement into clean, typed JSON your code can use — a `transactions` array with dated, signed amounts and a balance, under a stable, versioned schema. Send a file to the extraction API and get structured JSON back, or use the PDF-to-JSON API for high-volume pipelines. No regex, no per-bank templates, no brittle PDF scraping.

FlowParse
flowparse.io

Why developers want statements as JSON

A bank statement PDF is the worst possible input for software: it is a page layout, not data. Tables wrap across pages, columns drift, amounts are formatted for humans (£1,200.00 DR), and two banks never lay things out the same way. JSON is the opposite — a predictable, typed structure your code can iterate, validate and store. Converting the statement once, into a clean `transactions` array, removes every downstream parsing headache.

The naive approaches all fail at scale. Regex breaks the moment a bank tweaks its template; positional PDF scraping shatters on a multi-line description or a new column order; and hand-built per-bank parsers become an unbounded maintenance burden as you add banks. A converter that reads layout the way a person does — and emits one consistent JSON schema regardless of the source — is what lets you support any bank without writing a parser per format.

Once you have JSON, everything downstream gets simpler: persist it to a database, run it through your categorisation or reconciliation logic, feed it to a lending model, or render it in your own UI. The statement stops being a document you have to read and becomes data you can compute on.

FlowParse
flowparse.io

What the JSON looks like

Every response follows the same envelope: `{ type, pages, billedPages, data }`, where `data` is `{ type, data }`. For a statement the inner `type` is `bank_statement` and the inner `data` carries the account-level fields plus a `transactions` array. Each transaction is an object with a normalised `date`, the full `description`, a signed `amount` (negative for money out, positive for money in) and, where the statement prints it, a running `balance`.

Crucially, the original table is preserved alongside the typed fields in `raw_table` (columns and rows, 1:1 with the PDF), so you never lose a source column the schema doesn't name. You get the best of both: clean typed data to build on, and the untouched original for audit or edge cases.

200 OK — bank_statement
{
  "type": "bank_statement",
  "pages": 12,
  "billedPages": 12,
  "data": {
    "type": "bank_statement",
    "data": {
      "bank_name": "Barclays",
      "account_holder": "ACME LTD",
      "currency": "GBP",
      "opening_balance": 4120.55,
      "closing_balance": 3118.20,
      "transactions": [
        { "date": "2026-03-14", "description": "CARD PAYMENT TfL TRAVEL", "amount": -42.50, "balance": 4078.05 },
        { "date": "2026-03-15", "description": "FASTER PAYMENT FROM CLIENT", "amount": 1200.00, "balance": 5278.05 }
      ]
    }
  }
}

A stable, typed, versioned schema

The contract is versioned under `/api/v1`, so you can build against it with confidence — fields don't disappear from under you. The same envelope covers other document types too, which means one integration handles invoices, receipts and statements without branching parsers: switch on the inner `type` and read the typed fields. For the full developer reference, see the PDF-to-JSON API and the extraction API.

Because the schema is consistent across banks, your code path is the same whether the upload was a Chase PDF, a Barclays scan or a neobank export. There is no per-bank branching, no template registry to maintain, and no special case for a layout you haven't seen — the AI normalises all of them to this one shape.

FieldTypeNotes
bank_name / account_holderstringFrom the statement header where present
currencystringISO-style code, e.g. GBP, USD, EUR
opening_balance / closing_balancenumberUsed for the balance reconciliation check
transactions[].datestringNormalised date (ISO-style)
transactions[].amountnumberSigned: negative out, positive in
raw_tableobjectOriginal columns/rows preserved 1:1
FlowParse
flowparse.io

How to get JSON from a statement

Send the file and read the response — that's the whole flow. POST a base64-encoded PDF (or scan, or XLSX/CSV) with your API key, and the same engine that powers the app runs end to end: layout detection, OCR fallback for image-only pages, field extraction, classification and balance validation. The JSON comes back in one call, already structured and checked. There's a free preview mode for wiring up your integration before you bill a page.

For interactive or one-off needs you can also convert in the app and export, but for anything programmatic the API is the right tool: it's stateless, authenticated per key, billed per page, and returns the JSON directly so you can pipe it straight into your system.

1

Get an API key

Create a key from your dashboard — every request authenticates with it. See the bank statement API docs.

2

POST the file

Send the base64 file to the extraction endpoint; use preview mode first to integrate without billing.

3

Read the JSON

Switch on the inner `type`, then iterate `data.data.transactions` — dated, signed, balance-checked.

4

Store or compute

Persist to your DB, run categorisation/reconciliation, or feed a model — the data is ready to use.

FlowParse
flowparse.io

Scanned statements become JSON too

Plenty of statements arrive as scans or phone photos — image-only PDFs with no text layer at all. The pipeline handles them: pages with no extractable text fall back to OCR, and the recognised text is then structured into the same JSON schema as a digital PDF. From your code's perspective there's no difference — you get the same `transactions` array either way.

Confidence scores accompany uncertain fields so you can decide how to handle low-certainty rows programmatically — auto-accept the high-confidence majority, route the rest for review. That makes it safe to build automated pipelines on real-world documents, including the messy scanned ones, without silently trusting a bad read.

FlowParse
flowparse.io

JSON you can trust: balance-validated

Structured doesn't automatically mean correct, so validation is built into the response. Every statement is balance-checked — opening balance plus the sum of transactions must equal the closing balance — which catches a dropped or duplicated row before the JSON ever reaches your system. For a developer that's the difference between data you can trust and data you have to re-check.

The same deterministic Validation Engine that powers the app runs here, so the JSON you ingest has already passed the checks a careful human would run. You can surface the validation result in your own pipeline, gate on it, or log it — but you're not starting from raw, unverified extraction.

FlowParse
flowparse.io

What developers build on statement JSON

Lending and underwriting platforms ingest applicant statements to compute income, expense and balance signals for their own risk models — JSON in, decision features out, with the raw transactions retained for audit. See bank statement analysis for loans for the analysis angle. Accounting and bookkeeping tools use the JSON to auto-import transactions and reconcile, skipping manual entry entirely.

Personal-finance and budgeting apps turn the `transactions` array into categorised spending views; expense and travel tools match transactions to receipts; and internal finance teams build month-end automation that pulls every account into one normalised dataset. In each case the value is the same: a clean, consistent schema across every bank means one integration instead of an endless backlog of per-format parsers.

FlowParse
flowparse.io

Status codes and error handling

Building a reliable pipeline means handling the unhappy paths, not just the `200`. The API uses standard HTTP status codes so your integration can branch cleanly: a successful extraction returns `200` with the JSON body; an authentication problem returns `401`; a file that genuinely can't be read as a financial document returns `422` and isn't billed; and hitting your balance or rate limit returns `429`. Because an unreadable file isn't charged, you can fail fast without paying for a bad upload.

The practical pattern is to switch on the status first, then on the inner `type`. Treat `401` as a configuration error (rotate or check the key), `422` as "this upload isn't usable" (surface it to the user rather than retrying blindly), and `429` as back-pressure (slow down or top up). On `200`, read `billedPages` to track cost and iterate the transactions. Handling these four cases covers essentially every real-world outcome.

StatusMeaningWhat to do
200Extraction succeededRead data.data.transactions; log billedPages
401Missing or invalid API keyCheck or rotate the key — config error
422File not a readable financial docSurface to user; not billed, don't retry blindly
429Rate or balance limit reachedBack off or top up, then retry
preview modeFree integration testBuild and verify without billing a page
FlowParse
flowparse.io

Reading the transactions array

Once you have the `200` body, the data you most often want lives at `data.data.transactions` — an ordered array of objects you can iterate directly. Each carries a normalised `date`, a `description`, a signed `amount` and, where present, a `balance`. Because amounts are signed numbers, summing inflows and outflows is a one-line reduce rather than a parse-and-clean exercise, and because dates are normalised you can group by month without wrangling a dozen regional formats.

The account-level fields sit one level up alongside the array — `bank_name`, `account_holder`, `currency`, `opening_balance`, `closing_balance` — which is what you need to label the data and to run your own reconciliation if you want to double-check the engine's. And when you hit a field the typed schema doesn't name, `raw_table` has the original columns and rows verbatim, so an unusual statement never leaves you stuck. In practice most integrations read a handful of fields and the transactions array, and ignore the rest until they need it.

Ordering is reliable too: transactions come back in statement order, so you can present or persist them without re-sorting, and a stable order makes diffing or de-duplicating across overlapping statements straightforward. If you need a different arrangement — newest-first for a UI, grouped by month for a report — you sort the array yourself once it's in memory, which is trivial because the dates are already normalised. The combination of a predictable shape, signed numbers, normalised dates and preserved order is what makes the array genuinely pleasant to build on, rather than something you have to defensively clean before you can trust it.

FlowParse
flowparse.io

Why not regex, templates or scraping

It's tempting to start with a regex against the PDF text, and it works — for exactly one bank's current template. The moment that bank changes its layout, or you add a second bank, the approach collapses into a thicket of special cases. Positional scraping (reading text by x/y coordinates) is even more fragile: a wrapped description or a shifted column throws the whole row off, silently.

An AI converter sidesteps this by reading the statement the way a person does — understanding that this column is the amount and that block is a description, regardless of pixel position — and emitting one schema for all of them. You trade a growing pile of brittle parsers for a single API call. The maintenance cost of supporting a new bank drops to zero, because there's nothing bank-specific to maintain.

FlowParse
flowparse.io

Running it at volume

The API is stateless and authenticated per key, so it scales horizontally with your workload — fire requests concurrently and bill per page. A preview mode lets you build and test the integration without incurring page charges, and usage is metered per key so you can attribute cost to a customer or a job. For pipelines processing thousands of statements a month, that predictability matters as much as the extraction itself.

Because every response carries `pages` and `billedPages`, your own accounting of cost is exact, and the consistent schema means you can batch-process a mixed pile of banks through one code path. Pair it with Smart Merge when you need many statements consolidated rather than returned individually.

FlowParse
flowparse.io

Security and data handling

Financial data demands care, and the handling reflects that: requests run over TLS to EU-hosted infrastructure, the uploaded file is deleted right after processing, and documents are never used to train AI models. API keys are yours to rotate and revoke, and usage is logged per key so you have a clear audit trail of what was processed.

Nothing about the JSON path weakens this — the data goes in, the structured result comes back, and the source file doesn't linger. That makes the API suitable for handling your customers' statements within your own product, where you carry the trust and need a processor that won't become a liability.

FlowParse
flowparse.io

JSON, or Excel, CSV and Sheets

JSON is the right output for code, but the same conversion feeds every other format too. When a human needs the data, export the same statement to Excel, CSV or Google Sheets; when your accounting lives elsewhere, push it to QuickBooks or Xero. One accurate extraction, many destinations — you choose per use case rather than re-converting.

For teams that are partly technical and partly not, this matters: developers consume the JSON API while finance staff use the app's spreadsheet exports, both backed by the same engine and the same validated data. There's no divergence between what the code sees and what the spreadsheet shows.

FlowParse
flowparse.io

Get clean JSON from any statement

Send a PDF or scan to the API and get a typed, balance-validated transactions array back — one schema for every bank, no templates to maintain.

Frequently asked questions

Related