Overview: what you're building
Parsing a bank statement means going from an unstructured PDF — pages of dates, descriptions and amounts laid out in a bank-specific table — to structured data your code can use: a list of transactions with typed fields, plus the account header and period balances. Doing that by hand, or with brittle regex, doesn't scale. An API does: you POST the file and get back JSON. This guide walks the whole loop with the FlowParse bank statement API — extract, validate, export, reconcile — so by the end you have a production-ready integration, not just a hello-world call.
You'll use one endpoint to read statements (/api/v1/extract) and a handful more to act on the result. Everything authenticates with a single key, bills per page, and shares one stable JSON schema, so the output of one call is the input of the next. If you want the conceptual background on the data itself, the PDF to JSON API page covers the schema in depth.
A quick note on scope before we dive in: "parsing" here means the full journey from an opaque PDF to data your application can act on, not just pulling out raw text. Plenty of tools can dump the words on a page; almost none rebuild the transaction table correctly, keep debits and credits straight, and prove the result reconciles. Those last three are where statements go wrong in practice, so this guide treats extraction, validation and a clean hand-off to your ledger or accounting software as one connected problem rather than five disconnected scripts. Everything below uses calls you can run today against the live API, with no SDK to install.
Why use an API instead of templates
Every bank formats statements differently, and they change those formats without warning. Template- or regex-based parsers break the moment a column moves or a new bank appears, so you end up maintaining an ever-growing pile of fragile rules. An AI extraction API generalises across layouts — it reads a statement by understanding what each field means, not where it sits in pixels — so a bank you've never seen works on the first request. That's the difference between a one-off script and a bank statement converter you can depend on.
- No per-bank templates to author or maintain — new layouts just work.
- Every row preserved, even on long multi-page statements where naive parsers drop lines.
- Debits and credits normalised to one signed amount, dates to ISO-8601.
- A built-in correctness check (opening + transactions = closing) you can gate on.
- One schema flows straight into validation, export and reconciliation.
Before you start
You need three things: an account, an API key, and a statement file to test with. Sign up, then create a key from the API dashboard — keys are revealed once, so copy it somewhere safe. Use a separate key per environment (dev, staging, prod) so you can rotate without downtime. Keep the key server-side; never ship it in a browser or mobile app.
For the file, any text-based PDF, scanned PDF, image (PNG/JPG), XLSX or CSV statement works. Validation and previews are free, so you can build and test the entire pipeline before you spend a single page — see the pricing page for how the page balance works. Full reference and live calls are in the API docs and the playground.
Step 1 — Authenticate
Every request carries your key as a bearer token in the Authorization header (an X-API-Key header is also accepted). The base URL is https://flowparse.io/api/v1. A quick way to confirm your key works is a free validate call:
curl -X POST https://flowparse.io/api/v1/validate \
-H "Authorization: Bearer pf_live_xxx" \
-H "Content-Type: application/json" \
-d '{ "type": "bank_statement", "data": { "transactions": [] } }'
# 200 → { "validations": [ ... ] } 401 → invalid keyStep 2 — Extract the statement
Base64-encode the statement and POST it to /api/v1/extract. The whole request is plain JSON — the file travels as a base64 string in the file field, with an optional filename so the type is detected. No multipart handling required.
# encode the PDF (shell example)
B64=$(base64 -w0 october.pdf)
curl -X POST https://flowparse.io/api/v1/extract \
-H "Authorization: Bearer pf_live_xxx" \
-H "Content-Type: application/json" \
-d "{ \"file\": \"$B64\", \"filename\": \"october.pdf\" }"The same call in Node uses any HTTP client — read the file, base64-encode it, and send JSON:
import { readFileSync } from "node:fs"
const file = readFileSync("october.pdf").toString("base64")
const res = await fetch("https://flowparse.io/api/v1/extract", {
method: "POST",
headers: {
Authorization: "Bearer " + process.env.FLOWPARSE_KEY,
"Content-Type": "application/json",
},
body: JSON.stringify({ file, filename: "october.pdf" }),
})
const json = await res.json()Step 3 — Read the structured JSON
A successful call returns { type, pages, billedPages, data }. The data holds the account header and a transactions array where every row is already normalised: ISO date, a single signed amount (credits positive, debits negative) and the running balance.
{
"type": "bank_statement",
"pages": 4,
"billedPages": 4,
"data": {
"type": "bank_statement",
"data": {
"bank_name": "Sterling Bank",
"account_holder": "ACME TRADING LTD",
"currency": "GBP",
"opening_balance": 4120.55,
"closing_balance": 6134.80,
"transactions": [
{ "date": "2024-10-03", "description": "STRIPE PAYMENTS UK LTD", "amount": 2480.00, "balance": 6600.55 },
{ "date": "2024-10-05", "description": "AWS EMEA", "amount": -312.40, "balance": 6288.15 }
],
"raw_table": { "columns": ["Date","Description","Money Out","Money In","Balance"], "rows": [ ] }
}
}
}Insert the rows straight into your ledger table — no post-processing needed. If you also want the original column layout for audit (reference codes, transaction types, card last-fours), read data.data.raw_table, which preserves every source column 1:1. The key fields are:
| Field | Type | Notes |
|---|---|---|
| bank_name / account_holder | string | Header, as printed |
| currency | string | ISO code (GBP, USD, EUR…) |
| opening_balance / closing_balance | number | Used by the reconciliation check |
| transactions[].date | string | ISO-8601 (YYYY-MM-DD) |
| transactions[].amount | number | Signed: credit +, debit − |
| transactions[].balance | number | Running balance after the row |
| raw_table | object | Original columns + rows, 1:1 |
Step 4 — Validate before you trust it
Structured doesn't automatically mean correct — especially for scanned statements where OCR can misread a digit. Pipe the returned data straight into /api/v1/validate to get a 0–100 quality score, a letter grade and concrete checks: balance reconciliation, duplicate detection, date order and low-confidence fields. Validation is free.
curl -X POST https://flowparse.io/api/v1/validate \
-H "Authorization: Bearer pf_live_xxx" \
-d '{ "type": "bank_statement", "data": { ... } }'
# → { "validations": [ { "score": { "value": 100, "grade": "A" }, "checks": [ ... ] } ] }Use the grade as a gate: auto-accept high scores and route only the genuinely ambiguous statements to a human. That's how you run extraction at volume without quietly importing wrong numbers — the validation engine lists every rule it applies.
Step 5 — Export to Excel or accounting software
When the destination is a spreadsheet or accounting system, hand the JSON to /api/v1/export. It returns a base64 file in the format you ask for: xlsx, csv, quickbooks (.QBO), qfx, ofx, xero and more. Bank-feed files use OFX 1.0.2 with FITID de-duplication, so re-imports never double-post.
curl -X POST https://flowparse.io/api/v1/export \
-H "Authorization: Bearer pf_live_xxx" \
-d '{ "format": "quickbooks", "type": "bank_statement", "data": { ... } }'
# → { "format":"qbo", "filename":"acme-oct.qbo", "encoding":"base64", "content":"T0ZYSER..." }Add "preview": true to inspect the column mapping for free before you generate the billed file. For the full format list and import steps see PDF to QBO and bank statement to Xero.
Step 6 — Reconcile against invoices
If you're matching incoming payments to invoices, /api/v1/reconcile takes your invoices and the statement's payments and returns matched and unmatched items with a reconciliation report — the same engine behind the reconciliation feature. Reconciliation is free.
curl -X POST https://flowparse.io/api/v1/reconcile \
-H "Authorization: Bearer pf_live_xxx" \
-d '{ "invoices": [ ... ], "payments": [ ... ] }'
# → { "report": { "matched": [ ... ], "unmatched": [], "currency": "EUR" } }Step 7 — Batch and merge many statements
For a year of statements or a whole portfolio, extract each document — you can parallelise this across workers — then call /api/v1/mergeto consolidate up to 100 already-extracted documents into one reconciled Excel: unified columns across banks, duplicate rows removed, per-row source tracking. It's Smart Merge over the API. Pass preview: true to see the summary and sheet previews for free before spending pages on the file.
const docs = []
for (const path of statementPaths) {
const r = await extract(path) // POST /api/v1/extract
const v = await validate(r.data) // POST /api/v1/validate (free)
if (v.grade >= "B") docs.push(r.data) // auto-accept; else queue for review
}
const merged = await merge(docs) // POST /api/v1/merge → one ExcelErrors, status codes & rate limits
The API uses standard HTTP status codes and never returns unbilled data. Handle them explicitly so your integration degrades gracefully:
| Code | Meaning | What to do |
|---|---|---|
| 200 | Success — structured data returned | Process the JSON |
| 400 | Bad request (missing/invalid file or base64) | Fix the request body |
| 401 | Invalid or missing API key | Check the Authorization header / rotate the key |
| 422 | Unreadable or nothing extractable (not billed) | Re-scan at higher quality or send the original PDF |
| 429 | Page budget exhausted | Top up or upgrade, then retry |
| 503 | Temporarily unavailable | Retry with exponential backoff |
Billing is per page and drawn from your page balance (monthly allowance first, then top-up pages); validation, reconciliation and previews are free. A 429 tells you exactly how many pages the request needed versus how many were available, so spend is always predictable — manage it on the pricing page.
Choosing the right output format
Once a statement is structured, the question is what to do with it — and that depends on where the data needs to land. If you're storing transactions in your own database, the raw JSON is all you need. If a human or another system expects a spreadsheet, ask /api/v1/export for xlsx or csv. If the destination is accounting software, choose a bank-feed format so the import is one click for the user rather than a fragile column-mapping exercise.
For QuickBooks and Quicken, quickbooks (.QBO) and qfx produce native bank-feed files; for tools that accept generic Open Financial Exchange, ofx works with GnuCash, Sage and others; and xero emits a Xero-friendly CSV. All of these use OFX 1.0.2 with a stable FITID per transaction, which is what stops a re-import double-posting rows the user already has.
A good rule of thumb: default to a bank-feed file when the user's end goal is accounting software, and to XLSX/CSV when they want to analyse or share the data. You can always offer both. The import steps for each destination are covered on PDF to QBO, bank statement to Quicken and bank statement to Xero.
| format | Output | Best for |
|---|---|---|
| quickbooks | .QBO bank feed | QuickBooks Online / Desktop |
| qfx | .QFX bank feed | Quicken (Web Connect) |
| ofx | .OFX 1.0.2 | GnuCash, Sage, MoneyDance |
| xero | Xero-format CSV | Xero bank import |
| xlsx / csv | Spreadsheet | Analysis, sharing, custom imports |
A worked example, end to end
Let's walk a single statement through the whole loop. Suppose a customer uploads october.pdf — a four-page business current-account statement. Your worker base64-encodes it and POSTs it to /api/v1/extract. A few seconds later it gets back the structured JSON: an opening balance of 4,120.55, a closing balance of 6,134.80, and atransactions array where the Stripe payout is +2480.00 and the AWS charge is -312.40. The response also reports billedPages: 4, which your worker records for the audit log.
Next it calls /api/v1/validate with that data. The response scores it 100/A: opening plus the sum of transactions equals the closing balance, no duplicates, dates in order. Because the grade clears your threshold, the worker auto-accepts: it writes the transactions to your ledger table, stores the validation grade alongside them, and moves on without any human touch. Had the score come back amber — say a balance break from one misread row — the worker would instead queue the document for a quick review.
Finally, your product needs the data inside QuickBooks, so the worker calls /api/v1/export with format: "quickbooks" and gets a base64 .QBO file back, which it hands to the customer to import. One upload became validated transactions and an importable bank feed, with three API calls and no manual entry. Scale that pattern across thousands of statements and you have an unattended pipeline — exactly what the bank statement API is built for.
Scaling: concurrency, retries & idempotency
The extract call is synchronous and a large multi-page scan can take a little while, so don't call it inline on a user-facing web request at volume. Put extraction behind a queue and a pool of workers: the upload handler stores the file and enqueues a job; workers pull jobs, call extract and validate, and write results. This keeps your app responsive, lets you tune throughput by adding workers, and gives you a natural place to implement retries and backoff. It's effectively your own webhook flow, built from infrastructure you control.
Make jobs idempotent by keying each on a content hash of the file, so a retried or duplicated upload updates the same record instead of creating a second one. When you later export to a bank feed, the OFX FITID on each transaction is a stable identifier you can use to de-duplicate across re-imports — the same mechanism that stops QuickBooks double-posting. For transient 503s, retry with exponential backoff; for a 429, pause the affected worker until the page budget is topped up rather than spinning on the endpoint.
Finally, cap concurrency. Billing is per page, so an uncapped batch of large statements can spend your whole balance in one burst. A modest worker pool with a concurrency limit gives you predictable spend and steady throughput, and it plays nicely with the per-page budget you set on the pricing page. For consolidating a finished batch into one workbook, hand the validated documents to Smart Merge via /api/v1/merge.
Test the whole flow for free
You don't need to spend a single page to build and prove your integration. Validation is free, and both export and merge offer free previews, so you can wire up the entire pipeline — authenticate, validate sample data, preview an export, preview a merge — and confirm your code handles every response shape before any billed call runs. That makes it easy to develop against the real API in CI and in staging without burning budget.
When you're ready to test extraction itself, run a handful of real statements through /api/v1/extract and compare the JSON against the source — check that totals reconcile, dates parse, and signs are right. Use the API playground to fire ad-hoc requests from the browser while you're exploring, and the API docs for the exact request and response of every endpoint. Keep a small fixture set of representative statements (a clean digital PDF, a scan, a multi-currency account) so you can re-run them whenever you change your integration.
Watch your usage as you go: each key's request and page totals are visible in the dashboard, so you can see exactly what a test run cost and set a budget that matches your launch plan on the pricing page. Building free-first, then switching on billed extraction at go-live, is the cheapest and safest path to production. Keep that fixture set in version control alongside your integration tests, so every future change to your code is checked against the same known statements and you catch a regression — a dropped row, a mis-signed amount, a wrong total — long before it can reach a customer's books.
Security & compliance
Bank statements are among the most sensitive documents your system will ever touch, so treat the integration accordingly. Calls run over HTTPS and authenticate with a hashed key; keep that key strictly server-side and out of any browser, mobile app or client bundle. Use a separate key per environment and, where it helps, per customer — if one is ever exposed you revoke and replace it with zero downtime. Every request is logged with the document label and page cost, giving you a clean audit trail of what was processed and what it cost.
Because you control the request, you also control retention. The uploaded file is processed to produce the JSON response and isn't retained as a downloadable document on your behalf; on your side, store only the fields you actually need, drop raw_tableif you don't use it, and keep account numbers and other PII out of your application logs. FlowParse never uses your documents to train models. For the platform's wider posture — encryption, data handling and compliance — see the security page.
Common mistakes
- Skipping validation. Always score extraction with /api/v1/validate and gate on the grade — don't import unchecked rows.
- Sending a photo when a digital PDF exists. The original PDF is read error-free; OCR a scan only when there's no text layer.
- Re-summing debits and credits yourself. Amounts are already signed — sum the amount column directly.
- Treating 422 as a failure to retry blindly. It means the file was unreadable or had nothing extractable; fix the input.
- Putting the API key in client-side code. Keep keys server-side and rotate per environment.
- Ignoring raw_table. If you need reference codes or transaction types, they're preserved there 1:1.
Best practices
- Build the whole flow against free validation and previews first, then switch on billed extraction.
- Parallelise extraction across workers for batches, but cap concurrency so you don't blow your page budget in one burst.
- Persist the validation score with each document so you have an audit trail of what was auto-accepted.
- Store only the fields you need; drop raw_table if you don't, and keep PII out of your logs.
- Use a separate API key per environment and customer so you can revoke and rotate without downtime.
- Reconcile statements against invoices to catch missing or duplicate payments early.
That's the full loop: authenticate, extract, validate, export and reconcile — all over one key and one schema. For the complete reference see the API docs; to go deeper on each surface, read the bank statement API, the bank statement OCR API for scans, the PDF to JSON API, and the document extraction API for invoices and receipts.
Start parsing statements via API
Create a key, POST one statement to /api/v1/extract, and get clean transaction JSON back — then validate, export to QuickBooks or Xero, and reconcile over the same API.
