What is 'row loss' in PDF extraction?

Row loss is when a converter reads a bank statement or financial PDF but silently drops some of the transactions — so the output looks complete and plausible but is actually missing data. It's distinct from misreading a value: the row simply never appears. Because nothing flags it, you usually discover it weeks later when a total or a balance won't reconcile, which is what makes it the most dangerous kind of extraction error.

Why do bank statement converters drop rows?

Most commonly because they fragment a page into separate tables at every subtotal, blank line or marker, then require a clean header row for each fragment and abandon the fragments that don't have one. Continuation rows that span a page break, sections without a repeated header, and tables that change shape mid-document are exactly where rows get orphaned and discarded. Treating a data row that contains an amount or a date as a 'header' compounds it.

How did FlowParse fix row loss?

By rewriting extraction to a document-level model. Instead of splitting a page into header-bearing fragments, it infers the column layout from geometry, streams every data row into the right layout, and skips only repeated headers and footers — never data. A strict header score means a row containing any amount, date or identifier is never mistaken for a header. The result is one complete table per file, with continuation rows and headerless sections preserved.

How was the improvement measured?

With a regression harness that runs the real production code path — extract, consolidate, export — against a ten-document stress set with known ground truth, asserting per-file row counts, balance reconciliation and the absence of fabricated rows. The old approach lost 50–88% of rows on the hardest files; the document-level model brought every file to full fidelity, taking the merged output from 843 to 1,155 rows with no invented grand-total.

Does this apply to scanned statements too?

Yes. Scanned and photographed statements run through OCR first, then the same document-level structuring and the same balance check. So an OCR error that would otherwise drop or garble a row breaks the reconciliation and is surfaced, rather than passing unnoticed. Image quality still matters, but the completeness safety net is the same as for digital PDFs.

What is Merge Review?

When you combine many statements, FlowParse opens an editable grid — Merge Review — with a quality score and every questionable cell highlighted, plus an issues panel that jumps you straight to each problem. You correct anything in place and only export once you're satisfied. It's the human safety net that complements the automated completeness checks, applied to the whole consolidated set.

Why not just trust a high accuracy percentage?

Because a single percentage hides the failure mode that matters most. A tool can read every field it captures at 99% and still drop whole rows, which a field-accuracy figure never reflects. Row completeness and field correctness are different problems; FlowParse protects completeness separately with the document-level model and the balance check, so you don't have to take an aggregate number on faith.

Does eliminating row loss slow extraction down?

No meaningfully. The document-level model is deterministic table logic, not an extra AI pass, so it adds negligible time. Extraction still runs in parallel as you upload, and the merge itself is near-instant because it works from already-extracted data.

Can I verify the fidelity on my own documents?

Yes, and it's the test we'd recommend over any benchmark. Convert your hardest real statement — the scanned, multi-column or mid-year-reformatted one — and check two things: did every transaction appear, and did the balance reconcile? That fifteen-second check on your own worst document is more meaningful than any headline figure.

Where can I read more about how accuracy works?

The bank statement accuracy page covers the three kinds of accuracy (field, row, structural), the balance validation, confidence scoring and review workflow in depth, and the bank statement validation page details the reconciliation checks. This article is the engineering story behind those guarantees.

Is this relevant beyond bank statements?

Yes. The same document-level extraction and validation discipline applies to invoices, financial statements and other multi-row financial PDFs — anywhere a table can span pages, change shape or carry subtotals that a naive splitter would mistake for boundaries. Bank statements are simply where row loss bites hardest because every transaction counts.

How We Eliminated Row Loss in PDF Table Extraction (2026)

Q: What is the balance check and why does it matter?

On every statement, FlowParse verifies that the opening balance plus the sum of the extracted transactions equals the closing balance. If a row were missing or misread, that arithmetic would break and the discrepancy is flagged. It's the proof on top of the extraction: not 'the tool is usually accurate' but 'this statement provably reconciles', which is the only standard that's safe for financial data.

The worst bug a financial converter can have

There's a hierarchy of badness in data extraction. Misreading a value is bad, but visible — a date that's obviously wrong, an amount that looks off — and a human or a validation rule can catch it. Far worse is the error that leaves no trace: a transaction that the converter simply never emits. The output looks clean, the columns line up, the totals are plausible, and nothing tells you that one row in ten quietly went missing.

For a bank statement, that's catastrophic, because every transaction is load-bearing. A dropped row throws off the balance, the category totals, the tax figures and the reconciliation — and you don't find out at extraction time. You find out weeks later, when a closing balance won't match or an accountant asks why the numbers don't tie out, and by then you're hunting for one missing line among thousands. We decided that silent row loss was the one failure mode our converter was not allowed to have, and this is how we got there.

Why row loss stays invisible

The reason row loss is so pernicious is that none of the usual quality signals catch it. A confidence score reports how sure the model is about the values it didread — it says nothing about the rows it never attempted. Field-accuracy benchmarks measure correctness on captured cells, so a tool can score 99% and still be dropping whole transactions. Even a human reviewer skimming the output sees a coherent, well-formed table; there's no gap, no error marker, nothing that says “a row should be here.”

That invisibility is exactly why a high headline accuracy figure is the wrong thing to trust. The number you actually need isn't “how often is a value right” but “did every row survive” — a completely different question with a completely different answer. Recognising that completeness and field-accuracy are separate problems, with separate failure modes, was the first step; the second was building a way to prove completeness rather than assume it.

flowparse.io

The old approach, and where it broke

Our original digital-PDF extractor did something that sounds reasonable: it split each page's text into table regions by looking for structural breaks — a row with very few items (a subtotal, a section marker, a blank) was treated as a boundary between tables. Each resulting region was then expected to have its own header row, which the extractor used to name the columns. Downstream, the pipeline picked the single “best” table per page by a confidence heuristic and used that.

On a clean, simple statement — one tidy table, one header, no page breaks mid-table — this works fine. The trouble is that real financial PDFs are almost never that tidy. A subtotal line in the middle of a statement isn't a table boundary; it's part of the table. A section that continues onto the next page doesn't repeat its header. A document can carry several related blocks that together form one logical ledger. Every one of those normal features looked, to the old logic, like a reason to split, demand a fresh header, and — when there wasn't one — throw the region away.

How the rows actually vanished

Three compounding behaviours did the damage. First, splitting on any low-item row chopped a single continuous statement into many fragments at every subtotal and marker. Second, the header-per-region requirement meant any fragment without its own header row — every continuation block, every headerless section — was treated as unparseable and dropped. Third, taking only the single best table per page meant that even when multiple valid blocks were extracted, the others were discarded downstream.

There was a fourth, subtler failure: the header detector would sometimes mistake a datarow for a header. A row whose first cell happened to look label-like could be promoted to a column header, which both lost that transaction and mislabelled the column for everything beneath it. So rows didn't just fall off the ends of fragments — some were consumed as structure. Put together, on a statement with subtotals, page breaks and a couple of sections, the extractor could confidently emit a clean-looking table that contained a fraction of the real transactions.

flowparse.io

The stress test that exposed it

We built a deliberately hostile test set: ten documents chosen to break naive extraction. An accounts-payable ledger, a bank reconciliation report, a multi-currency business statement, a corporate expense report, a premium credit-card statement, a cross-border payments register, an international invoice register, a marketplace settlement report, a neobank statement and a travel-expense claim — each with the subtotals, page breaks, mixed sections and unusual layouts that real finance teams actually deal with.

The results were grim and, crucially, measured. On the hardest files — the accounts-payable, cross-border, travel and multi-currency documents — the old extractor lost between 50% and 88% of the rows. The merged output across all ten files came to a fraction of the true transaction count, and to add insult, it sometimes invented a bogus grand-total row by summing across currencies that should never have been added together. A converter that drops most of a statement and fabricates a total is worse than no converter, because it looks like it worked.

flowparse.io

The fix: a document-level model

The rewrite inverted the logic. Instead of splitting a page into fragments and hoping each had a header, the new extractor builds a model of the document. It infers column layouts from the x-geometry of the text — where the columns physically sit on the page — and treats a layout as stable across the whole statement rather than re-deriving it per fragment. Every data row is then streamed into the layout it belongs to.

Two rules do most of the work. A strict header scoremeans a row containing any amount, date or identifier value can never be classified as a header — which directly fixes the “data row eaten as a header” bug. And the extractor skips only what it can prove is non-data: repeated headers and footers. Everything else — continuation rows after a page break, sections without their own header, subtotal-adjacent lines — flows into the table instead of triggering a split. The output is one lossless table per file.

Where a document genuinely contains several related blocks, two passes reconcile them: aligned tables with matching column shapes are merged, and related tables are unioned by canonical key — the same column-matching logic that powers consolidation, so “Date”, “Datum” and “Transaction Date” collapse to one column and nothing is stranded in an orphan block. The result is structural: rows can't fall through the cracks because there are no cracks to fall through.

flowparse.io

Surviving the merge, not just the extraction

Fixing extraction is only half the battle when people consolidate dozens of statements at once. The consolidation engine itself was already lossless — it stacks rows verbatim and matches columns deterministically — but it could only be as complete as what extraction handed it. With the document-level model feeding it full tables, the whole pipeline became trustworthy end to end.

We also fixed a downstream insult to accuracy: a spurious grand-total row that appeared when a sheet mixed currencies. Totals are now only computed for genuinely numeric, same-currency columns, and suppressed where a sheet mixes currencies — so the merged workbook reflects the data, not an arithmetic artefact. The principle throughout: the combining step must never add, remove or invent a number, only carry through exactly what was extracted and approved.

flowparse.io

The result: 100% fidelity on the hard set

With the document-level model in place, the same ten-document stress set went from a lossy subset to every single row present— ten files, 100% row fidelity, no missing transactions. The merged transaction count rose from 843 to 1,155 rows, and the fabricated cross-currency total was gone. Every file's extracted row count matched its ground truth, verified automatically rather than by eye.

The number that matters there isn't 1,155; it's the per-file checkmarks. “Average accuracy improved” would let a tool drop everything in one file and over-count another and still look fine. The harness asserts that eachdocument survives intact, because that's what a user actually experiences — they don't convert an average, they convert their statement, and their statement has to be complete.

flowparse.io

Old fragment approach vs document-level model

It's worth laying the two designs side by side, because the contrast explains why the new one is robust rather than just tuned. The old extractor made local decisions — split here, demand a header there — that each seemed reasonable but compounded into lost data. The new one makes a single global decision about the document's structure and then never has to throw anything away.

Aspect	Old: fragment & drop	New: document-level
Unit of work	A region of one page	The whole document
Column layout	Re-derived per fragment from a header	Inferred once from x-geometry, held stable
Subtotals & markers	Treated as table boundaries	Treated as part of the table
Continuation across pages	Orphaned without a header	Streamed into the same layout
Data row that looks label-like	Could be eaten as a header	Strict header score forbids it
Multiple blocks per page	Only the 'best' one kept	Unioned by canonical key
Failure mode	Silent row loss	Completeness, balance-checked

The throughline is that every row in the right-hand column removes a way for data to disappear. You don't fix silent row loss by being more careful inside a fragile design; you fix it by choosing a design where the loss can't happen, then adding a check to catch the rare exception. That's the move from “usually accurate” to “provably complete.”

Proving it, not just claiming it

Hitting 100% on a test set is reassuring, but a user can't see our test set — they need to know theirstatement is complete. That's what the balance check provides on every statement: opening balance plus the sum of the transactions must equal the closing balance. If extraction ever did drop or misread a row, the arithmetic would break and the discrepancy is flagged for inspection.

This turns completeness from a promise into a property you can verify per document. It's the difference between “our extractor is accurate” and “this statement provably reconciles” — and for financial data only the latter is good enough. The document-level model makes row loss extremely unlikely; the balance check makes the rare remaining case visiblerather than silent, which is the whole point. The deeper rationale is in the bank statement accuracy write-up.

flowparse.io

Keeping it fixed: the regression harness

Fixing a bug once is easy; keeping it fixed across months of changes is the hard part, and it's where most quietly-broken extractors lose ground. A refactor to handle a new bank's layout, a tweak to OCR pre-processing, a dependency bump — any of these can reintroduce row loss, and because the failure is silent, a casual test wouldn't catch it. So the fix isn't just the document-level model; it's a harness that makes a regression impossible to merge unnoticed.

The harness runs the real production code path — the same extract, consolidate and export functions the app calls, not a mock — against the ten-document stress set, every one with known ground truth. For each file it asserts three things: the exact row count matches, the balance reconciles, and no fabricated rows (like the old cross-currency grand total) have crept in. It runs over the merged output too, so the end-to-end count has to land on the expected figure, not just each file in isolation.

Asserting on per-file counts rather than an averageis deliberate, and it's the detail that makes the harness meaningful. An aggregate check would let a change drop every row in one document and over-count another and still report “100% on average.” By pinning each document to its own ground truth, a regression in any single file fails the build — which is exactly the granularity a user experiences, because they convert their statement, not an average of ten.

The same philosophy extends to the merge path, with a separate end-to-end harness that takes structured documents through consolidation and export and checks the resulting workbook. Together they mean the “843 to 1,155 rows” result isn't a one-off measurement from the week we shipped the fix — it's a property the test suite re-proves on every change, which is the only way a completeness guarantee stays true over time rather than decaying quietly after launch.

There's a broader lesson in choosing what to assert. It would have been tempting to measure something easy — “does the extractor produce a table?” — and call it tested. But that question can't fail in the way that actually hurts users, so it gives false confidence. The assertions worth writing are the ones aimed squarely at your worst failure mode: for a financial converter that means row counts and balance reconciliation, not table-shaped output. A test suite is only as honest as the thing it refuses to let regress, and the thing we refuse to let regress is a single missing transaction.

The human safety net: Merge Review

Automated checks should reduce human effort, not replace human judgement. So consolidating now opens Merge Review— an editable grid of the combined data with a quality score and every questionable cell highlighted, plus an issues panel that jumps you straight to each flagged date or amount. You fix anything in place and only export once you're satisfied, with every row keeping its source-file reference.

The design intent is a clean division of labour: the document-level model guarantees the rows are there, the balance check proves it, the confidence scoring points to anything uncertain, and the human spends thirty seconds on the genuine exceptions instead of re-reading a thousand rows. That's how accuracy scales — not by asking people to trust a black box, and not by asking them to check everything, but by surfacing exactly the few things worth a look.

Lessons we'd generalise

A few principles came out of this that apply well beyond our codebase. First, treat completeness as a first-class metric, separate from field accuracy — if you only measure value correctness, you're blind to your worst failure mode. Second, model the document, not the fragment: splitting eagerly and requiring local structure is what orphans data; inferring global structure and streaming into it is far more robust to the messiness of real PDFs.

Third, build the check that makes the failure visible. We can't promise extraction will never err, but a balance reconciliation converts a silent error into a flagged one, which is the difference that matters in practice. Fourth, test on hostile inputs and assert per-item, not on averages — the document that breaks your extractor is the one your user will upload, and an aggregate score will hide it. None of these are exotic; they're just the discipline financial data demands.

The one-line takeaway: don't ask whether your extractor is accurate on average — ask whether it can prove, document by document, that no row was left behind.

Three kinds of accuracy, and which one row loss breaks

Part of why this bug hid for as long as it could is that “accuracy” is really three different things, and the industry tends to quote only the first. Separating them is what let us see — and then close — the gap that mattered.

Kind of accuracy	The question it answers	How FlowParse protects it
Field accuracy	Is each captured value correct?	AI extraction + confidence scoring + editable review
Row completeness	Did every transaction survive?	Document-level model — the focus of this work
Structural integrity	Do the numbers reconcile?	Balance check: opening + transactions = closing

A headline “99% accurate” almost always refers to the first row of that table and quietly ignores the second and third — which is exactly where money goes missing. Treating all three as separate, measurable properties, each with its own safeguard, is the substance behind the accuracy claims; the document-level model is simply the piece that closed the middle row.

Why we publish the failure, not just the fix

Writing up a bug this serious is an unusual thing for a product to do — the comfortable path is to ship the fix quietly and only ever talk about the happy result. We're publishing the failure because, in this category, the failure is the most useful thing we can tell you. Almost every PDF-to-Excel and bank statement converter shares some version of the fragment-and-drop design, and almost none of them surface row loss to the user. If reading this makes you check your current tool on a hard statement, that's a good outcome whatever tool you land on.

It also reflects how we think trust should be earned with financial data: not with a marketing number, but with a falsifiable claim and a way for you to test it. We told you the failure mode, the measurement, the before-and-after (843 to 1,155 rows on the stress set), and the exact check — balance reconciliation — that would catch a regression. That's a claim you can hold us to, document by document, which is the only kind worth making about money.

If there's one thing to take from all of this, it's a question to ask any extraction tool — ours included — before you trust it with your books: not “how accurate are you?” but “how would I know if you dropped a row?” A tool that can't answer the second question is asking for blind faith. The honest answer is a per-document completeness check you can see, and that is precisely what we built this work to be able to give.

What it means if you convert statements

For anyone using FlowParse, the practical upshot is simple: a statement you convert today carries every transaction, the balance check confirms it reconciles, and the consolidation of a year across accounts is complete rather than approximately complete. The places this matters most — a practice closing client books, a lender reading applicant statements, a finance team consolidating across subsidiaries — are exactly the places where a single dropped row is most expensive.

And you don't have to take our word for it. Convert your own hardest statement — the scanned one, the multi-column one, the one that changed format mid-year — straight to Exceland check two things: did every transaction appear, and did the balance reconcile? That fifteen-second test on your worst document is the honest measure of a converter, and it's the one we built this work to pass.

flowparse.io

Test it on your hardest statement

Convert a real statement free — no signup — and check it yourself: every row present, and the balance reconciled.

How we eliminated row loss in PDF table extraction

The worst bug a financial converter can have

Why row loss stays invisible

The old approach, and where it broke

How the rows actually vanished

The stress test that exposed it

The fix: a document-level model

Surviving the merge, not just the extraction

The result: 100% fidelity on the hard set

Old fragment approach vs document-level model

Proving it, not just claiming it

Keeping it fixed: the regression harness

The human safety net: Merge Review

Lessons we'd generalise

Three kinds of accuracy, and which one row loss breaks

Why we publish the failure, not just the fix

What it means if you convert statements

Test it on your hardest statement

Frequently asked questions

Related reading