Monetary establishments course of hundreds of paperwork each day, together with tax varieties, mortgage statements, and buy orders. Every has a novel format, construction, and area names, making it difficult to create automation workflows utilizing optical character recognition (OCR) software program. Amazon Bedrock Information Automation (BDA) helps resolve these challenges by automating the extraction, validation, and evaluation of knowledge from monetary paperwork. BDA goes past easy OCR through the use of basis fashions that may:
- Perceive doc context
- Acknowledge relationships between totally different sections
- Extract structured, actionable information
- Validate info throughout a number of sources
Whereas basis fashions like Anthropic Claude can extract content material from PDFs, Amazon Bedrock Information Automation provides customized extractions with industry-leading accuracy at a decrease value, together with options corresponding to visible grounding with confidence scores for explainability and built-in hallucination mitigation.
On this submit, we discover how Amazon Bedrock Information Automation can precisely extract info from 4 widespread forms of monetary paperwork: financial institution statements, W-2 varieties, 1099-B tax varieties, and vendor contracts. We spotlight the complexity within the paperwork, element the customized extraction created in Amazon Bedrock Information Automation, and describe the outcomes of the extraction course of.
Answer overview
Amazon Bedrock Information Automation permits you to configure output based mostly in your processing wants utilizing blueprints. A blueprint in Amazon Bedrock Information Automation is a configuration template that defines how information needs to be extracted from paperwork. It specifies:
- The doc kind being processed
- The information fields to be extracted
- The validation guidelines for the extracted information
- The construction and format of the output
Consider it as a map that tells Amazon Bedrock Information Automation precisely what info to search for and tips on how to course of it. When utilizing a blueprint for extraction, you should use a catalog blueprint or a customized created blueprint. A customized blueprint permits organizations to create extraction patterns for his or her particular wants. On this submit, we created customized blueprints and used the BDA console to generate and validate the output.
Methods to develop blueprints for 4 forms of monetary paperwork
The next sections stroll you thru creating customized blueprints for financial institution statements, W-2 varieties, 1099-B varieties, and vendor contracts.
Stipulations
If you’re not accustomed to how customized blueprints are created, observe the directions from the Amazon Bedrock documentation. For our analysis, we uploaded the paperwork on the BDA console, refined the AI-generated prompts, and downloaded the outcomes. Sometimes, a single customized blueprint suffices for a selected doc kind when extracting constant fields. Nonetheless, if workflow necessities fluctuate or doc codecs change considerably, a number of customized blueprints may should be created to accommodate these variations. After a blueprint is created, you should use it as part of the workflow for constant downstream processing. For a similar blueprint, if the enter doc has totally different information, then BDA may return barely totally different output (for instance, some financial institution statements may need complete debits and credit). Nonetheless, as a result of BDA output is structured JSON, it’s easy to create acceptable guidelines based mostly on downstream processing workflows (for instance, discard complete if the workflow is to categorize particular person debit and credit score transactions for accounting).
The next screenshot illustrates the blueprint immediate for one of many doc varieties.

The subsequent part describes the 4 paperwork tried as part of this undertaking and extraction achieved utilizing customized blueprints based mostly on wants. Output is on the market in JSON, CSV, and uncooked information codecs, highlighting the answer’s adaptability to various integration and reporting wants.
Monetary doc varieties and customized blueprints
Amazon Bedrock Information Automation supplies built-in blueprints for widespread doc varieties together with financial institution statements and W-2 varieties. These built-in blueprints supply complete extraction out of the field. On this submit, we use customized blueprints to display how organizations can tailor extraction to their particular workflow necessities. For instance, you possibly can extract solely transaction information from financial institution statements for automated accounting, or group W-2 fields into logical constructions (federal tax, state tax, code-amount pairs) that align with downstream tax processing methods. Customized blueprints additionally function the strategy for doc varieties that don’t have built-in blueprints, corresponding to 1099-B varieties and vendor contracts proven later on this submit.
1. Financial institution Statements – Paperwork from banks detailing an account’s monetary exercise, together with deposits, withdrawals, and charges, over a selected interval, usually a month.
Financial institution statements current a fancy problem: they comprise quite a few month-to-month transactions, typically spanning a number of pages, with various codecs and particulars. In lots of workflows, the vital job is to exactly seize transaction information, together with dates, quantities, descriptions, and reference numbers, which might then feed straight into automated accounting workflows like categorizing transactions in an accounting ledger. This automated extraction minimizes handbook information entry errors and streamlines the reconciliation course of. As a part of our analysis course of, we chosen the next financial institution assertion for a trial of the extraction course of:

Account Assertion generated utilizing Amazon Nova Professional Foundational Mannequin
Tailor-made blueprint directions for Amazon Bedrock Information Automation:
Extraction outcomes from desk.csv:

Upon evaluation, we will affirm that the system efficiently extracted the transactions precisely.
2. Type W-2 – Studies earnings and tax withheld for a person or a enterprise.
W-2 tax varieties current distinctive extraction challenges due to their standardized but complicated construction. As a part of our analysis course of, we used the next W-2 for a trial of the extraction course of:

W2 generated utilizing Amazon Nova Professional Foundational Mannequin
Tailor-made blueprint directions for Amazon Bedrock Information Automation:
Extraction outcomes from outcome.json:


Upon evaluation, we will affirm that the system efficiently extracted the transactions precisely. A number of extraction complexities had been particularly verified within the undertaking:
- There isn’t a particular grouping on the shape for Federal Tax and State Tax info however they should be processed collectively so extraction outcomes ought to deliver them collectively.
- In a single Field 12 of W2 there might be as much as 26 codes to report sure compensation and profit quantities. It is very important extract code and worth as a pair.
- Employers can put absolutely anything in field 14. It helps catch objects that don’t have their very own devoted field on the W-2, so these needs to be grouped individually.
3. IRS Type 1099-B: Proceeds from Dealer and Barter Trade Transactions – This tax doc tracks:
- Securities buying and selling exercise
- Dealer-facilitated transactions
- Barter change participation
As a part of our analysis course of, we used the next 1099-B for a trial of the extraction course of:

1099-B assertion generated utilizing Amazon Nova Professional Foundational Mannequin
Tailor-made blueprint directions for Amazon Bedrock Information Automation:
Extraction outcomes from desk.csv:

A big validation of BDA’s contextual understanding capabilities is that the system precisely recognized and extracted ‘TSLA’ because the safety descriptor throughout the inventory transactions, even when it appeared as a typical descriptor for the transactions. This constant extraction demonstrates BDA’s capability to keep up contextual accuracy all through the doc processing.
4. Vendor contract – This extraction course of is relevant to a variety of vendor contracts. The precise particulars to be captured should be tailor-made to every firm’s distinctive operational workflows and necessities.
As a part of our analysis course of, we chosen the next vendor contract for a trial of the extraction course of:




Tailor-made blueprint directions for Amazon Bedrock Information Automation:
Extraction outcomes from outcome.json:

The system efficiently recognized and extracted the blueprint-specified components current throughout the contract.
Conclusion
On this submit, we demonstrated how you should use Amazon Bedrock Information Automation to precisely extract key info from monetary paperwork together with financial institution statements, W-2 varieties, 1099-B varieties, and vendor contracts to automate downstream processing. You realized tips on how to:
- Create customized blueprints for various doc varieties
- Extract structured information from complicated monetary paperwork
- Validate Amazon Bedrock Information Automation outputs for downstream processing
To study extra about implementing doc processing with Amazon Bedrock, evaluation the Amazon Bedrock Information Automation documentation. For manufacturing workflows involving delicate info, observe your group’s cybersecurity and authorized tips to confirm compliance with all relevant laws, together with however not restricted to GDPR in Europe or every other regional or industry-specific necessities.
Concerning the authors
