AWS document processing service that extracts text, tables, forms, and structured data from scanned documents and images using machine learning. Pay-per-page pricing starting at $0.0015/page for OCR.
Amazon's document reading service that extracts text, tables, and form data from scanned documents automatically.
Amazon Textract is AWS's document intelligence service that goes beyond basic OCR to extract structured data from documents using machine learning. It reads printed text, handwriting, tables, and form key-value pairs without requiring templates or custom configuration.
The service includes specialized models for invoices (AnalyzeExpense), identity documents (AnalyzeID), and mortgage documents (AnalyzeLending) that understand domain-specific formats and fields. Each model extracts the specific data types relevant to that document category.
Textract processes documents stored in S3 and delivers structured JSON output with bounding box coordinates and confidence scores for every extracted element. The asynchronous API handles documents up to 3,000 pages as background jobs with SNS notifications on completion. The synchronous API processes single pages in real-time for interactive applications.
Handwriting recognition sets Textract apart from many competitors. It accurately extracts handwritten notes, filled forms, and annotations that appear alongside printed text. Healthcare organizations, financial services firms, and government agencies use this for digitizing paper records where handwriting is common.
The JSON output format includes bounding boxes for every detected element, which is useful for document visualization but requires significant post-processing to feed into LLM or RAG applications that expect plain text. Teams building document AI pipelines often need a transformation layer between Textract output and their downstream systems.
Was this helpful?
Multiple extraction APIs tuned for different document types: basic OCR (DetectDocumentText), structured analysis (AnalyzeDocument for tables and forms), and domain-specific models (AnalyzeExpense, AnalyzeID, AnalyzeLending). Each mode is priced separately so you pay only for the extraction depth you need.
Use Case:
An accounts payable team uses AnalyzeExpense at $0.01/page for invoices requiring vendor and line-item extraction, while using basic OCR at $0.0015/page for general correspondence that only needs text content.
Identifies table boundaries, rows, columns, merged cells, and cell relationships. Preserves the spatial structure of tables as structured data rather than flattening them into unstructured text.
Use Case:
A financial analyst extracts quarterly earnings tables from PDF reports. Textract preserves row-column relationships, merged header cells, and numeric formatting so the data imports directly into spreadsheets without manual cleanup.
Extracts handwritten text alongside printed content with high accuracy. Works on forms, notes, annotations, and signatures common in healthcare, legal, and government documents.
Use Case:
A healthcare system digitizes patient intake forms where doctors write notes in the margins. Textract extracts both the printed form fields and handwritten annotations into structured data.
Processes multi-page documents up to 3,000 pages as background jobs. Documents are uploaded to S3, processing runs asynchronously, and completion notifications arrive via SNS. Handles variable workloads without provisioning infrastructure.
Use Case:
A law firm uploads 500-page contracts to S3. Textract processes them in the background and triggers a Lambda function via SNS when extraction completes, adding results to a searchable DynamoDB index.
Free
$0.0015/page
$0.015/page
$0.05/page
$0.01/page
$0.025/page
$0.07/page
Ready to get started with Amazon Textract?
View Pricing Options →We believe in transparent reviews. Here's what Amazon Textract doesn't handle well:
Textract offers better AWS integration and competitive pricing for basic OCR ($0.0015/page vs Azure's $0.001/page for read). Azure wins on custom model training (Textract has none) and complex table extraction accuracy. Choose based on your cloud provider. If you're on AWS, Textract integrates natively. If you need custom models for unusual document formats, Azure is the better choice.
New AWS customers get 3 months of free usage: 1,000 pages/month for basic OCR (DetectDocumentText), and 100 pages/month each for AnalyzeDocument, AnalyzeExpense, and AnalyzeID APIs. After the free tier expires, you pay per-page at standard rates.
Yes. Textract recognizes handwritten text alongside printed content. It works on filled forms, margin notes, and annotations. Accuracy varies by handwriting legibility, but it handles typical business documents well. This is a significant advantage over many competitors that only handle printed text.
Costs drop significantly at scale. Basic OCR falls from $0.0015 to $0.0006/page above 1M pages/month. Table extraction drops from $0.015 to $0.01/page. For a company processing 500,000 invoice pages monthly using AnalyzeExpense ($0.01/page), the monthly cost would be approximately $5,000.
Weekly insights on the latest AI tools, features, and trends delivered to your inbox.
Document Processing
Extract structured data from documents using AI models trained on your specific formats. Automates form processing, invoice extraction, and contract analysis with 95%+ accuracy through custom model training and 16+ prebuilt models.
Document AI
Cloud document processing platform that automates data extraction and classification with industry-leading OCR accuracy. Processes invoices, receipts, forms, and custom document types to optimize document workflows and improve processing efficiency.
No reviews yet. Be the first to share your experience!
Get started with Amazon Textract and see if it's the right fit for your needs.
Get Started →Take our 60-second quiz to get personalized tool recommendations
Find Your Perfect AI Stack →Explore 20 ready-to-deploy AI agent templates for sales, support, dev, research, and operations.
Browse Agent Templates →