Receipt Scanning with Computer Vision: 98% Accuracy OCR
How we built receipt scanning for expense tracking apps using TensorFlow, OpenCV, and custom OCR models. From crumpled receipts to structured data in seconds.
Snap a photo of a receipt. Get structured data (merchant, date, total, items) in 2 seconds.
Sounds simple. It's not.
Receipts are:
- Crumpled and faded
- Different fonts and layouts
- Low-contrast thermal prints
- Photographed in bad lighting
Our AI Mobile Copilot handles all of this with 98% accuracy.
The Computer Vision Pipeline
Step 1: Image Preprocessing
Challenges:
- Rotated images (user didn't hold phone level)
- Shadows and glare
- Background clutter
Solutions:
import cv2
import numpy as np
# Perspective correction (straighten rotated image)
def deskew_image(image):
coords = np.column_stack(np.where(image > 0))
angle = cv2.minAreaRect(coords)[-1]
if angle < -45:
angle = -(90 + angle)
else:
angle = -angle
(h, w) = image.shape[:2]
center = (w // 2, h // 2)
M = cv2.getRotationMatrix2D(center, angle, 1.0)
rotated = cv2.warpAffine(image, M, (w, h))
return rotated
# Contrast enhancement
def enhance_contrast(image):
lab = cv2.cvtColor(image, cv2.COLOR_BGR2LAB)
l, a, b = cv2.split(lab)
clahe = cv2.createCLAHE(clipLimit=3.0, tileGridSize=(8,8))
cl = clahe.apply(l)
enhanced = cv2.merge((cl, a, b))
return cv2.cvtColor(enhanced, cv2.COLOR_LAB2BGR)
Step 2: Text Detection
Model: CRAFT (Character Region Awareness for Text detection)
Locates text regions without reading them yet:
from craft_text_detector import Craft
craft = Craft(output_dir='output/', cuda=True)
boxes = craft.detect_text('receipt.jpg')
Step 3: OCR (Optical Character Recognition)
Model: Tesseract OCR + custom fine-tuning
import pytesseract
text = pytesseract.image_to_string(
image,
config='--psm 6 --oem 3' # Assume uniform text block
)
Problem: Tesseract struggles with:
- Thermal receipt fonts
- Handwritten notes
- Smudged ink
Solution: Fine-tune on 10,000 receipt images:
# Custom training with receiptnet dataset
!tesseract receipt.jpg output -l eng \
--user-patterns receipts.patterns \
--user-words receipts.wordlist
Step 4: Structured Data Extraction
Raw OCR output:
WHOLE FOODS MARKET
123 MAIN ST, AUSTIN TX
01/15/2025 3:45 PM
ORGANIC BANANAS $3.49
ALMOND MILK $4.99
TOTAL $8.48
VISA ****1234 $8.48
Structured output:
{
"merchant": "Whole Foods Market",
"location": "123 Main St, Austin TX",
"date": "2025-01-15T15:45:00Z",
"total": 8.48,
"currency": "USD",
"items": [
{"name": "Organic Bananas", "price": 3.49},
{"name": "Almond Milk", "price": 4.99}
],
"payment_method": "Visa •••• 1234"
}
Parsing logic:
import re
def extract_total(text):
# Match "TOTAL" followed by dollar amount
match = re.search(r'TOTAL\s*[\$]?([\d,]+\.\d{2})', text)
if match:
return float(match.group(1).replace(',', ''))
return None
def extract_date(text):
# Match MM/DD/YYYY or similar
match = re.search(r'(\d{1,2}[/-]\d{1,2}[/-]\d{2,4})', text)
if match:
return parse_date(match.group(1))
return None
Accuracy Improvements
V1 (Tesseract only): 72% accuracy
Too many false positives on dates and totals.
V2 (CRAFT + Tesseract): 89% accuracy
Better text localization, but still errors on faded receipts.
V3 (Custom fine-tuned model): 98% accuracy
Trained on 50,000 real receipts with labeled data.
The Training Process
Dataset:
- 50,000 receipt images (scraped from public datasets)
- Hand-labeled by contractors on Amazon Mechanical Turk
- Cost: $5,000 for labeling
Model: Fine-tuned BERT for entity extraction
from transformers import BertForTokenClassification
model = BertForTokenClassification.from_pretrained(
'bert-base-uncased',
num_labels=7 # merchant, date, total, tax, items, etc.
)
Training time: 12 hours on NVIDIA V100 GPU
Edge Deployment
Running OCR in the cloud is slow (500ms+ latency).
Running on-device is instant.
TensorFlow Lite model:
# Convert to mobile-friendly format
converter = tf.lite.TFLiteConverter.from_saved_model('model/')
tflite_model = converter.convert()
open('receipt_ocr.tflite', 'wb').write(tflite_model)
React Native integration:
import { TensorflowLite } from 'react-native-tensorflow-lite';
const result = await TensorflowLite.runModelOnImage({
model: 'receipt_ocr.tflite',
imagePath: receiptPhoto,
});
Performance:
- iOS (iPhone 12): 180ms
- Android (Pixel 5): 220ms
Real-World Results
Expense tracking app client:
- 10,000 receipts processed/month
- 98% accuracy (manual correction needed on 2%)
- Labor savings: 50 hours/month (vs manual data entry)
Common Failures & Fixes
1. Faded thermal receipts
Problem: Low contrast makes text unreadable
Fix: Adaptive histogram equalization (CLAHE)
2. Handwritten amounts
Problem: OCR trained on printed text
Fix: Separate handwriting recognition model
3. Multi-language receipts
Problem: Tesseract defaults to English
Fix: Auto-detect language, use appropriate model
Cost at Scale
Cloud OCR (Google Vision API):
- $1.50 per 1,000 images
- 10,000 receipts = $15/month
Custom on-device model:
- Training cost: $500 (one-time)
- Inference cost: $0 (runs on user's phone)
Break-even: 35,000 receipts
Future Improvements
- Video OCR: Scan multiple receipts in one video
- Item-level categorization: Auto-tag "food", "travel", "office supplies"
- Duplicate detection: Prevent re-uploading same receipt
Want receipt scanning in your app?
Explore AI Mobile Copilot
Ready to Build Something?
Let's discuss your next project. Mobile apps, AI integration, or custom development.
Written by AJ Patatanian
Senior full-stack engineer with expertise in React Native, AI/ML, and cloud architecture. Building production apps at SERA Industries.
More articles →Continue Reading
Serverless Architecture: AWS Lambda at Scale
How we handle 10M requests/day with AWS Lambda while keeping costs under $500/month. Optimization strategies for serverless computing.
Building the Future: Cross-Platform Development and AI Integration
From mobile gaming to enterprise security systems, explore how modern development patterns, AI integration, and cloud-native architecture are transforming software delivery. Real insights from building production apps.