Receipt Scanning with Computer Vision: 98% Accuracy OCR

Snap a photo of a receipt. Get structured data (merchant, date, total, items) in 2 seconds.

Sounds simple. It's not.

Receipts are:

Crumpled and faded
Different fonts and layouts
Low-contrast thermal prints
Photographed in bad lighting

Our AI Mobile Copilot handles all of this with 98% accuracy.

The Computer Vision Pipeline

Step 1: Image Preprocessing

Challenges:

Rotated images (user didn't hold phone level)
Shadows and glare
Background clutter

Solutions:

import cv2
import numpy as np

# Perspective correction (straighten rotated image)
def deskew_image(image):
    coords = np.column_stack(np.where(image > 0))
    angle = cv2.minAreaRect(coords)[-1]
    if angle < -45:
        angle = -(90 + angle)
    else:
        angle = -angle
    (h, w) = image.shape[:2]
    center = (w // 2, h // 2)
    M = cv2.getRotationMatrix2D(center, angle, 1.0)
    rotated = cv2.warpAffine(image, M, (w, h))
    return rotated

# Contrast enhancement
def enhance_contrast(image):
    lab = cv2.cvtColor(image, cv2.COLOR_BGR2LAB)
    l, a, b = cv2.split(lab)
    clahe = cv2.createCLAHE(clipLimit=3.0, tileGridSize=(8,8))
    cl = clahe.apply(l)
    enhanced = cv2.merge((cl, a, b))
    return cv2.cvtColor(enhanced, cv2.COLOR_LAB2BGR)

Step 2: Text Detection

Model: CRAFT (Character Region Awareness for Text detection)

Locates text regions without reading them yet:

from craft_text_detector import Craft

craft = Craft(output_dir='output/', cuda=True)
boxes = craft.detect_text('receipt.jpg')

Step 3: OCR (Optical Character Recognition)

Model: Tesseract OCR + custom fine-tuning

import pytesseract

text = pytesseract.image_to_string(
    image,
    config='--psm 6 --oem 3'  # Assume uniform text block
)

Problem: Tesseract struggles with:

Thermal receipt fonts
Handwritten notes
Smudged ink

Solution: Fine-tune on 10,000 receipt images:

# Custom training with receiptnet dataset
!tesseract receipt.jpg output -l eng \
  --user-patterns receipts.patterns \
  --user-words receipts.wordlist

Step 4: Structured Data Extraction

Raw OCR output:

WHOLE FOODS MARKET
123 MAIN ST, AUSTIN TX
01/15/2025  3:45 PM

ORGANIC BANANAS    $3.49
ALMOND MILK        $4.99
TOTAL              $8.48
VISA ****1234      $8.48

Structured output:

{
  "merchant": "Whole Foods Market",
  "location": "123 Main St, Austin TX",
  "date": "2025-01-15T15:45:00Z",
  "total": 8.48,
  "currency": "USD",
  "items": [
    {"name": "Organic Bananas", "price": 3.49},
    {"name": "Almond Milk", "price": 4.99}
  ],
  "payment_method": "Visa •••• 1234"
}

Parsing logic:

import re

def extract_total(text):
    # Match "TOTAL" followed by dollar amount
    match = re.search(r'TOTAL\s*[\$]?([\d,]+\.\d{2})', text)
    if match:
        return float(match.group(1).replace(',', ''))
    return None

def extract_date(text):
    # Match MM/DD/YYYY or similar
    match = re.search(r'(\d{1,2}[/-]\d{1,2}[/-]\d{2,4})', text)
    if match:
        return parse_date(match.group(1))
    return None

Accuracy Improvements

V1 (Tesseract only): 72% accuracy

Too many false positives on dates and totals.

V2 (CRAFT + Tesseract): 89% accuracy

Better text localization, but still errors on faded receipts.

V3 (Custom fine-tuned model): 98% accuracy

Trained on 50,000 real receipts with labeled data.

The Training Process

Dataset:

50,000 receipt images (scraped from public datasets)
Hand-labeled by contractors on Amazon Mechanical Turk
Cost: $5,000 for labeling

Model: Fine-tuned BERT for entity extraction

from transformers import BertForTokenClassification

model = BertForTokenClassification.from_pretrained(
    'bert-base-uncased',
    num_labels=7  # merchant, date, total, tax, items, etc.
)

Training time: 12 hours on NVIDIA V100 GPU

Edge Deployment

Running OCR in the cloud is slow (500ms+ latency).
Running on-device is instant.

TensorFlow Lite model:

# Convert to mobile-friendly format
converter = tf.lite.TFLiteConverter.from_saved_model('model/')
tflite_model = converter.convert()
open('receipt_ocr.tflite', 'wb').write(tflite_model)

React Native integration:

import { TensorflowLite } from 'react-native-tensorflow-lite';

const result = await TensorflowLite.runModelOnImage({
  model: 'receipt_ocr.tflite',
  imagePath: receiptPhoto,
});

Performance:

iOS (iPhone 12): 180ms
Android (Pixel 5): 220ms

Real-World Results

Expense tracking app client:

10,000 receipts processed/month
98% accuracy (manual correction needed on 2%)
Labor savings: 50 hours/month (vs manual data entry)

Common Failures & Fixes

1. Faded thermal receipts

Problem: Low contrast makes text unreadable
Fix: Adaptive histogram equalization (CLAHE)

2. Handwritten amounts

Problem: OCR trained on printed text
Fix: Separate handwriting recognition model

3. Multi-language receipts

Problem: Tesseract defaults to English
Fix: Auto-detect language, use appropriate model

Cost at Scale

Cloud OCR (Google Vision API):

$1.50 per 1,000 images
10,000 receipts = $15/month

Custom on-device model:

Training cost: $500 (one-time)
Inference cost: $0 (runs on user's phone)

Break-even: 35,000 receipts

Future Improvements

Video OCR: Scan multiple receipts in one video
Item-level categorization: Auto-tag "food", "travel", "office supplies"
Duplicate detection: Prevent re-uploading same receipt

Want receipt scanning in your app?
Explore AI Mobile Copilot

SERA Industries