← Back to Project Vault

DCDC PROJECT HUB

AI Document Scanner with Perspective Correction & OCR

4TH YEARAI/MLHARD

Problem statement

Capturing documents with a phone camera often results in skewed images with poor readability and no searchable text.

Abstract

The AI document scanner captures images of documents from a mobile or webcam, detects document edges, performs perspective transformation to produce a flat view, enhances contrast and runs OCR to extract editable text. A simple UI lets users capture, crop, enhance and export PDF or text.

Components required

  • Laptop or PC with webcam / smartphone camera
  • Python with OpenCV and Tesseract OCR
  • GUI framework (Tkinter / PyQt / web UI)
  • Dataset of document images
  • Storage for saving PDFs and text

Block diagram

Image Capture
Edge Detection & Contour Finding
Perspective Correction
Image Enhancement
OCR Engine
PDF/Text Export

Working

The system captures an input frame and uses edge detection and contour analysis to find the largest quadrilateral region. It then applies a perspective transform to warp the region into a rectangular top-down view, enhances brightness and contrast, and runs Tesseract OCR to extract text. Results and cleaned images are saved or exported as a multi-page PDF.

Applications

  • Digitizing notes and question papers
  • Building simple scanning apps
  • Understanding document image processing
  • Assistive tech for visually impaired users