DCDC PROJECT HUB

AI Document Scanner with Perspective Correction & OCR

4TH YEAR• AI/ML• HARD

Problem statement

Capturing documents with a phone camera often results in skewed images with poor readability and no searchable text.

Abstract

The AI document scanner captures images of documents from a mobile or webcam, detects document edges, performs perspective transformation to produce a flat view, enhances contrast and runs OCR to extract editable text. A simple UI lets users capture, crop, enhance and export PDF or text.

Components required

Laptop or PC with webcam / smartphone camera
Python with OpenCV and Tesseract OCR
GUI framework (Tkinter / PyQt / web UI)
Dataset of document images
Storage for saving PDFs and text

Block diagram

Image Capture

➜

Edge Detection & Contour Finding

➜

Perspective Correction

➜

Image Enhancement

➜

OCR Engine

➜

PDF/Text Export

Working

The system captures an input frame and uses edge detection and contour analysis to find the largest quadrilateral region. It then applies a perspective transform to warp the region into a rectangular top-down view, enhances brightness and contrast, and runs Tesseract OCR to extract text. Results and cleaned images are saved or exported as a multi-page PDF.

Applications

Digitizing notes and question papers
Building simple scanning apps
Understanding document image processing
Assistive tech for visually impaired users