DCDC PROJECT HUB
AI Document Scanner with Perspective Correction & OCR
Problem statement
Capturing documents with a phone camera often results in skewed images with poor readability and no searchable text.
Abstract
The AI document scanner captures images of documents from a mobile or webcam, detects document edges, performs perspective transformation to produce a flat view, enhances contrast and runs OCR to extract editable text. A simple UI lets users capture, crop, enhance and export PDF or text.
Components required
- Laptop or PC with webcam / smartphone camera
- Python with OpenCV and Tesseract OCR
- GUI framework (Tkinter / PyQt / web UI)
- Dataset of document images
- Storage for saving PDFs and text
Block diagram
Working
The system captures an input frame and uses edge detection and contour analysis to find the largest quadrilateral region. It then applies a perspective transform to warp the region into a rectangular top-down view, enhances brightness and contrast, and runs Tesseract OCR to extract text. Results and cleaned images are saved or exported as a multi-page PDF.
Applications
- Digitizing notes and question papers
- Building simple scanning apps
- Understanding document image processing
- Assistive tech for visually impaired users