Swift – Native OCR reader

Starting from iOS 11, Apple introduces a new framework called Vision.

The Vision framework performs face and face landmark detection, text detection, barcode recognition, image registration, and general feature tracking. Vision also allows the use of custom Core ML models for tasks like classification or object detection.

https://developer.apple.com/documentation/vision

Today we implement with few lines of code one of the simplest features of this beautiful framework, the OCR reader.

OCR means: Optical Character Recognition. If you want to learn more, wikipedia helps you: https://en.wikipedia.org/wiki/Optical_character_recognition.


Let’s start!

Create a new empty XCode project and a simple interface like this beautiful one:

Composed of an UIImageView, an UITextView and UIButton.

Connect outlets and actions and prepare the code!


Import Vision framework

First stuff, simplest is to add the new framework. So on top of your view controller, add:

import UIKit
import Vision
import VisionKit

Show the DocumentCameraViewController

Second thing, just to make a test, show the new VNDocumentCameraViewController that helps you to catch the document from any angle!

Attach the code to a button action:

@IBAction func didScanPressed(_ sender: Any) {
    let scanVC = VNDocumentCameraViewController()
    scanVC.delegate = self
    present(scanVC, animated: true)
}

Remember to add the delegates needed:

extension ViewController: VNDocumentCameraViewControllerDelegate {

    func documentCameraViewController(_ controller: VNDocumentCameraViewController,
                                      didFinishWith scan: VNDocumentCameraScan)
    {
        guard scan.pageCount >= 1 else {
            controller.dismiss(animated: true)
            return
        }

        imgDocument.image = scan.imageOfPage(at: 0)
        processImage(scan.imageOfPage(at: 0))
        controller.dismiss(animated: true)
    }
    
    func documentCameraViewController(_ controller: VNDocumentCameraViewController,
                                      didFailWithError error: Error)
    {
        controller.dismiss(animated: true)
    }
    
    func documentCameraViewControllerDidCancel(_ controller: VNDocumentCameraViewController) {
        controller.dismiss(animated: true)
    }

}

The VNDocumentCameraViewController is your new photo view controller.

You can automatically recognize the documents, can adjust colors, crop, resize, align and much more… and the picture is taken automatically!

Cool… but now we want to read the text of our documents!


Implement the Text Request

Start creating the class:

private var ocrRequest = VNRecognizeTextRequest(completionHandler: nil)

Now we need to configure our request.

The VNRecognizeTextRequest have different options that can be used:

  • recognitionLanguages
  • customWords
  • recognitionLevel
  • usesLanguageCorrection
  • minimumTextHeight
  • etc..

Setup the OCR

private func setupOCR() {
    // the OCR request to inizialize
    ocrRequest = VNRecognizeTextRequest { (request, error) in
        guard let observations = request.results as? [VNRecognizedTextObservation] else {
            return
        }

        var ocrText = ""
        for observation in observations {
            // the top words candidates founded
            guard let topCandidate = observation.topCandidates(1).first else {
                return
            }
            ocrText += topCandidate.string + "\n"
        }
        
        DispatchQueue.main.async {
            // the response
            self.txtRecognizedText.text = ocrText
        }
    }

    // we want an accurate recognition
    ocrRequest.recognitionLevel = .accurate

    // correcting eventual misspelled words
    ocrRequest.usesLanguageCorrection = true

    // and our languages in priority order are:
    ocrRequest.recognitionLanguages = ["it-IT", "es-ES", "en-US", "en-GB", "fr-FR", "de-DE"]
}

Process the image

Last but first important stuff is to analyze the taken image.

In the delegates before we called the processImage() function and this is the code.

private func processImage(_ image: UIImage) {
    guard let cgImage = image.cgImage else {
        return
    }
    
    DispatchQueue.main.async {
        self.txtRecognizedText.text = ""
    }

    let requestHandler = VNImageRequestHandler(cgImage: cgImage, options: [:])
    do {
        // we use our created OCR request
        try requestHandler.perform([self.ocrRequest])
    } catch {}
}

We reset the text field and perform an ocrRequest after the image is taken.

For this tutorial, we skip the multiple image management and consider only the first one taken.

Now everything is complete, remember to call setupOCR() in your init (viewDidLoad ).


Final result

A full text recognized, automatically that you can read, edit, share and whatever you want:

I have no idea on what is this document. I’ve searched “document” on google…

Now: add some UI, storage, a pay-per-use and you have created one of the 1000000 apps presents on the App Store that reads and scan documents 😀.

Enjoy scanning!

 

Alberto Pasca

Software engineer @ Pirelli & C. S.p.A. with a strong passion for mobile  development, security, and connected things.