Episode #303

Capturing High Resolution Photos

Series: Camera Capture and Detection

17 minutes
Published on September 25, 2017

This video is only available to subscribers. Get access to this video and 578 others.

I got a tip from a subscriber about the method I used to capture the photo in the last episode. Since we're capturing the data using the preset we chose for processing the rectangles, we are bound to that preset when we export to an actual photo. By using AVCapturePhotoOutput, we can have much more control over the format, size, and quality of the resulting image. In addition, since we are leveraging the SDK for capturing the photo, we benefit from things like auto-flash, HDR, auto focus, and the built-in camera shutter sound. (Yay for deleting code!) The end result might not look very different on device, but if you are taking that image somewhere else to do OCR or other processing on it, a higher quality image is important.

Episode Links

Setting Up

First, we'll add a property to store our photo output instance:

var photoOutput: AVCapturePhotoOutput?

Then, when we are setting up our capture session, we'll also add this output to it:

            let photoOutput = AVCapturePhotoOutput()
            self.photoOutput = photoOutput

Taking the Photo

Later on, when we want to take a photo, we no longer need the wantsPhoto flag. Instead, we can just tell the output to capture a photo. This requires us to tell the AVCapturePhotoOutput what pixel format we want to use (among other things).

If we don't do this we'll get a processed JPEG image, and this would need to get converted back to an uncompressed type for us to use Core Image on.

func takePhoto() {
    guard let formats = photoOutput?.supportedPhotoPixelFormatTypes(for: .tif) else { return }
    print("available pixel formats: \(formats)")
    guard let uncompressedPixelType = formats.first else {
        print("No pixel format types available")

    let settings = AVCapturePhotoSettings(format: [
        kCVPixelBufferPixelFormatTypeKey as String : uncompressedPixelType
    settings.flashMode = .auto

    photoOutput?.capturePhoto(with: settings, delegate: self)

The formats returned by the method above are of type OSType. these are special integers that have a nifty attribute: they can be printed as text!

If you want to print out the actual 4-letter code for a given OSType, you can use this function:

    func str4 (_ n: Int) -> String {
        var s: String = String (describing: UnicodeScalar((n >> 24) & 255)!)
        s.append(String(describing: UnicodeScalar((n >> 16) & 255)!))
        s.append(String(describing: UnicodeScalar((n >> 8) & 255)!))
        s.append(String(describing: UnicodeScalar(n & 255)!))
        return s

Capturing the Raw Pixels

This delegate method will be called once the photo has been taken and pixel data is available. We want to hand this data off to core image.

extension CaptureViewController : AVCapturePhotoCaptureDelegate {
    func photoOutput(_ output: AVCapturePhotoOutput,
                     didFinishProcessingPhoto photo: AVCapturePhoto,
                     error: Error?) {

        guard let rect = lastDetectedRectangle else { return }
        imageProcessingQueue.async {
            guard let pixelBuffer = photo.pixelBuffer else {
                print("No pixel buffer provided. Settings may missing pixel format")

            let ciimage = CIImage(cvPixelBuffer: pixelBuffer)
            if let correctedImage = self.perspectiveCorrect(ciimage, rectFeature: rect) {
                DispatchQueue.main.async {
                    self.displayImage(ciImage: correctedImage)

Cleaning Up

Since we're using the built-in camera features, we can delete the code that flashes the screen and plays the camera shutter sound. It's not needed anymore!

This episode uses Xcode 9.0-beta6, Swift 4.