Quick and useful: How we created an AR app in only three days

18 July 2018
| |
Reading time: 4 minutes

One of the biggest advantages of Augmented Reality is that most users can use it quite intuitively. That’s why a lot of realized or planned applications of this technology are in the field of learning or the transfer of know-how. However, when it comes to creating apps for the Augmented Reality, it seems to be the other way around: Even experienced programmers consider this a difficult and complicated task. And yet Apple and Google are making things much easier with ARKit and ARCore. So, we challenged ourselves: Can we program an AR app within only three days, that is really useful?

Our challenge took place on our annual Zühlke Camp from May 2 to 4, 2018. The staff from all our german locations gathered in Leimen, near the town of Heidelberg, for three days of learning, socializing and partying. The AR track hosted several projects, reaching from investigating shared experiences with the HoloLens and Windows immersive headsets, playing around with the Holo-Stylus and the Myo wristband.

1. Recognize doorplate

Our choice, however, was to dive into mobile AR, mainly ARKit (iOS) and ARCore (Android). We decided to give ARKit a try first and challenge ourselves to create a useful app within only three days.

As a usecase we came up with augmenting the doorplates that were attached to the doors of the conference rooms in the hotel. Different interest groups gathered in each conference room to work on specific topics. Our app is intended to extend the doorplate so that the user receives information about which interest groups are hidden behind the corresponding room. In order to recognize the doorplates, we use the new ARKit 1.5 image recognition APIs.

To make ARKit recognize images, you just have to include them in the assets of your app and add them to the ARWorldTrackingConfiguration of your ARSession:

var configuration: ARWorldTrackingConfiguration { 

        guard let referenceImages = ARReferenceImage.referenceImages(inGroupNamed: "AR Resources", bundle: nil) else { 

            fatalError("Missing expected asset catalog resources.") 


        let config = ARWorldTrackingConfiguration() 

        config.detectionImages = referenceImages 

        config.planeDetection = [.horizontal, .vertical] 

        return config 


When ARKit recognizes one of the images in the camera stream you get a call back to the renderer (renderer: SCNSceneRenderer, didAdd node: SCNNode, for anchor: ARAnchor) method. The SCNNode delivers the coordinates of the recognized image in the real world. These coordinates can be used to add 3D objects to the ARView using the SceneKit iOS framework.

func renderer(_ renderer: SCNSceneRenderer, didAdd node: SCNNode, for anchor: ARAnchor) { 

            if let imageAnchor = anchor as? ARImageAnchor { 


            let height = imageAnchor.referenceImage.physicalSize.height 

            let box = SCNBox(width: height, height: 0, length: height, chamferRadius: 0) 

            let boxNode = SCNNode(geometry: box) 


            boxNode.position = SCNVector3(0, 0, 0) 


            // ... 

2. Read doorplate via OCR

In order to find out which room was scanned, we planned to use OCR (optical character recognition) on the camera image. First, we planned to use the iOS Frameworks Vision and CoreML for this. The Vision Framework supports text detection, but not recognition. So we added a pretrained CoreML model for text recognition and fed the cropped images of the letters into CoreML for classification. Unfortunately, the model was not working good enough for getting reliable results. If you are further interested in solving OCR tasks with CoreML, Martin Mitrevski’s article is a very good starting point.

We used Microsoft’s Computer Vision API as a fallback. You can upload any image to the cloud-based API and the service will analyze the visual content in different ways based on user choices (e.g. OCR, detect human faces and flag adult contect). After processing the image, the developer will get the response as a JSON object. To get your images analyzed, you can register for a free trial.

So, whenever ARKit recognized the image target, we send a frame of the video to the cognitive services. As a return we get all strings from the image.

   func detectTextWithCognitiveServices(image: UIImage, completion: @escaping (_ result: Array<String>) -> Void) {
        var recognizedTexts = [String]()
        let parameters = [
            "Content-Type": "application/octet-stream"
        Alamofire.upload(multipartFormData: { multipartFormData in
            if let imageData = UIImageJPEGRepresentation(image, 1) {
                multipartFormData.append(imageData, withName: "file", fileName: "file.png", mimeType: "image/png")
            for (key, value) in parameters {
                multipartFormData.append((value.data(using: .utf8))!, withName: key)
            }}, to: "https://westcentralus.api.cognitive.microsoft.com/vision/v1.0/ocr?language=de&detectOrientation=false", method: .post, headers: ["Ocp-Apim-Subscription-Key": apiKey],
                encodingCompletion: { encodingResult in
                    switch encodingResult {
                    // ... process data

3. Augment additional information into the scene

Now that we have read out the names of the room, we can show additional information to the user. We implemented a simple mapping between room name and interest group shortcut. We want to show this information right below the Zühlke logo as a 3D text.

    func add3dtext(text: String){
        guard let anchor = zuehlkeAnchor else { return }
        let label = SCNText(string: text, extrusionDepth: 1)
        label.firstMaterial!.diffuse.contents = UIColor.purple
        label.firstMaterial!.specular.contents = UIColor.white
        label.chamferRadius = 0.1
        label.flatness = 0.1
        let width = (label.boundingBox.max.x - label.boundingBox.min.x) * 0.005
        let labelNode = SCNNode(geometry: label)
        labelNode.scale = SCNVector3(0.005, 0.005, 0.005)
        labelNode.position = SCNVector3(-width/2, 0, 0.12)
        labelNode.rotation = SCNVector4(1, 0, 0, -Double.pi/2)
        let node = sceneView.node(for: anchor)


4. Conclusion

All in all, the most difficult part of our project was to automatically read the texts. Apple provides the possibility to detect, if an image contains text. However, recognizing these texts is not yet so easy. A good alternative is to use cloud services for this task until further notice. This also shows how closely the further distribution of Augmented Reality applications is linked to the advancement of technologies like Data Analytics or Artificial Intelligence.

ARKit has low entry barriers, especially when it comes to recognizing and augmenting objects. As a developer, you can make rapid progress with the framework. So, it’s pretty easy to use, once you got familiar with the coordinate system. The web is full of useful tutorials, which makes it even easier. That is why we managed to finish this AR app within only three days – and had lots of fun doing it.

The project is open sourced on Github.

Comments (0)


Sign up for our Updates

Sign up now for our updates.

This field is required
This field is required
This field is required

I'm interested in:

Select at least one category
You were signed up successfully.

Receive regular updates from our blog


Or would you like to discuss a potential project with us? Contact us »