The HoloLens experience is awesome. I still love trying out new apps and immerse myself in Augmented and Mixed Reality. Existing UX concepts are still being adapted to the HoloLens, and new ideas abound.
Two aspects regularly dispel the illusion, however. As has been written countless of times, the field of view is just too small. This problem is being worked on by several companies from different angles. It will be solved soon, I am certain of that. The second aspect is the tediousness of the Air Tap. It is hard to target a specific point in space with my nose and execute a non-trivial gesture in front of myself. I keep wanting to nod affirmatively, thus clicking on some random widget border. It is also the single most difficult thing to teach a customer trying out the HoloLens for the first time.
The clicker is a highly underrated device that helps a lot with this. But you still need to fix your gaze (actually your nose) and hold it steady.
Voice input is great and fits AR well, but selecting a specific point or area will always be needed, and voice is not well suited to that. Also, voice input is not always preferable (e.g. on the train) and in an office context, separating actual voice commands from background chatter will prove challenging. BCI (brain-computer-interface) is the next big thing, but it will probably be a while before it becomes actually useful.
Therefore, this blog post is an appeal to collaborate on finding new concepts to complement voice input for AR! Here’s what I have heard of or thought up myself:
Gesture Tracking with the HoloLens
Whether you are using the HoloLens built-in tracking, Leap Motion, Kinect, or some of the newer devices around, this is what most users expect. At least half of the first-time users try the Minority-Report-Hand-Wave at some time. This is a clear indication that this gesture is intuitive and should be supported somehow. On the other hand, it is hard to discern gestures directed at the device from a wave directed at the co-worker walking past the friendly Holo-Zombie, just as with voice input.
Gaze tracking solo
How about letting us just shake or nod our heads to say no or yes to a dialogue question? Again, it might be hard to prevent false positives, and it is not culture-invariant. But the approach feels very intuitive to me.
Modern game consoles sport advanced input devices with a variety of concepts that are well suited to interact with a 3D environment. They are relatively inexpensive and easy to integrate. They are very specific to games, however, and not equally well suited for the different tasks in a business application. Also, they are often bulky, and it may not be desirable to carry them at all times.
Specific UIs on general-purpose devices
But most of us are carrying a highly integrated package of several useful sensors with us at all times. I am talking about smartphones, of course! A smartphone can give us basic hand tracking via its accelerators, may be able to localise itself independently via its camera, and, best of all, it has a highly sensitive touch display, which may be used as a slider, to track a gaze-independent spatial cursor, or to display pop-up dialogue buttons. It can even vibrate to let you know it is offering input choices.
Then again, 2D is dead. Therefore, most of our time-honoured (or time-worn?) 2D widgets might be decrepit as well. We probably should think about new ways to interact with a scene that do not involve having to look at a few microradians of dihedral angle to make our will known.
Whenever a decision needs to be made, I could take a decisive action representing my choice, walking to a specific place, picking up a specific tool, or just looking at a specific object.
Where screen real estate is the limit on desktop and mobile, we have all the world as a stage in 3D!
Eliminate binary Workflows
Guessing the user’s choices for him based on position and gaze direction is prone to errors. Therefore, recovery from incorrect guesses must be graceful and unobtrusive for the user. Walking away, gazing away must undo everything that has been triggered before.
We have already expanded on some of these concepts at our AR-Camp and my colleagues Michael and Ines will blog about this soon. Please share your thoughts in the comments or just let me know on Twitter!