It began rather innocuously, to develop interface concepts for displaying speech commands when the voice system is active (teleprompters). In current and prior car models, the speech interfaces were comprised primarily of teleprompter interfaces, just speech commands on the screen.
I sent over various wireframe concepts (like the one seen above) and discussed the complexity of the information architecture (which needed improvement). With the addition of new speech commands and more natural language recognition, the number of potential voice commands grew significantly and the request from management was to display them all.
As the project progressed, I began to have other ideas about what we might be able to do with a speech interface. For example, if the teleprompter is meant to assist the user, then why not provide a better interface that includes user assistance? Obviously just developing a new teleprompter UI wasn't going to cut it. But what could it become? This is where the real project began in earnest. As usual, it began with a lot of sketching and brainstorming.
These are just a fraction of the whiteboard sketches that were generated at this point. I was paired up with a designer from a different team that did visual design. Lots of detailed discussions about various use cases and scenarios took place at this point. There's nothing quite like locking yourself up in a room with a bunch of whiteboards and hashing out concept after concept to refine an idea.
What we eventually settled on originally was an interface based on modular tiles. Why? The basic idea was that we could improve the voice experience by presenting information visually and only displaying the relevant information to help the user continue to the next dialogue step.
Keep in mind, the general UI convention at the time was to use text lists everywhere, and I do mean everywhere. In earlier systems, if you did a "free POI search" for a restaurant (aside from the clunky dialogue) you would be presented with an ordered list of names, addresses, and relative distances filling up your screen.
While it certainly provides a lot of detailed information, as an interface, it resembles a set of database records and not an actual user interface. So our mission overall was to change that, which proved to be much harder in practice than it seemed.
Collaboration and compromise
Collaborating with international teams is not easy. Technology has not been able to resolve the problem of remote interaction. Video conferences just don't cut it. Face-to-face meetings are the best way to build collaboration, but you can't always fly your whole team to a remote location (Germany in this case, of course). So we had to have video conferences, lots and lots of video conferences.
I pitched the tile UI concept over and over. Pitched it to as many people as I could. There was simply too much resistance to this kind of idea. In retrospect, I might have tried to change my tactic a bit simply because we were showing some really different UI concepts from the generally expected text lists. Anyway, as a result, we had to figure out an effective compromise. We couldn't have just text lists, but the modular tile concept wasn't winning people over either. Meeting in the middle isn't always the best thing for a product, but it is very important to keep the project moving forward!
So these were the results of the compromises. We decided to develop a new set of UI "archetypes" as an iteration of the tile concept and keep things moving forward. The idea of modularity was still important (given how dialogues work) and even though I couldn't get agreement on the tile concepts, the overall mission to simplify and improve the voice interface remained. After reading through all of the different dialogue specifications (there were a lot!) I realized we could abstract the system into a set of key moments, maps, contacts, music, messages, etc. and these moments, when combined, would form the basis for the whole speech UI system.
Continuing our learnings
Usability testing was critical to help us refine the design. In order to get to test the UI however, we needed something interactive that we could test. Voice interactions are tricky in that they're not as easy to stick into a usability test as a simple clickthrough prototype. It's difficult to measure the quality of a voice interaction without a speech dialogue system. So we had to get far enough with prototyping to be able to evaluate some of our advanced dialogue concepts.
One example was where we extended the dialogue for "Hey Mercedes, take me home." If the user had not set a home address, the original requirement was to just end the dialogue with "Sorry, no home address available." We proposed to extend it to allow the user to set their home address by voice (and then start the navigation) because an intelligent system would know what data it has and assist the user accordingly. Testing showed that these were positive improvements and through a lot of different discussions we got approval for a lot of the dialogue enhancements.
Usability tests also helped us evolve the UI concepts. Even though the idea was to be "voice first", we had to allow for multimodal interaction as well because we knew that people wouldn't only use their voice to interact with the UI. The challenge was to design a voice interface that understood that you might also want to interact with it via touch. We had to design all the touch affordances in a different way than the rest of the system. Text labels had to be written to indicate voice commands. Focus cursors (that highlighted where the remote touch interaction would take place) were not visible until after some time. We even tested what an appropriate timing would be to automatically change the UI to display more touch affordances.
All of this test data helped us develop the proper visual designs that would end up in the final product.