For 25 years, technology companies have designed user interfaces for our eyes, largely neglecting our other senses.
After heavy investments in screens and visual media, the industry has shifted towards the auditory realm and advancements in natural language processing (NLP) and artificial intelligence (AI) have made Siri, Alexa, and Cortana possible.
These assistants sound human-like, but that’s not why we appreciate them. Technologists who program AI to be more ‘human’ misunderstand how the human mind works.
Setting aside the creepy factor, people don’t need or want AI created in their image. By examining how our attitudes towards visual and auditory media have changed in the last decade, we can identify a less anthropomorphized but more rewarding future for intelligent technology.
The rise of visual communication and asynchronicity
The dilemmas of present-day AI begin with smartphones. The decline of phone calls and the rise of text messaging was somewhat counterintuitive. Speaking with a person is more ‘on-demand’ and ‘instant’ than SMS exchanges that can last days without resolution. Nonetheless, in 2007, monthly text messages surpassed monthly phone calls among Americans for the first time.
Phone calling declined because it is synchronous communication – you must be present and engaged throughout the dialogue. After phones became mobile, anyone could call and hijack your attention. Sometimes, I’m sure you check the caller ID and think, “No, not now.”
In contrast, text messaging is a form of asynchronous communication, meaning the parties engage and respond as their schedules permit. It’s not necessarily more or less social than phone calls. But, it promised freedom. Rather than feeling pressured to answer immediately, asynchronicity gave us time to think about the message, delay commitments, and plan our responses.
Cheryl Casey, an assistant professor of media communication at Champlain College, discusses two sides of asynchronous communication.
On the one hand, it can make exchanges friendlier and less stressful than synchronous communication by facilitating self-censorship (often a good thing), careful message construction, and feelings of safety. On the other hand, it can shield jerks from the reactions to and consequences of their behavior, and it can distract people from face-to-face conversations. We’ve all seen families at restaurants buried in their phones, ignoring each other.
Over the last decade, smartphones funneled countless forms of asynchronous communication – text, email, social networking, advertising, commerce, news, research, video, and more – into mobile screens.
The analytics platform Dscouts recently found that, on average, people now tap, swipe, or click their phones 2,617 times per day and spend 2.42 hours on the phone split across 76 sessions.
The screen demands full attention. Imagine doing creative work, loading a dishwasher, talking with family, or driving a car while watching a movie on a smartphone. People try. But, as neuroscientist Daniel J Levitin explains in The Guardian, our focus, cognitive health, and decision-making ability suffer if we condition ourselves to multitask. Visual asynchronous communication overwhelms our brains.
Sensory design and AI
Screens won’t go away, but they have some limitations. The discipline of sensory design is helping us overcome them. Sensory designers use multiple inputs and outputs to create immersive experiences.
Creative use of sight, hearing, touch, taste, and smell can preserve our focus, reduce emotional stress, and simplify tasks. Sight is our most dominant sense, but sound in combination with AI offers some advantages over screens.
People walk and talk, play sports to music, drive to radio and podcasts, or sing while showering. Siri, Alexa, and Cortana free up vision yet interact asynchronously. While I’m loading the dishwasher, I can ask Alexa about the weather or news. I can’t load a dishwasher and text message with friends simultaneously.
Unlike websites, social networks, and other visual advertising channels, an audio AI assistant responds strictly on our terms.
Anecdotally, people tell me that they love the experience of giving AI orders. There’s no complications or arguments. We feel in command because it’s a one-way exchange. Tell Alexa to order more coffee beans, and she does it. There’s no emotional stress weighing down the ‘conversation,’ if it’s even fair to use that term. We can talk in front of Alexa yet not talk to Alexa.
A better vision for AI
If we want frictionless tech experiences and useful sensory designs, why are so many innovators trying to make AI more human-like? Human beings are not easy to deal with. An AI assistant that is emotionally static and says “yes” to every request is unlike any human I know!
It’s not shocking that human beings would try to create AI in their own image (and miss). Our movies, novels, and religions reinforce that approach to AI.
However, as we expand sensory design to combine AI with vision, hearing, touch, taste, and smell, we need a better target than humanness. A few suggestions:
- Predictability. While we often admire human spontaneity, AI shouldn’t be a robotic improv actor. AI should be optimized to understand what we want and act upon it. Companies like ai.x have created AI assistants that can schedule and reschedule meetings with human email contacts. If we want to trust AI with more complex tasks – like scheduling a business trip – the intelligence must be predictable. When we speak to auditory AI, we want the certainty of tapping without actually tapping.
- Initiative. AI needs some leeway to interrupt us and make exchanges less asynchronous. If I have a meeting across town and there’s bad traffic, I’d like AI to warn me then ask if it should book an Uber at an earlier time than planned. Consider how different that is from the AI of visual media, which is optimized to hook our attention and maximize advertising-based revenue models. We don’t want Alexa to bombard us with marketing offers. Auditory AI should be proactive but not distracting.
- Trainability. Although people talk about their dogs like children, kids are a lot harder to ‘train.’ Dogs will respond diligently to our signals and commands with some Pavlovian conditioning. Unless you’re a modern Captain vonn Trapp from The Sound of Music, you’re not blowing whistles to condition your kids. AI should be more like a dog than a person. It needs to be highly trainable rather than annoyingly independent. Maybe AI should even act excited to greet us when we come home.
Technologists made screens the nexus of communication but created a boundless source of distraction and temptation. Sensory design and AI promise to free up our vision and redefine the concept of a ‘frictionless’ experience. It’s not just about instant, on-demand gratification – it’s that, plus preserving our presence of mind and freedom to focus beyond a screen.
Let’s make AI predictable, proactive, and trainable. As technologists embrace the full spectrum of sensory design, expect less humanness but more utility from artificial intelligence.