#10 ChatGPT understands, except it doesn't
We're in the beep boops era of chatbot writing... again.
Hey Voice This! fam 👋🏼
Full disclosure — this is a longer than usual issue! We have a ton of sources, insights, and text to read through this time.
Picture this: it’s dinner and you’re at a crowded restaurant with one of your closest friends and someone new, your friend’s friend. No big deal, you’ll get to know them over the course of the evening, except- it’s hard to hear anything they’re saying over the noise of the room. Overlapping voices and the sound of clanging dishes have made it hard to determine if they said they’re from “New York” or “Newark”. The topic changes and they turn to you to ask you something. “Sorry, what?” you ask, prompting them to repeat the question. “Sorry, I didn’t hear you.” so again they attempt their question. You begin to sweat. If you admit you still couldn’t hear them, is it on them to try again or is it your failure to be a good listener? In a split second, you panic and chuckle while saying, “Ha, yep,” and then stare them down, searching for confirmation in their face. It takes a moment, but relief soon washes over their face as they accept your answer and smile back. Amazing. You have no idea what you said yes to, but it was a good enough answer to keep the night going smoothly.
We all have given “good enough” answers at some point in our lives, either due to lack of context, to follow social decorum when full honesty is too much, or literally any other reason which makes human communication so difficult. The funny thing is: a lot of the general hype around Large Language Model (LLM)-powered “chatbots” is that they talk back as if they understand. Many different kinds of requests are now possible with ChatGPT that weren’t accessible before. You can ask ChatGPT to recommend the best restaurant in the city or to write the 3 main points of an essay for a historical figure and you will get back what you write in. It’s a magical experience, especially if we think of how easily it was, or still is, to be disappointed by chatbots. OpenAI “opening” this technology to the world is a huge milestone. We at Voice This! fully embrace this change, although admittedly we did have our initial hesitation amidst all the hype (see issue #6).
ChatGPT doesn’t understand anything it writes, but does it need to? For the majority of conversation designers in the field, Conversational AI follows the knowledge-based or “understand” approach, which is, more or less: in order for a machine to respond to natural language, you must first train it to know what natural language is, linguistic rules and everything.
This is where we get the listen, understand, and respond model for human-machine conversations. With voice tech, you also have the added layer of automatic-speech recognition (one more probabilistic model!) in the “listen” part of the process. LLMs do not process language in the same way. In fact, LLMs do not perform any form of natural language understanding (NLU) at all. As stated in the paper, “On the Dangers of Stochastic Parrots” by Bender and Gebru, et al., “[LMs] only have success in tasks that can be approached by manipulating linguistic form. […] the training data for LMs is only form; they do not have access to meaning.”
Stated in another way, and I quote Kate Crawford, “LLMs are […] looking for patterns and relationships between the texts that they’ve been trained on— and they use this to predict the next word in a sentence.” ChatGPT and other LLMs are glorified word guessers. They don’t understand what they generate. They don’t hold opinions. They can’t distinguish between the truth or a lie. And they’re really bad at math. Why does it matter if they understand us or not? Well, they don’t have to understand anything to carry out their tasks, but how we, as creators and participants of the technology, talk about it matters. If LLMs inherently are not created to understand things, only to respond to things, then it’s not fair to say they “hallucinate”. As blogger Charley Johnson writes quiet eloquently on their own newsletter:
“[Hallucination] implies an aberration; a mistake of some kind, as if it isn’t supposed to make things up. But that’s actually exactly what generative models do — given a bunch of words, the model probabilistically makes up the next word in that sequence. Presuming that AI models are making a mistake when they’re actually doing what they’re supposed to do has profound implications for how we think about accountability for harm in this context.”
ChatGPT and other LLMs are simply doing what they were trained to do. Likewise, when human users of these products create prompts that either accidentally or intentionally create blatantly false outputs with these systems, they are also likely not to blame for such outputs. As a society, we sorely need accountability from both the creators and designers of this next generation of AI tools. We are what we make. It is not enough to place blame on probabilistic models for absurd responses, when their public-facing response could have been prevented by someone on the product team, raising their hand and asking, “What things should our bot never say, and why?”
Conversation designers, generative AI won’t take your job because conversational AI teams still need someone to ask the “why” and “how”. If the ChatGPT-powered Expedia bot has taught us anything, it’s that it’s not enough to put a chatbot on your website (or app) and call it a day. If we do that, then we might as well have not gone through the entire blip of personal and home voice assistants or the development of conversation design as an entire field. The persona of today’s bots is one boop beep away from sounding like a 1970’s tech caricature.
What your bot says, how it says it, and how users can interact with the experience still need to be designed. The world doesn’t need more chatty bots, it needs tools that can “understand” and “do.”
P.S. What’s the point of a travel bot that can’t book a trip for you? (Expedia) Or an e-commerce bot that doesn’t reveal price information? (Sorry Azaria, you’re guilty too).
Podcast Corner 🎙️
We recently hit 2,000 listens! We’re constantly overwhelmed by the amount of support we get on this (low-key) side project, but we’re so glad so many of you have tuned in and shared how much you enjoy the podcast. Thank you for getting us to 2k 💜💙💜
We really can’t wait for season 2 now 👀 (coming soon!)
AI Reading Corner📚 (+ works cited)
AI Timeline (very cool infographic! 🖼️) - https://digitalwellbeing.org/wp-content/uploads/2017/08/Artificial-Intelligence-AI-Timeline-Infographic.jpeg
On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?
A Very Gentle Introduction to Large Language Models without the Hype
Kate Crawford: ‘We need to have a much more comprehensive form of AI governance’
Unpacking AI: "an exponential disruption" with Kate Crawford
You Can Probably Beat ChatGPT at These Math Brainteasers. Here’s Why
ChatGPT Is Powered by Human Contractors Getting Paid $15 Per Hour
🤖 AI isn’t ‘hallucinating.’ We are.
Expedia Debuts a ChatGPT-Powered Assistant ... but It's Totally Disappointing
Audience Q&A 💌 with Nathan Bishop!
At the time of this interview, Nathan was working at Discover Financial Services as a Senior Conversational AI Specialist. He has since moved to Fidelity Investments in the role of Conversation Designer!
Meet Nathan (Nate!) in his own words:
😄 Nice to meet you, I’m Nate.
I love building at the intersection of content and AI. As a consultant at Drift, I built, launched, and optimized enterprise conversational experiences for companies like HPE and Adobe. Then at Discover, I implemented conversational experiences that ensured 57 million customers gained the most value from self-service support solutions.
Now at Fidelity, I'm an embedded conversation designer in product squads across the Personal Investing and Workplace Investing business units focused on guided agent experiences, a net-new consumer app, and R&D. My contract term ends March 31, 2024.
I have a keen interest in using AI to deliver or analyze user content, whether in conversational UX or social media trust & safety. Examples of my work, and resources for success in conversation design, can be found on my website Toodle.ai. I can be found on Twitter and Post.news @NathanABishop.
I'm always open to coffee chats, let me know if you're interested!
💌 NOW FOR THE QUESTIONS! 💌
What is a typical day in the life of a conversation designer?
Nate: Every day is a little different. I focus on the week as a whole and how that looks for me. On Mondays, I like to start by understanding how the bot is performing. I spend a lot of time manually reading conversations. There’s some hot takes around that- that people shouldn’t be doing that by hand- but I personally benefit from getting that frontline perspective, if you will, on what the typical user journey looks like. There’s [no tool] that can quite augment that, at least yet. Then the other part of my time as a designer is meetings. As someone working for a large enterprise, but especially in financial services, a lot of the work I do is highly collaborative. There’s a lot of stakeholders and dependencies.
When I was at Drift, it was relatively easy to push out updates [on designs]. Sometimes my customers trusted me enough that I could put out copy and then ask for approval on it later. In Financial services, to get content out: it’s an incredibly lengthy but necessary process to make sure the content is consistent across all of our different channels, whether it’s on a help doc or if the IVR, chatbot, or (human) agent is saying it. Does it comply with all legal and regulatory standards? Can it be understood by the most number of users possible? So with all those dependencies, there’s a lot of meetings that stem from that. Also at a company like Discover, there’s a lot of stakeholders in specific areas. You might have (what we call) a business or process owner that is responsible for a particular experience when it comes to credit cards or the dispute process. Typically, I’ll meet and consult with all these different teams as well.
I’m lucky at Discover that while I do have a lot of meetings, I also have a lot of heads-down time to focus on projects. I’d say 3/4 of my work is NLP training and tuning, like standing up a new intent or making training improvements. Say there’s a particular intent that we’re focusing on for the month that we want to improve metrics on: how can NLP training and tuning help with moving the needle on that goal?
In the second half of a typical week, I might work on some of the longer term initiatives. So I’ll meet with people on things like personality design or which business units or processes we might be looking to incorporate into Conversational AI within the next few quarters or years.
What’s 1 thing you enjoy and 1 you dislike about Conversation Design as a career?
Nate: I find Conversational AI so exciting. Generally, when I talk to people in tech, my neighbor, whoever, when I tell them what I do, there seems to be so much curiosity and initial passion. It’s such a cutting edge and exciting space to be in. Even though it can sometimes sound boring and at times I have to rack my brain on like: how do I solve this problem that seems so simple? I often have to remind myself that yes, it’s simple, but I’m probably defining best practices for the industry right now- what I’m doing for this financial services chatbot. And when I ask the vendor on how to do it and they’re like, “I don’t know,” it’s a pretty good indication that the work I’m doing is still exciting. You’re doing something that no one else has done before. You’re a pioneer, if you will, in so many ways.
I kind of have 1 major point of friction [with the field]. As a pioneer, one of the hardest things, and I think a lot of us have related and aligned on this before: it’s relatively easy to find resources on building a resume for conversation design, maybe even what a portfolio looks like, but as far as leveling up your career, finding those resources is incredibly tricky, if not impossible.
What do productive career development conversations look like with your manager after you’ve been a couple of years in your role? What does management look like? Or does management for this even exist when we only have one or two people doing this role [CxD] at our company? What does it look like to advocate for more resources? Say we have dramatically scaled up our efforts for Conversational AI across the enterprise, but we still only have two NLU-focused people in the company. What can we do to increase headcount? Or even some of the technical things. The work is so niche and there’s so few people working on it, predominantly in the enterprises that can afford it, so a lot of the cutting-edge stuff or best practices are highly confidential.
Say I’m working on a challenge around disambiguation or establishing better context understanding for my bot. Someone at another company could have very well found a really good solution to that problem, but I’m not going to find anything helpful about it online. One of the big frustrations is that a lot of the thought leaders who talk a lot about conversation design cover the general concepts of it. There’s not much out there for the people who are already in the field and need more advanced help or resources. Or a lot of designers don’t or can’t really show their work. Like, “I saw you did this really cool thing for a webinar, but what does the actual design look like? What tools did you use?” So, at times, you start to wonder, when’s the last time this person has actually shipped a chatbot or anything into the conversational ecosystem? Are these even the right people that we should be expecting resources from?
To bring it all together, I’m still excited in this space. We are all pioneers and there is still tremendous opportunity left.
Thanks for Reading This! 🥰
Voice This! Newsletter is a joint effort of Millani Jayasingkam and Elaine Anzaldo. Opinions expressed are solely our own and do not express the views or opinions of our employers.