AI was my Worst Agent #2 - Labeling Customer Sentiment
Mar 11, 2024
James McHenry
MelodyArc is an operations platform that uses both AI and human agents to service users. AI are the primary workers, with human agents available for support. You can learn more about our platform here.
Using AI as the primary worker in a production environment is hard. This series shares our journey in solving specific challenges.
The Task
Determine the perceived customer sentiment based on their messages.
🔔In customer service, sentiment can impact how a conversation is serviced. In most businesses, a “Neutral” sentiment should characterize the vast majority of conversations. Other sentiments, such as “Happy” or “Upset,” are used to flag conversations for unique service flows. Generally, sentiment ratings should lean towards neutrality.
Requirements
Agent must determine a customer’s sentiment from a list of available options, based on their messages.
Agent must respond in a pre-defined format.
Round 1
We are using ChatGPT 4 as the AI to power our agent. We will start by generating a small set of test emails. Emojis will indicate the sentiment I believe to be accurate.
Message 1: Frustrated subscriber - 😠
Subject: Order status I put my monthly subscription on pause 2 months ago. I have NOT TAKEN IT OFF PAUSE, as I still have alot of product. PLEASE cancel this. When I'm low on stock will do the automatic monthly schedule again. PLEASE confirm this correction by email.
Message 2: Exchange please - 😐
Subject: Order issue Hello! I ordered 4 boxes of candles and I don’t care for the blue or pink. Can I exchange these for a different color? The order number is 568792. Thank you!
Message 3: Stuck order - 😐
Subject: Request for help My order has been stuck in transit.
Message 4: Where is my order - 😐
Subject: Your order Shipped! Placed my order 8 days ago and still don't have it. Any updates?
Message 5: Great order - 😀
Subject: Order received Just got my order. Everything in it is so GREAT! Thank you!
Message 6: Answer me - 😠
Subject: Order received This is my third message with no response. Why can I not get help?
Prompt
For our first prompt, we will start with a very simple and direct approach.
What is the customers sentiment based on their message? options: "Happy" "Neutral" "Upset" ---Start of Message--- Subject: Order status I put my monthly subscription on pause 2 months ago. I have NOT TAKEN IT OFF PAUSE, as I still have alot of product. PLEASE cancel this. When I'm low on stock will do the automatic monthly schedule again. PLEASE confirm this correction by email. ---End of Message---
Response
I executed the above prompt for each test message. Here are the responses.
Messages
Frustrated subscriber - "Upset" 👍
Exchange please - "Neutral" 👍
Stuck order - "Upset" 👎
Where is my order - "Upset" 👎
Great order - "Happy" 👍
Answer me - "Upset" 👍
Results
👎Agent must determine a customer’s sentiment from a list of available options, based on their messages.
👍Agent must respond in a pre-defined format.
The agent responded using the correct format each time. Its sentiment alignment matched our expectations in four out of six instances, with discrepancies occurring in messages 3 and 4, where we anticipated a “Neutral” sentiment instead of an “Upset” one.
Below are its justifications for the differing sentiments.
Message 3
The customer's message indicates a problem with their order not being delivered as expected, which suggests that they are not fully satisfied or happy. This typically characterizes an upset sentiment.
Message 4
The customer's message indicates dissatisfaction with the fact that they placed an order 8 days ago and still haven't received it, hence their sentiment is upset.
🔔In a typical context, these justifications are sensible. However, in customer service, nearly every message relates to a problem. Therefore, to be effective, we need to tune sentiment analysis to identify customers who have explicitly expressed their upset feelings, rather than inferring that their circumstances justify an upset state.
Round 2
Let's take an incremental step and define the sentiment options. The prompt will attempt to categorize messages as “Happy” or “Upset”, defaulting to “Neutral” for all others.
What is the customers sentiment based on their message? options: "Happy" - Message expresses exuberant joy or immense appreciation. "Upset" - Message directly expresses anger or extreme frustration. "Neutral" - Message does not qualify as Happy or Upset. ---Start of Message--- Subject: Order status I put my monthly subscription on pause 2 months ago. I have NOT TAKEN IT OFF PAUSE, as I still have alot of product. PLEASE cancel this. When I'm low on stock will do the automatic monthly schedule again. PLEASE confirm this correction by email. ---End of Message---
Response
Messages
Frustrated subscriber - "Upset" 👍
Exchange please - "Neutral" 👍
Stuck order - "Neutral" 👍
Where is my order - "Upset" 👍/👎
Great order - Happy 👍/👎
Answer me - "Upset" 👍
Results
👍Agent must determine a customer’s sentiment from a list of available options, based on their messages.
👍Agent must respond in a pre-defined format.
The results are not exactly what I was looking for, but they could work. In the case of criteria 2, the quotations around a specific word were inconsistently applied in message 5. I will overlook this issue, as our platform is capable of correcting it.
This time, the sentiment aligned with expectations in five out of six instances. The exception was message 4, which was still categorized as “Upset” instead of “Neutral”. However, considering that the wording of message 4 does express frustration, this categorization might actually be correct. Score one for the AI sticking to its point.🫡
Validation
I tested this prompt across a series of actual customer messages. Below are a few examples, with personally identifiable information removed for privacy.
1. Re: Thank you for your order!
Hi! I edited the order earlier this week to blue, red, green yellow, and purple. The website showed it as "saved" at the time, and I'm surprised that it has now reverted to the old set. Can you please change that for the current delivery?
Result 👍
"Neutral"
2. Request from Bob
My order has been stuck in transit
Result 👎
"Upset"
3. Re: Your Order Has Shipped!
Hello Team Order: 154687 The tracking mentions that the product has been delivered. Unfortunately I did not receive any package. I have waited for a few days as well however no luck. I would like to request a refund. I shall order again soon :) Thank you
Result 👎
"Upset"
Analysis
The prompt is still biased towards classifying messages as “Upset” when “Neutral” is desired. The AI continues to interpret the underlying justification for being upset, rather than the actual expression of sentiment.
Round 3
I will try to refine the criteria for “Upset” by adjusting its definition and redefining “Neutral” to include expressions of a problem.
What is the customers sentiment based on their message? options: "Happy" - Message expresses exuberant joy or immense appreciation. "Upset" - Message directly and strongly expresses anger or extreme frustration. "Neutral" - Message relays a problem but does not directly express negativity or does not qualify as Happy or Upset. ---Start of Message--- Subject: Order status I put my monthly subscription on pause 2 months ago. I have NOT TAKEN IT OFF PAUSE, as I still have alot of product. PLEASE cancel this. When I'm low on stock will do the automatic monthly schedule again. PLEASE confirm this correction by email. ---End of Message---
Response
Messages
Frustrated subscriber - "Upset" 👍
Exchange please - "Neutral" 👍
Stuck order - "Neutral" 👍
Where is my order - "Neutral" 👍
Great order - “Happy” 👍
Answer me - "Upset" 👍
Results
👍Agent must determine a customer’s sentiment, from a list of available options, based on their messages.
👍Agent must respond in a pre-defined format.
The results meet the originally expected criteria. The AI also now aligns with my original expectation for message 4. So much for sticking to it’s point… 😞
Validation
I again tested the prompt with a series of actual customer messages. Below are the results, compared against the previous validation examples.
1. Re: Thank you for your order!
Hi! I edited the order earlier this week to blue, red, green yellow, and purple. The website showed it as "saved" at the time, and I'm surprised that it has now reverted to the old set. Can you please change that for the current delivery?
Result 👍
"Neutral"
2. Request from Bob
My order has been stuck in transit
Result 👍
"Neutral"
3. Re: Your Order Has Shipped!
Hello Team Order: 154687 The tracking mentions that the product has been delivered. Unfortunately I did not receive any package. I have waited for a few days as well however no luck. I would like to request a refund. I shall order again soon :) Thank you
Result 👍
"Neutral"
Analysis
The prompt is performing as expected, differentiating between expressions of problems and inferred anger.
Conclusion
Our AI agent is now ready to attempt sentiment inference, a challenging task. A key insight was refining the AI's empathy to distinguish between direct expressions of being upset vs assuming an upset state when circumstances justify such feelings.
AI Powered Operations with MelodyArc CX
MelodyArc CX is the world's first all-in-one customer operations platform, reducing operating costs by >50%. It's a single end-to-end solution that handles daily customer operations by providing both agents and AI together in one powerful platform.
MelodyArc CX layers on top of your existing CRM and other customer service channels to manage all the moving parts. From responding to and resolving customer requests, to implementing and cascading policies, to forecasting and analytics, MelodyArc CX takes on daily operations. It's more cost effective than agents alone and ensures higher resolution rates than automation.
Pay-as-you-go pricing with no minimum commitment means MelodyArc flexes with your business.
Book a demo today to learn more.
AI was my Worst Agent #2 - Labeling Customer Sentiment
Mar 11, 2024
AI was my Worst Agent #2 - Labeling Customer Sentiment
Mar 11, 2024
AI was my Worst Agent #2 - Labeling Customer Sentiment
Mar 11, 2024