AI
February 19, 2025

Can I use Copilot or ChatGPT to analyse my Voice of Customer feedback?

What's better at analysing customer feedback? An LLM or a Specialist Tool? We analyse the same data using both tools and show the outcomes...

Your customers are telling you exactly what they need from you. Within thousands of pieces of customer feedback is the blueprint for improving your services. But finding those meaningful insights can feel like searching for a needle in a haystack? 

When CX teams ask us “Can I use Copilot or ChatGPT to analyse my Voice of Customer feedback?” they’re really asking a deeper question "What's the best way to turn customer feedback into action?”

The one thing you need to be doing with your customer feedback data, is analysing it effectively so you can confidently prioritise actions to focus on. The good thing is that there are so many tools to assist you with analysing your voice of customer feedback. Specifically AI tools. 

So the real question is not ‘Where do I start?’ but ‘Which tool is the best at helping me analyse my VoC feedback?’ 

Why are AI tools the best for customer feedback analysis? 

Without AI assistance, your team faces four critical challenges:

  • Scale: As feedback volumes grow, manual analysis becomes impossible. Trying to read and categorise thousands of comments means urgent issues will slip through the cracks.
  • Accuracy: Human bias creeps in. Different team members tag the same feedback differently, creating inconsistent data that leads to misguided decision making. 
  • Speed: Days or weeks to understand the scope of an issue means you're always playing catch-up with customer needs.
  • Focus: Manual tagging drains mental energy. Your talented team members waste hours on repetitive tasks instead of actually making a difference to customers' experiences.

Understanding AI customer feedback analysis options: 

When we think of AI we automatically think of tools like ChatGPT, with its record for fastest user growth when it launched, reaching 100 million users in just two months and 250 million active users. These tools can often seem like the most top of mind choice, but there a suite of other tools available for AI feedback analysis:

  • LLMs: Co-pilot, ChatGPT, Claude ect
  • Afterthought analytics: Analytics tools built into your survey provider e.g. Qualtrics’ Discover XM and Text IQ
  • DIY solutions: Software built by your own internal team for your business
  • Specialist tools: Tools specifically designed for unstructured text analysis and customer feedback

So back to the question: “Can I put our customer verbatim through Copilot/ChatGPT and see what it does?" You can definitely do that. In fact, we even do that in this blog for you. But is it the best option for truly understanding your customer feedback? Definitely not. OpenAI CEO, Sam Altman even said that ChatGPT is incredibly limited and a mistake to rely on it for anything important. And if there’s one thing that’s important, it’s your customer data. 

The AI Showdown: LLMS vs Specialist Tools 

We’ll show you the exact outputs that you get from an LLM vs A Specialist Tool when looking at a data set for key themes, sentiment analysis and priorities to action in the data. For the purpose of this experiment, we’ve created a synthetic set of housing data and we gave both Co-pilot and Wordnerds this same data set. Lo and behold the results...

Co-pilot Data Analysis

A brief breakdown: Confidently oblivious to your customers' needs. Fake it til you make it is most definitely at play here. 

A detailed breakdown: At first glance, Co-pilot appears to handle customer feedback well, quickly identifying key themes. But as we dig deeper, it becomes clear that its confidence doesn’t always match reality.

When we ask our first prompt, we get an instant overview of the key themes that it has identified within the data: repairs and maintenance, security, customer service, accessibility, community environment. Amazing! 

As a CX professional, what you really need is numbers behind these themes - so I can understand what people are talking about the most and where to begin investigating. Co-pilot initially gives us percentages for a few of these themes, as well as whether the comments are decreasing or increasing over time, a brief sentiment analysis and some recommendations on how to improve CX.

We asked Co-pilot again for all the figures, as it did miss some out on the first try. It then gave us percentages for each theme, with a very top level summary of trends increasing or decreasing over time.

With Co-pilot telling us that 20% of comments mention accessibility, we ask it to list the rows in the spreadsheet where accessibility is mentioned. It gives a full list of rows where it’s mentioned, so we check the verbatim in row 9. We find that row 9 reads “I’m impressed with the energy efficiency of my flat. The new windows and insulation have made a big difference to my heating bills.” This isn’t verbatim related to accessibility. We tell Co-pilot this and ask if that is correct. It rechecks the data set and lists the correct rows for us.

We get a new set of rows where accessibility is mentioned. We check line 33 which reads “I’m impressed with the energy- efficient features in my flat. The new windows and insulation have made a noticeable difference in reducing drafts and keeping the heat in. This has helped to lower my energy bills and make my home more comfortable”. We then ask Co-pilot to double check this, and it recognises that the comment doesn’t mention accessibility. Co-pilot rechecks the data set again. It gives us the same data back. 

We then notice that the figures might not be adding up. Co-pilot initially says that 20% of the data mentions accessibility, with 12288 lines of data this would mean that there are 2547 lines of verbatim talking about accessibility.  We ask it to double check that this is correct.

Co-pilot rechecks the data set and finds that rather than 2547 lines of verbatim, there are only 320 lines of verbatim which actually equates to 2.6%. 

One of the biggest challenges with LLMs like Co-pilot is that they don’t just make mistakes—they make them confidently. This can be dangerous in a CX setting, where acting on incorrect insights can lead to wasted resources or make customer issues even worse.

It’s not an isolated case. Research explains that despite advancements, LLMs have become less reliable over time with larger models more likely to produce inaccurate answers whilst projecting confidence, even when faced with straightforward problems. Instead of acknowledging uncertainty, they will provide incorrect answers with confidence, making it harder for users to detect errors.

A word from Steve Erdal, our Chief Scientific Officer

AI is, for better or worse, a totally different type of computing process. This means it's completely amazing at some things we hadn't imagined a computer could do before, but it's also genuinely terrible at things we're used to computers doing well.

Computers are usually great with facts and numbers. This was our first use for them, and the most common computing tasks (calculations, file sharing, data storage and analysis) all have this in common. There's a specific answer to a question, and the computer gets it for you.

But Generative AI isn't trying to give you the right answer. It's trying to approximate the most likely answer, based on the prompt you've given it. In many cases, an approximate answer is great. But if you need a specific answer, and especially if the numbers have to be right, Gen AI is not built to help you.

The big problem, though, is that Gen AI won't tell you that.

When I was trying to get the data insight from CoPilot, the scariest bit was its confidence. Sure, there's a catch-all "this could be wrong" disclaimer at the bottom, but when the figures were wrong (in some cases, an order of magnitude wrong), it's presented with the same authority as when it's bang on. I feel a bit weird banging on about numbers as a dyed-in-the-wool Wordnerd. But when you're talking about how to make your customers' lives better, and how their experience is changing over time, getting the numbers right is pivotal. And GenAI just isn't built to do it. - Steve Erdal, Chief Scientific Officer @ Wordnerds

So if LLMs struggle with accuracy, what’s the alternative? This is where specialist tools make a difference. Rather than guessing, specialist tools systematically categorise feedback into precise, quantifiable themes—allowing you to identify issues with confidence. Let’s take a look… 

Wordnerds Data Analysis

A brief breakdown: Want an omnipotent view of your data set? Here it is…

A detailed breakdown: When we upload the data into the Wordnerds platform, we can then view it directly in a business intelligence tool, in this case, Power BI. You can then see that automatically the data has been analysed and categorised based on a specific TSM framework and that each of the key areas - maintenance and repairs, communication and engagement, safety, complaint handling, estate and neighbourhood and financial - are listed by the volume of comments on that particular topic in a given time period. Already we have a clearer indication of the scale of issues in our data. 

You can click into any of these topics and understand what people are talking about the most and what’s driving positive or negative sentiment in that particular topic. In this example we look at safety - a big priority for residents and an area that is seeing a big rise in comments. 

We can see that damp and mould and security are huge drivers of this topic. When we click into security, we can see the key themes that make up this particular topic, where 87.8% of the comments are about doors. This is a huge issue compared to the other themes. 

In the Wordnerds platform you can go into your dataset and use the filters to select security and doors to dive into all your tenants verbatim. Here we can see “door isn’t closing” is coming up frequently and if we click on that we can see all the customers verbatim around that particular issue… 

“Living with severe anxiety has made dealing with these property issues unbearable … The bedroom door won't close properly either, which makes me feel unsafe. I've spent countless hours trying to get this sorted - it's emotionally draining.”

By doing month-on-month theme analysis, you can track significant increases and decreases, identifying potential issues and emerging trends in your data sets.

Key drivers of sentiment within the data set

Notably, mentions of security showed the fifth-largest increase in the most recent month of data. A surge in security mentions is also evident in July, and this significant rise warrants further investigation to pinpoint the cause.

Volume over time of key themes

While the increase in this dataset likely correlates with a general rise in July comments, a real world dataset analysis would involve deeper exploration to uncover any underlying factors.

In the co-pilot data analysis we discussed accessibility. Disabilities is a smaller issue within the data set, with only 376 mentions of disabilities & accessibility. However, it is a negative issue and sensitive issue with a sentiment of 39. A concerning 21% of these comments report that accessibility problems cause or worsen health issues. Key problem areas include damp and mold (46%), kitchens (34%), and doors (20%). These areas represent significant challenges for tenants who already struggle with daily tasks…

“My health conditions make it difficult to manage with this faulty front door. I can barely push it open as it's so stiff, and my arthritis is getting worse trying to handle it. I've been waiting for nearly two months for someone to fix this. At 76, I shouldn't have to struggle like this with basic home maintenance.”

While Co-pilot provided surface-level insights, specialist tools like Wordnerds integrate directly into Power BI, enabling you and your teams to drill down into the data, see exact verbatim comments, and track trends over time. This transforms feedback from a vague summary into a powerful decision-making tool.

What can we learn from this AI feedback analysis comparison?

Summarisation vs classification of customer feedback data:

There is a clear difference between the analysis of the feedback, with LLMs merely summarising the data and specialist tools having the power and ability to classify your data correctly.

- Summarisation (LLMs): Here's roughly what people are saying.

- Classification (Specialist Tools): Here's exactly how many people have each issue, how sentiment is trending, and the verbatim evidence to support the numbers.

How to confidently prioritise customer insights: 

Your customers feedback isn’t subjective, it’s data that gives you a clear snapshot of exactly what your customers are thinking and feeling at any point in time, and because we can measure it, it can be right or wrong. If your analysis is wrong then that can have a huge impact on your team and ultimately your customers. 

We saw above how confidently LLMs can give us answers that aren’t correct, and we even see that in the first response back from co-pilot there is a warning label that ‘AI-generated content may be incorrect’.

We think that Benedict Evans summed it up perfectly in his blog ‘are better models better’.

“There is a broad class of tasks that we would like to be able to automate, that’s boring and time consuming and can’t be done by traditional software, where the quality of the result is not a percentage, but a binary. But for some tasks, the answer is not better or worse: it's right or not right. If I need something that does have answers that can be definitely wrong in important ways, and where I’m not an expert in the subject, or don’t have all the underlying data memorised and would have to repeat all the work myself to check it, then today, I can’t use an LLM for that at all. -  Benedict Evans “Are better models better?” @ ben-evans.com 


When we’re dealing with something as important as customer feedback, it’s therefore crucial that CX teams are using tools that allow them to understand and explore why specific numbers are the way they are and give evidence to support that.

That enables CX professionals to confidently communicate their findings, build the case for change and really have an impact on their customers who need it most. 

Democratising your data across the business is important:

One of the biggest challenges for CX teams isn’t just analysing customer feedback—it’s making sure those insights reach the right people across the business. Too often, customer insights get siloed, leaving other teams in the dark about the real issues customers are facing.

Having customer feedback data readily available in business intelligence platforms like Power BI, means everyone in the business can connect and visualise data from multiple sources. Insights become accessible to decision-makers in every department.

When customer insights are democratised in this way, it means you can anticipate customer needs, align teams around data-driven priorities, and drive meaningful changes in real time.

The magic with AI  in customer feedback analysis really happens when the right people have access to the insights they need to take action.

Human-AI collaboration is the way forward: 

CX is first and foremost a human field of work, with humans working to make other humans happier. With this in mind, AI will never be able to replace humans and the work they do in CX, but when used in the right way it can be the best tool, helping you to easily find the answers you need so you can prioritise on action rather than relying on gut instinct.

Where does that leave CX teams with feedback analysis?

Ultimately, the difference between LLMs and specialist tools comes down to reliability. While tools like Co-pilot can provide quick summaries, they lack accuracy and depth. Specialist tools, ensure that customer feedback is analysed correctly, giving CX teams the confidence to act on the data. When customer insights drive real business decisions, that confidence is invaluable.

If you and your CX team are relying on generic AI tools for feedback analysis, it might be time to rethink your approach. Having the right tools and tech stack means you can go from guessing what your customers need to actually taking confident steps to improve your customer experience. 

Related blogs

Join our CX Newsletter