Tools & How-To

How do you make sense of thousands of open-ended survey responses?

Thematic analysis is how you turn a thousand open-ended comments into themes you can trust. Doing it the same careful way from the first comment to the thousandth is where it gets hard, and where most analyses quietly come unstuck.

A panicked cartoon office worker buried under a heap of printed survey responses, only their head and feet showing, as a printer on the desk keeps spilling out more

Written by Stella Dooris

Created 3 July 2026 · Last updated 10 July 2026

Every big survey ends the same way. The responses close, the export lands, and you're looking at a column of free text thousands of rows deep, every line a customer who took the time to tell you something. Somewhere in that pile is the story your board needs. The job is to get it out without spending the next month reading.

How do you make sense of thousands of open-ended survey responses?

You group them into themes with thematic analysis: read each comment, tag what it's about, then gather those tags into a handful of named, defined themes. It's a well-worn method (Braun and Clarke wrote it up in 2006) and it hasn't changed much since. What a thousand-plus responses change is how hard it becomes to apply that method evenly, comment after comment after comment, right to the end.

Wordnerds turns what customers say into what organisations do, and most of the analysts we work with could code feedback in their sleep. They're not stuck on method. They're stuck on volume: a thousand comments is a fortnight of careful reading, ten thousand is most of a quarter, and the survey lands again next month. Nobody's really asking "which technique?" They're asking "how do I do this properly when there's this much of it?".

What actually breaks when the responses pile up?

In the analyses we run, the problem is rarely running out of time. It's staying consistent: judging the thousandth comment by the same rules you used on the first. Thematic analysis is only as good as the coding frame behind it, and a frame applied by a tired human at comment 900 rarely matches the one they used at comment 20. Researchers even have a name for it: coder drift, tracked with a reliability score called Cohen's kappa. When that score slips, it's telling you the definitions have moved without anyone deciding they should.

And the real stakes here have nothing to do with deadlines. We've sat in that room when the analysis lands in front of the leadership team. First question: "so what?" Then the one that empties the room: "how do you know?" If your honest answer is "trust me, I read them all," it won't matter how many late nights went into it. Reading every comment feels rigorous, but past a certain volume nobody, you included, can retrace exactly how a given theme was decided.

Can't you just paste them into ChatGPT?

You can, and plenty of teams do. The catch is reliability: ask a general-purpose LLM to code the same text twice and you'll often get two different answers. One study (Tai and colleagues, 2024) found the output only settled down after around 40 controlled repetitions of the same task. Paste your thousand responses in once and you get themes that look convincing but can't be reproduced, audited, or explained to a sceptical board.

We've all done the 11pm paste-and-pray, and there's no shame in it. But a one-shot prompt quietly throws away the one thing even a spreadsheet gave you: a trail you can follow back. For an idle curiosity, fine. For a regulated report, or anything a colleague might challenge, it's a bad swap. The version of AI worth having here works inside a frame you've set: it shows its working, automates the grind, and keeps the trail intact from start to finish.

Can you still do it in Excel or Google Sheets?

Yes, and at low volume it's the right tool. For a few hundred responses coded by one person in one sitting, a spreadsheet is honest, cheap and easy to check over, and we'd point you straight at one.

The moment to move on is a practical one: a second coder joins and, entirely reasonably, reads a theme a little differently from you, or the volume simply outgrows one person. That's the consistency problem from earlier turning up in a spreadsheet, and no clever formula touches it, because what's drifting is human judgement, not the software.

The three routes people take, and where each one gives out:

Approach	By hand / Excel	One ChatGPT pass	Frame-first + transparent AI
Handles the volume	A few hundred	Any, in seconds	Any, at scale
Stays consistent	Drifts as you tire	Different answer each run	Same frame, first comment to last
Reproducible & auditable	Partly	No	Yes
Defensible to a board or regulator	Hard	No	Yes
Best for	Small one-off analyses	Idle curiosity	Regulated or recurring reporting

What does a workflow that survives the volume look like?

The workflow we run on every dataset that lands with us puts the frame first. Code on the fly and your frame quietly forms and re-forms as you tire; decide it up front and you can hold the same rules across a thousand responses or a hundred thousand. Four steps:

Build the coding frame first. Draft your themes and their definitions before you touch a single comment, in your own words, and fold in the vocabulary you already answer to (Awaab's Law, complaint categories, journey stages). This is your quality-control layer, and it's where the real thinking lives.
Pressure-test it on a sample. Code a couple of hundred responses against the frame and hunt for the places it goes fuzzy. Tighten the definitions until two people would tag the same comment the same way.
Scale the frame, don't reinvent it. Apply the settled frame to the whole dataset. This is where transparent AI earns its keep: a structured pipeline (raw comments → structured codes → a semantic model) applies your definitions to comment 100,000 exactly as it did to comment one, with none of the 3pm drift.
Keep the trail. Every theme should lead straight back to the comments underneath it and the definition that caught them. That's what turns "trust me" into "here, look."

How do teams in regulated sectors prove their feedback findings to a regulator?

They lead with the trail, because the regulator asks to see it. Whether it's a housing association under the TSMs, a water company answering to Ofwat, or a bank under the FCA's Consumer Duty, the bar is the same: you can't simply report a satisfaction score, you have to show the customer voice underneath the number, and show it traceably. A coding frame anyone can open and inspect is the line between evidence and assertion.

This is the sharp end of the volume problem, and where we spend most of our days. Our Wordnerds × Housemark Social Housing Benchmarking Report 2026 read 135,000-plus tenant comments across 18 housing associations against a single frame, the sort of scale where a spreadsheet hasn't a hope of holding its definitions still. We'd say transparent AI is the answer, wouldn't we. But the reason it actually holds up in a regulated setting is that the frame is built with your analysts and stays out in the open, so what comes out is something a board or a regulator can inspect, not a black box. And the payoff isn't only a clean regulatory return: once your themes are consistent enough to trust, you can act on the right things and watch the numbers move, one association we work with, bpha, cut repeat complaints by 32% once it could act on what its feedback was really telling it.

So what should you actually do first?

Write the frame before you write a single code. It's the first thing we set up on any project we take on. Whatever your volume or your tool, the one move that improves a qualitative analysis most is deciding your themes and their edges up front, then holding to them as you go.

Do that, and the next time someone asks how you landed on those themes, you can open a frame anyone can read rather than defend a spreadsheet only you understand. Three responses or three thousand, the method doesn't change. The only thing that changes at volume is whether you can hold it steady, and now you can. And once you can trust your themes, the next question is what you actually do with them.

Frequently asked questions

What's the best method for analysing open-ended survey responses?

Thematic analysis is the standard: code each response, then group the codes into named, defined themes (Braun & Clarke, 2006). Coding can be inductive (themes emerge from the data) or deductive (you start from a predefined frame). Most real analysis blends the two, and applying it consistently matters more than the method itself.

How many open-ended responses can you analyse by hand?

Realistically a few hundred, coded by one person in one focused sitting. Beyond that, consistency (not time) becomes the limit: fatigue and definition-drift mean later responses get judged by a subtly different frame than earlier ones, which undermines how defensible the findings are.

Can ChatGPT analyse open-ended survey responses?

It can code text, but a single pass isn't reproducible: the same prompt gives different results on different runs, and output only stabilises across many controlled repetitions (Tai et al., 2024). For anything you'll need to defend, use AI inside a transparent, auditable frame rather than as a one-shot black box.

What is a coding frame?

A coding frame is the set of themes and their definitions you apply to every response: the rulebook that decides what counts as what. Building it before you code (definition-led), rather than letting it form as you go, is what keeps analysis consistent and lets a second person reach the same result.

How do you make qualitative analysis defensible to leadership or a regulator?

Keep an audit trail. Every theme should trace back to the comments behind it and the definition that captured them, so the finding can be inspected rather than taken on trust. For UK regulated sectors like housing, this traceability is what the TSM and Consumer Standards effectively require.

Do you need specialist software to analyse open-ended responses?

Not at low volume: a spreadsheet is fine for a few hundred responses coded by one person. You need a dedicated approach once volume outlasts one person's consistency or a second coder joins, because the challenge then is applying one frame at scale, transparently, which is what a Voice of Customer platform integrated into Power BI is built for.

How do you make sense of thousands of open-ended survey responses?

What actually breaks when the responses pile up?

Can't you just paste them into ChatGPT?

Can you still do it in Excel or Google Sheets?

What does a workflow that survives the volume look like?

How do teams in regulated sectors prove their feedback findings to a regulator?

So what should you actually do first?

Frequently asked questions

Analysing more feedback than you can code by hand?

Read more from the blog