latest

How to use AI in innovation evaluation
– without losing the human craft

How to use AI in innovation evaluation <br>– without losing the human craft
7th January 2026 about a 4 minute read

AI is moving fast. In evaluation, that’s exciting – because so much of our work involves making sense of complex information, spotting patterns and translating evidence into practical decisions. But it’s also a moment where it’s easy to over-promise.

At Future Care Capital, we’re enthusiastic about what AI can support in evaluation. We’re also clear-eyed about what it currently can’t do well (and what it shouldn’t do at all).

What AI can do well (when it’s used properly)

Used thoughtfully, AI can reduce friction and free up evaluator time for the parts of the job that genuinely require human judgement, relationship-building and contextual understanding.

1. Speeding up ‘first draft’ work

Evaluation starts with a lot of drafting: candidate theories of change, topic guides, early outcome frameworks, indicator menus, plain-English summaries and options for survey items. AI can help generate a starting point – a draft we then refine with stakeholders and align with context, feasibility and ethics.

2. Supporting sensemaking across large volumes of text

AI can help with early-stage organisation of qualitative material: surfacing candidate themes, grouping similar responses, or identifying recurring concepts that a human analyst then tests, refines and challenges. This can be particularly helpful in developmental evaluations where learning cycles need to be rapid.

3. Making outputs more usable

AI can support the production of tailored outputs for different audiences (commissioners, frontline teams, boards, communities), while keeping the underlying evidence base consistent. This might include creating multiple versions of findings (e.g., a two-page brief, a slide summary, a narrative case study) or improving accessibility through clearer language and structure.

4. Improving the ‘operations’ of evaluation

Some of the most reliable wins are practical: turning meeting notes into action logs, structuring coding frameworks that can be iterated, drafting template materials, and supporting project management. These create headroom for deeper engagement and better interpretation of findings.

What AI shouldn’t do (and why that matters)

The risk with AI in evaluation is not that it’s useless. The risk is that it can look convincing while being wrong, biased or inappropriate – especially if outputs are treated as ‘answers’ rather than drafts.

1. AI can hallucinate

AI tools can confidently produce statements, references, themes or ‘insights’ that do not exist in the data. In evaluation – where credibility is everything – this is a serious issue. Our stance is simple: AI output is never treated as evidence. Evidence comes from agreed data sources, transparent methods and human verification.

2. AI can reproduce bias and flatten context

Evaluation is rarely just technical; it is social and contextual. AI can amplify dominant narratives, underplay marginalised experiences and smooth out nuance. This is where human evaluators are needed: noticing what isn’t being said, testing assumptions and interpreting findings in context.

3. AI struggles with causality and mechanism

Evaluation often asks ‘what changed, for whom, in what contexts, and why?’. AI can help organise information related to those questions, but it cannot reliably infer causal mechanisms. That’s why theory-driven approaches, triangulation and stakeholder sensemaking remain central.

4. Data governance and confidentiality aren’t optional

Many evaluations involve sensitive information: patient or service-user narratives, safeguarding concerns, commercially confidential material or internal organisational dynamics. Dropping raw data into generic AI tools is often unacceptable. Sometimes the right answer is simply: we don’t use AI on that dataset.

5. Over-reliance can damage capability

If AI becomes a crutch, teams can lose confidence in their own analytic reasoning and writing. Good evaluation is a craft. We want AI to augment evaluator capability – not erode it.

How FCC uses AI

If you approach us for evaluation support, here’s what you should expect:

· We use AI to augment – never to replace – robust evaluation methods.

· We are explicit about where AI is used and what checks are applied.

· We keep the ‘human work’ front and centre: trust-building, facilitation, interpretation, ethics and accountability.

The highest-value parts of evaluation are often the least automatable: building trust with stakeholders and communities, eliciting insight through skilled interviewing and facilitation, navigating implementation realities, and translating findings into decisions – not just documents.

We’re optimistic, but not starry-eyed. If you’re considering an evaluation and you’re curious about where AI can genuinely add value (and where it shouldn’t be used), we’re happy to talk it through and build an approach that is efficient, proportionate and trustworthy. Get in touch with Andy Jones at andy@futurecarecapital.org.uk.