All articles
Differential DiagnosisAIClinical Reasoning

AI Differential Diagnosis in Physiotherapy: How It Actually Works

A clear explanation of how AI differential diagnosis works for physiotherapists — what the model is actually doing, where it helps, where it fails, and how to use it without losing clinical rigour.

20 April 2026 5 min readBy The Oris Team

Differential diagnosis is one of the hardest things physiotherapists do. It's also one of the most cognitively expensive — which is why, on a tired Thursday afternoon, even experienced clinicians tend to anchor on the first plausible diagnosis and move on.

AI differential diagnosis tools promise to fix that. Some do. Some make it worse. This is a short, honest explanation of how they work and how to use them well.

What "AI differential diagnosis" actually is

At its core, an AI differential is a ranked list of likely diagnoses produced by a large language model (LLM) from your subjective and objective findings. For each candidate, a good tool will give you:

  • The diagnosis
  • A confidence or likelihood ranking
  • The findings that support it
  • The findings that argue against it
  • Any red flags to rule out

It's the same structure you'd use yourself on a whiteboard — just faster, and with fewer omissions.

What the model is actually doing

Under the hood, a current-generation LLM (Gemini 2.5 Flash, Claude, GPT-4.x, and similar) reads your input as text, compares it against the clinical reasoning patterns it learned from its training data, and generates a structured ranked output.

Three properties matter for a clinician:

  1. It's pattern-matching, not diagnosing. It's surfacing likely patterns — not running a physical exam.
  2. It doesn't suffer from decision fatigue. The 4pm patient gets the same quality of reasoning as the 9am one.
  3. It has no stake in being right. It will happily list an uncomfortable differential you might skip over.

That last point is the one most clinicians underestimate.

Where AI differential genuinely helps

1. Surfacing differentials you'd rank lower

On a typical musculoskeletal case, a clinician will form a working hypothesis within 60–90 seconds. Everything after that tends to confirm the hypothesis. A good AI differential will show you the second, third, and fourth plausible diagnoses you might have deprioritised — and that's often where missed diagnoses hide.

2. Red-flag screening without fatigue

Red flags are exactly the kind of task where "I've asked this 400 times this year" turns into "I didn't ask this time." AI tools don't get bored of the screen. A well-built tool will flag missing red-flag data before producing a differential at all.

3. Atypical presentations

The tricky cases — young athlete with an "innocent" knee, older patient with cervical symptoms that don't fit — are where pattern memory is thinnest. An LLM has been exposed to many more atypical presentations than any individual clinician. It's particularly useful as a second opinion here.

4. Early-career physiotherapists

Junior clinicians do not have the pattern library senior clinicians have. AI differential gives them a structured comparison to reason against — without asking a senior for every case.

Where AI differential fails

1. When the input is thin

Garbage in, garbage out. An AI differential built from vague subjective reporting and no objective findings is a guess dressed up as a list. The input quality ceiling is the output quality ceiling.

2. When the clinician defers entirely

The moment you stop interrogating the output and just take the top item, the tool has become worse than useless — it's made you slower and more anchored. The only safe way to use AI differential is as a prompt for your own reasoning.

3. When red flags are absent from input

An AI can only flag red flags it's been told about. If you didn't ask about night pain, neurological symptoms, or systemic features, the tool can't know. Many tools now require red-flag data before they'll produce a differential — this is a good feature, not a friction.

4. When the clinical context is niche

Very specialised populations (elite athletes, paediatrics, complex neuro) have narrower pattern libraries. AI tools do fine here, but the clinician's lead is stronger.

A good clinical workflow with AI differential

Here's the workflow we see work for most physiotherapists:

  1. Capture a full subjective. Don't short-cut this to "save time." The best output comes from the best input.
  2. Complete your objective assessment. Include what you ruled out — that information is as valuable to the AI as what you ruled in.
  3. Form your own working hypothesis first. Write it down (even just a word). Then run the AI.
  4. Compare the AI differential to yours. Where do they agree? Where do they disagree? The disagreements are where the learning happens.
  5. Take clinical responsibility for the final answer. Your name goes on the note. Your reasoning is what matters.

A common failure mode to avoid

There's a subtle trap in how these tools are often used: clinicians treat the AI differential as a checklist to confirm, not a second opinion to reason against. Same output, completely different clinical value.

If you find yourself agreeing with the AI on every case, you've stopped reasoning. If you find yourself disagreeing on every case, you've stopped trusting. The right state is somewhere in between — most of the time you'll agree, and the times you don't will be the ones that teach you something.

What to look for in a tool

If you're evaluating AI differential tools specifically for physiotherapy:

  • Ranked, not single. Single-answer tools are dangerous.
  • Reasoning exposed. You should see why each diagnosis is ranked where it is.
  • Red-flag first. A tool that runs a red-flag screen before producing a differential is safer.
  • Physiotherapy-literate. A general medical AI is not a physiotherapy AI — the MSK reasoning patterns are different.
  • Editable. You need to be able to override, re-rank, or reject — and have that saved alongside the AI output.
  • Fast. If it takes 30 seconds per run, you'll stop using it by Thursday. Under 3 seconds is the working threshold.

If you want to see it in action

Oris produces a full ranked differential with reasoning, a red-flag screen, and a treatment plan in under 3 seconds. Free Starter plan, no credit card.

Run it on a case where you've already formed a hypothesis. See whether you agree. That's the most useful way to evaluate any AI differential tool.

Start your trial

Describe the patient — Oris hands back a reasoned differential, red-flag screen and personalised treatment plan in under three seconds. Free Starter forever, 14-day Pro trial included — no card.

Try it free

Related articles