MI Fidelity and MITI Coding: How MI Practice Is Measured

MI fidelity refers to the degree to which a practitioner’s actual sessions adhere to MI’s defined skills and Spirit. It matters because the research evidence for MI’s effectiveness is conditional: high-fidelity MI shows strong outcomes; low-fidelity MI — sessions that use the label but not the method — performs little better than no intervention at all.

This guide explains MI fidelity, the MITI (Motivational Interviewing Treatment Integrity) coding system that’s the field’s standard measure, the thresholds that distinguish beginner from competent from proficient, and how to use coding feedback as a self-improvement tool rather than a credentialing hurdle.

Why Fidelity Matters

MI is one of the most-studied behaviour change methods of the past four decades, but the effect-size estimates vary widely across studies — and a substantial portion of that variation is fidelity-related. Studies that include independent fidelity ratings consistently show that sessions coded as high-fidelity produce better client outcomes than sessions coded as low-fidelity, even when both were delivered by clinicians who self-reported using MI (Magill et al., 2018, JCCP; Forsberg et al., 2010, Cognitive Behaviour Therapy).

The honest version: clinicians overestimate their own MI fidelity. Self-rated MI use correlates poorly with externally coded MI fidelity. This is the central reason MI training programmes invest so heavily in audio review and structured coding — the practitioner’s internal sense of “I did MI today” is not reliable on its own.

MITI 4.2.1 — The Standard System

MITI (the Motivational Interviewing Treatment Integrity coding instrument), currently in version 4.2.1 (Moyers et al., 2016), is the most widely used MI fidelity measure. It evaluates a 20-minute audio segment along two layers:

Global scores (1–5 scale)

Four global ratings capture the overall character of the session:

Cultivating Change Talk — extent to which the clinician evoked and reinforced change talk.
Softening Sustain Talk — extent to which the clinician avoided amplifying sustain talk and reduced its frequency over the session.
Partnership — extent to which the clinician treated the client as the expert on their own situation, conveying collaboration rather than expert/patient asymmetry.
Empathy — extent to which the clinician demonstrated accurate understanding of the client’s perspective.

Global scores are subjective by design; coder calibration is essential.

Behaviour counts

The MITI also counts specific OARS and adherence behaviours over the 20-minute segment:

Giving Information
Persuade — attempting to convince the client.
Persuade with Permission — sharing a perspective after the client invited it.
Question — split into Open and Closed.
Simple Reflection vs Complex Reflection
Affirm
Seeking Collaboration
Emphasizing Autonomy
Confront — actively disagreeing, criticising, or labelling.

These raw counts feed several derived ratios.

Derived metrics

Reflection-to-question ratio (R:Q): total reflections divided by total questions.
Percentage open questions of all questions.
Percentage complex reflections of all reflections.
Total MI-Adherent (Affirm + Seeking Collaboration + Emphasizing Autonomy) vs MI Non-Adherent (Persuade without permission + Confront) — sometimes summarised as a single ratio.

Competence vs Proficiency Thresholds

MITI 4 publishes two-tier thresholds. A practitioner has to clear competence to be considered MI-trained; clearing proficiency is closer to expert-level practice.

Metric	Competence	Proficiency
R:Q (reflections-to-questions ratio)	≥ 1	≥ 2
% Complex reflections	≥ 40%	≥ 50%
% Open questions	≥ 50%	≥ 70%
Cultivating Change Talk (global)	≥ 3	≥ 4
Softening Sustain Talk (global)	≥ 3	≥ 4
Partnership (global)	≥ 3	≥ 4
Empathy (global)	≥ 3	≥ 4
Total MI-Adherent / Total MI Non-Adherent	n/a	≥ 90% MI-Adherent

Two notes on interpreting thresholds:

Competence is a floor, not a destination. The thresholds were set to reflect what a trained MI practitioner can do reliably. Most workshop graduates do not clear all of them on their first independently-rated session.
Improvement is uneven. Practitioners commonly clear behaviour-count thresholds (R:Q, % open questions) faster than global Spirit ratings (Partnership, Empathy). The Spirit takes longer because it depends on the embodied stance, not just behavioural frequencies.

Other Coding Systems (Briefly)

MITI is the dominant system but not the only one:

MISC (Motivational Interviewing Skill Code) — predates MITI, developed by the same Miller group. More granular (codes every utterance) and more time-intensive. Still used in some research settings.
MIPS (Motivational Interviewing Process Code) — used in some MI process research; codes both clinician and client utterances.
GROMIT — a gloss-based abbreviated system, used when full MITI coding is infeasible.

For most training and self-improvement purposes, MITI 4 is the system to learn.

How to Use Coding Feedback for Improvement

A common failure pattern: a clinician gets a MITI-coded session back, sees their R:Q is 0.7 (below the competence floor of 1.0), and resolves to “do more reflections.” Three months later their R:Q is 0.8. The intervention didn’t stick.

What works better:

Pick one metric per practice cycle

Trying to improve all eight metrics simultaneously is how you improve none of them. Pick the metric that’s both furthest below threshold and most actionable, and target it for two to four weeks of focused practice.

Use the metric as a question, not a target

“My R:Q is 0.7” → ask “what’s happening in my sessions that’s producing more questions than reflections?” The answer is rarely “I forget to reflect.” It’s usually something like: “I get nervous in long pauses and ask another question to fill the space.” That’s a different fix than just “do more reflections.”

Practise on volume, not just on the difficult cases

Practitioners sometimes save MI for high-stakes sessions and use a more directive style for routine ones. The result is that MI never becomes the default — the muscle memory doesn’t form. Run MI in lower-stakes practice contexts, repeatedly, to build automaticity.

Get sessions coded that you didn’t expect to be coded

Self-selection bias is brutal in MI fidelity. A practitioner choosing which session to send for coding often picks ones they thought went well. The picture is healthier if some of the coded sessions were chosen randomly — the practitioner’s ceiling and floor both come into view.

Limitations of Coding

MITI is the best-validated system in the field, but it’s not without trade-offs:

20-minute window: A coded segment is one slice of a longer session. Performance can vary across the session.
Audio-only: MITI codes audio, not video. Non-verbal Partnership and Empathy cues are not captured.
Coder reliability is variable: Reaching adequate inter-rater reliability requires training and ongoing calibration. A single coder’s score on a single session is noisier than people often assume.
Cost and latency: Human MITI coding can take 1–2 hours per session and £50–£200 per session at typical research rates. This is why most practitioners get coded once or twice in a workshop and then never again.

Where AI-Based Scoring Fits

Several projects (academic and commercial) have produced AI-based MI scoring systems in the last few years, including bench tests against the MITI standard. Performance varies — early systems were poor at distinguishing simple from complex reflections, for example, and global Spirit scores remain harder for AI than behaviour counts.

The honest framing: AI-based scoring is currently best understood as formative practice feedback, not a credentialing assessment. It gives you near-instant feedback, lets you re-run a scenario and see whether your behaviour-count ratios improved, and surfaces specific moments in the transcript for review. It doesn’t (yet) replace human MITI coding for credentialing purposes — and credible AI MI tools should be transparent about that distinction.

The MI Practice Lab treats AI scoring as exactly this — a fast feedback loop on practice sessions, with the client’s evidence quoted in the feedback so the practitioner can verify the reasoning rather than trusting the score blind.

Frequently Asked Questions

Do I need to be MITI-coded to call myself MI-trained?

No fixed credentialing hurdle exists in most jurisdictions. The Motivational Interviewing Network of Trainers (MINT) doesn’t require MITI scores for membership; it requires attendance at the Training of New Trainers programme. Most MI training programmes use MITI for formative assessment rather than as a pass/fail credential.

Is the MITI threshold set scientifically?

The thresholds are calibrated against the distribution of MITI-coded sessions in research samples — i.e., they describe what trained practitioners typically achieve, not a theoretically-derived absolute. They’re useful benchmarks but not magic numbers. A clinician at R:Q 1.9 isn’t materially different from one at 2.1.

Can I self-code my own sessions?

Yes, but with a large grain of salt. Self-coded MITI scores correlate poorly with externally-coded scores; clinicians tend to inflate their own ratings, particularly on global Spirit dimensions. Self-coding works better for behaviour counts than for globals, and works better when you’ve been coder-trained on the system.

How long does it take to reach proficiency?

Highly variable. Studies of MI training programmes typically show clinicians reaching competence on most metrics within 6–12 months of post-workshop deliberate practice, with proficiency taking 1–2 years. The well-documented MI “skill fade” (declining metrics in the months after a workshop without ongoing practice) means many clinicians never reach competence at all without structured between-workshop practice.

Want to track your fidelity metrics across practice sessions? The MI Practice Lab returns reflection counts (simple/complex), question counts (open/closed), Talk Ratio, and an MI Spirit breakdown across the four pillars after every session — modelled on MITI’s structure and clearly framed as formative feedback. Start a free trial — 5 minutes, no card required.