The Art and Science of Translating Clinical Guidelines for AI: Hard Lessons from the Trenches

By
The Art and Science of Translating Clinical Guidelines for AI: Hard Lessons from the Trenches

It’s never a good sign when sticky notes are a vital part of your policy enforcement process. 

From utilization management to payment integrity, the healthcare industry has long grappled with applying an absurd volume of policies and guidelines. It’s so overwhelming, in fact, that sticky notes, with handwritten reminders and special instructions, are actually still used by billion dollar companies today. “But AI is going to fix all that!” Well, kinda. 

Data Informs AI 

AI-driven solutions need guidance in the form of instruction. You’d think a 3000-page contract would do the trick, but it doesn’t. All of the paper clinical guidelines and policies healthcare has generated over the past 2000 years are great for manual processes but they don’t work so well with the latest AI solutions. Why? Because most documents contain a crazy amount of excess or vague information that overcomplicates the task at hand. 

Having spent years at the intersection of medicine, policy, and AI development, I’ve witnessed both successes and setbacks. Cutting edge software only works if the content it’s ingesting is well designed. This design requires a little bit of art and science, and maybe a touch of magic. Let’s review some lessons to live by when it comes to teaching AI policy.

Lesson 1: Clinical Guideline Language Isn’t Straightforward

Here’s an example of “lost in translation.” During the COVID-19 pandemic, in an attempt to provide multilingual information rapidly, the Virginia Department of Health employed AI translation tools to convert English content into Spanish. 

Unfortunately, the AI translated the phrase “the COVID-19 vaccine is not mandatory” to “the COVID-19 vaccine is not necessary,” leading to widespread confusion among Spanish-speaking residents. 

For another example, let’s say you’re reviewing whether or not a patient had an acute kidney injury (AKI). The guideline may require that a nephrology consult, “if present,” agrees with the diagnosis. “If present” is the key here. It means that a nephrology consult is helpful but not required by the guideline. These “what if” scenarios become a hindrance rather than a help when it comes to AI. A human reviewer, with experience, can easily interpret that phrasing, but an AI system wouldn’t be able to pick up on the nuance. This is why translating clinical guidelines into AI-compatible formats requires a high skill level—understanding not just the guideline’s intent but also how AI interprets and applies rules at scale.

These instances underscore that AI systems can misinterpret policy language, resulting in misinformation and public distrust. 

Lesson 2: AI Can’t Take Guidelines at Face-Value—It Needs Context

Interpretation that ignores context can be as dangerous as mistranslating words. Think about pediatrics, for example. In medical training we learn that children are not just small adults. Their circumstances and scenarios may be far more nuanced than adult cases, and AI hasn’t quite mastered this context. 

As Live Science explains, in a study published by JAMA, “researchers ran 100 patient case challenges sourced from JAMA Pediatrics and The New England Journal of Medicine (NEJM) through ChatGPT, asking the chatbot to ‘list a differential diagnosis and a final diagnosis.’” The results? “ChatGPT provided incorrect diagnoses for 72 of the 100 cases, with 11 of the 100 results categorized as “clinically related but too broad to be considered a correct diagnosis.”

In one of the cases incorrectly diagnosed, a teenager with autism presented with a rash and joint stiffness. While the initial physician diagnosed the teen with scurvy, a condition caused by a severe lack of vitamin C, ChatGPT’s diagnosis was immune thrombocytopenic purpura—an autoimmune disorder.

The context ChatGPT didn’t have? People with autism can have very restrictive diets because of sensitivities to food textures or flavors, making them more prone to vitamin deficiencies. ChatGPT was thinking the hoof beats were from a horse when, in this case, with the necessary context, they were actually from a Zebra. 

Lesson 3: AI’s Interpretation of Policy Must Be Continuously Monitored and Tuned

Even when AI systems are correctly implemented initially, policies evolve, and clinical practices change. AI models must be updated to reflect these shifts to remain effective.

A pertinent example involves a top payer’s use of an AI algorithm to manage prior authorizations for post-acute care services. Between 2019 and 2022, the company experienced a significant increase in denial rates for services like skilled nursing facilities, with the denial rate rising from 8.7% to 22.7%. 

This surge coincided with the deployment of the algorithm, which was intended to streamline decision-making but instead led to inappropriate denials of necessary care. The algorithm’s rigid application, without continuous monitoring and adaptation to evolving clinical guidelines, resulted in patient care delays and increased administrative burdens due to the necessity of appeals.

This case underscores the importance of continuous policy monitoring and real-time updates in AI-driven systems. Without a governance framework to ensure AI adapts to evolving clinical standards, even well-designed automation can become obsolete and harmful.

The Future: Blending Human Oversight with AI Scalability

The solution to these challenges is to approach modern solutions with a profound understanding of clinical guideline translation. Successful AI deployment requires:

  1. Expert-Driven Policy Engineering: AI developers must collaborate with clinicians and policy experts to map decision logic that mirrors real-world medical judgment.
  2. Contextual AI Models: Instead of rigid rule-based systems, AI should employ machine learning and natural language processing to interpret clinical documentation contextually.
  3. Continuous Validation & Governance: AI models must undergo regular audits, updates, and tuning to align with changes in regulatory and legal policies and clinical guidelines.

The recent acquisition forming the new Machinify exemplifies how important it is to acknowledge the complexities of clinical guideline translation. Rather than copying and pasting existing clinical guidelines into algorithms, Machinify programs the software to understand critical context and imposes feedback loops. By integrating advanced AI capabilities with deep industry expertise, Machinify is now poised to revolutionize payer operations, leading to more accurate claims processing and reduced administrative burdens.

While AI holds the potential to transform healthcare administration, realizing this potential necessitates respecting the intricacies of clinical guideline translation and the subtleties of clinical decision-making. Through thoughtful integration of human expertise and AI technology, the healthcare industry can achieve greater efficiency and improved patient outcomes.

To learn more about how Machinify interprets clinical guidelines and how it can help payers streamline processes, contact us.