At DeepScribe, we understand the critical importance of accurate and reliable medical documentation for healthcare practitioners. Our fully-automated medical documentation services are designed to alleviate the administrative burden on healthcare professionals, allowing them to focus more on patient care. To ensure the highest quality of our AI-generated documentation, we employ several comprehensive quality management systems throughout our organization, including Golden Notes, Development Rates, and Qualitative Evaluations. 

The Challenge of Quality Management

Quality management in medical documentation is a challenging process. Medical documentation, like any form of writing, is inherently subjective. Each healthcare practitioner has their own preferences for how they want their documentation to look and sound. Furthermore, there is often debate over whether certain pieces of information are relevant enough to be included in a note, and certain pieces of information are often deemed “more important to include” than others. Given these intricacies, it is nearly impossible to assign a quantitative score to a medical note that accurately reflects its quality. This complexity is compounded when building a quality management system that must cater to the diverse needs of healthcare practitioners and organizations across the nation.

Modern quality control is further complicated by having a mix of human and AI support, so to execute a comprehensive quality management strategy effectively, it is essential to implement a multi-faceted approach that addresses all aspects of quality control. Frontline quality control focuses on managing the quality of those "on the frontlines" — the expert human reviewers responsible for overseeing our quality control workflows. Development quality control proactively manages the quality of new releases, ensuring that any updates to our AI algorithms meet our stringent standards before they are deployed. Production quality control, on the other hand, assesses and reactively manages the quality of notes that are currently being processed through the DeepScribe system. By addressing quality management from these different angles, we can ensure a holistic and effective approach to maintaining the highest standards of medical documentation.

Frontline Quality Control: Golden Notes

Golden Notes serve as the foundation of DeepScribe’s quality management system, acting as a form of “quality control for our quality control processes'' by evaluating the quality and effectiveness of expert human reviewers. This approach involves presenting our expert human reviewers with a medical note that contains a set of known defects, such as mis-transcriptions, missing information, or incorrect details. These notes are carefully crafted and calibrated to test the reviewers' ability to identify and correct these errors, ensuring they maintain the highest standards of accuracy and attention to detail.

Our expert human reviewers, responsible for overseeing various quality management activities such as Development Rates and Qualitative Evaluations, are evaluated through their performance on Golden Notes. Each reviewer's work on these notes is meticulously assessed by a senior reviewer against an established answer key. This rigorous process not only ensures the consistency and precision of our human-in-the-loop quality management system but also plays a crucial role in the continuous improvement of our overall quality management practices.

Development Quality Control: Rates

Our multi-tiered rates system is a methodical and granular evaluation process that assesses the quality of AI-generated medical documentation across various dimensions. Rates is primarily used as development quality control–it tells DeepScribe whether changes to our algorithm are effective enough to be released, or if we should avoid releasing due to an increased risk of defects. There are currently 3 distinct rates processes at DeepScribe: (1) pulse rates; (2) panel rates; and (3) statistical rates.

Every note undergoing any form of rates evaluation goes through a comprehensive preparation process, essentially creating an "answer key" for evaluation. First, our AI models generate a detailed note. Next, at least two expert human reviewers assess and correct any inaccuracies in the note. We then break down each note into fundamental components called "entities." These entities are granular pieces of information that our models are expected to include accurately. For example, the sentence "The patient is a 42-year-old male presenting for an annual wellness visit" contains three entities: (1) "42-year-old," (2) "male," and (3) "annual wellness visit." We then review the transcription associated with the note and identify any errors. Errors in the transcript can lead to downstream errors that show up in the final medical note. This process is repeated for numerous notes and compiled into a test set of diverse patient encounters that broadly represents the range of possibilities that the AI will encounter in real-world applications. After this process, the test set is ready to be evaluated across our three types of evaluative rates:

  1. Pulse Rates represent our most efficient and holistic evaluation process. It involves a small set of test notes reviewed by expert human reviewers who highlight areas needing attention and leave detailed comments on observed issues. This rapid feedback mechanism is crucial for iterating and improving our AI models in the early development stages. 
  2. Panel Rates offer a broader perspective on note quality by aggregating the opinions of multiple human reviewers. Each reviewer grades each note in the test set on several quality categories, including Clinical Accuracy, Writing Quality, Organization, and Completeness. By gathering a consensus view of each note, we can quickly collect meaningful and actionable qualitative information that helps us refine and improve our AI models for better accuracy and reliability.
  3. Statistical Rates provides specific, quantitative feedback on the quality of a note by highlighting individual defects and assigning them a severity level. This detailed analysis helps guide development decisions by pinpointing areas for improvement. Each defect is evaluated based on its impact on the note’s integrity and potential safety or legal risks. Statistical Rates allow us to establish quality metrics and thresholds that must be achieved to enable a new release. For example, we can track medication mismatches and require there to be no more than a specified number of such defects per thousand notes. 

To classify defects in the notes, we employ DeepScribe's proprietary Medical Edit Data Insights & Classification (MEDIC) Taxonomy, which assigns specific defect types to issues identified during the evaluation. This taxonomy is an integral part of our quality management system, providing a standardized way to categorize and address defects. Stay tuned for a separate article detailing the defect types in the MEDIC Taxonomy.

Rates act as proactive quality control, akin to clinical trials for medications. They allow us to test AI models and ensure updates meet high standards before release, maintaining accuracy and reliability in medical documentation. Our rates system identifies regressions in AI-generated documentation quality, and if significant regressions are detected, we may halt a release. The choice of rates—Pulse, Panel, or Statistical—depends on factors like the change's scope, development stage, and assessment goals, ensuring the right balance between detail and speed in feedback.

In summary, our multi-tiered rates system is an essential tool in our quest to continuously improve the quality and effectiveness of our AI-generated medical documentation. By carefully selecting the appropriate rates process and conducting thorough evaluations, we can confidently advance our technology and uphold our commitment to excellence in healthcare.

Production Quality Control: Qualitative Evaluations

DeepScribe’s qualitative evaluations are essential in refining our AI models after release. While rates focus on development decisions, such as whether or not a particular update to our algorithms should be released, qualitative evaluations serve as broad, continuous quality control. Development rates simply can’t catch everything–the nature of large language models is that no matter how much testing is done, there will always be some form of anomalies that happen when the model is applied more broadly. Qualitative evaluations allow us to catch production anomalies, report against them, and identify problems to solve in the next iteration of the model

To conduct Qualitative Evaluations, expert human reviewers assess medical notes generated by our AI model, which are either randomly selected or added to the review pool by our Customer Success team based on feedback from practitioners using DeepScribe's services. They review the note in detail, again leveraging the MEDIC taxonomy, following a structured auditing methodology, capturing any defects with the notes or transcripts. All of these defects are stored in a secure database and are analyzed weekly by our quality, AI, and Product team to align on emerging issues and determine what actions should be taken to address them.

At DeepScribe, we are committed to revolutionizing medical documentation through our advanced AI technology. Our comprehensive quality management systems, including Golden Notes, Development Rates, and Qualitative Evaluations, exemplify our dedication to accuracy, reliability, and continuous improvement. By meticulously evaluating and refining our AI models, we ensure that healthcare practitioners and organizations can rely on our documentation services to enhance patient care and streamline administrative processes.

As we move forward, we will continue to innovate and adapt our quality management strategies to meet the evolving needs of the healthcare industry. In addition to these “traditional” methodologies, DeepScribe is actively working on formalizing quality management programs for streams specific to AI applications such as AI Ethics and Bias Reduction & Mitigation.

Through our unwavering commitment to excellence, DeepScribe aims to set new standards in medical documentation, empowering healthcare professionals to focus on what truly matters: their patients' health and well-being.

Related stories


Enhancing Medical Documentation with DeepScribe's Quality Management Systems

DeepScribe enhances medical documentation through comprehensive quality management systems, including Golden Notes, Development Rates, and Qualitative Evaluations, ensuring accuracy and reliability in AI-generated documentation for healthcare professionals.

When Physician Depression Goes Undetected: How AI Can Help Those Who Suffer Unknowingly

Check how artificial intelligence in healthcare could aid physicians not only patient care but in caring for themselves as well.

You've Been Served: Medical Documentation Downfalls

Medical errors are often due to poor or incomplete patient exam documentation. Learn how AI dramatically improves patient outcomes and lowers provider risk.

Realize the full potential of Healthcare AI with DeepScribe

Explore how DeepScribe’s customizable ambient AI platform can help you save time, improve patient care, and maximize revenue.