CCP Product Release Notes

About

This release notes page highlights changes, improvements, and updates available for a new version release of the Clinical Comprehensive Pipeline (CCP).

The Clinical Comprehensive Pipeline (CCP) is an NLP pipeline that recognizes and extracts clinical entities and modifiers from unstructured text across IMO’s problem/diagnosis, procedure, medication, laboratory, and clinical observation domains. Entities extracted from unstructured text are also mapped to the following standard terminologies and code systems:

Problem/Diagnosis domain: ICD-10-CM, ICD-9-CM, SNOMED CT, and IMO lexicals.
Procedure domain: CPT (Current Procedural Terminology), HCPCS (Healthcare Common Procedure Coding System), ICD-10-PCS, SNOMED CT, LOINC (Logical Observation Identifiers Names and Codes), and IMO lexicals.
Medication domain: RxNorm, NDC (National Drug Code), CVX, IMO lexicals.
Lab Domain: LOINC, CPT, HCPCS, ICD-10-PCS, SNOMED CT, and IMO lexicals.

CCP Version 3.0

What’s New in Version 3.0

Release Date: 08/20/2025

Version 3.0 introduces significant improvements in entity extraction accuracy, expanded coverage of new entity types, and alignment with external code systems.

Stronger alignment with IMO’s clinical domains and external code systems

Procedures (previously classified as treatments), labs (previously classified as tests), and medications (previously classified as drugs) are re-classified to show alignment and parity with IMO's existing clinical domains (procedure, labs, and medications).
This version incorporates the latest IMO terminology updates, ensuring extracted entities and modifiers align with our supported medical code systems.

Improved procedure and laboratory entity identification

Enhanced model training of our Named Entity Recognition and relation extraction models enables more precise detection of procedures and lab entities and modifiers, leading to higher coverage and accuracy for both.
The pipeline components were refined to improve parity, consistency, etc.

New entity types introduced: social factors, clinical observations

Social Factors are defined as: conditions in which people are born, grow, live, work, and encompassing economic policies, social norms, and political systems that influence health outcomes, including behaviors like alcohol use, smoking, and circumstances like homelessness.
Clinical Observations are defined as: measurements, assessments, or evaluations that provide insight into a patient’s health status, distinct from laboratory tests. This can include physiological measurements, imaging-derived metrics, or structured assessments (e.g. Left ventricular ejection fraction, Heart rate, etc.)

New and enhanced modifiers for higher specificity in resolution to standard codes

New contextual modifiers improve resolution of extracted entities to the most specific standardized medical code available.
New modifiers include object, approach, technique, reference range, unit, value.
- Object is a new modifier for procedures, defined as a non-body structure being inserted, implanted, removed, or manipulated during a procedure.
- Approach is a new modifier for procedures, defined as a specific method used to access and manipulate tissues or organs during a procedure.
- Technique is a new modifier for procedures, defined as the specific methods used to perform a particular procedure.
- Reference Range is a new modifier for labs, defined as the range or interval of values that are normal for a lab result.
- Unit is a new modifier for labs, defined as the standardized measurement associated with a lab result (e.g., mg/dL, mmol/L, %).
- Value is a new modifier for labs, defined as a lab result expressed as a numeric measurement, categorical label (e.g., positive, elevated), or nominal descriptor (e.g., cloudy, abnormal).

Improved handling of historical clinical information

The pipeline can now better distinguish between active and historical problems, procedures, and medications. This reduces the misclassification of past events as current, thus reducing false positives caused by incorrect assertion classification.

Improved handling of allergies to medications

Allergy-related mentions are more consistently recognized and extracted as problems instead of medications, reducing false positives caused by incorrect entity recognition.

Improved section identification for better structuring of extracted notes

Section header detection has been enhanced, allowing the pipeline to better identify and segment structured portions of clinical notes (such as “Past Medical History” or “Family History”). This improves accuracy and context in downstream entity and modifier extraction.

Accuracy Improvements in Version 3.0

CCP v3.0 shows measurable accuracy gains over the prior version in both lexical accuracy (classification of IMO default lexicals) and entity extraction accuracy (detecting and classifying named entities and modifiers in clinical text).

At the entity extraction-level, accuracy for recognizing and extracting our core entities improved by the following increments. Note that accuracy is defined as F1 scores, and the improvement is quantified by percentage increase, not absolute increase.

Procedure entities: accuracy improved by +57.5% compared to the prior version
Lab entities: accuracy improved by +37.7% compared to the prior version
Problem entities: accuracy improved by +12.3% compared to the prior version
Medication entities: accuracy improved by +7.7% compared to the prior version

At the lexical-level, accuracy for resolving entities to IMO lexicals for our core entities improved by the following increments. Note that accuracy is defined as F1 scores, and the improvement is quantified by percentage increase, not absolute increase.

Procedure entities: accuracy improved by +13.4% compared to the prior version
Lab entities: accuracy degraded by 3%; however, this is more reflective of the improved entity identification scope as the pipeline now extracts a broader and more precise set of lab entities that align with IMO's clinical domains.
Problem entities: accuracy improved by +4.8% compared to the prior version
Medication entities: accuracy improved by +7.2% compared to the prior version

How Was Accuracy Measured?

Accuracy was evaluated against a Gold Standard dataset, which was created and validated by a team of IMO Clinical Subject Matter Experts. The Gold Standard was designed to reflect a diverse range of real-world clinical documentation, and included the characteristics below:

Note types: History & Physical (H&P) notes, Progress Notes, Discharge Summaries, Procedure Reports, SOAP notes, and Operative Notes.
Formats: Typed, dictated, and transcribed electronic text notes.
Clinical settings: Data was sampled from both inpatient and outpatient emergency department settings.
Time frame: Data was sampled from years 2022–2024.
Geographic diversity: Data captured across both rural and suburban care settings.

Our clinical team of physicians, informaticists, and medical coders:

Reviewed and annotated the Gold Standard dataset to ensure consistent application of entity and modifier definitions.
Validated extraction results against established IMO lexicals, which resolve to terminology and codes such as ICD-10-CM, CPT, SNOMED CT, RxNorm,etc.
Conducted inter-annotator agreement checks to ensure consistency in clinical interpretation.
Iteratively refined annotation guidelines for training, to ensure coverage of edge cases.

Release Notes

Date	Version	Release Summary
08/18/2025	3.0.0	Significant accuracy improvements in entity extraction and modifier detection across the pipeline. FEATURE Alignment with IMO’s clinical domains and external code systems. In the entity type schema, tests are changed to labs, drugs are changed to medications, and treatments are changed to procedures. New entity types: social factors and clinical observations. New modifiers for procedures and labs that provide higher specificity when resolving to standard codes. CHANGE Improved precision in procedure and lab entity identification. Improved handling of historical clinical information to distinguish between active and past events. Improved handling of allergies to medications. Enhanced section identification for more accurate segmentation of clinical note text.
09/13/2023	2.0.0	Platform-level enhancements for security, performance and reliability.
	1.0.0	Initial release

Date

Version

Release Summary

08/18/2025

3.0.0

Significant accuracy improvements in entity extraction and modifier detection across the pipeline.

FEATURE

Alignment with IMO’s clinical domains and external code systems. In the entity type schema, tests are changed to labs, drugs are changed to medications, and treatments are changed to procedures.
New entity types: social factors and clinical observations.
New modifiers for procedures and labs that provide higher specificity when resolving to standard codes.

CHANGE

Improved precision in procedure and lab entity identification.
Improved handling of historical clinical information to distinguish between active and past events.
Improved handling of allergies to medications.
Enhanced section identification for more accurate segmentation of clinical note text.

09/13/2023

2.0.0

Platform-level enhancements for security, performance and reliability.

1.0.0

Initial release

Notices

SNOMED®️ and SNOMED CT®️ are registered trademarks of IHTSDO.

LOINC®️ is a registered United States trademark of Regenstrief Institute, Inc.

RxNorm is publicly available data courtesy of the U.S. National Library of Medicine (NLM), National Institutes of Health, Department of Health and Human Services.