NLP Tackles Unstructured Data

Twitter icon
Facebook icon
LinkedIn icon
e-mail icon
Google icon
pieces_1338314563.jpg - Pieces

Reaching patient safety and quality goals requires a razor-sharp handle on data. Yet, data reside in an array of systems, software and resources and often are maintained in an unstructured format difficult to access and analyze. Natural language processing (NLP) comprises a robust toolset that can address these challenges by taking data management beyond the next level.

NLP translates words (or unstructured inputs) into data, providing a set of technologies that incorporate statistical algorithms to match components of language. Several pioneers are building on NLP's coding roots to develop a roadmap for future applications in healthcare.

NLP-driven documentation

In August 2011, the University of Pittsburgh Medical Center (UPMC) signed a 10-year joint development agreement with an NLP vendor with a goal of organizing and mining mountains of internal and external data. "When we looked at where we aspired to be—an accountable care organization that leverages the power of information—we realized we needed a good handle on the buckets of information at our disposal," says Rasu B. Shrestha, MD, MBA, vice president of medical IT and medical director of interoperability and imaging informatics at UPMC.

The initiative focuses on documentation from three perspectives: creation, storage and mining, and seeks to address key drivers in healthcare, specifically patient safety and cost. To date, other healthcare organizations have applied NLP in back-end processes, such as dictation and coding.  

"Our vision is that medical intelligence will guide the physician to include the most thorough and accurate information in the notes," Shrestha says. He and his team are working against the typical workflow. "Incorporating NLP at the front end of the process can have a tremendous impact," he says. By shifting the emphasis to the front end, the capabilities of NLP then flow through the clinical process.

Other organizations, such as Adventist Health System (AHS) of Orlando, Fla., have tapped into NLP to optimize coding. Since integrating NLP at 24 sites beginning in February 2010, AHS coders have improved average processing times to eight minutes better than the national average.

The NLP-driven process is more efficient, more complete and accurate, says Migdalia Hernandez, RHIT, corporate health information management director. The NLP system is interfaced to the EMR and pulls relevant data, such as ulcers or arrhythmias, for presentation to coders.

AHS also worked with its vendor to develop rules to teach the system to review each of the more than 20 terms that physicians use as a heading for the diagnostic impression. Regardless of what term is used, the impression represents a wealth of coding data and must be reviewed.

The value proposition

The most potent NLP applications may help physicians apply research in practice. UPMC aims to use NLP to mine its systems and records for information and evidence tied to a disease process or metrics. NLP, says Shrestha, organizes those data in a more user-friendly manner. "We realized a single technology stack could help us meet our goals. With a better starting point for information, the end results have improved."

Seton Healthcare Family, an Austin, Texas-based hospital system, is tackling a related challenge and shifting from hospital-based care to a healthcare delivery provider. "We are trying to increase the value of the care we deliver. That means connecting the delivery system and pulling together data from entities that aren't normally connected," says Ryan Leslie, vice president of analytics and health economics. It requires providers to examine data in the aggregate, as well as on a case-by-case basis.

Like UPMC, Seton is challenged by unstructured data. "Structured data only get us so far in trying to identify high-risk patient populations or gaps in care," says Leslie. The Seton team recognizes that data that could lead to better outcomes exist in physician notes and other unstructured sources. "An individual clinician might find the relevant information, but there is no easy way to analyze it in the aggregate." A physician might recognize that a single patient lacks a home-based caregiver, but identifying the entire group of at-risk patients is beyond the capacity of individual providers or traditional IT systems.

In general, 20 percent of EMR data is structured and 80 percent is unstructured. While it's easier to mine structured data, such as medications, the "golden nuggets" of information, such as ejection fraction, are often hidden away in an unstructured format in clinical notes.

The problem is that traditional data analytics tools—aggregate views and trend reporting—don't work with unstructured data. Researchers need a system to mine unstructured and structured data. "When you have both data aggregation and data mining together, you can start making sense of structured and unstructured data to determine predictors of problems such as 30-day readmissions," says Leslie.

Seton plans to use NLP to expedite the research and learning process, to stratify patients at the highest risk for readmissions and test published models in the local patient population.

Meanwhile, UPMC aims to link NLP with decision support and use the technology to wade through medical literature and feed evidence to clinicians at the point of care. High-quality contextually relevant evidence, if presented intelligently at the point of care, can push the envelope further in providing more informed and appropriate care related to a number of parameters, including perhaps diagnoses, treatment options, tests to order and guidelines, says Shrestha.

Whether it's streamlining coding, capturing data or re-engineering workflow, NLP is poised to translate into better, more efficient, more effective healthcare.