Four data business archetypes in health care
About the opportunities and weaknesses of data source systems, routers, refiners, and aggregators
A key to unlocking health care savings is data. The data use cases range from detecting undiagnosed conditions in patients to evaluating new cancer drugs to optimizing hospital facility utilization.
But realizing this potential is not an easy feat. Data is spread across different source systems and organizations, and raw data is usually not actionable. Thus, a whole ecosystem of companies has emerged to make health care data usable, and today I will look at the business archetypes that exist in this space. I will discuss their general business model, how they build economic moats, the weaknesses of their business models, and which growth strategies might work for them.
Here are a few examples of companies in these different archetypes:
Data Source Systems: Owning the Inputs
In the beginning, there was a data point - every entry in a database will need to be produced somewhere. Either by a sensor, by a manual entry, or by a system writing into another system. In most organizations, data is entered and produced by operational systems such as an ERP system or the EHR in the healthcare context.
The best examples of healthcare data source systems are EHR vendors and practice management systems. Device companies also generate raw data. Examples are WHOOP & Levels, on the lifestyle product side, and medical equipment manufacturers, like oxymeters or other diagnostic devices. Claims and financial systems are the most prevalent data sources on the payer side.
Business Model & Challenges
Data source systems usually have a straightforward business model; either they pursue a SaaS strategy or an enterprise license model. They strive to become the system of record, which will make them very sticky, and it will become almost impossible to migrate away from the solution (as it will most likely take five years and will burn through at least two CIOs).
Two main challenges exist for data source system vendors: First, their solutions depend significantly on their sales force. Because migration cycles are long, there is no perfect competition between all solutions at any time. Knowing when a customer is looking for a specific solution and being present with the right relationships and arguments is critical for the success of these enterprise deals. Second, though, many data source systems are not very differentiated. At its core, many operating systems are just databases with forms attached to them. Of course, they differentiate on how well they match their user interface to the workflow needs, but in the end, many of these systems are quite interchangeable (in theory).
Once you are the system of record, a logical expansion opportunity is to upsell new modules on top of your data model. As the data system provider knows the software the best, it is not too hard for them to build modules on top of their data model that seamlessly integrates with the existing system of record. They can also open up their source system and allow third-party providers to build applications on top of their platform. This offers two advantages: making them more sticky with their customers & financial upside via revenue shares.
Data source systems usually have some level of control over accessing the data. This control could be direct (they control access to the database or IP rights to the data schema and decide who can use the data or not) or indirect (they own the knowledge on how to make the data usable). Depending on their control over the data, they can sell access to the data and make it available to other organizations. This control of the data generation process is so essential that other data business archetypes build out their own data capture systems. The best example is Flatiron acquiring an EMR vendor - but more on that later.
Data Routers: Make Data Available for Use Cases
No organization exists in a vacuum, and organizations need to interact and transact with each other. Sharing data here is critical, and a whole group of companies allows organizations to collaborate by enabling data exchange. Some of these "data routers" just help with secure data transmission; others allow more complex workflows that include authorization or settlement mechanisms.
In general, data routers are agnostic about the data content - they apply some normalization, but usually, they don't apply any business logic to the data. Great non-healthcare examples include (the always mentioned) Plaid, which helps consumer FinTech Apps link to financial institutions, or Sabre and Amadeus, which have access to hotel and airline inventory and connect this to travel agents and booking platforms. In health care, various data routing businesses exist: Clearinghouses such as Change Healthcare or Komodo, interoperability providers such as Particle and Health Gorilla, and Payer-to-Provider data exchanges like Flexpa and 1UpHealth.
Business Model & Challenges
Data routers can elicit considerable network effects - if they are connected to a majority of players in a market, they can become the de-facto standard for how data is exchanged and how organizations work together on a specific use case. A great example here is Surescripts, which owns the prescription workflow. 95% of all pharmacies and 70% of physicians use Surescripts to send and receive prescriptions. However, becoming this standard can also have adverse side effects, as once established, the data router might have little incentives to innovate. They might even be disincentivized to innovate as it might break certain connections. This dynamic raises the opportunity for "on-ramp" businesses that allow for easier access to these older standards and basically "innovate the user experience". If you want to learn more about these businesses in health care, definitely read this post from Brendan Keeler.
Data routers also have several weaknesses. First, their standard is subject to disruption. If the industry comes together and establishes criteria for peer-to-peer data exchange, data transmitters become obsolete (or much less powerful). Unfortunately, these standards have not prevailed in US health care or are driven by private/ semi-private organizations with their own interests (like Epic, Direct Trust, and The Sequoia Project). Regulators could probably play a huge role in forcing people to adopt a single standard. Another dynamic could be innovations from Web3. A significant promise of Web3 is that crypto will replace a lot of the data transmission networks by using trusted protocols and cutting out the intermediaries.
The first goal for data routers should be achieving network dominance, which means having a sufficiently broad coverage among all organizations. This is easier said than done in health care, as the first 50-60% is often easy, as some standards exist in source systems. But it gets incredibly tough when we get to the long tail of home-grown data sources.
A second strategy can be to enable more complex collaboration workflows. Currently, many interoperability companies are pretty simple in their use case: send one file from one party to another. However, similar to payment networks, they have the opportunities to build more complex workflows in real-time. Examples here are claims pre-adjudication, automated prior authorizations, or referral workflows.
Another growth route is adding an analytics layer on top of the routed data. By definition, data routers are very agnostic about their data content. Still, because they touch a lot of data going through their network, they are in a great position to add analytics layers on top of their services, like flagging outliers, transforming and normalizing certain clinical concepts, and calculating benchmarks. Adding an analytics layer on top would push them into the next bucket of data businesses.
Data Refiners: Turn Data into Insights and Actions!
Raw data is often not actionable, so organizations need to turn it into valuable insights and build operational workflows around it. This usually involves several steps:
Integrating data from different sources
Building out a data model
Calculating metrics and predictive measures
Serving it to frontend applications
I am not going into detail here, as there are whole books about the proper data architecture organizations should employ and which software vendors they should use for each function. Well-known players in this space in health care are Innovaccer, Clarify, Arcadia, and Health Catalyst.
Business model & weakness
Most data refiners have a license model, i.e., they provide access to their software for a license fee (which can be per user seat or an enterprise license). These vendors will be more sticky the more central they are to the core operational workflows of an organization or when they are embedded with the IT architecture of the organization. This is very similar to the data source system vendors. But there are other strategies they can employ to get an edge: data refiners often deliver encoded domain knowledge. For example, some vendors provide rules engines that can detect uncoded risk scores or care gaps. Other examples include pre-trained AI models or specific workflow frontends that codify industry best practices. This is why there are so many healthcare-specific data platforms. Data refiners are usually competing with two groups: point solution providers, that solve one specific problem really well (for example Arcadia’s risk adjustment module competes with Apixio’s risk adjustment AI) and in-house IT departments, who want to build workflows themselves using generic builder tools such as AWS and Google Cloud.
Similar to the data source systems, the foremost expansion opportunity for these companies is to expand the number of data use cases their platform supports. Ideally, they would like to extend from data consumption workflows, i.e., where users look at a dashboard or get a particular metric, to operational workflows, i.e., where users not only read data but also write data back into the platform. Achieving this allows the data refiner to become a data source system, which makes them more sticky than if they are just an analytics solution.
Another strategy for data refiners is to become a data aggregator and not only reuse insights and best practices, but build an aggregated data asset. Let's look at this next.
Data Aggregators: Bring Data Together across Organizations
Data aggregators are a particular case of data refiners. While most data refiners work within the boundaries of one organization, data aggregators bring data together from different sources across organizations. Usually, data aggregators don't own the data sources, but they can be super powerful if they do.
Once the data aggregators obtain the data from different parties, they prepare it, integrate it, and sell it to other organizations. They can sell access to the aggregated dataset on a record per-record basis (which can be identified or deidentified), or they can only sell the analytics, like benchmarks or trained models. There are many prominent examples from the non-healthcare world, including credit agencies, payroll benchmark providers like Pave or Levels.io, and consumer research companies like Nielsen (here is a great article describing data aggregator businesses in more detail). Data aggregators are also prevalent in health care, and several high-valued companies are built on this concept. Examples are Ribbon Health (for provider data), Flatiron (for cancer patient data), UpToDate (for medical guidelines), and Apple Health (for personal health and wellness data).
Business Model & Weakness
Data aggregators don't work in all markets. They usually work well if there are high data acquisition barriers:
Distributed data: Data needs to be distributed among different organizations for data aggregators to work. The mechanism here is that the more difficult it is for a single organization to collect and integrate the data, the more defensible the business will be.
Data is changing relatively frequently: Data does not only need to be distributed among different organizations but can also be distributed over time. Data aggregators will work well if the information needs to be constantly refreshed and recollected.
Data value decays over time: This recollection dynamic is further supported if data loses value over time. For example, my credit score from 10 years might still have some meaning, but a recent score is much better for a lender.
Incentives for data owners to share the data: Data aggregators can only exist if they can access the source system. Sometimes the source systems are "open," which means data aggregators can scrape or download the data without any agreements with the data owners. They can also be "proprietary", which means the data aggregator needs to establish an agreement with the data owner. Crafting the right incentives for an organization to share the data is critical here. Incentives can range from monetary compensation, access to the combined data asset/benchmark, access to specific software tools, and avoiding regulatory fines.
Contracting friction: For proprietary data sources, data aggregators usually need to set up a data-sharing agreement with the data owner. Negotiating the terms around what and how data is shared takes time and effort. The larger the organization and the more sensitive the data, the longer the contracting cycles.
If data aggregators reach enough scale, they can become quite defensible. The number of agreements with data source organizations becomes challenging to replicate, especially if the barriers to recreating the aggregated data asset are high. Replicating the data connections and sharing agreements might become too burdensome for a single data user or a potential competitor.
The pinnacle of defensibility is reached if a company can become the source of truth for specific data points. If they become the de-facto registry for data points, data owners will want to update and provide accurate data to the registry. Google Maps is an excellent non-healthcare example: business owners want to make sure their address and business information is correct and recent so that people can find them. Google does not need to lift a finger for this or provide any compensation (they can even charge for it).
Another interesting approach to data aggregators is open source repositories, where organizations or individuals voluntarily share their knowledge or data. Nikhil from Out-of-pocket started an interesting thread on this topic:
Examples of open-source data aggregators could include a list of digital health builder tools or customers for value-based care enablement services (It is not so easy to find the right contact person for an ACOs). If everyone would benefit from the combined data, organizing some of these resources as open source makes sense. However, there are also "common good" dynamics at play. Why should I do the work and contribute when I could just wait and let others do the job.
A significant weakness of data aggregator businesses is losing access to their data source. As they are often not the data owner, they rely on data source partners to build their services, or if they use web scrapers, they rely on a UI that risks constant modification. If there is a concentration of data sources among just a few organizations, this can become a risk. Another challenge is that the aggregators do not control data quality and how data is captured without owning the data source. That's one of the main reasons Flatiron went into the EHR business and acquired Altos Solutions. The last weakness is that data acquisition barriers could fall. For example, the ONC rules about FHIR interoperability or Payer-to-Patient data will require payers to make patient data available via a modern API. This will make it easier to pull data from certain organizations, which can undermine the business of companies that rely on these exclusive relationships and data acquisition barriers.
Data aggregators can create a precious data asset, which might be uniquely positioned to solve particular business problems. Instead of just selling access to the data, they could opt into offering the solution that is powered by their data asset. For example, Google never sells consumer profiles, but they let people run targeted ads using their data.
Thoughts on Data businesses
Here are some of my thoughts on these business models:
The power of data ownership: In all of the business models, the data source owner is probably the most powerful stakeholder in the value chain. You can compare this to the oil market dynamic. If oil prices go up, the production companies benefit the most, not the pipeline or refinery owners. Because the data source owner can limit the data flow, they have better negotiating power to squeeze the benefits from the intermediaries (data routers & data aggregators). Data routers and data aggregators can avoid this squeeze by diversifying their data sources.
Privacy: No article about data businesses should not mention privacy. Especially when handling patient data, privacy and data security need to be taken very seriously. Luckily we have the correct legal frameworks to share data in a secure and compliant way (The P in HIPAA stands for "Portability"). Unfortunately, many organizations are too afraid and use HIPAA more as an information blocking tool than to enable collaboration.
Data business vs. workflow business: A common theme among all these businesses is whether they should "just do the data work" or expand into higher-level workflows. Both approaches have their advantages and disadvantages. Staying close to the data and spending time making the data capture, routing, refinement, and aggregation better give a lot of focus to a business, and it can be a winning strategy. However, workflow businesses can capture more value and thus warrant higher prices.
Blending the models: The archetypes are often not as clear-cut as I present them here. Data refiners are trying to become data aggregators; data transmitters are trying to become refiners; data source owners are becoming refiners. If you're able to combine the different business models into one, you will build the next "Google of health care", however, I don't see this happening soon, as there is too much competition in each field - and this is probably a good thing.
Overall I believe data businesses will have a bright future in health care. Collaboration within and across organizations is rising, and data is crucial to enabling better outcomes and more efficient care. In particular, I see more opportunities in the payer-provider data exchange for value-based care organizations and more opinionated workflows regarding care coordination & specialty referrals. Let me know if you're building something interesting in this space (my Twitter DM is open!).
"The power of data ownership: In all of the business models, the data source owner is probably the most powerful stakeholder in the value chain."
Is that so? Why?
I don't think the source owner is just always more powerful. Example: highly competitive commoditised market, e.g. coffee beans, to my knowledge oil too ... if there are lots of sellers (crude oil), and few buyers (refiners), you're not in a powerful position.
It seems to me it depends very much on the value of the data, and the available alternatives.
I don't think data ownership is a great advantage anymore ... but I might be biased by my particular industry background (analytics).
"Luckily we have the correct legal frameworks to share data in a secure and compliant way."
Is that so? Why? Which of the criticisms are (not) valid?