Data ecosystems representation
Fluree Blog Blog Post Brian Platz10.26.23

Data Ecosystems: What They Are, Use Cases, and Building Your Own

A comprehensive exploration of data ecosystems.

Each person and thing moves, communicates, and connects within an ecosystem. Your home, for example, could be considered an ecosystem of appliances, family members, plumbing fixtures, waste disposal, and so on. A bank is an ecosystem of ATM machines, credit cards, checking accounts, wealth management, loan servicing, and other services. Because so much of our lives are digitized, it is possible to monitor these movements in the form of data. This is called a data ecosystem. 

A data ecosystem is a way of gaining insight into what’s already happening to create new opportunities. It’s not about collecting every piece of imaginable data and trying to understand everything at once. Rather, it’s about linking together key pieces of data from across the ecosystem to fit a new use case. The speed and volume of data available today, combined with regulatory and economic considerations, creates urgency around building a data ecosystem. But where to begin? 

In this two-part series, we’ll cover what a data ecosystem is, how organizations use data ecosystems in real life, how to think about your own, and the risks involved. This article will cover the what and how of data ecosystems. The next article will show you how to mitigate security-, privacy-, and sharing risks. 

Ecosystems Old and New

Information ecosystems are nothing new. What’s new, rather, is the ability suck in vast amounts of digital data, some of it unstructured, and run it through smart software to fuel a data ecosystem. 

In the early 1900s, for example, companies like Ford and General Motors found success within the context of manufacturing- and services ecosystems. Steel mills, tire manufacturers, auto repair shops, road maintenance, and a slew of industries communicated within the auto ecosystem to ensure that cars were built, maintained, and operating on well-paved roads. Information moved through phones, ledgers, letters, and so on. 

The auto ecosystem catalyzed other new ecosystems. Real-estate developers masterminded suburbs as a desirable place for newly car-endowed families to live. Highways, bridges, and tunnels led to a new form of tourism (the road trip) and roadside attractions. A cultural ecosystem also emerged around cars. They became not only a way to get from point to point, but a statement about one’s personal preferences and social status.

When Bytes Bite

In the time of the first automobiles, information moved through in-person conversations, phone calls, telegraph, letters. In the modern ecosystem, data moves in packets of bytes, emitting from online interactions and sensors and traveling by way of text, phone calls, emails, and so on. 

These bytes are not, by themselves, complicated. The complication comes from the various architectures we’ve built to process bytes. We don’t use different forms of the US dollar for banking, credit cards, retail, groceries, and housing. A dollar is a dollar is a dollar; its value is collectively understood. Yet data is stored, labeled, transported, and applied in unique and application-specific ways. This has to do with the rush to digitize, and the various custom solutions built to facilitate digitalization. Our complex ways of dealing with data get in the way of turning it into a commodity for everyone’s benefit. 

Different data architectures have evolved based on use case and budget. Marketing collects, processes, stores, and integrates data very differently from DevOps, finance, or customer support. Instead of sharing data, these ecosystems rest beneath the organizational umbrella like separate islands.

Even if you’re a sole proprietorship with cloud storage and website analytics, you are plugged into a rudimentary data ecosystem. The point is to think about how you can make it yours, employing it for new areas of innovation, service, and cost savings. When you find a use case that links data together in new ways across the organization, you begin to build a bigger data ecosystem.

The New Urgency Around Data Ecosystems

Recently, several factors have increased the pressure to leverage a data ecosystem. One comes from regulation. GDPR, healthcare interoperability requirements, and other regulations are pushing organizations to pay closer attention to their organization-wide data ecosystem. Supply-chain problems lingering beyond the Covid-19 pandemic have compelled leaders to gain a view into data ecosystems to pinpoint shortages and revenue leaks. Inflation is another factor. Industries like utilities and aviation use data ecosystems to optimize pricing and usage models. 

The resurgence of AI adds urgency. A lot of organizations are wondering how to best use foundational models like Chat-GPT. While building your own, similar model costs hundreds of millions of dollars, limiting development to the most well-endowed corporations, anyone can tune a model to fit their own needs. For example, a salon chain might fine tune a model to create a salon-specific scheduling bot that aligns with the company’s tone, brand, and policies, rather than sounding generic. A tech company could tune a model to detect cybersecurity threats specific to its infrastructure. 

If you lack oversight into your data sources, quality, and pipeline, the AI model can’t be tuned until you clean everything up, nor can it continue to learn. If your data security, privacy, and compliance are questionable, your model could be fed compromised data. AI is thus another catalyst to build an organization-wide data ecosystem. But where do you begin?

Data Ecosystems in Real Life

It would be impractical to wire together an entire organization-wide data ecosystem at once. Of the many combinations of storage, data processing, and other components that make up your broader ecosystem, you need to prioritize in order to stay on budget and actually make use of that glut of data. Think of a use case that aligns with your business goals. Audit what’s possible, then build a proof of concept from there. 

Fluree CEO Brian Platz recently covered the most common use cases for data ecosystems in Forbes: 

  •  Save money. Optimize supply chains, staunch revenue leaks, predict resource usage and reduce waste,
  •  Create a new line of business. Create a data ecosystem and sell access to it. Or else mine a data ecosystem to infer more accurate predictions, then sell predictions as a service.
  • Predictive modeling. Harness accurate predictive models to accomplish a goal. For example HiLo, a joint initiative in the maritime industry, has made significant strides in reducing accidents. Or else use the business intelligence approach: digitize services on a massive scale and then display data in a way that aids decision-making.
  •  Accelerate R&D. Use a data ecosystem to accelerate R&D, for example in the search for new pharmaceutical compounds. 
  • Tune or build AI models. If you have exclusive access to specific data types or combinations, and deploy an AI model on top of that data, you can unlock insights and opportunities for new business ventures. For instance, if you alone have data related to hair-brushing routines, you have the unique ability to engineer a more efficient hairbrush, or else offer a predictive solution to hair care companies.

Does one of these use cases fit your business goals? If so—or even if you haven’t yet found a use case—know that there are discrete steps you can take to begin to build your own data ecosystem, one use case at a time.

How to Build Your Own Data Ecosystem

There is a systematic approach to identify, evaluate, and implement a data ecosystem. Here’s a step-by-step guide. 

1.    Determine Core Business Goals and Challenges

In a previous blog post on data hoarding, I described how, in the mid-2000s, “IT suggested copying all relevant data into a single data warehouse, where it would be easier to pull and analyze.” This led to an excess of data sitting in data warehouses. The lesson: Don’t store everything you can, it gets messy and expensive. 

Data collection is not one size fits all. Instead of setting up systems just to collect data, be strategic. Which data should you be collecting and why? The data you collect should align with your company’s business goals and challenges. For example, if you’re in retail, you might prioritize inventory optimization or customer segmentation. But if you’re running a mobile app, then you should be collecting device information, payment information, and usage data instead. 

Think about the established use cases above. Do you most urgently want to save money, create a new line of business, predictive modeling, accelerate R&D, or tune/build AI models? What are other companies in your industry doing? What do the domain experts in your own organization want and need for their pain points? Once you answer these questions, you should come up with a short list of best fits. The next step is to validate these best fits against the volume, quality and type of data you are already collecting. 

2.    Assess & Prioritize Data

What data do you currently collect and store? Compile a comprehensive list of all the data sources your organization uses, both internal and external. This includes databases, spreadsheets, cloud storage, third-party data providers, IoT devices, and any other repositories where you collect and store data. 

Once you’ve answered that question, create a data inventory that catalogs the types of data each source contains. Categorize the data into structured (e.g., databases), semi-structured (e.g., JSON or XML), and unstructured (e.g., text documents) formats. Note the frequency and volume of data generated or collected from each source. Assess the quality of your data. Evaluate data accuracy, completeness, consistency, and timeliness. Identify any data anomalies, duplicates, or errors that need to be addressed.

Review the security measures in place to protect your data. Is sensitive data encrypted and are your access controls robust? Is data regularly backed up to prevent data breaches or loss? Who owns and has access to each data source? Do you have defined rights and permissions so that you can ensure data security and compliance with privacy regulations like GDPR or HIPAA? Your data audit should clarify which data sources you should leverage first. If your sensitive data is not secured, it is not ready to be part of a data ecosystem.

3.    Select a Use Case

Now that you have an idea of the state of your data, think about how to match the data to your most urgent and important use cases. Data will to some extent inform what you can do. What additional data sources and technologies will you need to achieve your use case? Is it within reach? Cross off use cases that are too far out of reach, and focus on the one you can build now. 

Write down KPIs that your proof of concept will measure. Identify stakeholders that should be part of the proof of concept, such as data engineers, data scientists, analysts, and business stakeholders. Ensure everyone understands their roles and responsibilities. Now you’re ready to put together a proof of concept of your use case.

4.    Build a Proof of Concept 

To test the waters, develop a small-scale project to demonstrate the feasibility and benefits of your data ecosystem. Pick the data sources, storage, data integration platforms, and visualization platforms that you will use (for the least risk, use ones that are already within your organization instead of buying new software). 

Next, extract, transform, and load data into your database. Note that you can use AI at this stage. Fluree Sense, for example, uses supervised machine learning that is trained on labeled data. It can help uncover patterns in data and then predict patterns in new and unseen data. Thus, you can use machine learning to quickly identify and match records of the same entity type from diverse sources, even if there are mild discrepancies in the data.

Process and analyze your data to meet your use case. If there’s sensitive data, implement governance policies, access controls, and security measures. Test the performance and scalability of your data ecosystem under realistic workloads. Identify and address any bottlenecks or issues. Set up monitoring and logging to track the health and performance of your data ecosystem in real-time. 

Regularly review and evaluate progress against the defined objectives and KPIs. Adjust as necessary to meet your goals. Collect feedback from all stakeholders involved. Use this feedback to fine-tune your data ecosystem and address any issues. Create a final report summarizing the results of the proof of concept, including insights gained, challenges faced, and recommendations for future steps. Based on the proof of concept’s results and feedback, make an informed decision on whether to proceed with the full-scale implementation of your data ecosystem or make necessary adjustments and conduct further testing.

An Ongoing Evolution

Running a proof of concept for a data ecosystem can be a complex process. Once you validate your ideas and technologies, your organization will be able to move to a production-level deployment of a data ecosystem. Key pieces of data across your organization will be linked together to serve a use case that matches your business goals and capabilities. You’ll be one step closer to data-driven transformation. 

In the next blog post in this two-part series, we’ll return to the proof of concept to cover the risks that you need to think about when building a data ecosystem. We’ll include data security, privacy, and sharing, including outside of your own ecosystem.