Fluree Blog Blog Post Kevin Doubleday09.19.23

Making Data FAIR(EST)

Extending the FAIR principles to enable trusted collaboration and instant composability in data.

In 2020, we published a blog post on the FAIR principles for data management

As a quick recap, the FAIR principles of data management emphasize Findability, Accessibility, Interoperability, and Reusability of data. These principles were outlined in a 2016 article published by Scientific Data as a way to curb silos and promote broader collaboration within the academic community. Here’s a quick summary of the FAIR data principles:

Read more here: https://flur.ee/fluree-blog/making-data-fair/ 

What does FAIR have to do with enterprise data? 

Like scholarly data, enterprise information today is lost in silos, rarely re-used, integrated, or leveraged in a meaningful way beyond its original purpose. 

We call this forever-lost information “dark data.” According to CIO magazine, 40% to 90% of enterprise data is considered “dark”, depending on the industry.

Of the information that is potentially available for reuse through various extraction methods such as ETL, APIs, or data clouds, quality is often so far below par that it might be incomprehensible without expensive data engineering to normalize various information. We call this information “dirty data.”  Experian recently reported that on average, U.S. organizations believe 32 percent of their data is inaccurate. The correlated impact is equally staggering: in the US alone, bad data costs businesses $3 Trillion Per Year 


Dirty and lost data are commonplace at organizations of all sizes, resulting in lost time and money. 

Behind these problems are the broken promises of “Big Data,” an illusion sold and perpetuated that an abundance of data and some fancy analytics tools could unlock a new era of knowledge discovery and decision-making. However, blindly implementing big data solutions often requires substantial investments in technology, infrastructure, and personnel. Moreover, the time and effort required to integrate disparate data sources and ensure data quality often outweighed the potential savings. In many cases, the costs associated with big data initiatives exceeded the benefits, leaving organizations disillusioned. 

This is why Gartner recently predicted that 70% of organizations will shift their focus from Big Data to “Small and Wide” data, emphasizing the importance of high-quality, linked information over high quantities of low-quality data. Brian Platz, CEO of Fluree, covered this idea in 2022 with a Forbes opinion piece entitled How Small Data Is Fulfilling The Big Data Promise.

What does this have to do with FAIR? The FAIR principles provide an agnostic but prescriptive framework for making data high quality, accessible, and reusable so that all stakeholders along the data value chain can glean insights with minimal friction. Applying the FAIR principles as a governance framework can help organizations reduce the risk of dark or dirty data. 


Adding “EST” 

Today, we are making the case to extend the FAIR principles to include notions of extensibility, security, and trust. While FAIR provides an excellent framework for open data reuse, these three additions contribute guidance for organizations looking to treat data as a strategic, collaborative asset across boundaries. 

Data-centricity is the ethos driving these additional principles. In a data-centric architecture, many of the capabilities required to share information across applications, business units, or organizational boundaries are pushed from the application layer (software) to the data layer (database). Organizations moving to a data-centric architecture begin to strip away the layers of middleware and software that seek to accomplish specific tasks related to interoperability, trust, security, and sharing and instead focus on data-centric and model-driven architectures that enable composability and collaboration out of the box. 

Let’s dive into the proposed appendix to FAIR(EST): 

E – Extensibility

Data extensibility involves the capacity to expand a data model dynamically for additional capabilities while preserving its original structure and meaning. 

In a data-centric architecture, data is the central product, while “agents” such as applications, data science workflows, or machine learning systems interact with and revolve around this core of interconnected data. This requires the data to be useful in a variety of contexts, and, importantly, freed from proprietary formatting. 

Leveraging standardized vocabularies, in the form of semantic ontologies, data producers and consumers can extend data’s potential value and applicability across operational or analytical contexts. Relational systems are rigid and often proprietary; the opposite is true for semantic graph databases, which are flexible, extendable, and built on open standards. 

Extensibility through open semantics standards allows data models to grow and adapt as new analytical, operational, or regulatory data requirements emerge. This saves time and resources, as organizations can extend data models as needed instead of creating entirely new ones or investing in a mess of ETLs and disconnected data lakes. 



S – Security 

While FAIR alone provides an excellent framework for interoperability and usability in spaces where data is an open resource, enterprise ecosystems often operate within closed or hybrid boundaries where privacy, compliance, and competitive advantage are key factors. In 2020, we presented on data-centric security at the DCAF (Data-Centric Architecture Forum), making the case for functions related to identity and access management to be accomplished at the data layer, as data.  In a data-centric security framework, security policies and protocols are defined and enforced at the data layer, rather than deferred to a server, application, or network. 

We call this “data defending itself.” Security is baked in, and thus inseparable from the data it protects. Using these powerful embedded operations, we can turn business logic into enforced policies that travel alongside the data, forever. 

Enabling data-centric security opens data up to become more immediately and freely available. Within this framework, we can open up our data sets to be queried directly by virtually anyone without moving it or building specific APIs that abstract certain elements; data will be filtered out according to the established rules associated with the user’s identity. 

Read more on data-centric security here



T – Trust

We make the case in our data-centric series that in order for data to be effectively shared across trust boundaries, it must have digital authenticity inherently built in. If we are to share data dynamically across departments, partners, and other stakeholders, that information should come pre-packaged with proof of where it came from and that it did in fact originate from a trusted source. 

A few elements make up what we call “trust” when it comes to information: 

  • Data Provenance: We can prove who (or what) originated data, ensuring it came from an authoritative source. 
  • Data Lineage & Traceability: We have comprehensive visibility into the complete path of changes to data: who has accessed or updated a piece of data in a system, when, and under what circumstances.
  • Data Integrity – We can detect data tampering at any level of granularity (down to the pixel of an image or a single letter in a text document).
  • Identity Management – We can control and prove the identity (user or machine) associated with any of the above events (data origination, changes, access).
  • Proof – We (humans or machines) can prove the above criteria using standard math and cryptography.  Taking “Triple A” (Authentication, Authorization, and Audit) to the next level.

As more and more data consumers are unleashed onto enterprise data (*clears throat* AI), the imperative to ensure the digital authenticity of information becomes critical. This is a data quality, risk, and operational issue that cannot be addressed tomorrow. 

Read more about data-centric trust here

Conclusion 

FAIR(EST) can be taken as a high-level framework to move your organization away from legacy data infrastructure. Modeling FAIR data future-proofs that information for re-use. Adding extensibility, security, and trust to your data enables true composability across boundaries without expensive data engineering or data security issues. 

The result might closely align with what our industry calls “data mesh”, a lightweight infrastructure for decentralized sources to collaborate on data across technological and organizational boundaries. However you might define it, the path to FAIR(EST) is a technological and cultural journey worth hiking.

We will cover how organizations can take their first (second and third) steps to achieve data-centricity in our upcoming webinar with Dave McComb from Semantic Arts. Check it out here: Data-Centric Strategies to Break the Cycle of Legacy Addiction

What other additions would you make to the FAIR principles? Email us at [email protected].