Welcome to part 1 of Fluree’s Data-Centric Series, where we peel back the layers of the data-centric architecture stack to explain its relevance to today’s enterprise technologies and shifting digital industry needs. At its core, Fluree is built around the concept that in order for data to be increasingly available, interoperable, and fundamentally usable—data must be backed with total provenance, security, and trust.
This first installment focuses on data “trust” and its impact across the broad spectrum of enterprise data management. Organizations are increasingly describing themselves as “data-driven,” which begs a few questions: what data is being used to “drive” the organization forward? Where did it come from? Can it be trusted? Could it have been manipulated by a third party, an admin, a SaaS middleware, or a central authority? Who can access it? Who has accessed, copied, or distributed it? What is the entire CRUD history?
What does “Trust” mean for data?
As elusive as the term can be in a technical setting, we can agree that information being deployed in an operational or analytical setting should have a high degree of “trust.”
Here’s a checklist for what “trust” means to us at Fluree:
- Data Provenance – We can prove not only who (or what) originated data, but also the complete lineage of who (or what) read or modified the information and under what circumstances.
- Data Traceability – We have comprehensive visibility into the complete path of changes to data: who has accessed or updated a piece of data in a system, when, and under what circumstances.
- Data Integrity – We can detect data tampering at any level of granularity (down to the pixel of an image or a single letter in a text document).
- Identity Management – We can control and prove the identity (user or machine) associated with any of the above events (data origination, changes, access).
- Proof – We (humans or machines) can prove the above criteria using standard math and cryptography. Taking “Triple A” (Authentication, Authorization, and Audit) to the next level.
Summed up, trusting your data means you have complete control over and visibility into how your organization’s data is created, accessed and used across its entire lifecycle. Data-Centric trust involves securing CRUD operations across all data assets, managing attack surface visibility, and laying the foundation for complete “data audit trails.” In turn, data-centric trust allows the organization to scale security and compliance alongside new digital innovations.
Why Trust is Needed Today
We can think of trust as a key factor for conducting successful transactions between stakeholders in any given group (individual, enterprise, government), as well as any permutation of interaction (individual<>enterprise, government <> enterprise, enterprise<>enterprise, so on and so forth.)
Today, we’ll focus on a few areas that affect enterprises:
Building Trust into Enterprise Data
Executives have dozens of high-powered analytics tools at their disposal – so why do they continue to rely on “gut instinct” to make decisions? Mostly because they don’t trust the sandbox: If you can’t fully trust your data, you most likely won’t let data lead decisions for your company.
There are three dimensions to data trust that must be satisfied:
- Validity – is the data correct?
- Reliability – is the data accurate and timely?
- Governance – does the data satisfy business or compliance requirements?
Erosion of any of these three dimensions introduces risk to an organization.
Building Trust into Digital Compliance
With regulation around data ownership and privacy rapidly approaching the global market, organizations will need a solid plan on how to address these new pressures.
We spend hundreds of billions of dollars a year on auditing costs. A large amount of that cost is spent on simply verifying the validity around records to comply with various categories of compliance (Vertical and Horizontal).
Horizontally, PII compliance will need to scale alongside digital initiatives. Companies are already scrambling to satisfy GDPR and CCPA, and it will only get worse with “data-driven” innovation.
“The new European data privacy legislation is so stringent that it could kill off data-driven online services and chill innovations like driverless cars, tech industry groups warn.”Natasha Singer, NYT
Vertically, there are an array of industry regulations spread across the spectrum of digital maturity: HIPPA laws in healthcare or airworthiness compliance mandated by the FAA. In any of these cases, as technologies replace existing manual processes with data-centric processes, these compliance regulations must be accounted for digitally.
Instead of a never ending threat, these data regulations can be seen as an opportunity to re-imagine enterprise-wide data strategy with data quality, trust, and integrity in mind.
Integrating the tenets of data-centric trust into an enterprise GRC (Governance Risk and Compliance) strategy will be core to survival and success. Specifically, using temporal dimensions to managing data (see time travel with Fluree) allows companies to prove they were in compliance at all times versus a snapshot of times. Ledger-based data management systems like blockchain technology have the potential to automate much of these emerging data governance and compliance needs.
Building Trust into Data Ecosystems
Data is becoming more collaborative as organizations realize the value of sharing information internally and externally to collectively solve problems. At the same time, industries are increasingly becoming “data ecosystems” with various levels of permissioned data sharing: supply chains, insurance companies, banks, and other industries that run on data are breaking down their four walls in the name of greater transparency, collaboration, and efficiency.
With this increase in collaboration comes a greater mandate to build systems for trusted data sharing. Here are a few examples of applied trust in data sharing:
- A clothing supply chain that digitally fights against counterfeits
- A digital auto-parts market that can guarantee the authenticity and value of parts across suppliers and manufacturers
- A shared KYC or AML database
- “Verifiable Credentials” – a W3C standard for proving identity on a permissioned basis to third parties (for example: providing provable credentials of employment in a mortgage application process)
Enterprises will increasingly need to share trusted data in real-time while managing associated risks such as inaccurate or manipulated data. Practicing data-centric trust in business ecosystems will greatly reduce the friction, time, and cost associated with validating, operationalizing or analyzing third party data. In turn, we can realize the benefits of more seamless data exchange (better customer outcomes, more reliable predictive models, more accurate shared databases, etc.)
Building Trust into Machines
Machines are making more and more autonomous decisions, drawing from more sources of data. Shouldn’t those decisions have a foundation of trusted and secure data? Machines will increasingly need to verify the validity, origin and integrity of the data they use to make operational decisions.
In terms of AI explainability, data-centric trust allows for a complete audit trail into why a decision was made.
In terms of AI security, we must build tools to fight against “data poisoning,” an emerging attack pattern that involves maliciously manipulating source data to affect the outcome of an algorithm.
Fluree Automates Trust
As an operational data store, Fluree embeds trust directly into data. Fluree transactions include the following elements (in RDF-style):
- (+ extensibility with additional meta-data)
Fluree then signs each transaction with the submitter’s unique key. Before committing the transaction, Fluree checks the transaction against various categories of rules (schema, validity, permission, and duplicate checks).
Fluree then combines transactions into immutable time-stamped ‘blocks’ and locks each block in via asymmetric cryptography (Sha3-256) for complete data integrity. This immutable ledger of data allows us to categorically prove time and source for every datum in the system (read more about Fluree’s unique architecture on our documentation site).
This process provides a few key outcomes for data:
- Provable Data Provenance: Mathematically prove the complete provenance of every piece of data — including who originated it, and under what circumstances.
- Provable Data Lineage: Traceability of changes are tied to digital signatures — including even third-party applications that sit between clients and data. Access every version of the data as of any historical point in time (see Time Travel with Fluree).
- Provable Data Integrity: Fluree can prove that data has not been manipulated, and provide systems and organizations with the tools to verify these facts.