Making Data VAULTIS-Compliant with Fluree

In 2020, the Department of Defense published a data management strategy that recognized data as a critical and central asset to operational success and called for digital transformation in the name of “data-centricity.” The strategy identified a need to leverage data across the DOD “at speed and scale for operational advantage and increased efficiency.”

To accomplish this data-centric vision, the strategy listed 7 guiding principles under the acronym VAULTIS – each letter representing a requirement for building a successful data-centric strategy within the organization.

What is VAULTIS?

VAULTIS provides guiding principles for data-centric data management. VAULTIS stands for: Visible, Accessible, Understandable, Linked, Trusted, Interoperable, and Secure. In providing these characteristics to a data strategy, organizations can move towards a data-centric approach to conducting their business, merging operational and analytical data domains and treating data as a valuable and versatile asset serving many stakeholders. Data-centricity increases an organization’s ability to securely share and collaborate around data, unlocks higher levels of analytical insights, and allows organizations to iterate business operations with more agility and precision.

Why pay attention to VAULTIS?

VAULTIS may have originated out of the Defense sector, but private sector organizations would do well to emulate these guiding principles and apply VAULTIS to their own data-centric strategies. The VAULTIS principles ultimately prepare data for better outcomes across the data value chain – bringing together data producers and consumers for increased efficiencies while preserving data integrity, trust, and security. Let’s dive into each of the VAULTIS principles:

Visible: Data must be made visible for downstream consumption, and data owners should have the tools necessary to model and tag data for increased visibility.

Data visibility remains an issue for most organizations – at many times, data consumers aren’t aware of the types of data assets that are available to them, resulting in duplication of work (and data).

Fluree provides data publishers, stewards, custodians, and managers from any given data domain a means to consolidate sources of truth for data stakeholders and drive visibility for their registered data or datasets. The platform allows organizations to make rich use of semantic ontology-driven data modeling to index data assets under connected vocabularies, providing a semantically linked data catalog of data assets that are highly discoverable by data consumers.

Our upcoming Fluree Nexus cloud product interface will allow permissioned data consumers to browse and search data sets as well as discover related data assets via linked metadata. URL-style saving and sharing of data sets and queries will make the data sharing process intuitive.

Accessible: Data must be accessible to a wide range of consumers on-demand, but also easily and quickly obfuscated from people, apps, or organizations with enforceable and wide-ranging permissions based on data privacy and policy.

Enterprise data platforms should make data readily available to those that need to for better and quicker decisions, but should also recognize data security and privacy challenges that emerge alongside a higher level of data sharing.

With Fluree’s data-centric approach to data security, data owners can manage data policies, including data access control, at the data layer in order to restrict data access at the dataset, row, column, or cell level. Data owners may define very granular and arbitrary rules to determine access privileges – leveraging Fluree’s data-centric governance, databases are capable of relationship-based-access-control where conditions around the data itself (e.g. mission relationships, metadata characteristics, affiliation to storage environments) can provide more powerful, granular, and complex contexts for data-centric data security. As such, credentialing can be flexible and dynamic depending on changing environments. In the case of multiple stakeholders in the policy enforcement process, data stewards, custodians, and managers can define their own unique access policies around the same data, as per the requirements of their own data governance contexts or dataset-specific sharing agreements.

This data-centric framework of embedding data security policies within the data itself allows a single data set to service multiple consumers, even with varying degrees of credentialed access.

Understandable: Data consumers should be able to recognize the content, context, and applicability of data.

Fluree’s RDF core allows organizations to make use of semantic graph ontologies to formally define common concepts and relationships in order to place shared meaning on registered data assets – even across disparate data domains and vocabularies. Leveraging an ontological schema approach allows data consumers – both human and machine – to derive globally-defined meaning from a data asset beyond the scope of its origin domain.

Data aggregation and insight-generation is then capable of interoperating across multiple data sources, creating a highly-dynamic distributed knowledge graph through which data consumers can understand and explore relationships between data assets both within their respective domain and across the wider data fabric. Users can then leverage analyses and insights originating from one data domain and apply them directly to a separate domain with ease.

Linked: Data should be linked and cross-referenced in order to increase the quality of enterprise information and derived insights.

Fluree is built on linked data standards, capable of defining relationships between data in a linked graph database. Fluree is built on W3C data standards for linked data, with native support for RDF (Resource Description Framework) interchange form, SPARQL for semantic federated queries, and JSON-LD for instant ontology mapping to payloads and defining unique IRIs for data assets. Fluree is built for many-to-many sources and consumers interfacing with the data layer, allowing a collaborative environment of insight generation, where data can be extended and linked to other data in ways that enrich ecosystem-wide data experience.

By leveraging a semantic ontology, Fluree-backed solutions can leverage inferencing, which will uncover hidden insights and patterns for data consumers. Importantly, this linked data and associated inferences can power operational domains, in which a data graph serves as both an analytical tool for downstream consumers, and also simultaneously the source of operation for deployed applications. This model allows for linked data knowledge to grow in analytical and operational value without splitting into duplicative data silos.

Trusted: Data consumers should be able to trust all aspects of data, including provenance, lineage, and integrity.

With Fluree, data publishers can have high confidence that their data will reach the intended consumer un-altered by process or middleware, and data consumers can have high confidence in the accuracy, reliability, and trustworthiness of data assets on the platform.

Fluree’s data-centric approach to digital trust provides a zero-trust framework for verifying the legitimacy of data, metadata, and data sources by using public/private key cryptography and blockchain asymmetric hashing algorithms. By leveraging Fluree’s distributed ledger backplane, facts about data and metadata registered within datasets can be proven to be true and not tampered with, including data provenance (who or what originated the data), data lineage (complete lifecycle of data path across systems, users, and organizations), and identity associated with actions (machine or human interaction with data).

The underlying Fluree ledger system allows for “time travel” queries in which users have the ability to track and review the history of changes to data assets over time, with the option to reproduce any versioned state of data. Because Fluree indexes a timestamp as metadata related to every delta to data, queries to data within Fluree’s system can specify any moment of time and retrieve an instantaneous result. This temporal versioning for immutable data allows for highly-explainable data decision-making and provable data audit review.

Interoperable: Data must be easily exchanged across domains, systems, apps, and users with shared and common representation and meaning.

By leveraging universal data standards, Fluree’s graph database allows for the exchange of data across systems using a lightweight RDF interchange format and universal semantic interoperability, with zero integration overhead. RDF is a non-proprietary format universally understood by databases and applications. Fluree is entirely built upon these W3C standards, meaning our solution avoids a situation of proprietary standards lock-in, and natively facilitates integration and interoperation with systems and data designed around the same W3C open standards.

Disparate data assets, including data, metadata, relationships, and even ontologies, will be able to maintain their semantic meaning in and across various domains. Built on W3C RDF semantic standards, Fluree natively supports the SPARQL standard for semantic queries that simultaneously interoperate across systems.

Secure: Data must be protected from unauthorized access, manipulation, and use.

Fluree’s platform satisfies encryption-at-rest and in-motion, as well as employs data-centric access restrictions for securing data in use, as mentioned above under Accessible. Fluree’s data-centric approach to data security delivers a zero-trust security framework for all data assets.

Data-embedded access policies require signatures on all queries and transactions to the system, meaning even breaches of the firewall or network by bad-faith actors do not risk unfettered data access or leakage. Data consumers with direct subscriptions to the data layer are always permissioned using cryptographic private keys, so that there is no risk of data leakage during data-event-driven messaging.

Fluree’s system natively supports ABAC/RBAC, but also provides an option for RelBAC, in which the data state itself can provide context for dynamic policy enforcement. RelBAC and cryptographic signatures create precise, relationship-based data policy enforcement that ensures arbitrary changes to user identity attribution won’t risk that user’s private key affording too much or too little data access.

Because Fluree’s system supports decentralized semantics and distributed ledger technology, it is possible for a Fluree-powered solution to make use of decentralized identity attribution, including identities issued across the enterprise or enterprise-external systems, so that organizations can extend data capabilities beyond their borders without introducing a set of security risks.

Conclusion

At the end of the day, your data strategy must ultimately benefit end-users along the data value chain: developers, programmers, data governance stewards, information security, data architects, analysts, scientists, and the general business user. The VAULTIS principles are excellent guidelines for organizations looking to build their data-centric strategy that addresses the diverse needs of these data stakeholders. Organizations that emulate the VAULTIS principles will move to a more fluid and functional data architecture in which analytical and operational domains can realize data’s full potential.

Interested in learning more? Get started with Fluree here!