Data-Centric Time

Welcome to part four of Fluree’s Data-Centric Architecture series, where we peel back each layer of Fluree’s data-centric architecture stack.

Our first installment, “Data-Centric Trust,” describes the ways in which data provenance, lineage, and integrity are central to a healthy data ecosystem. Our second installment, “Semantic Data Interoperability,” covers how formatting data under a common vocabulary can help data ecosystems exchange information with meaning. Our third installment, “Data-Centric Security,” covers how managing security and compliance at the data layer improves data delivery velocity to end users while reducing attack surfaces.

Combining these concepts of trust, security, and interoperability into a single data layer makes for a powerful architecture that paves the way for “data ecosystems,” where many data consumers and many data sources dynamically collaborate. Today’s post will focus on enabling these data ecosystems to efficiently and compliantly collaborate by adding time as a key element to the data architecture.

Why data needs time

Data ecosystems rely on sources of data truth – but there is no “truth” without time. If we are to dynamically collaborate around data, we must have access to a shared understanding of time in order to make accurate decisions. We must also be able to “roll back” data version history to view data sets “as of” any moment in time. And finally, we must be able to review the entire history of changes to a datum over the entirety of its existence. In addition to data collaboration, there are three fundamental drivers to temporal data management:

Auditing and Compliance – Roll back data history to prove data compliance under a variety of categories, including GDPR, SOX, and CCPA.
Multiversion Command and Control – Apply “git-style” multi-version command and control such as branching and merging to databases for data debugging or advanced analytics
Temporal Query and Analytics – Query across versions of data and inspect historical patterns in data.

Adding the Temporal Dimension

Traditional databases have no notion of time – data lives in tabular or graphical format without historical context, leaving us guessing as to the details of the information’s history. Some RDMS providers offer “point-in-time” recovery that restores a server to a previous state, but these snapshot capabilities are inflexible and generally require double-writes.

Data collaboration suffers under this “black-box” approach to records management, especially when relying on third-party data. How did this data get into the system? When was it changed, and how long ago?

Immutability is the answer

Immutable databases, such as Fluree’s data ledger system, treat data as facts that can never be destroyed. Instead of an update-and-replace system where data is overwritten, immutable databases create a historical account of data values over time in a ledger. Thus, every version of every piece of data is preserved and reproducible, even if it has technically been “updated” in a future version.

Fluree takes these concepts a few steps further by adding cryptographic trust to the immutable ledger, effectively allowing users to not only review and report on data history but also prove data integrity.

Opportunities in Temporal Data Management

“Lock-in” Time in Collaborative Environments: Data ecosystems need to lock in a shared understanding of time in order to efficiently collaborate and exchange information. For example, multiple systems (or humans) leveraging a mutual data source need a consistent view of information without the possibility that it could have served up older or newer versions to different consumers as queries concurrently hit. Using timestamps as a metadata characteristic allows microservices architectures or data scientists that leverage a source of truth to “lock in” a specific time for reliable query results. This capability becomes increasingly valuable in data applications where real-time data drives decisions or operations across an ecosystem.

Traceability and Reproducibility: Fluree’s time travel capability allows apps to query as of any specific point in time, or across multiple versions time. History queries allow apps to query a single data point throughout time, listing out the complete traceability to that one piece of information. Time travel can even be deployed in the future tense by adding mock data and generating “what if?” analysis within a query. Learn more about time travel here.

Data Audit and Compliance: As data pervades consumers’ lives, privacy and compliance has become a central concern for enterprise data management teams. With Fluree’s ledger of immutable (and provable) transactions, organizations can review and report out historical data. Organizations can effectively show auditors the exact state of any database, who has or had access to data, and the complete path of CRUD operations over time.

Multi-Version Command and Control: Fluree’s immutable structure supports versioning, branching, and merging of data ledgers. This allows developers and data scientists to build efficient collaborative workflows around data, akin to Git.

Potential Drawbacks of Immutability, and How Fluree Solves each:

Let’s explore a few theoretical drawbacks to immutable data management:

Disc space will balloon

Q: If we are effectively creating a new database at the point of every transaction, won’t our storage requirements explode?

A: Fluree handles transactions incredibly efficiently – the ledger system will only store the deltas to the database. This means that although we effectively have access to every database version to ever exist, the memory does not exponentially grow. Learn more about how we deal with immutable indexes here.

Time travel queries are slow

Q: Does accounting for time add significant latency to queries against historical versions of a Fluree database?

A: One might expect query latency to be significantly longer when adding a temporal dimension to data. In Fluree, this is not an issue as time is a metadata value that is indexed along with every other component of a transaction. It is extraordinarily efficient, and an end-user should never have to concern themselves with querying a ledger as of current or historical times. A query as-of the present will return just as efficiently as the same query as-of 100 days and three seconds ago. Learn more about “time” as an element to every index here.

Immutable data cannot be deleted, thereby violating privacy regulations

Q: An immutable database like Fluree cannot delete data – isn’t this a direct violation for data under GDPR’s right to be forgotten policy?

A: Fluree pairs immutability with data-centric security, the powerful ability to embed data access and usage control policies as data-layer executable code. With Fluree, granular and arbitrary permissions for data access are enforced at the data layer using cryptographic identities to sign queries and transactions. While there is no one-size-fits-all solution to data privacy regulations, Fluree’s SmartFunctions allow for flexibility in enforcing data compliance policies and tying them to cryptographic identities and keys. Read more about data-centric security here.

Thanks for taking the time to read about time, the fourth layer of Fluree’s Data-Centric Architecture. The next blog in this series, “Data-Centric Sharing,” is available here.

More resources: