Fluree Blog Blog Post Kevin Doubleday05.05.21

Data-Centric Security

Letting information defend itself from the inside out for better collaboration, compliance, and governance.

Introduction

Welcome to part three of Fluree’s Data-Centric Architecture series, where we peel back each layer of Fluree’s data-centric architecture stack.

Our first installment, “Data-Centric Trust,” describes the ways in which data provenance, lineage, and integrity are central to a healthy data ecosystem. Our second installment, “Semantic Data Interoperability,” covers how formatting data under a common vocabulary can help technologies exchange information with meaning. 

Together, these concepts of “trust” and “interoperability” open the door to a new era of data collaboration, defined by dynamic data ecosystems, where data is published, accessed, and collaborated on by a wide variety of stakeholders. 

However, as more stakeholders enter these data ecosystems, we’ll need to enforce various permissions and rules related to identity and access. In other words, as we broaden the audiences that are accessing data or transacting against data sets, we must rethink how and where we implement security. 

This is where Fluree’s next “layer” is required: data-centric security.

What is Data-Centric Security?


The Data-Centric philosophy involves moving data management responsibilities from the application tier to the data tier — and security is no exception. With data-centric security, permissions related to data are baked in to the architecture as a core ingredient:

Data-centric security is an approach to information cybersecurity that emphasizes the security of data itself rather than the security of applications or networks. In a data-centric security framework, security policies and protocols are defined and enforced at the data layer, rather than deferred to a server, application, or network. 

There are three key objectives to data-centric security: Manage, Track, and Protect.

Manage Define policies that determine who and how data can be accessed, contributed, or used

Track Monitor data’s supply chain as it moves through systems and users

Protect – Enforce identity and access management protocols

This image shows a mobius loop that represents the three objectives of data-centric security. Part 1 is manage, meaning to define strict identity and access policies determining data access, contribution, or use. Part 2 is track, defined as: monitor data's supply chain as it moves through systems and users. Part 3 is Protect, which is enforce identity and access management policies. These three things travel in a continuous cycle.

In Fluree, these security objectives are defined, codified, stored, and executed as data in the database in the form of SmartFunctions

Why is data-centric security needed today? 


More data silos = more attack surfaces: Data today is used and reused across multiple contexts, shared via webs of APIs, and duplicated into data silos for analytics. At every stage of reuse we introduce a new potential attack surface that must be monitored. Attackers know this – according to Akamai’s 2020 State of Internet Security report, 75% of total cyberattacks in the financial services industry were targeted on APIs. Re-implementing data security in every middleware, data lake, and API along this digital supply chain is simply not scalable. 

Once you’re in, you’ve got root access: As exemplified by many of the data breaches in recent news, information security in an application-centric architecture is only as good as its endpoint security. The more attacks grow in complexity, the deeper security measures get pushed into online infrastructure. Yet data, the ultimate reward at the core of every hack, often remains unprotected.

Cloud computing, SaaS, and the era of remote devices: Thanks to the advent of cloud computing, enterprise data is now published to the cloud and accessed by many users across many devices across many networks (especially in today’s work-from-home era). The proliferation of bring-your-own personal devices and wifi networks is just one example of our inability to control how our information is being accessed and passed through systems.

The data supply chain is becoming complex and regulated: Data has been called the new “oil” – whether this is an accurate or poor analogy, we can all ascertain its ubiquity and importance in our global economy. And when something becomes ubiquitous, it is followed by regulation:

  • Motor Vehicles were followed by Motor Vehicle Laws and the DMV 
  • Mass-produced food products led to nutritional labels and the FDA
  • The airline industry was met with FAA policies

The first wave of data regulation has already taken place in the form of GDPR, CCPA, and the like. At the same time, data has evolved to become somewhat of an asset that can be passed around, exchanged, and even brokered. In other words, data now has a supply chain with various stakeholders. Add these emerging compliance pressures to this already complex supply chain mix, and now you’ve got quite the set of security demands to manage.

While there is certainly still merit to securing endpoints and tightening up network security, the above trends demonstrate a clear need to bring data-centric security into the overall enterprise strategy. 

Data-Centric Security, Applied


In a data-centric security context, information will remain protected as it moves in and out of storage systems or applications as well as changing business contexts, regardless of the network or application security. We call this “data defending itself.” Security is baked in, and thus inseparable from the data it protects. 

As you might imagine, data-centric security can simplify and automate data governance and security for data sets. By baking security directly into the data tier, we find many benefits, among them: 

  • Data is self-defending as it travels across contexts, domains, users, and networks – identity and access policies work consistently throughout information’s entire lifecycle
  • Security logic becomes automated and scalable, instead of having to be re-implemented across all sources (apps, data lakes, middleware, APIs)
  • Compliance can roll into the overall data governance strategy and reap the same benefits
  • Security and governance practice becomes more aligned with changing business contexts
  • Developers can focus on building great applications and APIs without bearing the responsibility of accounting for security or governance 

To sum up these benefits, when data can “defend itself,” we can (1) mitigate data theft or loss, (2) build better governance and compliance strategies, and (3) provide improved delivery velocity to end users while reducing attack surfaces. 

Fluree brings security into the data tier


With Fluree’s SmartFunctions, it is possible to embed data permissioning logic within the system as data itself. Because these policies are embedded within the data layer, they can leverage any conceivable set of data conditions or linked data in the Fluree system as context for evaluation.

This is made possible by Fluree’s notion of identity, in which all users transact or query via provable cryptographic identities that can be tied to various authorizations. These authorization rules can be complex and arbitrary, and can be enforced by evaluating a wide range of possible connections in the database (e.g. is the user linked to this data? is this data’s security score less than or equal to the user’s security score? are both the user and the data linked to the same organization, and is that organization located in country X, Y, or Z?). 

Security = Identity + Rules

Smart Functions can reliably evaluate user identity and user data, because of Fluree’s fundamental implementation of cryptographic signatures for all queries and transactions. To read more about this set of features and values, check out Fluree’s guide to identity.


SmartFunctions allow the enforcement of arbitrarily complex data security policies and data shape validation at the data layer and support data-centric security initiatives such as cell-level security, attribute-based access control (ABAC) models, and granular permissioning logic that relies on linked data relationships. Within this framework, we can open up our data sets to be queried directly by virtually anyone; data will be filtered out according to the established rules associated with the user’s identity. 

Using these powerful embedded operations, we can turn business logic into enforced policies that travel alongside the data, forever. 

Simple SmartFunction Examples

Let’s explore some simple ideas of how SmartFunctions can enforce data-driven policies around queries and transactions:

Here's a series of Business Logic statements and their corresponding fluree programming explanations: 
1. Business Logic Statement: Only you can update your own data. Fluree Programming explanation: This transaction is permitted only if the public identity signing the update belongs to the user record being updated.
2. Business Logic Statement: Wallet balances can only be positive numbers. Fluree Programming explanation: This transaction is rejected entirely if an attempted transaction against wallet balance data would result in a value < 0.
3. Business Logic Statement: University course catalogs are visible to all, but only editable by university admin. Fluree Programming explanation: Query attempts against course catalog data are unrestricted (either for specific fields or all fields). Transactions are rejected entirely unless the identity signing the transaction is linked to a university via graph edges that describe admin relationships.
4. Business Logic Statement: A research database is freely accessible to students, but otherwise requires a user to have an active subscription. Fluree Programming explanation: Queries by identities with a STUDENT role are unrestricted. Queries by other identities associated with other roles are only permitted if the identity is linked to subscription data with a status of ACTIVE.
5. Business Logic Statement: Users can’t invent money. Fluree Programming explanation: Transactions involving wallet balances must sum to the same aggregate values as existed before the transaction, and users can only add to other balances if they subtract an equal amount from their own balance. Transactions that don’t follow these rules are rejected.

Thanks for taking the time to read about security, the third layer of Fluree’s Data-Centric Architecture. Check out our next post on Data-Centric Time.

Additional Resources on Data-Centric Security:

Read up about SmartFunctions in the Fluree Docs

Read Co-founder Brian Platz’s byline in Forbes on Data-Centric Security

Watch a demo Fluree application of data-centric security within a master data management scenario

Watch our one-hour webinar on data-centric security in Fluree