Welcome to part three of Fluree’s Data-Centric Architecture series, where we peel back each layer of Fluree’s data-centric architecture stack.
Our first installment, “Data-Centric Trust,” describes the ways in which data provenance, lineage, and integrity are central to a healthy data ecosystem. Our second installment, “Semantic Data Interoperability,” covers how formatting data under a common vocabulary can help technologies exchange information with meaning.
Together, these concepts of “trust” and “interoperability” open the door to a new era of data collaboration, defined by dynamic data ecosystems, where data is published, accessed, and collaborated on by a wide variety of stakeholders.
However, as more stakeholders enter these data ecosystems, we’ll need to enforce various permissions and rules related to identity and access. In other words, as we broaden the audiences that are accessing data or transacting against data sets, we must rethink how and where we implement security.
This is where Fluree’s next “layer” is required: data-centric security.
What is Data-Centric Security?
The Data-Centric philosophy involves moving data management responsibilities from the application tier to the data tier — and security is no exception. With data-centric security, permissions related to data are baked in to the architecture as a core ingredient:
Data-centric security is an approach to information cybersecurity that emphasizes the security of data itself rather than the security of applications or networks. In a data-centric security framework, security policies and protocols are defined and enforced at the data layer, rather than deferred to a server, application, or network.
There are three key objectives to data-centric security: Manage, Track, and Protect.
- Manage – Define policies that determine who and how data can be accessed, contributed, or used
- Track – Monitor data’s supply chain as it moves through systems and users
- Protect – Enforce identity and access management protocols
In Fluree, these security objectives are defined, codified, stored, and executed as data in the database in the form of SmartFunctions
Why is data-centric security needed today?
More data silos = more attack surfaces
Data today is used and reused across multiple contexts, shared via webs of APIs, and duplicated into data silos for analytics. At every stage of reuse we introduce a new potential attack surface that must be monitored. Attackers know this – according to Akamai’s 2020 State of Internet Security report, 75% of total cyberattacks in the financial services industry were targeted on APIs. Re-implementing data security in every middleware, data lake, and API along this digital supply chain is simply not scalable.
Once you’re in, you’ve got root access
As exemplified by many of the data breaches in recent news, information security in an application-centric architecture is only as good as its endpoint security. The more attacks grow in complexity, the deeper security measures get pushed into online infrastructure. Yet data, the ultimate reward at the core of every hack, often remains unprotected.
Cloud computing, SaaS, and the era of remote devices
Thanks to the advent of cloud computing, enterprise data is now published to the cloud and accessed by many users across many devices across many networks (especially in today’s work-from-home era). The proliferation of bring-your-own personal devices and wifi networks is just one example of our inability to control how our information is being accessed and passed through systems.
The data supply chain is becoming complex and regulated
Data has been called the new “oil” – whether this is an accurate or poor analogy, we can all ascertain its ubiquity and importance in our global economy. And when something becomes ubiquitous, it is followed by regulation:
- Motor Vehicles were followed by Motor Vehicle Laws and the DMV
- Mass-produced food products led to nutritional labels and the FDA
- The airline industry was met with FAA policies
The first wave of data regulation has already taken place in the form of GDPR, CCPA, and the like. At the same time, data has evolved to become somewhat of an asset that can be passed around, exchanged, and even brokered. In other words, data now has a supply chain with various stakeholders. Add these emerging compliance pressures to this already complex supply chain mix, and now you’ve got quite the set of security demands to manage.
While there is certainly still merit to securing endpoints and tightening up network security, the above trends demonstrate a clear need to bring data-centric security into the overall enterprise strategy.
Data-Centric Security, Applied
In a data-centric security context, information will remain protected as it moves in and out of storage systems or applications as well as changing business contexts, regardless of the network or application security. We call this “data defending itself.” Security is baked in, and thus inseparable from the data it protects.
As you might imagine, data-centric security can simplify and automate data governance and security for data sets. By baking security directly into the data tier, we find many benefits, among them:
- Data is self-defending as it travels across contexts, domains, users, and networks – identity and access policies work consistently throughout information’s entire lifecycle
- Security logic becomes automated and scalable, instead of having to be re-implemented across all sources (apps, data lakes, middleware, APIs)
- Compliance can roll into the overall data governance strategy and reap the same benefits
- Security and governance practice becomes more aligned with changing business contexts
- Developers can focus on building great applications and APIs without bearing the responsibility of accounting for security or governance
To sum up these benefits, when data can “defend itself,” we can (1) mitigate data theft or loss, (2) build better governance and compliance strategies, and (3) provide improved delivery velocity to end users while reducing attack surfaces.
Fluree brings security into the data tier
With Fluree’s SmartFunctions, it is possible to embed data permissioning logic within the system as data itself. Because these policies are embedded within the data layer, they can leverage any conceivable set of data conditions or linked data in the Fluree system as context for evaluation.
This is made possible by Fluree’s notion of identity, in which all users transact or query via provable cryptographic identities that can be tied to various authorizations. These authorization rules can be complex and arbitrary, and can be enforced by evaluating a wide range of possible connections in the database (e.g. is the user linked to this data? is this data’s security score less than or equal to the user’s security score? are both the user and the data linked to the same organization, and is that organization located in country X, Y, or Z?).
Security = Identity + Rules
Smart Functions can reliably evaluate user identity and user data, because of Fluree’s fundamental implementation of cryptographic signatures for all queries and transactions. To read more about this set of features and values, check out Fluree’s guide to identity.
SmartFunctions allow the enforcement of arbitrarily complex data security policies and data shape validation at the data layer and support data-centric security initiatives such as cell-level security, attribute-based access control (ABAC) models, and granular permissioning logic that relies on linked data relationships. Within this framework, we can open up our data sets to be queried directly by virtually anyone; data will be filtered out according to the established rules associated with the user’s identity.
Using these powerful embedded operations, we can turn business logic into enforced policies that travel alongside the data, forever.
Simple SmartFunction Examples
Let’s explore some simple ideas of how SmartFunctions can enforce data-driven policies around queries and transactions:
|Business Logic Statement||Fluree Programming explanation|
|Only you can update your own data||This transaction is permitted only if the public identity signing the update belongs to the user record being updated|
|Wallet balances can only be positive numbers||This transaction is rejected entirely if an attempted transaction against wallet balance data would result in a value < 0|
|University course catalogs are visible to all, but only editable by university admin||Query attempts against course catalog data are unrestricted (either for specific fields or all fields). Transactions are rejected entirely unless the identity signing the transaction is linked to a university via graph edges that describe admin relationships.|
|A research database is freely accessible to students, but otherwise requires a user to have an active subscription||Queries by identities with a STUDENT role are unrestricted. Queries by other identities associated with other roles are only permitted if the identity is linked to subscription data with a status of ACTIVE|
|Users can’t invent money||Transactions involving wallet balances must sum to the same aggregate values as existed before the transaction, and users can only add to other balances if they subtract an equal amount from their own balance. Transactions that don’t follow these rules are rejected.|
Additional Resources on Data-Centric Security:
- Read up about SmartFunctions in the Fluree Docs: https://docs.flur.ee/guides/1.0.0/smart-functions/smart-functions
- Watch an example Fluree application of data-centric security within a master data management scenario: https://www.youtube.com/watch?v=e_r-L8ySpUg
- Watch our one-hour webinar on data-centric security in Fluree: https://www.youtube.com/watch?v=tkcQcFUV6gA
- Read Co-founder Brian Platz’s byline in Forbes on Data-Centric Security: https://www.forbes.com/sites/forbestechcouncil/2021/02/02/how-data-centric-security-can-protect-data-lakes-and-safeguard-innovation/?sh=2dd058f7c9fe