Welcome to part two of our Data-Centric Blog Series, where we peel back each layer of Fluree’s data-centric architecture stack. Our first installment, “Data-Centric Trust,” describes how the concepts around data provenance, lineage, and governance are central to a healthy data ecosystem.
Today, we will focus on making that data useful to a wide array of consumers through practicing semantic interoperability.
What is Semantic Interoperability?
Data integration now eats up over a third of the average IT department budget, when that overhead should really be spent on innovation. Why? Our systems aren’t natively speaking the same language.
In order to accomplish the increasingly-common need to combine disparate data, software developers are tasked with building and maintaining schema translation middleware and data scientists busy themselves with data reduction, cleansing, and harmonization.
Computer systems have always struggled with formatting discrepancies and ambiguity, analogous to human language barriers or even cultural semantics – “chips” in the UK (freshly-cut fried potatoes) versus “chips” in the US (thin, dry baked potatoes from a can/bag). When information needs to be integrated across systems, a lack of universal understanding can create unimaginable costs and headaches for the enterprise data team.
Semantic interoperability is the ability for systems to exchange information with shared meaning through the use of universal standards. In a semantic implementation, data arrives pre-packaged with self-described context, and the consumer of that information can derive meaning from that data through a universal vocabulary.
Topics related to Semantic Interoperability
Semantic Graphs are networks of semantic relationships within a database. Importantly, semantic graphs not only define the relationships and contexts between data elements, but also store those relationships themselves in the graph as data.
Semantic graphs are powerful, flexible tools that provide a home for data, relationships, and meaning. Semantic graphs provide relationship-rich queries without the need for joins or primary/foreign keys that are required in relational databases. If you are dealing with relationship-rich data, semantic graph technology will cut query latency and provide greater insight.
Ontologies are shared vocabularies that provide semantic-capable systems with a basis for defining and representing data within a context; they give us a way of formally representing concepts within a given domain.
Ontologies help wrangle common representations of data under a single vocabulary so that disparate and heterogeneous systems can interoperate with ease.
An example of a set of ontologies within a given domain would be FIBO, the Financial Industry Business Ontology, a trademarked effort by EDM Council. The FIBO effort addresses the complex nature of financial terms sprawled across industry data repositories and improves data quality for analysis and business applications.
RDF is a data model that represents prepositional data in “triples.” These triples are essentially assertions about a fact. Fluree, for example, uses subject-predicate-object.
Fluree actually extends this triple to include information about time to allow for retracting facts and to support extensible metadata inclusion.
RDF is incredibly flexible and therefore can become the basis of an RDF-graph, a set of interconnected triples. As a universal way of representing information, RDF can provide an excellent model for expressing ontologies.
More information on RDF:
Knowledge graphs are semantic graphs that align to an ontology. Knowledge graphs are essentially a fully-integrated platform of data unification, analytics, and sharing made possible by semantic graph formatting and ontological rules.
Why Semantic Interoperability is Important
The front-loaded effort in creating or mapping to a semantic ontology might seem burdensome (and it may very well be for a simple application stack.) The top three areas in which you should consider semantic implementation fall under data exchange, data federation, and data inference.
Will your data need to interact with other sources?
Semantic systems are able to automatically interpret the meaning of incoming data, making your data future-proof for use by any consumer. Semantic data exchange is absolutely essential for machine<>machine communication frameworks wherein data flows between systems with limited or no human interaction.
Will your data architecture involve the need to combine multiple data sources?
Semantic data integration provides a seamless and autonomous way of combining data sources and presenting them to applications as if they were pulled from the same source. For example, Fluree’s support for SPARQL allows for queries to hit multiple data sources and return the combined result within a single response.
Are you looking to uncover hidden or unrealized relationships within existing data?
Semantic web technologies that adhere to shared vocabularies can automatically “fill in” relationships given a set of rules. W3C provides two perfect examples of automatic data inference:
The data set to be considered may include the relationship (Flipper isA Dolphin). An ontology may declare that “every Dolphin is also a Mammal”. That means that a Semantic Web program understanding the notion of “X is also Y” can add the statement (Flipper isA Mammal) to the set of relationships, although that was not part of the original data. One can also say that the new relationship was “discovered”. Another example is to express [the] fact that “if two persons have the same name, home page, and email address, then they are identical”. In this case, the “identity” of two resources can be discovered via inferencing.Source: W3C
What can semantic interoperability do for you?
Practicing semantic interoperability allows organizations to treat data as a living asset that can be used and reused, combined, exchanged, and understood by various systems. Semantic data repositories bolster analytical capabilities, ease the integration of diverse sources, and, ultimately, provide the foundation for data-centric ecosystems. A few benefits include:
- Integrate data across disparate data sets and sources and federate data to multiple consumers
- Prepare your data to become machine-readable
- Uncover patterns and relationships within data
- Breakthrough data silos and resolve data discrepancies faster
- Provide an entire view of an enterprise’s information landscape (data federation)
- Improve database queries in both efficiency and quality
- Enhance data discoverability
Fluree – Semantic Graph Database
Fluree is a semantic RDF graph database with native support for W3C data standards. Organizations building on Fluree’s data platform gain the instant benefit of interoperability with other standards-based systems.
Fluree’s ledger files are persisted in the RDF data format standard, a W3C standard designed to promote universal data interoperability even when underlying schemas differ. As a result, data interoperability in Fluree is native, enabling linked data, shared vocabularies, inference, ontologies, decentralized identifiers, and verifiable credentials. Complex and multi-modal queries of data can not only interoperate across various Fluree ledgers and data stores but can interoperate globally with any other data built on the same W3C standards.
Fluree’s extensible metadata model (RDF ++) also allows for FAIR principles of data management (making data and metadata Findable, Accessible, Interoperable, and Reusable).
Thanks for taking the time to read about semantic interoperability, the second layer in Fluree’s Data-Centric Architecture. Next month, we’ll cover Data-Centric Security.