Local or Cloud
AI/ML Data Cleansing
Golden Record Pipeline
486 Patterson Ave
Winston-Salem, NC 27101
– – –
11 Park Place
New York, NY, 1007
– – –
Bagmane Laurel, Krishnappa
Garden, C V Raman Nagar,
Karnataka 560093, India
– – –
1644 Platte Street
Denver, CO 80202
– – –
Lange Dreef 11
4131 NJ Vianen
In 2020, we published a blog post on the FAIR principles for data management.
As a quick recap, the FAIR principles of data management emphasize Findability, Accessibility, Interoperability, and Reusability of data. These principles were outlined in a 2016 article published by Scientific Data as a way to curb silos and promote broader collaboration within the academic community. Here’s a quick summary of the FAIR data principles:
Like scholarly data, enterprise information today is lost in silos, rarely re-used, integrated, or leveraged in a meaningful way beyond its original purpose.
We call this forever-lost information “dark data.” According to CIO magazine, 40% to 90% of enterprise data is considered “dark”, depending on the industry.
Of the information that is potentially available for reuse through various extraction methods such as ETL, APIs, or data clouds, quality is often so far below par that it might be incomprehensible without expensive data engineering to normalize various information. We call this information “dirty data.” Experian recently reported that on average, U.S. organizations believe 32 percent of their data is inaccurate. The correlated impact is equally staggering: in the US alone, bad data costs businesses $3 Trillion Per Year
Behind these problems are the broken promises of “Big Data,” an illusion sold and perpetuated that an abundance of data and some fancy analytics tools could unlock a new era of knowledge discovery and decision-making. However, blindly implementing big data solutions often requires substantial investments in technology, infrastructure, and personnel. Moreover, the time and effort required to integrate disparate data sources and ensure data quality often outweighed the potential savings. In many cases, the costs associated with big data initiatives exceeded the benefits, leaving organizations disillusioned.
This is why Gartner recently predicted that 70% of organizations will shift their focus from Big Data to “Small and Wide” data, emphasizing the importance of high-quality, linked information over high quantities of low-quality data. Brian Platz, CEO of Fluree, covered this idea in 2022 with a Forbes opinion piece entitled How Small Data Is Fulfilling The Big Data Promise.
What does this have to do with FAIR? The FAIR principles provide an agnostic but prescriptive framework for making data high quality, accessible, and reusable so that all stakeholders along the data value chain can glean insights with minimal friction. Applying the FAIR principles as a governance framework can help organizations reduce the risk of dark or dirty data.
Today, we are making the case to extend the FAIR principles to include notions of extensibility, security, and trust. While FAIR provides an excellent framework for open data reuse, these three additions contribute guidance for organizations looking to treat data as a strategic, collaborative asset across boundaries.
Data-centricity is the ethos driving these additional principles. In a data-centric architecture, many of the capabilities required to share information across applications, business units, or organizational boundaries are pushed from the application layer (software) to the data layer (database). Organizations moving to a data-centric architecture begin to strip away the layers of middleware and software that seek to accomplish specific tasks related to interoperability, trust, security, and sharing and instead focus on data-centric and model-driven architectures that enable composability and collaboration out of the box.
Let’s dive into the proposed appendix to FAIR(EST):
Data extensibility involves the capacity to expand a data model dynamically for additional capabilities while preserving its original structure and meaning.
In a data-centric architecture, data is the central product, while “agents” such as applications, data science workflows, or machine learning systems interact with and revolve around this core of interconnected data. This requires the data to be useful in a variety of contexts, and, importantly, freed from proprietary formatting.
Leveraging standardized vocabularies, in the form of semantic ontologies, data producers and consumers can extend data’s potential value and applicability across operational or analytical contexts. Relational systems are rigid and often proprietary; the opposite is true for semantic graph databases, which are flexible, extendable, and built on open standards.
Extensibility through open semantics standards allows data models to grow and adapt as new analytical, operational, or regulatory data requirements emerge. This saves time and resources, as organizations can extend data models as needed instead of creating entirely new ones or investing in a mess of ETLs and disconnected data lakes.
While FAIR alone provides an excellent framework for interoperability and usability in spaces where data is an open resource, enterprise ecosystems often operate within closed or hybrid boundaries where privacy, compliance, and competitive advantage are key factors. In 2020, we presented on data-centric security at the DCAF (Data-Centric Architecture Forum), making the case for functions related to identity and access management to be accomplished at the data layer, as data. In a data-centric security framework, security policies and protocols are defined and enforced at the data layer, rather than deferred to a server, application, or network.
We call this “data defending itself.” Security is baked in, and thus inseparable from the data it protects. Using these powerful embedded operations, we can turn business logic into enforced policies that travel alongside the data, forever.
Enabling data-centric security opens data up to become more immediately and freely available. Within this framework, we can open up our data sets to be queried directly by virtually anyone without moving it or building specific APIs that abstract certain elements; data will be filtered out according to the established rules associated with the user’s identity.
Read more on data-centric security here.
We make the case in our data-centric series that in order for data to be effectively shared across trust boundaries, it must have digital authenticity inherently built in. If we are to share data dynamically across departments, partners, and other stakeholders, that information should come pre-packaged with proof of where it came from and that it did in fact originate from a trusted source.
A few elements make up what we call “trust” when it comes to information:
As more and more data consumers are unleashed onto enterprise data (*clears throat* AI), the imperative to ensure the digital authenticity of information becomes critical. This is a data quality, risk, and operational issue that cannot be addressed tomorrow.
Read more about data-centric trust here.
FAIR(EST) can be taken as a high-level framework to move your organization away from legacy data infrastructure. Modeling FAIR data future-proofs that information for re-use. Adding extensibility, security, and trust to your data enables true composability across boundaries without expensive data engineering or data security issues.
The result might closely align with what our industry calls “data mesh”, a lightweight infrastructure for decentralized sources to collaborate on data across technological and organizational boundaries. However you might define it, the path to FAIR(EST) is a technological and cultural journey worth hiking.
We will cover how organizations can take their first (second and third) steps to achieve data-centricity in our upcoming webinar with Dave McComb from Semantic Arts. Check it out here: Data-Centric Strategies to Break the Cycle of Legacy Addiction.
What other additions would you make to the FAIR principles? Email us at [email protected].
Follow us on Linkedin
Join our Mailing List
Subscribe to our LinkedIn Newsletter
Subscribe to our YouTube channel
Partner, Analytic Strategy Partners; Frederick H. Rawson Professor in Medicine and Computer Science, University of Chicago and Chief of the Section of Biomedical Data Science in the Department of Medicine
Robert Grossman has been working in the field of data science, machine learning, big data, and distributed computing for over 25 years. He is a faculty member at the University of Chicago, where he is the Jim and Karen Frank Director of the Center for Translational Data Science. He is the Principal Investigator for the Genomic Data Commons, one of the largest collections of harmonized cancer genomics data in the world.
He founded Analytic Strategy Partners in 2016, which helps companies develop analytic strategies, improve their analytic operations, and evaluate potential analytic acquisitions and opportunities. From 2002-2015, he was the Founder and Managing Partner of Open Data Group (now ModelOp), which was one of the pioneers scaling predictive analytics to large datasets and helping companies develop and deploy innovative analytic solutions. From 1996 to 2001, he was the Founder and CEO of Magnify, which is now part of Lexis-Nexis (RELX Group) and provides predictive analytics solutions to the insurance industry.
Robert is also the Chair of the Open Commons Consortium (OCC), which is a not-for-profit that manages and operates cloud computing infrastructure to support scientific, medical, health care and environmental research.
Connect with Robert on Linkedin
Founder, DataStraits Inc., Chief Revenue Officer, 3i Infotech Ltd
Sudeep Nadkarni has decades of experience in scaling managed services and hi-tech product firms. He has driven several new ventures and corporate turnarounds resulting in one IPO and three $1B+ exits. VC/PE firms have entrusted Sudeep with key executive roles that include entering new opportunity areas, leading global sales, scaling operations & post-merger integrations.
Sudeep has broad international experience having worked, lived, and led firms operating in US, UK, Middle East, Asia & Africa. He is passionate about bringing innovative business products to market that leverage web 3.0 technologies and have embedded governance risk and compliance.
Connect with Sudeep on Linkedin
CEO, Data4Real LLC
Julia Bardmesser is a technology, architecture and data strategy executive, board member and advisor. In addition to her role as CEO of Data4Real LLC, she currently serves as Chair of Technology Advisory Council, Women Leaders In Data & AI (WLDA). She is a recognized thought leader in data driven digital transformation with over 30 years of experience in building technology and business capabilities that enable business growth, innovation, and agility. Julia has led transformational initiatives in many financial services companies such as Voya Financial, Deutsche Bank Citi, FINRA, Freddie Mac, and others.
Julia is a much sought-after speaker and mentor in the industry, and she has received recognition across the industry for her significant contributions. She has been named to engatica 2023 list of World’s Top 200 Business and Technology Innovators; received 2022 WLDA Changemaker in AI award; has been named to CDO Magazine’s List of Global Data Power Wdomen three years in the row (2020-2022); named Top 150 Business Transformation Leader by Constellation Research in 2019; and recognized as the Best Data Management Practitioner by A-Team Data Management Insight in 2017.
Connect with Julia on Linkedin
Senior Advisor, Board Member, Strategic Investor
After nine years leading the rescue and turnaround of Banco del Progreso in the Dominican Republic culminating with its acquisition by Scotiabank (for a 2.7x book value multiple), Mark focuses on advisory relationships and Boards of Directors where he brings the breadth of his prior consulting and banking/payments experience.
In 2018, Mark founded Alberdi Advisory Corporation where he is engaged in advisory services for the biotechnology, technology, distribution, and financial services industries. Mark enjoys working with founders of successful businesses as well as start-ups and VC; he serves on several Boards of Directors and Advisory Boards including MPX – Marco Polo Exchange – providing world-class systems and support to interconnect Broker-Dealers and Family Offices around the world and Fluree – focusing on web3 and blockchain. He is actively engaged in strategic advisory with the founder and Executive Committee of the Biotechnology Institute of Spain with over 50 patents and sales of its world-class regenerative therapies in more than 30 countries.
Prior work experience includes leadership positions with MasterCard, IBM/PwC, Kearney, BBVA and Citibank. Mark has worked in over 30 countries – extensively across Europe and the Americas as well as occasional experiences in Asia.
Connect with Mark on Linkedin
Chair of the Board, Enterprise Data Management Council
Peter Serenita was one of the first Chief Data Officers (CDOs) in financial services. He was a 28-year veteran of JPMorgan having held several key positions in business and information technology including the role of Chief Data Officer of the Worldwide Securities division. Subsequently, Peter became HSBC’s first Group Chief Data Officer, focusing on establishing a global data organization and capability to improve data consistency across the firm. More recently, Peter was the Enterprise Chief Data Officer for Scotiabank focused on defining and implementing a data management capability to improve data quality.
Peter is currently the Chairman of the Enterprise Data Management Council, a trade organization advancing data management globally across industries. Peter was a member of the inaugural Financial Research Advisory Committee (under the U.S. Department of Treasury) tasked with improving data quality in regulatory submissions to identify systemic risk.
Connect with Peter on Linkedin
Turn Data Chaos into Data Clarity
"*" indicates required fields
Enter details below to access the whitepaper.
Pawan came to Fluree via its acquisition of ZettaLabs, an AI based data cleansing and mastering company.His previous experiences include IBM where he was part of the Strategy, Business Development and Operations team at IBM Watson Health’s Provider business. Prior to that Pawan spent 10 years with Thomson Reuters in the UK, US, and the Middle East. During his tenure he held executive positions in Finance, Sales and Corporate Development and Strategy. He is an alumnus of The Georgia Institute of Technology and Georgia State University.
Connect with Pawan on Linkedin
Andrew “Flip” Filipowski is one of the world’s most successful high-tech entrepreneurs, philanthropists and industry visionaries. Mr. Filipowski serves as Co-founder and Co-CEO of Fluree, where he seeks to bring trust, security, and versatility to data.
Mr. Filipowski also serves as co-founder, chairman and chief executive officer of SilkRoad Equity, a global private investment firm, as well as the co-founder, of Tally Capital.
Mr. Filipowski was the former COO of Cullinet, the largest software company of the 1980’s. Mr. Filipowski founded and served as Chairman and CEO of PLATINUM technology, where he grew PLATINUM into the 8th largest software company in the world at the time of its sale to Computer Associates for $4 billion – the largest such transaction for a software company at the time. Upside Magazine named Mr. Filipowski one of the Top 100 Most Influential People in Information Technology. A recipient of Entrepreneur of the Year Awards from both Ernst & Young and Merrill Lynch, Mr. Filipowski has also been awarded the Young President’s Organization Legacy Award and the Anti-Defamation League’s Torch of Liberty award for his work fighting hate on the Internet.
Mr. Filipowski is or has been a founder, director or executive of various companies, including: Fuel 50, Veriblock, MissionMode, Onramp Branding, House of Blues, Blue Rhino Littermaid and dozens of other recognized enterprises.
Connect with Flip on Linkedin
Brian is the Co-founder and Co-CEO of Fluree, PBC, a North Carolina-based Public Benefit Corporation.
Platz was an entrepreneur and executive throughout the early internet days and SaaS boom, having founded the popular A-list apart web development community, along with a host of successful SaaS companies. He is now helping companies navigate the complexity of the enterprise data transformation movement.
Previous to establishing Fluree, Brian co-founded SilkRoad Technology which grew to over 2,000 customers and 500 employees in 12 global offices. Brian sits on the board of Fuel50 and Odigia, and is an advisor to Fabric Inc.
Connect with Brian on Linkedin
Eliud Polanco is a seasoned data executive with extensive experience in leading global enterprise data transformation and management initiatives. Previous to his current role as President of Fluree, a data collaboration and transformation company, Eliud was formerly the Head of Analytics at Scotiabank, Global Head of Analytics and Big Data at HSBC, head of Anti-Financial Crime Technology Architecture for U.S.DeutscheBank, and Head of Data Innovation at Citi.
In his most recent role as Head of Analytics and Data Standards at Scotiabank, Eliud led a full-spectrum data transformation initiative to implement new tools and technology architecture strategies, both on-premises as well as on Cloud, for ingesting, analyzing, cleansing, and creating consumption ready data assets.
Connect with Eliud on Linkedin
Get the right data into the right hands.
Build your Verifiable Credentials/DID solution with Fluree.
Wherever you are in your Knowledge Graph journey, Fluree has the tools and technology to unify data based on universal meaning, answer complex questions that span your business, and democratize insights across your organization.
Build real-time data collaboration that spans internal and external organizational boundaries, with protections and controls to meet evolving data policy and privacy regulations.
Fluree Sense auto-discovers data fitting across applications and data lakes, cleans and formats them into JSON-LD, and loads them into Fluree’s trusted data platform for sharing, analytics, and re-use.
Transform legacy data into linked, semantic knowledge graphs. Fluree Sense automates the data mappings from local formats to a universal ontology and transforms the flat files into RDF.
Whether you are consolidating data silos, migrating your data to a new platform, or building an MDM platform, we can help you build clean, accurate, and reliable golden records.
Our enterprise users receive exclusive support and even more features. Book a call with our sales team to get started.
Download Stable Version
Download Pre-Release Version
Register for Alpha Version
By downloading and running Fluree you agree to our terms of service (pdf).
Hello this is some content.