Local or Cloud
AI/ML Data Cleansing
Golden Record Pipeline
486 Patterson Ave
Winston-Salem, NC 27101
– – –
11 Park Place
New York, NY, 1007
– – –
Bagmane Laurel, Krishnappa
Garden, C V Raman Nagar,
Karnataka 560093, India
– – –
1644 Platte Street
Denver, CO 80202
– – –
Lange Dreef 11
4131 NJ Vianen
We’ve all heard that “data is the new oil” or a similar analogy to describe the potential business value of enterprise information. But can that oil be found? According to a new Forrester report, the answer is likely not, seeing as employees lose 12 hours a week chasing data on average.
Can that data be leveraged? Forrester has the grim answer again: between 60 percent and 73 percent of all data within an enterprise goes unused for analytics.
While we’ve made impressive strides in IT to accomplish tasks at scale and speed (storage, compute, AI), we seem to have treated data as a by-product of these functions, without accounting for the need to re-use or share that data beyond its originating source system.
More specifically, we’ve treated data as a siloed by-product of the average 367 software apps and systems large organizations use to manage their workflows, none of which “speak the same language.”
As a result, we are left with sprawling, disconnected heterogeneous data sources that are potentially duplicated, out-of-date, inaccessible, and most likely never used.
It’s no surprise that “democratizing data across organizations” is on the top of most Chief Data Officers’ priority list, but how do we accomplish this at scale and without adding yet-another-data silo in the form of a fancy new data lake or warehouse?
Chief Data Officers and other professionals in the enterprise data management space are turning to knowledge graphs as the desired tool to connect disparate heterogeneous data assets across organizational disciplines. Gartner predicts that by 2025, graph technologies will be used in 80% of data and analytics innovations, up from 10% in 2021, facilitating rapid decision making across the organization.
A knowledge graph is a database that represents knowledge as a network of entities and relationships between them. Knowledge graphs are comprised of the following elements:
Knowledge Graphs offer a powerful way to organize and connect data across an organization through the use of semantic standards and universal ontologies (read more on the Fluree Blog: Semantic Interoperability – exchanging data with meaning). Knowledge graph use cases are growing rapidly, as the need to connect and integrate disparate data sources grows everyday for organizations looking to play effective “data offense” and “data defense” simultaneously.
However, despite the many benefits of knowledge graphs, most enterprises are not yet ready for such an initiative, specifically due to poor, duplicated, and non-interoperable data. Those 300+ saas application silos are contributing to poor data quality, lack of interoperability, and lack of data governance.
Challenge #1: Lack of Data Quality
One of the biggest challenges facing organizations looking to implement a knowledge graph is the quality of their existing data. In many cases, enterprise data is duplicated, incomplete, or simply not fit for purpose. This can lead to a range of problems, from difficulty in extracting insights from the data to confusion and errors when trying to make sense of it all. Poor data quality can be a major roadblock for any knowledge graph initiative, as it can make it difficult to build an accurate and comprehensive understanding of an organization’s data.
Challenge #2: Lack of Data Interoperability
Another challenge for knowledge graph implementation is the issue of interoperability. Most organizations have data stored in various formats and systems, making it difficult to connect the dots and derive meaningful insights from the data. In addition, many enterprises rely on proprietary software and data formats, which can make it even harder to integrate disparate data sources into a single knowledge graph. Without a standard way to connect all of their data sources, organizations are unable to build a comprehensive knowledge graph that reflects the true complexity of their business
Challenge #3: Lack of Data Governance
Lastly, many enterprises struggle with data governance and management, which can be a significant barrier to knowledge graph implementation. Data governance encompasses a wide range of practices and policies that are designed to ensure that data is managed effectively, from the way it is stored and secured to the way it is used and shared. Without robust data governance, organizations may be unable to ensure that their data is of sufficient quality and consistency to support a knowledge graph initiative. This can lead to a lack of trust in the data and make it difficult to build meaningful insights from it.
While knowledge graphs offer a powerful way to unlock the potential of enterprise data, most organizations are not yet ready for such an initiative. The challenges of poor, duplicated, and non-interoperable data, as well as data governance and management, pose significant barriers to implementation.
And – given the state of enterprise data management – the average data source is not quite ready for inclusion in a knowledge graph.
Best Practices in Prepping Enterprise Data for Knowledge Graphs
Knowledge graphs provide a powerful way to capture, organize, and analyze information from various sources, enabling organizations to gain insights that were previously hidden or difficult to access. However, preparing legacy data for an enterprise knowledge graph can be a complex and challenging process. Let’s dive into the common steps needed to build an effective enterprise knowledge graph:
1 – Define the scope and goals of the knowledge graph project: The first step in preparing legacy data for an enterprise knowledge graph is to clearly define the scope and goals of the project. This involves identifying the data sources that will be included in the knowledge graph, the types of entities and relationships that will be represented, and the business use cases that the knowledge graph will support.
2 – Cleanse and standardize data: Before data can be added to a knowledge graph, it must be cleansed and standardized to ensure accuracy and consistency. This involves identifying and correcting errors, removing duplicate entries, and standardizing formats and values across different data sources.
3 – Transform data into a graph-friendly format: Once data has been cleansed and standardized, it must be transformed into a graph-friendly format. This involves mapping the data to a graph schema that defines the entities and relationships that will be represented in the knowledge graph. The schema should be designed to support the business use cases and goals of the project, and it should be flexible enough to accommodate changes and additions as the knowledge graph evolves over time.
4 – Map data to graph schema: After the schema has been defined, data must be mapped to the schema to create the knowledge graph. This involves identifying the entities and relationships in the data and creating nodes and edges in the graph that represent them. The process of mapping data to a graph schema can be automated to some extent, but it often requires human input and expertise to ensure that the resulting graph accurately reflects the data.
5 – Validate and refine the knowledge graph: Once the knowledge graph has been created, it must be validated and refined to ensure that it accurately represents the data and supports the business use cases of the project. This involves testing the graph against various scenarios and use cases, refining the schema and data mappings as needed, and incorporating feedback from stakeholders and users.
Most data transformation projects are costly and time consuming – these same barriers exist for any knowledge graph initiative. While we certainly need to address the above challenges (data cleanliness, interoperability, structure, and standardization), the data engineering required can bring quite the pricetag.
Fluree Sense automates these processes: using machine learning and AI to find patterns inherent in data to help map data across multiple ontologies, Fluree Sense transforms data silos into structured, semantic data assets that are optimized for Knowledge Graph. With Fluree Sense, you can automatically transform your legacy data into a format that is compatible with your enterprise knowledge graph.
By using Fluree Sense to prepare your legacy data for an enterprise knowledge graph, you can streamline the data preparation process and ensure that your knowledge graph is built on a solid foundation of reliable and accurate data. Data is now semantically described in multiple ontologies, and can therefore be accessed by many users within and outside a company’s four walls based on whichever vocabulary they are comfortable interacting in. Data is also saved in RDF-friendly form so that it can be loaded into KnowledgeGraphs which enable users to analyze and introspect the data using queries more powerful than traditional SQL database queries alone.With Fluree Sense, you get best-in-class data cleansing technology that is business user-friendly.
A thousand times, no. Data Cleansing is a great way to get enterprise data into a usable state, but it does not address the fundamental problem that enterprises must address: their source data is, by nature, siloed.
The ideal scenario is that the necessity for data cleansing diminishes over time, as the underlying reasons for data problems are addressed. Without tackling the fundamental issues of native interoperability, semantics, trust, quality, and security, we will only be applying temporary fixes to a convoluted and deeply ingrained architectural problem.
We cover the fundamentals of addressing each of these “data problems” in our data-centric architecture series. Read it here.
Follow us on Linkedin
Join our Mailing List
Subscribe to our LinkedIn Newsletter
Subscribe to our YouTube channel
Partner, Analytic Strategy Partners; Frederick H. Rawson Professor in Medicine and Computer Science, University of Chicago and Chief of the Section of Biomedical Data Science in the Department of Medicine
Robert Grossman has been working in the field of data science, machine learning, big data, and distributed computing for over 25 years. He is a faculty member at the University of Chicago, where he is the Jim and Karen Frank Director of the Center for Translational Data Science. He is the Principal Investigator for the Genomic Data Commons, one of the largest collections of harmonized cancer genomics data in the world.
He founded Analytic Strategy Partners in 2016, which helps companies develop analytic strategies, improve their analytic operations, and evaluate potential analytic acquisitions and opportunities. From 2002-2015, he was the Founder and Managing Partner of Open Data Group (now ModelOp), which was one of the pioneers scaling predictive analytics to large datasets and helping companies develop and deploy innovative analytic solutions. From 1996 to 2001, he was the Founder and CEO of Magnify, which is now part of Lexis-Nexis (RELX Group) and provides predictive analytics solutions to the insurance industry.
Robert is also the Chair of the Open Commons Consortium (OCC), which is a not-for-profit that manages and operates cloud computing infrastructure to support scientific, medical, health care and environmental research.
Connect with Robert on Linkedin
Founder, DataStraits Inc., Chief Revenue Officer, 3i Infotech Ltd
Sudeep Nadkarni has decades of experience in scaling managed services and hi-tech product firms. He has driven several new ventures and corporate turnarounds resulting in one IPO and three $1B+ exits. VC/PE firms have entrusted Sudeep with key executive roles that include entering new opportunity areas, leading global sales, scaling operations & post-merger integrations.
Sudeep has broad international experience having worked, lived, and led firms operating in US, UK, Middle East, Asia & Africa. He is passionate about bringing innovative business products to market that leverage web 3.0 technologies and have embedded governance risk and compliance.
Connect with Sudeep on Linkedin
CEO, Data4Real LLC
Julia Bardmesser is a technology, architecture and data strategy executive, board member and advisor. In addition to her role as CEO of Data4Real LLC, she currently serves as Chair of Technology Advisory Council, Women Leaders In Data & AI (WLDA). She is a recognized thought leader in data driven digital transformation with over 30 years of experience in building technology and business capabilities that enable business growth, innovation, and agility. Julia has led transformational initiatives in many financial services companies such as Voya Financial, Deutsche Bank Citi, FINRA, Freddie Mac, and others.
Julia is a much sought-after speaker and mentor in the industry, and she has received recognition across the industry for her significant contributions. She has been named to engatica 2023 list of World’s Top 200 Business and Technology Innovators; received 2022 WLDA Changemaker in AI award; has been named to CDO Magazine’s List of Global Data Power Wdomen three years in the row (2020-2022); named Top 150 Business Transformation Leader by Constellation Research in 2019; and recognized as the Best Data Management Practitioner by A-Team Data Management Insight in 2017.
Connect with Julia on Linkedin
Senior Advisor, Board Member, Strategic Investor
After nine years leading the rescue and turnaround of Banco del Progreso in the Dominican Republic culminating with its acquisition by Scotiabank (for a 2.7x book value multiple), Mark focuses on advisory relationships and Boards of Directors where he brings the breadth of his prior consulting and banking/payments experience.
In 2018, Mark founded Alberdi Advisory Corporation where he is engaged in advisory services for the biotechnology, technology, distribution, and financial services industries. Mark enjoys working with founders of successful businesses as well as start-ups and VC; he serves on several Boards of Directors and Advisory Boards including MPX – Marco Polo Exchange – providing world-class systems and support to interconnect Broker-Dealers and Family Offices around the world and Fluree – focusing on web3 and blockchain. He is actively engaged in strategic advisory with the founder and Executive Committee of the Biotechnology Institute of Spain with over 50 patents and sales of its world-class regenerative therapies in more than 30 countries.
Prior work experience includes leadership positions with MasterCard, IBM/PwC, Kearney, BBVA and Citibank. Mark has worked in over 30 countries – extensively across Europe and the Americas as well as occasional experiences in Asia.
Connect with Mark on Linkedin
Chair of the Board, Enterprise Data Management Council
Peter Serenita was one of the first Chief Data Officers (CDOs) in financial services. He was a 28-year veteran of JPMorgan having held several key positions in business and information technology including the role of Chief Data Officer of the Worldwide Securities division. Subsequently, Peter became HSBC’s first Group Chief Data Officer, focusing on establishing a global data organization and capability to improve data consistency across the firm. More recently, Peter was the Enterprise Chief Data Officer for Scotiabank focused on defining and implementing a data management capability to improve data quality.
Peter is currently the Chairman of the Enterprise Data Management Council, a trade organization advancing data management globally across industries. Peter was a member of the inaugural Financial Research Advisory Committee (under the U.S. Department of Treasury) tasked with improving data quality in regulatory submissions to identify systemic risk.
Connect with Peter on Linkedin
Turn Data Chaos into Data Clarity
"*" indicates required fields
Enter details below to access the whitepaper.
Pawan came to Fluree via its acquisition of ZettaLabs, an AI based data cleansing and mastering company.His previous experiences include IBM where he was part of the Strategy, Business Development and Operations team at IBM Watson Health’s Provider business. Prior to that Pawan spent 10 years with Thomson Reuters in the UK, US, and the Middle East. During his tenure he held executive positions in Finance, Sales and Corporate Development and Strategy. He is an alumnus of The Georgia Institute of Technology and Georgia State University.
Connect with Pawan on Linkedin
Andrew “Flip” Filipowski is one of the world’s most successful high-tech entrepreneurs, philanthropists and industry visionaries. Mr. Filipowski serves as Co-founder and Co-CEO of Fluree, where he seeks to bring trust, security, and versatility to data.
Mr. Filipowski also serves as co-founder, chairman and chief executive officer of SilkRoad Equity, a global private investment firm, as well as the co-founder, of Tally Capital.
Mr. Filipowski was the former COO of Cullinet, the largest software company of the 1980’s. Mr. Filipowski founded and served as Chairman and CEO of PLATINUM technology, where he grew PLATINUM into the 8th largest software company in the world at the time of its sale to Computer Associates for $4 billion – the largest such transaction for a software company at the time. Upside Magazine named Mr. Filipowski one of the Top 100 Most Influential People in Information Technology. A recipient of Entrepreneur of the Year Awards from both Ernst & Young and Merrill Lynch, Mr. Filipowski has also been awarded the Young President’s Organization Legacy Award and the Anti-Defamation League’s Torch of Liberty award for his work fighting hate on the Internet.
Mr. Filipowski is or has been a founder, director or executive of various companies, including: Fuel 50, Veriblock, MissionMode, Onramp Branding, House of Blues, Blue Rhino Littermaid and dozens of other recognized enterprises.
Connect with Flip on Linkedin
Brian is the Co-founder and Co-CEO of Fluree, PBC, a North Carolina-based Public Benefit Corporation.
Platz was an entrepreneur and executive throughout the early internet days and SaaS boom, having founded the popular A-list apart web development community, along with a host of successful SaaS companies. He is now helping companies navigate the complexity of the enterprise data transformation movement.
Previous to establishing Fluree, Brian co-founded SilkRoad Technology which grew to over 2,000 customers and 500 employees in 12 global offices. Brian sits on the board of Fuel50 and Odigia, and is an advisor to Fabric Inc.
Connect with Brian on Linkedin
Eliud Polanco is a seasoned data executive with extensive experience in leading global enterprise data transformation and management initiatives. Previous to his current role as President of Fluree, a data collaboration and transformation company, Eliud was formerly the Head of Analytics at Scotiabank, Global Head of Analytics and Big Data at HSBC, head of Anti-Financial Crime Technology Architecture for U.S.DeutscheBank, and Head of Data Innovation at Citi.
In his most recent role as Head of Analytics and Data Standards at Scotiabank, Eliud led a full-spectrum data transformation initiative to implement new tools and technology architecture strategies, both on-premises as well as on Cloud, for ingesting, analyzing, cleansing, and creating consumption ready data assets.
Connect with Eliud on Linkedin
Get the right data into the right hands.
Build your Verifiable Credentials/DID solution with Fluree.
Wherever you are in your Knowledge Graph journey, Fluree has the tools and technology to unify data based on universal meaning, answer complex questions that span your business, and democratize insights across your organization.
Build real-time data collaboration that spans internal and external organizational boundaries, with protections and controls to meet evolving data policy and privacy regulations.
Fluree Sense auto-discovers data fitting across applications and data lakes, cleans and formats them into JSON-LD, and loads them into Fluree’s trusted data platform for sharing, analytics, and re-use.
Transform legacy data into linked, semantic knowledge graphs. Fluree Sense automates the data mappings from local formats to a universal ontology and transforms the flat files into RDF.
Whether you are consolidating data silos, migrating your data to a new platform, or building an MDM platform, we can help you build clean, accurate, and reliable golden records.
Our enterprise users receive exclusive support and even more features. Book a call with our sales team to get started.
Download Stable Version
Download Pre-Release Version
Register for Alpha Version
By downloading and running Fluree you agree to our terms of service (pdf).
Hello this is some content.