AI Golden Record Pipeline
Auto Content Tagging
486 Patterson Ave
Winston-Salem, NC 27101
– – –
11 Park Place
New York, NY, 1007
– – –
Bagmane Laurel, Krishnappa
Garden, C V Raman Nagar,
Karnataka 560093, India
– – –
1644 Platte Street
Denver, CO 80202
– – –
Lange Dreef 11
4131 NJ Vianen
Each person and thing moves, communicates, and connects within an ecosystem. Your home, for example, could be considered an ecosystem of appliances, family members, plumbing fixtures, waste disposal, and so on. A bank is an ecosystem of ATM machines, credit cards, checking accounts, wealth management, loan servicing, and other services. Because so much of our lives are digitized, it is possible to monitor these movements in the form of data. This is called a data ecosystem.
A data ecosystem is a way of gaining insight into what’s already happening to create new opportunities. It’s not about collecting every piece of imaginable data and trying to understand everything at once. Rather, it’s about linking together key pieces of data from across the ecosystem to fit a new use case. The speed and volume of data available today, combined with regulatory and economic considerations, creates urgency around building a data ecosystem. But where to begin?
In this two-part series, we’ll cover what a data ecosystem is, how organizations use data ecosystems in real life, how to think about your own, and the risks involved. This article will cover the what and how of data ecosystems. The next article will show you how to mitigate security-, privacy-, and sharing risks.
Information ecosystems are nothing new. What’s new, rather, is the ability suck in vast amounts of digital data, some of it unstructured, and run it through smart software to fuel a data ecosystem.
In the early 1900s, for example, companies like Ford and General Motors found success within the context of manufacturing- and services ecosystems. Steel mills, tire manufacturers, auto repair shops, road maintenance, and a slew of industries communicated within the auto ecosystem to ensure that cars were built, maintained, and operating on well-paved roads. Information moved through phones, ledgers, letters, and so on.
The auto ecosystem catalyzed other new ecosystems. Real-estate developers masterminded suburbs as a desirable place for newly car-endowed families to live. Highways, bridges, and tunnels led to a new form of tourism (the road trip) and roadside attractions. A cultural ecosystem also emerged around cars. They became not only a way to get from point to point, but a statement about one’s personal preferences and social status.
In the time of the first automobiles, information moved through in-person conversations, phone calls, telegraph, letters. In the modern ecosystem, data moves in packets of bytes, emitting from online interactions and sensors and traveling by way of text, phone calls, emails, and so on.
These bytes are not, by themselves, complicated. The complication comes from the various architectures we’ve built to process bytes. We don’t use different forms of the US dollar for banking, credit cards, retail, groceries, and housing. A dollar is a dollar is a dollar; its value is collectively understood. Yet data is stored, labeled, transported, and applied in unique and application-specific ways. This has to do with the rush to digitize, and the various custom solutions built to facilitate digitalization. Our complex ways of dealing with data get in the way of turning it into a commodity for everyone’s benefit.
Different data architectures have evolved based on use case and budget. Marketing collects, processes, stores, and integrates data very differently from DevOps, finance, or customer support. Instead of sharing data, these ecosystems rest beneath the organizational umbrella like separate islands.
Even if you’re a sole proprietorship with cloud storage and website analytics, you are plugged into a rudimentary data ecosystem. The point is to think about how you can make it yours, employing it for new areas of innovation, service, and cost savings. When you find a use case that links data together in new ways across the organization, you begin to build a bigger data ecosystem.
Recently, several factors have increased the pressure to leverage a data ecosystem. One comes from regulation. GDPR, healthcare interoperability requirements, and other regulations are pushing organizations to pay closer attention to their organization-wide data ecosystem. Supply-chain problems lingering beyond the Covid-19 pandemic have compelled leaders to gain a view into data ecosystems to pinpoint shortages and revenue leaks. Inflation is another factor. Industries like utilities and aviation use data ecosystems to optimize pricing and usage models.
The resurgence of AI adds urgency. A lot of organizations are wondering how to best use foundational models like Chat-GPT. While building your own, similar model costs hundreds of millions of dollars, limiting development to the most well-endowed corporations, anyone can tune a model to fit their own needs. For example, a salon chain might fine tune a model to create a salon-specific scheduling bot that aligns with the company’s tone, brand, and policies, rather than sounding generic. A tech company could tune a model to detect cybersecurity threats specific to its infrastructure.
If you lack oversight into your data sources, quality, and pipeline, the AI model can’t be tuned until you clean everything up, nor can it continue to learn. If your data security, privacy, and compliance are questionable, your model could be fed compromised data. AI is thus another catalyst to build an organization-wide data ecosystem. But where do you begin?
It would be impractical to wire together an entire organization-wide data ecosystem at once. Of the many combinations of storage, data processing, and other components that make up your broader ecosystem, you need to prioritize in order to stay on budget and actually make use of that glut of data. Think of a use case that aligns with your business goals. Audit what’s possible, then build a proof of concept from there.
Fluree CEO Brian Platz recently covered the most common use cases for data ecosystems in Forbes:
Does one of these use cases fit your business goals? If so—or even if you haven’t yet found a use case—know that there are discrete steps you can take to begin to build your own data ecosystem, one use case at a time.
There is a systematic approach to identify, evaluate, and implement a data ecosystem. Here’s a step-by-step guide.
In a previous blog post on data hoarding, I described how, in the mid-2000s, “IT suggested copying all relevant data into a single data warehouse, where it would be easier to pull and analyze.” This led to an excess of data sitting in data warehouses. The lesson: Don’t store everything you can, it gets messy and expensive.
Data collection is not one size fits all. Instead of setting up systems just to collect data, be strategic. Which data should you be collecting and why? The data you collect should align with your company’s business goals and challenges. For example, if you’re in retail, you might prioritize inventory optimization or customer segmentation. But if you’re running a mobile app, then you should be collecting device information, payment information, and usage data instead.
Think about the established use cases above. Do you most urgently want to save money, create a new line of business, predictive modeling, accelerate R&D, or tune/build AI models? What are other companies in your industry doing? What do the domain experts in your own organization want and need for their pain points? Once you answer these questions, you should come up with a short list of best fits. The next step is to validate these best fits against the volume, quality and type of data you are already collecting.
What data do you currently collect and store? Compile a comprehensive list of all the data sources your organization uses, both internal and external. This includes databases, spreadsheets, cloud storage, third-party data providers, IoT devices, and any other repositories where you collect and store data.
Once you’ve answered that question, create a data inventory that catalogs the types of data each source contains. Categorize the data into structured (e.g., databases), semi-structured (e.g., JSON or XML), and unstructured (e.g., text documents) formats. Note the frequency and volume of data generated or collected from each source. Assess the quality of your data. Evaluate data accuracy, completeness, consistency, and timeliness. Identify any data anomalies, duplicates, or errors that need to be addressed.
Review the security measures in place to protect your data. Is sensitive data encrypted and are your access controls robust? Is data regularly backed up to prevent data breaches or loss? Who owns and has access to each data source? Do you have defined rights and permissions so that you can ensure data security and compliance with privacy regulations like GDPR or HIPAA? Your data audit should clarify which data sources you should leverage first. If your sensitive data is not secured, it is not ready to be part of a data ecosystem.
Now that you have an idea of the state of your data, think about how to match the data to your most urgent and important use cases. Data will to some extent inform what you can do. What additional data sources and technologies will you need to achieve your use case? Is it within reach? Cross off use cases that are too far out of reach, and focus on the one you can build now.
Write down KPIs that your proof of concept will measure. Identify stakeholders that should be part of the proof of concept, such as data engineers, data scientists, analysts, and business stakeholders. Ensure everyone understands their roles and responsibilities. Now you’re ready to put together a proof of concept of your use case.
To test the waters, develop a small-scale project to demonstrate the feasibility and benefits of your data ecosystem. Pick the data sources, storage, data integration platforms, and visualization platforms that you will use (for the least risk, use ones that are already within your organization instead of buying new software).
Next, extract, transform, and load data into your database. Note that you can use AI at this stage. Fluree Sense, for example, uses supervised machine learning that is trained on labeled data. It can help uncover patterns in data and then predict patterns in new and unseen data. Thus, you can use machine learning to quickly identify and match records of the same entity type from diverse sources, even if there are mild discrepancies in the data.
Process and analyze your data to meet your use case. If there’s sensitive data, implement governance policies, access controls, and security measures. Test the performance and scalability of your data ecosystem under realistic workloads. Identify and address any bottlenecks or issues. Set up monitoring and logging to track the health and performance of your data ecosystem in real-time.
Regularly review and evaluate progress against the defined objectives and KPIs. Adjust as necessary to meet your goals. Collect feedback from all stakeholders involved. Use this feedback to fine-tune your data ecosystem and address any issues. Create a final report summarizing the results of the proof of concept, including insights gained, challenges faced, and recommendations for future steps. Based on the proof of concept’s results and feedback, make an informed decision on whether to proceed with the full-scale implementation of your data ecosystem or make necessary adjustments and conduct further testing.
Running a proof of concept for a data ecosystem can be a complex process. Once you validate your ideas and technologies, your organization will be able to move to a production-level deployment of a data ecosystem. Key pieces of data across your organization will be linked together to serve a use case that matches your business goals and capabilities. You’ll be one step closer to data-driven transformation.
In the next blog post in this two-part series, we’ll return to the proof of concept to cover the risks that you need to think about when building a data ecosystem. We’ll include data security, privacy, and sharing, including outside of your own ecosystem.
Blurb about what the company does and how they interact with Fluree blah blah blah minim officia amet nulla cupidatat eu id adipisicing velit aliquip elit labore labore aliquip exercitation enim do ea sunt nisi aute amet magna cillum culpa elit voluptate culpa officia eiusmod sunt ipsum duis laborum magna tempor cillum esse do sunt
Visit Partner Site More Details
"*" indicates required fields
Follow us on Linkedin
Join our Mailing List
Subscribe to our LinkedIn Newsletter
Subscribe to our YouTube channel
Partner, Analytic Strategy Partners; Frederick H. Rawson Professor in Medicine and Computer Science, University of Chicago and Chief of the Section of Biomedical Data Science in the Department of Medicine
Robert Grossman has been working in the field of data science, machine learning, big data, and distributed computing for over 25 years. He is a faculty member at the University of Chicago, where he is the Jim and Karen Frank Director of the Center for Translational Data Science. He is the Principal Investigator for the Genomic Data Commons, one of the largest collections of harmonized cancer genomics data in the world.
He founded Analytic Strategy Partners in 2016, which helps companies develop analytic strategies, improve their analytic operations, and evaluate potential analytic acquisitions and opportunities. From 2002-2015, he was the Founder and Managing Partner of Open Data Group (now ModelOp), which was one of the pioneers scaling predictive analytics to large datasets and helping companies develop and deploy innovative analytic solutions. From 1996 to 2001, he was the Founder and CEO of Magnify, which is now part of Lexis-Nexis (RELX Group) and provides predictive analytics solutions to the insurance industry.
Robert is also the Chair of the Open Commons Consortium (OCC), which is a not-for-profit that manages and operates cloud computing infrastructure to support scientific, medical, health care and environmental research.
Connect with Robert on Linkedin
Founder, DataStraits Inc., Chief Revenue Officer, 3i Infotech Ltd
Sudeep Nadkarni has decades of experience in scaling managed services and hi-tech product firms. He has driven several new ventures and corporate turnarounds resulting in one IPO and three $1B+ exits. VC/PE firms have entrusted Sudeep with key executive roles that include entering new opportunity areas, leading global sales, scaling operations & post-merger integrations.
Sudeep has broad international experience having worked, lived, and led firms operating in US, UK, Middle East, Asia & Africa. He is passionate about bringing innovative business products to market that leverage web 3.0 technologies and have embedded governance risk and compliance.
Connect with Sudeep on Linkedin
CEO, Data4Real LLC
Julia Bardmesser is a technology, architecture and data strategy executive, board member and advisor. In addition to her role as CEO of Data4Real LLC, she currently serves as Chair of Technology Advisory Council, Women Leaders In Data & AI (WLDA). She is a recognized thought leader in data driven digital transformation with over 30 years of experience in building technology and business capabilities that enable business growth, innovation, and agility. Julia has led transformational initiatives in many financial services companies such as Voya Financial, Deutsche Bank Citi, FINRA, Freddie Mac, and others.
Julia is a much sought-after speaker and mentor in the industry, and she has received recognition across the industry for her significant contributions. She has been named to engatica 2023 list of World’s Top 200 Business and Technology Innovators; received 2022 WLDA Changemaker in AI award; has been named to CDO Magazine’s List of Global Data Power Wdomen three years in the row (2020-2022); named Top 150 Business Transformation Leader by Constellation Research in 2019; and recognized as the Best Data Management Practitioner by A-Team Data Management Insight in 2017.
Connect with Julia on Linkedin
Senior Advisor, Board Member, Strategic Investor
After nine years leading the rescue and turnaround of Banco del Progreso in the Dominican Republic culminating with its acquisition by Scotiabank (for a 2.7x book value multiple), Mark focuses on advisory relationships and Boards of Directors where he brings the breadth of his prior consulting and banking/payments experience.
In 2018, Mark founded Alberdi Advisory Corporation where he is engaged in advisory services for the biotechnology, technology, distribution, and financial services industries. Mark enjoys working with founders of successful businesses as well as start-ups and VC; he serves on several Boards of Directors and Advisory Boards including MPX – Marco Polo Exchange – providing world-class systems and support to interconnect Broker-Dealers and Family Offices around the world and Fluree – focusing on web3 and blockchain. He is actively engaged in strategic advisory with the founder and Executive Committee of the Biotechnology Institute of Spain with over 50 patents and sales of its world-class regenerative therapies in more than 30 countries.
Prior work experience includes leadership positions with MasterCard, IBM/PwC, Kearney, BBVA and Citibank. Mark has worked in over 30 countries – extensively across Europe and the Americas as well as occasional experiences in Asia.
Connect with Mark on Linkedin
Chair of the Board, Enterprise Data Management Council
Peter Serenita was one of the first Chief Data Officers (CDOs) in financial services. He was a 28-year veteran of JPMorgan having held several key positions in business and information technology including the role of Chief Data Officer of the Worldwide Securities division. Subsequently, Peter became HSBC’s first Group Chief Data Officer, focusing on establishing a global data organization and capability to improve data consistency across the firm. More recently, Peter was the Enterprise Chief Data Officer for Scotiabank focused on defining and implementing a data management capability to improve data quality.
Peter is currently the Chairman of the Enterprise Data Management Council, a trade organization advancing data management globally across industries. Peter was a member of the inaugural Financial Research Advisory Committee (under the U.S. Department of Treasury) tasked with improving data quality in regulatory submissions to identify systemic risk.
Connect with Peter on Linkedin
Turn Data Chaos into Data Clarity
Enter details below to access the whitepaper.
Pawan came to Fluree via its acquisition of ZettaLabs, an AI based data cleansing and mastering company.His previous experiences include IBM where he was part of the Strategy, Business Development and Operations team at IBM Watson Health’s Provider business. Prior to that Pawan spent 10 years with Thomson Reuters in the UK, US, and the Middle East. During his tenure he held executive positions in Finance, Sales and Corporate Development and Strategy. He is an alumnus of The Georgia Institute of Technology and Georgia State University.
Connect with Pawan on Linkedin
Andrew “Flip” Filipowski is one of the world’s most successful high-tech entrepreneurs, philanthropists and industry visionaries. Mr. Filipowski serves as Co-founder and Co-CEO of Fluree, where he seeks to bring trust, security, and versatility to data.
Mr. Filipowski also serves as co-founder, chairman and chief executive officer of SilkRoad Equity, a global private investment firm, as well as the co-founder, of Tally Capital.
Mr. Filipowski was the former COO of Cullinet, the largest software company of the 1980’s. Mr. Filipowski founded and served as Chairman and CEO of PLATINUM technology, where he grew PLATINUM into the 8th largest software company in the world at the time of its sale to Computer Associates for $4 billion – the largest such transaction for a software company at the time. Upside Magazine named Mr. Filipowski one of the Top 100 Most Influential People in Information Technology. A recipient of Entrepreneur of the Year Awards from both Ernst & Young and Merrill Lynch, Mr. Filipowski has also been awarded the Young President’s Organization Legacy Award and the Anti-Defamation League’s Torch of Liberty award for his work fighting hate on the Internet.
Mr. Filipowski is or has been a founder, director or executive of various companies, including: Fuel 50, Veriblock, MissionMode, Onramp Branding, House of Blues, Blue Rhino Littermaid and dozens of other recognized enterprises.
Connect with Flip on Linkedin
Brian is the Co-founder and Co-CEO of Fluree, PBC, a North Carolina-based Public Benefit Corporation.
Platz was an entrepreneur and executive throughout the early internet days and SaaS boom, having founded the popular A-list apart web development community, along with a host of successful SaaS companies. He is now helping companies navigate the complexity of the enterprise data transformation movement.
Previous to establishing Fluree, Brian co-founded SilkRoad Technology which grew to over 2,000 customers and 500 employees in 12 global offices. Brian sits on the board of Fuel50 and Odigia, and is an advisor to Fabric Inc.
Connect with Brian on Linkedin
Eliud Polanco is a seasoned data executive with extensive experience in leading global enterprise data transformation and management initiatives. Previous to his current role as President of Fluree, a data collaboration and transformation company, Eliud was formerly the Head of Analytics at Scotiabank, Global Head of Analytics and Big Data at HSBC, head of Anti-Financial Crime Technology Architecture for U.S.DeutscheBank, and Head of Data Innovation at Citi.
In his most recent role as Head of Analytics and Data Standards at Scotiabank, Eliud led a full-spectrum data transformation initiative to implement new tools and technology architecture strategies, both on-premises as well as on Cloud, for ingesting, analyzing, cleansing, and creating consumption ready data assets.
Connect with Eliud on Linkedin
Get the right data into the right hands.
Build your Verifiable Credentials/DID solution with Fluree.
Wherever you are in your Knowledge Graph journey, Fluree has the tools and technology to unify data based on universal meaning, answer complex questions that span your business, and democratize insights across your organization.
Build real-time data collaboration that spans internal and external organizational boundaries, with protections and controls to meet evolving data policy and privacy regulations.
Fluree Sense auto-discovers data fitting across applications and data lakes, cleans and formats them into JSON-LD, and loads them into Fluree’s trusted data platform for sharing, analytics, and re-use.
Transform legacy data into linked, semantic knowledge graphs. Fluree Sense automates the data mappings from local formats to a universal ontology and transforms the flat files into RDF.
Whether you are consolidating data silos, migrating your data to a new platform, or building an MDM platform, we can help you build clean, accurate, and reliable golden records.
Our enterprise users receive exclusive support and even more features. Book a call with our sales team to get started.
Download Stable Version
Download Pre-Release Version
Register for Alpha Version
By downloading and running Fluree you agree to our terms of service (pdf).
Hello this is some content.