Local or Cloud
AI/ML Data Cleansing
Golden Record Pipeline
486 Patterson Ave
Winston-Salem, NC 27101
– – –
11 Park Place
New York, NY, 1007
– – –
Bagmane Laurel, Krishnappa
Garden, C V Raman Nagar,
Karnataka 560093, India
– – –
1644 Platte Street
Denver, CO 80202
– – –
Lange Dreef 11
4131 NJ Vianen
It’s a rainy Sunday afternoon, and my friend, Alice, and I are playing “Where’s Waldo.”* This page is a particularly challenging one, and we’ve been looking for nearly twenty minutes. I’m about ready to give up, when- finally- I spot Waldo, and I yell out, “I got him!”
“Oh really? Prove it then,” Alice says, hand on her hip.
I raise my finger, and I’m about to point to Waldo’s location, but she stops me, exclaiming, “Wait, wait, wait! Wait just one second! I’ve put so much work into finding Waldo, and I don’t want you to ruin it for me.”
“Hmm…” I say. “If only there was a way for me to prove to you that I know where Waldo is without revealing his location.”
“Here, I have an idea,” Alice says. “I’ll make a photocopy of this page in the book, and you can cut Waldo out of the photocopy and show him to me. That way, I know for sure that you know where Waldo is, AND I still won’t know where he is.”
As you might have guessed by the title of this blog, the scenario I’ve described is a classic example of a real-life zero knowledge proof. In this story (which **definitely** happened in real life), I am proving that I possess a piece of information (Waldo’s location), while at the same time not revealing that information.
There are many types and potential applications of zero-knowledge proofs, including nuclear disarmament (yes, really) and verified anonymous voting. In this example project, I’ve focused on how using Fluree in conjunction with zero-knowledge proofs can begin to tackle the challenge of traceable fishing.
Before we begin, I need to make the disclaimer that I am, by no means, an expert in zero-knowledge proofs. I welcome and encourage comments, suggestions, corrections, and more in the comments. This project is an exploration of the possibility of using zero-knowledge proofs with Fluree.
With three billion people relying on fish as their main source of protein, oceans are a critical food source. Given the importance of marine life to human nourishment, efforts have been made to regulate fishing locations, methods, and quantity. These regulation attempts are sometimes at odds with the desires of fishers who, among other concerns, are not too keen to reveal their fishing locations. This is where zero-knowledge proofs can come in. Zero-knowledge proofs can allow a fisher to prove that they are, for example, fishing within an allowed area, without exposing the exact location.
To follow along with this example, you can download the fluree/legal-fishing repository. If video is more your speed, you can also check out this project’s accompanying video above or on youtube.
Using circom, we’ll create a representation of a circuit, but let’s imagine for a second that it’s a real-world circuit. You and I sit down, and we build an electrical circuit with 5 switches on one side. The circuit has a light bulb that will always light up if exactly 3 of the switches are on- it doesn’t matter which three.
I leave the room, and you flip any three of the switches that you want. You then cover up the switches with a cardboard box so that no one can see them. I walk into the room, see the light bulb is shining, and I know that the circuit is complete. I know that your input (which switches you turned on) fit our criteria (exactly 3 switches turned on), but I won’t know exactly what your input was.This type of zero-knowledge proof is analogous to what we are mathematically accomplishing using circom and snarkjs. We won’t delve into the math here, but hopefully this has given you some intuition.
The specific type of circuit we are creating is an arithmetic circuit, which is a circuit that can perform some arithmetic operations. If you are following along with the GitHub repository, the circuit is `src/circuits/InRange.circom`. Our circuit will take two public inputs:
These two inputs will be visible to the public. This circuit will also take a private input:
The circuit will output a 0 (if the circuit is not satisfied) or a 1 (if it is). To compile the circuit, you’ll need to have circom installed:
npm install -g circom
Then you’ll need compile the circuit:
circom InRange.circom -o InRange.json
This will take InRange.circom as an input an output InRange.json in the same directory.
Now, we need to setup the circuit. In order to do this, we’ll need to have snarkjs installed.
npm install -g snarkjs
And then we can issue:
snarkjs setup -c InRange.json
This will create two files, proving_key.json and verification_key.json. As the names suggest, the proving key is the key you’ll need to prove that your input (your location) is valid. The verification key is the key you’ll need to verify anyone else’s proofs. When we set up our Fluree ledger, we’ll be putting both the proving and verification key (as well as the circuit) on the ledger.Note – This type of zero-knowledge proof requires a trusted setup. The process of generating these keys will also create some toxic data that must be deleted. Participants need to trust that the toxic data was deleted. It is important to note that this toxic data would allow an untrustworthy participant to create a fake proof using inputs that don’t match the constraints. The toxic data would NOT allow a user to discover someone else’s secret location. But this toxic data is only created once, and there are methods to minimize the risk. For example, a multi-party trusted setup creates a situation where a number of participants come together to generate the proving and verification keys, and each of them possess a piece of toxic data. In a setup like this, the only way to create a fake proof would be if every single party was untrustworthy, and they all kept their toxic data and then colluded by bringing their toxic data together (hopefully an unlikely occurrence!).
Before we can create a proof, we need to calculate all the signals in the circuit (including all the intermediate inputs) that match the circuit’s constraints. In order to do this, we’ll need to create an input.json, which has all of our inputs (including the private inputs). Neither the inputs.json, nor the witness.json files will be shared with anyone, but we do need to calculate them first.Our input file needs to be a map, where the keys are the names of all of the circuit’s inputs, and the values are our specific inputs. For example:
"latitudeRange": [ 20, 21],
"longitudeRange": [ 176, 190],
"fishingLocation": [ 20, 180]
We can then calculate the witness, which will generate the witness.json file.
snarkjs calculatewitness -c InRange.json
Now, we have all of the pieces to create the proof:
This command uses the proving key and the signals in witness.json to generate a proof.json (the actual proof) and public.json, which is a subset of your witness.json containing only your public inputs and the outputs.
You can now give any other party your verification_key.json, proof.json, and public.json, and they can verify that you put in an input that matched the constraints (a location within the legal range).
In the fluree/legal-fishing repo, we not only have an example circuit with example keys and inputs, but we also have a small demo React app that makes it easy to connect this zero-knowledge proof to a Fluree instance.To get this running, you’ll need to:
Now, you can run `npm install` and `npm start` to start up the lightweight React app, which integrates Fluree with the zero-knowledge proofs.
You can use the app to generate a proof and submit the proof and public signals to the Fluree ledger.You can also click on the Verify Proofs pages to see all the proofs that have been submitted to this ledger. You can click on “Verify Proof” to verify any given proof. Note that verifying a proof takes a little while, so expect to wait 10 – 20 seconds before a green “Verified!” alert comes up. For a full tour of the application, as well as a visual walk-through of getting the circuit and app setup, check out this project’s accompanying video.
This small project is only a tiny part of the puzzle needed to ensure seafood traceability. For starters, this example only deals with a single rectangular-shaped area. A real-life project would, of course, be much more complicated than this. In the case of zero-knowledge proofs, verifying and creating proofs can be time-intensive, so implementing a real-world project would require careful consideration of timing. There are zero-knowledge proofs that specifically are optimized for range-proofs, which might be a better fit for this example. This could be an area of future exploration for us. Additionally, even if the proof itself took the full scope of real-world restrictions into account, a fisher’s location at the time of catch would have to be reported by a source that is reliable. For example, we might want a piece of hardware that is sufficiently tamper-proof reporting a fisher’s location, rather than, say, the fisher’s word. We would also need a reliable way to correlate a GPS location to a particular catch. For hardware, considerations of cost, hassle to the fishers, and tamper-proofness would all have to be weighed. A final area to consider is public knowledge and trust of zero-knowledge proofs. Even if mathematically, we can show that a zero-knowledge proof does not reveal a fisher’s location, the fisher would have to trust the organization implementing this system. The fisher would first have to trust that their location is not hidden somewhere in the proof they are uploading to the database. The proof is a large, JSON object that could conceivably hide information. The fisher would also have to trust that the hardware they are using to report their location is not sending it out through some backdoor. These are assuredly not insurmountable concerns, but they should be considered as food-for-thought. Research, implementation, and public understanding of zero-knowledge proofs have really grown in the past few years due to projects like ZCash, so this is definitely an area to look out for!Thanks everyone for reading, and I’m interested in any and all feedback. If this piqued your interest, you might be interested in checking out other projects that tackle the challenge of proof of location, as well as this curated list of zero-knowledge proof content. You can also get started with Fluree here, and make sure to check out our documentation as well!
* For those unfamiliar with the “Where’s Waldo” books (or “Where’s Wally” outside of North America), Waldo is a cartoon man in a red-and-white striped shirt. “Where’s Waldo” books have page after page of hectic scenes, filled with people and colors. The object of the game is to try and spot Waldo.
Follow us on Linkedin
Join our Mailing List
Subscribe to our LinkedIn Newsletter
Subscribe to our YouTube channel
Partner, Analytic Strategy Partners; Frederick H. Rawson Professor in Medicine and Computer Science, University of Chicago and Chief of the Section of Biomedical Data Science in the Department of Medicine
Robert Grossman has been working in the field of data science, machine learning, big data, and distributed computing for over 25 years. He is a faculty member at the University of Chicago, where he is the Jim and Karen Frank Director of the Center for Translational Data Science. He is the Principal Investigator for the Genomic Data Commons, one of the largest collections of harmonized cancer genomics data in the world.
He founded Analytic Strategy Partners in 2016, which helps companies develop analytic strategies, improve their analytic operations, and evaluate potential analytic acquisitions and opportunities. From 2002-2015, he was the Founder and Managing Partner of Open Data Group (now ModelOp), which was one of the pioneers scaling predictive analytics to large datasets and helping companies develop and deploy innovative analytic solutions. From 1996 to 2001, he was the Founder and CEO of Magnify, which is now part of Lexis-Nexis (RELX Group) and provides predictive analytics solutions to the insurance industry.
Robert is also the Chair of the Open Commons Consortium (OCC), which is a not-for-profit that manages and operates cloud computing infrastructure to support scientific, medical, health care and environmental research.
Connect with Robert on Linkedin
Founder, DataStraits Inc., Chief Revenue Officer, 3i Infotech Ltd
Sudeep Nadkarni has decades of experience in scaling managed services and hi-tech product firms. He has driven several new ventures and corporate turnarounds resulting in one IPO and three $1B+ exits. VC/PE firms have entrusted Sudeep with key executive roles that include entering new opportunity areas, leading global sales, scaling operations & post-merger integrations.
Sudeep has broad international experience having worked, lived, and led firms operating in US, UK, Middle East, Asia & Africa. He is passionate about bringing innovative business products to market that leverage web 3.0 technologies and have embedded governance risk and compliance.
Connect with Sudeep on Linkedin
CEO, Data4Real LLC
Julia Bardmesser is a technology, architecture and data strategy executive, board member and advisor. In addition to her role as CEO of Data4Real LLC, she currently serves as Chair of Technology Advisory Council, Women Leaders In Data & AI (WLDA). She is a recognized thought leader in data driven digital transformation with over 30 years of experience in building technology and business capabilities that enable business growth, innovation, and agility. Julia has led transformational initiatives in many financial services companies such as Voya Financial, Deutsche Bank Citi, FINRA, Freddie Mac, and others.
Julia is a much sought-after speaker and mentor in the industry, and she has received recognition across the industry for her significant contributions. She has been named to engatica 2023 list of World’s Top 200 Business and Technology Innovators; received 2022 WLDA Changemaker in AI award; has been named to CDO Magazine’s List of Global Data Power Wdomen three years in the row (2020-2022); named Top 150 Business Transformation Leader by Constellation Research in 2019; and recognized as the Best Data Management Practitioner by A-Team Data Management Insight in 2017.
Connect with Julia on Linkedin
Senior Advisor, Board Member, Strategic Investor
After nine years leading the rescue and turnaround of Banco del Progreso in the Dominican Republic culminating with its acquisition by Scotiabank (for a 2.7x book value multiple), Mark focuses on advisory relationships and Boards of Directors where he brings the breadth of his prior consulting and banking/payments experience.
In 2018, Mark founded Alberdi Advisory Corporation where he is engaged in advisory services for the biotechnology, technology, distribution, and financial services industries. Mark enjoys working with founders of successful businesses as well as start-ups and VC; he serves on several Boards of Directors and Advisory Boards including MPX – Marco Polo Exchange – providing world-class systems and support to interconnect Broker-Dealers and Family Offices around the world and Fluree – focusing on web3 and blockchain. He is actively engaged in strategic advisory with the founder and Executive Committee of the Biotechnology Institute of Spain with over 50 patents and sales of its world-class regenerative therapies in more than 30 countries.
Prior work experience includes leadership positions with MasterCard, IBM/PwC, Kearney, BBVA and Citibank. Mark has worked in over 30 countries – extensively across Europe and the Americas as well as occasional experiences in Asia.
Connect with Mark on Linkedin
Chair of the Board, Enterprise Data Management Council
Peter Serenita was one of the first Chief Data Officers (CDOs) in financial services. He was a 28-year veteran of JPMorgan having held several key positions in business and information technology including the role of Chief Data Officer of the Worldwide Securities division. Subsequently, Peter became HSBC’s first Group Chief Data Officer, focusing on establishing a global data organization and capability to improve data consistency across the firm. More recently, Peter was the Enterprise Chief Data Officer for Scotiabank focused on defining and implementing a data management capability to improve data quality.
Peter is currently the Chairman of the Enterprise Data Management Council, a trade organization advancing data management globally across industries. Peter was a member of the inaugural Financial Research Advisory Committee (under the U.S. Department of Treasury) tasked with improving data quality in regulatory submissions to identify systemic risk.
Connect with Peter on Linkedin
Turn Data Chaos into Data Clarity
"*" indicates required fields
Enter details below to access the whitepaper.
Pawan came to Fluree via its acquisition of ZettaLabs, an AI based data cleansing and mastering company.His previous experiences include IBM where he was part of the Strategy, Business Development and Operations team at IBM Watson Health’s Provider business. Prior to that Pawan spent 10 years with Thomson Reuters in the UK, US, and the Middle East. During his tenure he held executive positions in Finance, Sales and Corporate Development and Strategy. He is an alumnus of The Georgia Institute of Technology and Georgia State University.
Connect with Pawan on Linkedin
Andrew “Flip” Filipowski is one of the world’s most successful high-tech entrepreneurs, philanthropists and industry visionaries. Mr. Filipowski serves as Co-founder and Co-CEO of Fluree, where he seeks to bring trust, security, and versatility to data.
Mr. Filipowski also serves as co-founder, chairman and chief executive officer of SilkRoad Equity, a global private investment firm, as well as the co-founder, of Tally Capital.
Mr. Filipowski was the former COO of Cullinet, the largest software company of the 1980’s. Mr. Filipowski founded and served as Chairman and CEO of PLATINUM technology, where he grew PLATINUM into the 8th largest software company in the world at the time of its sale to Computer Associates for $4 billion – the largest such transaction for a software company at the time. Upside Magazine named Mr. Filipowski one of the Top 100 Most Influential People in Information Technology. A recipient of Entrepreneur of the Year Awards from both Ernst & Young and Merrill Lynch, Mr. Filipowski has also been awarded the Young President’s Organization Legacy Award and the Anti-Defamation League’s Torch of Liberty award for his work fighting hate on the Internet.
Mr. Filipowski is or has been a founder, director or executive of various companies, including: Fuel 50, Veriblock, MissionMode, Onramp Branding, House of Blues, Blue Rhino Littermaid and dozens of other recognized enterprises.
Connect with Flip on Linkedin
Brian is the Co-founder and Co-CEO of Fluree, PBC, a North Carolina-based Public Benefit Corporation.
Platz was an entrepreneur and executive throughout the early internet days and SaaS boom, having founded the popular A-list apart web development community, along with a host of successful SaaS companies. He is now helping companies navigate the complexity of the enterprise data transformation movement.
Previous to establishing Fluree, Brian co-founded SilkRoad Technology which grew to over 2,000 customers and 500 employees in 12 global offices. Brian sits on the board of Fuel50 and Odigia, and is an advisor to Fabric Inc.
Connect with Brian on Linkedin
Eliud Polanco is a seasoned data executive with extensive experience in leading global enterprise data transformation and management initiatives. Previous to his current role as President of Fluree, a data collaboration and transformation company, Eliud was formerly the Head of Analytics at Scotiabank, Global Head of Analytics and Big Data at HSBC, head of Anti-Financial Crime Technology Architecture for U.S.DeutscheBank, and Head of Data Innovation at Citi.
In his most recent role as Head of Analytics and Data Standards at Scotiabank, Eliud led a full-spectrum data transformation initiative to implement new tools and technology architecture strategies, both on-premises as well as on Cloud, for ingesting, analyzing, cleansing, and creating consumption ready data assets.
Connect with Eliud on Linkedin
Get the right data into the right hands.
Build your Verifiable Credentials/DID solution with Fluree.
Wherever you are in your Knowledge Graph journey, Fluree has the tools and technology to unify data based on universal meaning, answer complex questions that span your business, and democratize insights across your organization.
Build real-time data collaboration that spans internal and external organizational boundaries, with protections and controls to meet evolving data policy and privacy regulations.
Fluree Sense auto-discovers data fitting across applications and data lakes, cleans and formats them into JSON-LD, and loads them into Fluree’s trusted data platform for sharing, analytics, and re-use.
Transform legacy data into linked, semantic knowledge graphs. Fluree Sense automates the data mappings from local formats to a universal ontology and transforms the flat files into RDF.
Whether you are consolidating data silos, migrating your data to a new platform, or building an MDM platform, we can help you build clean, accurate, and reliable golden records.
Our enterprise users receive exclusive support and even more features. Book a call with our sales team to get started.
Download Stable Version
Download Pre-Release Version
Register for Alpha Version
By downloading and running Fluree you agree to our terms of service (pdf).
Hello this is some content.