Cardano Analytics Data Hub

Photo by Stephen Dawson on Unsplash

If you are a developer, investor or user of the blockchain, like most of us, you will want to know the data on-line, to know how it currently works. Information is power.

Currently, the data in the Cardano ecosystem is available, but it is scattered across multiple sources, making it difficult to access, and use for applications.

The number of transactions being made, the speed of transactions, the distribution of funds by portfolio, the average cost of the network fee, the number of delegations per stake pool, among other data of interest, are essential to understand the ecosystem.

The current implementation of Cardano is highly modular. It includes the following components, with different implementation use cases, and with different combinations of components for certain functions:

  • Node: this is the backbone of the blockchain. They are distributed in quantity in the network, and communicate with each other to achieve consensus on the state of the system.
  • Command Line Interface (CLI): the node’s CLI tool is the “Swiss Army Knife” of the system. It can do almost anything, although it is not very intuitive, because it is text-based, and lacks a graphical user interface (GUI).
  • Daedalus wallet: Daedalus is a wallet, and also a complete node, which helps users to manage their ADAs, but also propagates the blockchain, since the device on which it operates has all the information of the network.
  • Cardano DB Sync: The cardano node only stores the blockchain itself and the associated information needed to validate the blockchain. DB Sync connects to the local node as a client and synchronizes activity on the chain. 
  • GraphQL API Server (Apollo): contains multiple packages to compose GraphQL services to meet specific application demands.
  • REST API components: Cardano REST provides a set of APIs to interact with the data in the chain via JSON over HTTP.
  • SMASH server: aggregates common metadata about stakepools that are registered on the Cardano blockchain, including the name of the stakepool, its “ticker” name, etc.

https://docs.cardano.org/explore-cardano/cardano-architecture/overview

DB Sync is the most detailed source of data about the chain, but it is not easy to run, and the data contained is highly normalized, which means a high level of knowledge about the database schema. 

Included in the details it provides is block information, which allows users to follow the chain and explore transactions within the blocks, but excludes cryptographic signatures.

There are other sources that need to be integrated for the data to be optimally useful, such as the stake pools metadata server (SMASH), extended metadata files according to the adapools.org standard, token data from sources such as the Cardano Foundation token registry and CNFT.io’s policyid database, market data, social metrics, and many others.

A Data Analytics Hub

Currently, there are several excellent sites that offer pool-specific data, such as adapools.org and pooltool.io, as well as several block explorers, however, none of these sites provide complete historical data, custom queries, or data that has been modeled for analytics or machine learning use cases.

The developers propose a data hub for the Cardano ecosystem, which makes certain historical and modeled data sets available through multiple access mechanisms.

Thus, the current proposal is to build an initial minimum viable product (MVP) of a data hub, which will provide consolidated analytics-ready data to the Cardano ecosystem.

At a minimum, there will be data available from DB Sync and other sources (such as those listed above), which have been modeled for various analytics activities. 

The DB Sync data will have additional aggregated views such as those in the following repository: https://github.com/cardanocanuck/db-sync-queries.

They then propose to continue to add special purpose datasets, for various domains within the Cardano ecosystem, with several smaller proposals to be modeled and developed, such as:

  • On-chain Analytics – Transactions, volume, rewards, etc.
  • NFT Analytics – information about several aspects of NFT projects
  • Pool Analytics – Machine Learning ready dataset on historical pool performance
  • Smart Contract Analytics – (still in ideation phase)

The initial MVP Data Hub will allow the download of CSV data sets. In the future, the range of exchange methods will be expanded.

Some of these exchange methods will be:

  • API access
  • Web-based data explorer
  • Google Sheets available in the community
  • Direct access to the database in the cloud
  • Direct data sharing (Azure / Snowflake)

Priority will be given to free community access methods, but some access methods can be monetized with a subscription model, such as direct database access or data sharing. 

Roadmap

They propose to follow an agile and hybrid waterfall methodology, starting with initial architecture, design and feature planning, followed by 4 feature development sprints. 

The project plan will be updated throughout this process, as team members are added, and the idea and feature set is improved.

Budget

The budget will fund the first 3 months of development and 6 months of infrastructure costs.

The approximate budget breakdown by function is as follows:

  • Architect / Senior Dev – 100h * USD 75 = USD 7,500
  • Graphic Designer – 60h * USD 75 = USD 4,500
  • Web Developer – 80h x USD 75 = USD 6,000
  • Data Engineer – 280h x USD 75 = $USD 21,000
  • Project Manager – 80h x USD 75 = USD 6,000
  • QA – 40h x USD 75 = USD 3,000

Total development costs: USD 48,000

Infrastructure costs estimated at USD 2000/mo x 6 months = USD 12,000

Requested funds in USD 60,000

The Team

The two developers are founders of Cardano Canucks and Canuckz NFT with over 30 years of experience in data infrastructure, analytics and data visualization for enterprises.

Michael Stewart

  • 17+ years of software development and architecture experience.
  • 10+ years focused in the data and analytics space
  • Led the development team of a boutique data / analytics firm where I designed and architected cloud based data warehouse solutions for fortune 500 companies
  • Member of the Cardano community since 2017
  • Co-Founder of Cardano Canucks stake pool and Canuckz NFTs
  • Co-Founder of CCSPA (Canadian Cardano Stake Pool Association)

Vivek Nankissoor

  • 15+ years of experience in database requirements, design and development
  • Established and grew web analytics, marketing automation and QA practices
  • Engaged in marketing, data and analytics strategy development with enterprise retail, cpg organizations, banks, automotive, pharma, fintech and others
  • Co-Founder of Cardano Canucks stake pool and Canuckz NFTs
  • Co-Founder of CCSPA (Canadian Cardano Stake Pool Association)
  • Participant in community work such as financial literacy relating to crypto and raising awareness with various investment groups

You can read the original proposal at Catalyst.

The developers also present in this FUND6, another related proposal: Dataset – On-Chain Analytics

Total
3
Shares
1 comment
Leave a Reply

Your email address will not be published. Required fields are marked *

Related Posts