by Evgeny Medvedev.

In this article we’ll cover our Ethereum 2.0 ETL tools for exporting Ethereum 2.0 blockchain data, the public Medalla dataset in BigQuery, and a Nansen.ai dashboard we built with some interesting charts and tables.

The article is broken down into three parts:

  1. A quickstart guide for Ethereum 2.0 ETL. The tools allow you to export beacon blocks, attestations, deposits, slashings, voluntary exits, validators, and validator committees.
  2. Some sample queries for BigQuery.
  3. Visualisations in Nansen.

Now, let’s go through the details.

Ethereum 2.0 ETL Quickstart Guide

The easiest way to get started is to request access to a Medalla node on Infura. Scroll to the bottom of this page until you see “apply now and register your interest” link. Click on that link and fill out the form to request access.

Install ethereum2-etl:

pip install ethereum2-etl

Export beacon blocks, attestations, deposits, proposer slashings, attester slashings, and voluntary exits:

ethereum2etl export_beacon_blocks --start-block 0 --end-block 200 \
--output-dir output --output-format json --rate-limit 5 \
--provider-uri https://projectid:secret@medalla.infura.io

Make sure to replace projectid and secret with your Infura project id and secret. The output will be written to the ./output directory.

For the--output-format option you can use either json or csv. JSON output is useful for data storages that support nested and repeated structures, such as Google BigQuery and AWS Athena.

You can find the json and csv schema in the documentation: JSON Schema, CSV Schema, Reference. Here is an example beacon_blocks.json schema:

block_slot: integer
block_epoch: integer
block_timestamp: timestamp
proposer_index: integer
skipped: boolean
block_root: string
parent_root: string 
state_root: string
randao_reveal: string
graffiti: string
eth1_block_hash: string
eth1_deposit_root: string 
eth1_deposit_count: string
signature: string 
attestations: record (repeated) 
|- aggregation_bits: string
|- slot: integer
|- index: integer 
|- beacon_block_root: string
|- source_epoch: integer
|- source_root: string
|- target_epoch: integer
|- target_root: string
|- signature: string
deposits: record (repeated)
|- pubkey: string 
|- withdrawal_credentials: string
|- amount: integer
|- signature: string
proposer_slashings: record (repeated) 
|- header_1_slot: integer 
|- header_1_proposer_index: integer
|- header_1_parent_root: float
|- header_1_state_root: float 
|- header_1_body_root: float
|- header_1_signature: float
|- header_2_slot: integer 
|- header_2_proposer_index: integer
|- header_2_parent_root: float
|- header_2_state_root: float 
|- header_2_body_root: float
|- header_2_signature: float
attester_slashings: record (repeated) 
|- attestation_1_attesting_indices: integer (repeated)
|- attestation_1_slot: integer
|- attestation_1_index: integer 
|- attestation_1_beacon_block_root: string
|- attestation_1_source_epoch: integer
|- attestation_1_source_root: string
|- attestation_1_target_epoch: integer
|- attestation_1_target_root: string
|- attestation_1_signature: string
|- attestation_2_attesting_indices: integer (repeated)
|- attestation_2_slot: integer
|- attestation_2_index: integer 
|- attestation_2_beacon_block_root: string
|- attestation_2_source_epoch: integer
|- attestation_2_source_root: string
|- attestation_2_target_epoch: integer
|- attestation_2_target_root: string
|- attestation_2_signature: string
voluntary_exits: record (repeated)
|- epoch: integer 
|- validator_index: integer
|- signature: string

Other commands —  export_beacon_validators and export_beacon_committees — allow you to export validators and committees respectively.

To schedule daily exports of the data and loads into Google BigQuery you can use our Ethereum 2.0 ETL Airflow repository. Or you can access our public datasets instead.

Medalla BigQuery Dataset

You can access the public-data-finance.crypto_ethereum2_medalla dataset in the BigQuery console. The dataset contains these three tables:

  • beacon_blocks — contains block data such as block timestamp, proposer index, graffiti, attestations, deposits, slashings, and voluntary exits.
  • beacon_validators — contains block validators data such as validator public key, index, and balance.
  • beacon_committees — groups of validators chosen by the beacon chain to attest the validity of blocks.

Here is how you can query blocks, their graffiti and attestations:

SELECT block_slot, block_timestamp, SAFE_CONVERT_BYTES_TO_STRING(FROM_HEX(REPLACE(graffiti, '0x', '')))  AS graffiti
FROM `public-data-finance.crypto_ethereum2_medalla.beacon_blocks`
ORDER BY block_timestamp DESC
LIMIT 100

This query will give you the top 100 validators:

SELECT pubkey, validator_index, balance, effective_balance
FROM `public-data-finance.crypto_ethereum2_medalla.beacon_validators`
ORDER BY balance DESC
LIMIT 100

The query below joins block attestations with committees to figure out validators that attested blocks, using the aggregation_bits field in attestations. This bitlist field has bits set in the positions corresponding to validator indexes in committees, as explained in the specification:

WITH filtered_blocks as (
  SELECT *
  FROM `public-data-finance.crypto_ethereum2_medalla.beacon_blocks`  
  WHERE DATE(block_timestamp) = "2020-08-04" -- to scan less data
),
attestations_with_committees as (
  SELECT block_slot, attestation.slot, attestation.index, attestation.aggregation_bits, committee
  FROM filtered_blocks as blocks
  JOIN UNNEST(attestations) as attestation
  JOIN `public-data-finance.crypto_ethereum2_medalla.beacon_committees` as committees
    ON block_epoch = (committees.epoch + 1)
      AND attestation.slot = committees.slot and attestation.index = committees.index
  WHERE length(attestation.aggregation_bits) = array_length(committee) -- checking just in case
)
SELECT block_slot, slot, index, committee_member_index, aggregation_bits
FROM attestations_with_committees,
UNNEST(committee)  AS committee_member_index WITH OFFSET off
WHERE SUBSTR(REVERSE(aggregation_bits), off + 1, 1) = '1'

BigQuery result for block attesters

On beaconscan.com the validators that attested a block have highlighted indexes in the “Beacon Committee Indices” section, e.g. on this page: https://beaconscan.com/slot/33#attestations

beaconscan.com block attestations

Nansen.ai dashboard

Below is a dashboard we put together at Nansen.ai using the Medalla data:

pro.nansen.ai/medalla dashboard

The interactive version of this dashboard and many other dashboards are available at https://pro.nansen.ai/medalla.

Please contribute to Ethereum 2.0 ETL to add support for more ETH2 clients and more data. You can find the list of open issues here.