by Evgeny Medvedev.
In this article we’ll cover our Ethereum 2.0 ETL tools for exporting Ethereum 2.0 blockchain data, the public Medalla dataset in BigQuery, and a Nansen.ai dashboard we built with some interesting charts and tables.
The article is broken down into three parts:
- A quickstart guide for Ethereum 2.0 ETL. The tools allow you to export beacon blocks, attestations, deposits, slashings, voluntary exits, validators, and validator committees.
- Some sample queries for BigQuery.
- Visualisations in Nansen.
Now, let’s go through the details.
Ethereum 2.0 ETL Quickstart Guide
The easiest way to get started is to request access to a Medalla node on Infura. Scroll to the bottom of this page until you see “apply now and register your interest” link. Click on that link and fill out the form to request access.
Install ethereum2-etl:
pip install ethereum2-etl
Export beacon blocks, attestations, deposits, proposer slashings, attester slashings, and voluntary exits:
ethereum2etl export_beacon_blocks --start-block 0 --end-block 200 \
--output-dir output --output-format json --rate-limit 5 \
--provider-uri https://projectid:secret@medalla.infura.io
Make sure to replace projectid
and secret
with your Infura project id and secret. The output will be written to the ./output
directory.
For the--output-format
option you can use either json
or csv
. JSON output is useful for data storages that support nested and repeated structures, such as Google BigQuery and AWS Athena.
You can find the json and csv schema in the documentation: JSON Schema, CSV Schema, Reference. Here is an example beacon_blocks.json
schema:
block_slot: integer
block_epoch: integer
block_timestamp: timestamp
proposer_index: integer
skipped: boolean
block_root: string
parent_root: string
state_root: string
randao_reveal: string
graffiti: string
eth1_block_hash: string
eth1_deposit_root: string
eth1_deposit_count: string
signature: string
attestations: record (repeated)
|- aggregation_bits: string
|- slot: integer
|- index: integer
|- beacon_block_root: string
|- source_epoch: integer
|- source_root: string
|- target_epoch: integer
|- target_root: string
|- signature: string
deposits: record (repeated)
|- pubkey: string
|- withdrawal_credentials: string
|- amount: integer
|- signature: string
proposer_slashings: record (repeated)
|- header_1_slot: integer
|- header_1_proposer_index: integer
|- header_1_parent_root: float
|- header_1_state_root: float
|- header_1_body_root: float
|- header_1_signature: float
|- header_2_slot: integer
|- header_2_proposer_index: integer
|- header_2_parent_root: float
|- header_2_state_root: float
|- header_2_body_root: float
|- header_2_signature: float
attester_slashings: record (repeated)
|- attestation_1_attesting_indices: integer (repeated)
|- attestation_1_slot: integer
|- attestation_1_index: integer
|- attestation_1_beacon_block_root: string
|- attestation_1_source_epoch: integer
|- attestation_1_source_root: string
|- attestation_1_target_epoch: integer
|- attestation_1_target_root: string
|- attestation_1_signature: string
|- attestation_2_attesting_indices: integer (repeated)
|- attestation_2_slot: integer
|- attestation_2_index: integer
|- attestation_2_beacon_block_root: string
|- attestation_2_source_epoch: integer
|- attestation_2_source_root: string
|- attestation_2_target_epoch: integer
|- attestation_2_target_root: string
|- attestation_2_signature: string
voluntary_exits: record (repeated)
|- epoch: integer
|- validator_index: integer
|- signature: string
Other commands — export_beacon_validators
and export_beacon_committees
— allow you to export validators and committees respectively.
To schedule daily exports of the data and loads into Google BigQuery you can use our Ethereum 2.0 ETL Airflow repository. Or you can access our public datasets instead.
Medalla BigQuery Dataset
You can access the public-data-finance.crypto_ethereum2_medalla
dataset in the BigQuery console. The dataset contains these three tables:
beacon_blocks
— contains block data such as block timestamp, proposer index, graffiti, attestations, deposits, slashings, and voluntary exits.beacon_validators
— contains block validators data such as validator public key, index, and balance.beacon_committees
— groups of validators chosen by the beacon chain to attest the validity of blocks.
Here is how you can query blocks, their graffiti and attestations:
SELECT block_slot, block_timestamp, SAFE_CONVERT_BYTES_TO_STRING(FROM_HEX(REPLACE(graffiti, '0x', ''))) AS graffiti
FROM `public-data-finance.crypto_ethereum2_medalla.beacon_blocks`
ORDER BY block_timestamp DESC
LIMIT 100
This query will give you the top 100 validators:
SELECT pubkey, validator_index, balance, effective_balance
FROM `public-data-finance.crypto_ethereum2_medalla.beacon_validators`
ORDER BY balance DESC
LIMIT 100
The query below joins block attestations with committees to figure out validators that attested blocks, using the aggregation_bits
field in attestations. This bitlist field has bits set in the positions corresponding to validator indexes in committees, as explained in the specification:
WITH filtered_blocks as (
SELECT *
FROM `public-data-finance.crypto_ethereum2_medalla.beacon_blocks`
WHERE DATE(block_timestamp) = "2020-08-04" -- to scan less data
),
attestations_with_committees as (
SELECT block_slot, attestation.slot, attestation.index, attestation.aggregation_bits, committee
FROM filtered_blocks as blocks
JOIN UNNEST(attestations) as attestation
JOIN `public-data-finance.crypto_ethereum2_medalla.beacon_committees` as committees
ON block_epoch = (committees.epoch + 1)
AND attestation.slot = committees.slot and attestation.index = committees.index
WHERE length(attestation.aggregation_bits) = array_length(committee) -- checking just in case
)
SELECT block_slot, slot, index, committee_member_index, aggregation_bits
FROM attestations_with_committees,
UNNEST(committee) AS committee_member_index WITH OFFSET off
WHERE SUBSTR(REVERSE(aggregation_bits), off + 1, 1) = '1'

On beaconscan.com the validators that attested a block have highlighted indexes in the “Beacon Committee Indices” section, e.g. on this page: https://beaconscan.com/slot/33#attestations

Nansen.ai dashboard
Below is a dashboard we put together at Nansen.ai using the Medalla data:

The interactive version of this dashboard and many other dashboards are available at https://pro.nansen.ai/medalla.
Please contribute to Ethereum 2.0 ETL to add support for more ETH2 clients and more data. You can find the list of open issues here.