All articles

Getting Started with the Eurostat API

First pull, dataset codes, and what to expect when building pipelines on the Eurostat JSON-UI and SDMX APIs.

Source on EconIndx: Eurostat — free, no registration, 7,000+ datasets, 27 EU member states + EEA.

Access & Pricing

Fully free, no registration, no API key. Eurostat is the European Union’s statistical office — all data is publicly available for commercial and non-commercial use with attribution. The JSON-UI API at ec.europa.eu/eurostat/api/dissemination/statistics/1.0/data/ is the easiest entry point.

Your First Data Pull

Eurostat organizes data by dataset code (e.g., namq_10_gdp for quarterly national accounts). Each dataset has dimensions you filter with query parameters:

import requests
import pandas as pd

EUROSTAT_BASE = "https://ec.europa.eu/eurostat/api/dissemination/statistics/1.0/data"

def fetch_eurostat(dataset: str, **filters) -> pd.DataFrame:
    """Fetch a Eurostat dataset with dimension filters."""
    params = {"format": "JSON", "lang": "EN", **filters}
    r = requests.get(f"{EUROSTAT_BASE}/{dataset}", params=params)
    r.raise_for_status()
    data = r.json()

    # Unpack SDMX-lite JSON structure
    dims = data["dimension"]
    dim_order = data["id"]
    values = data["value"]
    size = data["size"]

    # Build index → label maps for each dimension
    label_maps = {
        dim: {str(v["index"]): k for k, v in dims[dim]["category"]["label"].items()}
        for dim in dim_order
    }

    # Flatten the multi-dimensional array
    rows = []
    for flat_idx, obs_val in values.items():
        idx = int(flat_idx)
        coords = []
        for s in reversed(size):
            coords.append(idx % s)
            idx //= s
        coords.reverse()

        row = {dim: label_maps[dim].get(str(c), c)
               for dim, c in zip(dim_order, coords)}
        row["value"] = obs_val
        rows.append(row)

    return pd.DataFrame(rows)

# Quarterly GDP for Germany and France
gdp = fetch_eurostat(
    "namq_10_gdp",
    geo="DE,FR",
    unit="CP_MEUR",     # current prices, millions EUR
    na_item="B1GQ",     # GDP
    s_adj="NSA",        # not seasonally adjusted
    freq="Q"
)

print(f"Rows: {len(gdp)}")
print(gdp.tail(5)[["geo", "time", "value"]])

First Pull: What to Expect

DatasetDescriptionFilter exampleRows (2 countries, 20yr)
namq_10_gdpQuarterly national accountsgeo=DE,FR~200–400 per unit
une_rt_mMonthly unemployment rategeo=EU27_2020,DE,FR~600
prc_hicp_midxHICP monthly indexgeo=EU,DE,FR~800
ext_lt_maineuTrade with main partners---~50,000+ (large)
demo_pjanPopulation on 1 Jan, annualgeo=EU27_2020~300

Without geo filtering, a dataset like namq_10_gdp can return 100,000+ rows (all countries, all dimensions). Always filter by geo and time for your first pull.

Flags embedded in TSV: When using the bulk TSV format (different endpoint), flags like : (not available), b (break in series), e (estimated) are embedded in value cells (e.g., "1234.5 e"). The JSON-UI API returns clean numeric values without flags — use it for pipeline work.

Key Datasets to Start With

National accounts:

  • namq_10_gdp — quarterly GDP by expenditure approach
  • nama_10_gdp — annual GDP, broader coverage
  • nama_10_pc — GDP per capita, annual

Labor market:

  • une_rt_m — monthly unemployment rate, by sex and age
  • lfsq_urgan — quarterly unemployment by geography

Prices:

  • prc_hicp_midx — HICP monthly price index (EU inflation measure)
  • prc_hicp_aind — HICP annual average index

Trade:

  • ext_lt_maineu — extra-EU trade by main partners (large dataset)
  • tet00002 — exports/imports summary

Population:

  • demo_pjan — population on 1 January
  • demo_gind — population change indicators
# Monthly unemployment for all EU countries (one call)
unemp = fetch_eurostat(
    "une_rt_m",
    sex="T",      # total (M/F/T)
    age="TOTAL",
    unit="PC_ACT", # % of active population
    s_adj="SA",   # seasonally adjusted
    freq="M"
)
print(f"Countries available: {unemp['geo'].nunique()}")
print(f"Total rows: {len(unemp)}")
# Expected: ~30 countries × ~300 months = ~9,000 rows

Data Tolerance & Validation

What’s normal:

  • Eurostat harmonizes national data, so coverage depends on member state reporting. New EU members (Bulgaria, Romania) have shorter series. Some indicators only go back to the year of EU accession.
  • Quarterly GDP (namq_10_gdp) is revised for 2+ years after each release. Download timestamps matter — store them.
  • NUTS geography levels add complexity: namq_10_r3 provides regional (NUTS 2) data, which is much sparser than national (NUTS 0) data.
  • The time dimension format is YYYY-QN for quarterly (2023-Q4), YYYY-MM for monthly, YYYY for annual.

Validation checks:

def validate_eurostat_pull(df: pd.DataFrame, dataset: str) -> dict:
    country_count = df["geo"].nunique() if "geo" in df.columns else None
    row_count = len(df)

    # For quarterly data, parse time to find latest
    if "time" in df.columns:
        # Handle both "2023-Q4" and "2023-12" formats
        times = df["time"].dropna().unique()
        latest = max(times)
    else:
        latest = None

    null_count = df["value"].isna().sum() if "value" in df.columns else None

    return {
        "dataset": dataset,
        "row_count": row_count,
        "country_count": country_count,
        "null_count": null_count,
        "latest_period": latest,
        "alert": row_count == 0,  # empty response = bad filter or dataset moved
    }

report = validate_eurostat_pull(gdp, "namq_10_gdp")
print(report)

Alert thresholds:

  • Zero rows returned: dataset code may have changed (Eurostat reorganizes periodically — check the catalog)
  • Country count drops more than 20% from last pull: investigate API or dimension filter change
  • Latest quarterly period more than 3 months behind current calendar quarter: data is stale
  • HICP for euro area more than 45 days old: data is stale (published ~2 weeks after month end)

Bulk TSV for Large Initial Loads

For datasets with millions of rows, use the bulk TSV endpoint:

import gzip
import io

def fetch_eurostat_bulk(dataset: str) -> pd.DataFrame:
    url = (f"https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/"
           f"{dataset}?format=TSV&compressed=true")
    r = requests.get(url, stream=True)
    with gzip.open(io.BytesIO(r.content)) as f:
        df = pd.read_csv(f, sep="\t", dtype=str)
    return df

# Example: full HICP dataset (~50MB compressed)
# hicp_bulk = fetch_eurostat_bulk("prc_hicp_midx")

Schema Stability

Dataset codes are stable but Eurostat reorganizes its catalog every few years. Track your dataset codes in a registry and add a health-check that calls the catalog endpoint to confirm they still exist. Dimension codes (geo, unit, na_item, etc.) follow SDMX codelists and are very stable. Geographic codes follow Eurostat conventions (e.g., EU27_2020 for current EU, DE for Germany).

Next Steps

Learn

Recent guides

View all →