Source on EconIndx: Eurostat — free, no registration, 7,000+ datasets, 27 EU member states + EEA.
Access & Pricing
Fully free, no registration, no API key. Eurostat is the European Union’s statistical office — all data is publicly available for commercial and non-commercial use with attribution. The JSON-UI API at ec.europa.eu/eurostat/api/dissemination/statistics/1.0/data/ is the easiest entry point.
Your First Data Pull
Eurostat organizes data by dataset code (e.g., namq_10_gdp for quarterly national accounts). Each dataset has dimensions you filter with query parameters:
import requests
import pandas as pd
EUROSTAT_BASE = "https://ec.europa.eu/eurostat/api/dissemination/statistics/1.0/data"
def fetch_eurostat(dataset: str, **filters) -> pd.DataFrame:
"""Fetch a Eurostat dataset with dimension filters."""
params = {"format": "JSON", "lang": "EN", **filters}
r = requests.get(f"{EUROSTAT_BASE}/{dataset}", params=params)
r.raise_for_status()
data = r.json()
# Unpack SDMX-lite JSON structure
dims = data["dimension"]
dim_order = data["id"]
values = data["value"]
size = data["size"]
# Build index → label maps for each dimension
label_maps = {
dim: {str(v["index"]): k for k, v in dims[dim]["category"]["label"].items()}
for dim in dim_order
}
# Flatten the multi-dimensional array
rows = []
for flat_idx, obs_val in values.items():
idx = int(flat_idx)
coords = []
for s in reversed(size):
coords.append(idx % s)
idx //= s
coords.reverse()
row = {dim: label_maps[dim].get(str(c), c)
for dim, c in zip(dim_order, coords)}
row["value"] = obs_val
rows.append(row)
return pd.DataFrame(rows)
# Quarterly GDP for Germany and France
gdp = fetch_eurostat(
"namq_10_gdp",
geo="DE,FR",
unit="CP_MEUR", # current prices, millions EUR
na_item="B1GQ", # GDP
s_adj="NSA", # not seasonally adjusted
freq="Q"
)
print(f"Rows: {len(gdp)}")
print(gdp.tail(5)[["geo", "time", "value"]])
First Pull: What to Expect
| Dataset | Description | Filter example | Rows (2 countries, 20yr) |
|---|---|---|---|
namq_10_gdp | Quarterly national accounts | geo=DE,FR | ~200–400 per unit |
une_rt_m | Monthly unemployment rate | geo=EU27_2020,DE,FR | ~600 |
prc_hicp_midx | HICP monthly index | geo=EU,DE,FR | ~800 |
ext_lt_maineu | Trade with main partners | --- | ~50,000+ (large) |
demo_pjan | Population on 1 Jan, annual | geo=EU27_2020 | ~300 |
Without geo filtering, a dataset like namq_10_gdp can return 100,000+ rows (all countries, all dimensions). Always filter by geo and time for your first pull.
Flags embedded in TSV: When using the bulk TSV format (different endpoint), flags like : (not available), b (break in series), e (estimated) are embedded in value cells (e.g., "1234.5 e"). The JSON-UI API returns clean numeric values without flags — use it for pipeline work.
Key Datasets to Start With
National accounts:
namq_10_gdp— quarterly GDP by expenditure approachnama_10_gdp— annual GDP, broader coveragenama_10_pc— GDP per capita, annual
Labor market:
une_rt_m— monthly unemployment rate, by sex and agelfsq_urgan— quarterly unemployment by geography
Prices:
prc_hicp_midx— HICP monthly price index (EU inflation measure)prc_hicp_aind— HICP annual average index
Trade:
ext_lt_maineu— extra-EU trade by main partners (large dataset)tet00002— exports/imports summary
Population:
demo_pjan— population on 1 Januarydemo_gind— population change indicators
# Monthly unemployment for all EU countries (one call)
unemp = fetch_eurostat(
"une_rt_m",
sex="T", # total (M/F/T)
age="TOTAL",
unit="PC_ACT", # % of active population
s_adj="SA", # seasonally adjusted
freq="M"
)
print(f"Countries available: {unemp['geo'].nunique()}")
print(f"Total rows: {len(unemp)}")
# Expected: ~30 countries × ~300 months = ~9,000 rows
Data Tolerance & Validation
What’s normal:
- Eurostat harmonizes national data, so coverage depends on member state reporting. New EU members (Bulgaria, Romania) have shorter series. Some indicators only go back to the year of EU accession.
- Quarterly GDP (
namq_10_gdp) is revised for 2+ years after each release. Download timestamps matter — store them. - NUTS geography levels add complexity:
namq_10_r3provides regional (NUTS 2) data, which is much sparser than national (NUTS 0) data. - The time dimension format is
YYYY-QNfor quarterly (2023-Q4),YYYY-MMfor monthly,YYYYfor annual.
Validation checks:
def validate_eurostat_pull(df: pd.DataFrame, dataset: str) -> dict:
country_count = df["geo"].nunique() if "geo" in df.columns else None
row_count = len(df)
# For quarterly data, parse time to find latest
if "time" in df.columns:
# Handle both "2023-Q4" and "2023-12" formats
times = df["time"].dropna().unique()
latest = max(times)
else:
latest = None
null_count = df["value"].isna().sum() if "value" in df.columns else None
return {
"dataset": dataset,
"row_count": row_count,
"country_count": country_count,
"null_count": null_count,
"latest_period": latest,
"alert": row_count == 0, # empty response = bad filter or dataset moved
}
report = validate_eurostat_pull(gdp, "namq_10_gdp")
print(report)
Alert thresholds:
- Zero rows returned: dataset code may have changed (Eurostat reorganizes periodically — check the catalog)
- Country count drops more than 20% from last pull: investigate API or dimension filter change
- Latest quarterly period more than 3 months behind current calendar quarter: data is stale
- HICP for euro area more than 45 days old: data is stale (published ~2 weeks after month end)
Bulk TSV for Large Initial Loads
For datasets with millions of rows, use the bulk TSV endpoint:
import gzip
import io
def fetch_eurostat_bulk(dataset: str) -> pd.DataFrame:
url = (f"https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/"
f"{dataset}?format=TSV&compressed=true")
r = requests.get(url, stream=True)
with gzip.open(io.BytesIO(r.content)) as f:
df = pd.read_csv(f, sep="\t", dtype=str)
return df
# Example: full HICP dataset (~50MB compressed)
# hicp_bulk = fetch_eurostat_bulk("prc_hicp_midx")
Schema Stability
Dataset codes are stable but Eurostat reorganizes its catalog every few years. Track your dataset codes in a registry and add a health-check that calls the catalog endpoint to confirm they still exist. Dimension codes (geo, unit, na_item, etc.) follow SDMX codelists and are very stable. Geographic codes follow Eurostat conventions (e.g., EU27_2020 for current EU, DE for Germany).
Next Steps
- Full access, rate limit, and NUTS geography details at the Eurostat source page on EconIndx
- Python:
pip install eurostat— wraps the JSON-UI API and handles TSV parsing - R:
library(eurostat)— mature, widely used, handles bulk downloads well - Browse datasets at ec.europa.eu/eurostat/data/database