Source on EconIndx: Eurostat β free, no registration, 7,000+ datasets, 27 EU member states + EEA.
Access & Pricing
Fully free, no registration, no API key. Eurostat is the European Unionβs statistical office β all data is publicly available for commercial and non-commercial use with attribution. The JSON-UI API at ec.europa.eu/eurostat/api/dissemination/statistics/1.0/data/ is the easiest entry point.
Your First Data Pull
Eurostat organizes data by dataset code (e.g., namq_10_gdp for quarterly national accounts). Each dataset has dimensions you filter with query parameters:
π Note: Eurostat dataset codes look opaque but follow a pattern:
une_rt_m= unemployment (une) rate (rt) monthly (m). Browse the Eurostat Data Browser to find codes visually. The URL of any dataset page contains the code you need for the API.
import requests
import pandas as pd
EUROSTAT_BASE = "https://ec.europa.eu/eurostat/api/dissemination/statistics/1.0/data"
def fetch_eurostat(dataset: str, **filters) -> pd.DataFrame:
"""Fetch a Eurostat dataset with dimension filters."""
params = {"format": "JSON", "lang": "EN", **filters}
r = requests.get(f"{EUROSTAT_BASE}/{dataset}", params=params)
r.raise_for_status()
data = r.json()
# Unpack SDMX-lite JSON structure
dims = data["dimension"]
dim_order = data["id"]
values = data["value"]
size = data["size"]
# Build index β label maps for each dimension
label_maps = {
dim: {str(v["index"]): k for k, v in dims[dim]["category"]["label"].items()}
for dim in dim_order
}
# Flatten the multi-dimensional array
rows = []
for flat_idx, obs_val in values.items():
idx = int(flat_idx)
coords = []
for s in reversed(size):
coords.append(idx % s)
idx //= s
coords.reverse()
row = {dim: label_maps[dim].get(str(c), c)
for dim, c in zip(dim_order, coords)}
row["value"] = obs_val
rows.append(row)
return pd.DataFrame(rows)
# Quarterly GDP for Germany and France
gdp = fetch_eurostat(
"namq_10_gdp",
geo="DE,FR",
unit="CP_MEUR", # current prices, millions EUR
na_item="B1GQ", # GDP
s_adj="NSA", # not seasonally adjusted
freq="Q"
)
print(f"Rows: {len(gdp)}")
print(gdp.tail(5)[["geo", "time", "value"]])
First Pull: What to Expect
| Dataset | Description | Filter example | Rows (2 countries, 20yr) |
|---|---|---|---|
namq_10_gdp | Quarterly national accounts | geo=DE,FR | ~200β400 per unit |
une_rt_m | Monthly unemployment rate | geo=EU27_2020,DE,FR | ~600 |
prc_hicp_midx | HICP monthly index | geo=EU,DE,FR | ~800 |
ext_lt_maineu | Trade with main partners | --- | ~50,000+ (large) |
demo_pjan | Population on 1 Jan, annual | geo=EU27_2020 | ~300 |
Without geo filtering, a dataset like namq_10_gdp can return 100,000+ rows (all countries, all dimensions). Always filter by geo and time for your first pull.
Flags embedded in TSV: When using the bulk TSV format (different endpoint), flags like : (not available), b (break in series), e (estimated) are embedded in value cells (e.g., "1234.5 e"). The JSON-UI API returns clean numeric values without flags β use it for pipeline work.
Key Datasets to Start With
National accounts:
namq_10_gdpβ quarterly GDP by expenditure approachnama_10_gdpβ annual GDP, broader coveragenama_10_pcβ GDP per capita, annual
Labor market:
une_rt_mβ monthly unemployment rate, by sex and agelfsq_urganβ quarterly unemployment by geography
Prices:
prc_hicp_midxβ HICP monthly price index (EU inflation measure)prc_hicp_aindβ HICP annual average index
Trade:
ext_lt_maineuβ extra-EU trade by main partners (large dataset)tet00002β exports/imports summary
Population:
demo_pjanβ population on 1 Januarydemo_gindβ population change indicators
# Monthly unemployment for all EU countries (one call)
unemp = fetch_eurostat(
"une_rt_m",
sex="T", # total (M/F/T)
age="TOTAL",
unit="PC_ACT", # % of active population
s_adj="SA", # seasonally adjusted
freq="M"
)
print(f"Countries available: {unemp['geo'].nunique()}")
print(f"Total rows: {len(unemp)}")
# Expected: ~30 countries Γ ~300 months = ~9,000 rows
Data Tolerance & Validation
Whatβs normal:
- Eurostat harmonizes national data, so coverage depends on member state reporting. New EU members (Bulgaria, Romania) have shorter series. Some indicators only go back to the year of EU accession.
- Quarterly GDP (
namq_10_gdp) is revised for 2+ years after each release. Download timestamps matter β store them. - NUTS geography levels add complexity:
namq_10_r3provides regional (NUTS 2) data, which is much sparser than national (NUTS 0) data. - The time dimension format is
YYYY-QNfor quarterly (2023-Q4),YYYY-MMfor monthly,YYYYfor annual.
β οΈ Flag codes: Eurostat uses observation flag codes alongside values. A
pflag means provisional,emeans estimated,bmeans break in series. Always parse thestatusdimension alongside thevaluedimension in JSON-UI responses and store flags in your schema β they affect how the data should be used in models.
Validation checks:
def validate_eurostat_pull(df: pd.DataFrame, dataset: str) -> dict:
country_count = df["geo"].nunique() if "geo" in df.columns else None
row_count = len(df)
# For quarterly data, parse time to find latest
if "time" in df.columns:
# Handle both "2023-Q4" and "2023-12" formats
times = df["time"].dropna().unique()
latest = max(times)
else:
latest = None
null_count = df["value"].isna().sum() if "value" in df.columns else None
return {
"dataset": dataset,
"row_count": row_count,
"country_count": country_count,
"null_count": null_count,
"latest_period": latest,
"alert": row_count == 0, # empty response = bad filter or dataset moved
}
report = validate_eurostat_pull(gdp, "namq_10_gdp")
print(report)
Alert thresholds:
- Zero rows returned: dataset code may have changed (Eurostat reorganizes periodically β check the catalog)
- Country count drops more than 20% from last pull: investigate API or dimension filter change
- Latest quarterly period more than 3 months behind current calendar quarter: data is stale
- HICP for euro area more than 45 days old: data is stale (published ~2 weeks after month end)
Bulk TSV for Large Initial Loads
For datasets with millions of rows, use the bulk TSV endpoint:
import gzip
import io
def fetch_eurostat_bulk(dataset: str) -> pd.DataFrame:
url = (f"https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/"
f"{dataset}?format=TSV&compressed=true")
r = requests.get(url, stream=True)
with gzip.open(io.BytesIO(r.content)) as f:
df = pd.read_csv(f, sep="\t", dtype=str)
return df
# Example: full HICP dataset (~50MB compressed)
# hicp_bulk = fetch_eurostat_bulk("prc_hicp_midx")
Schema Stability
Dataset codes are stable but Eurostat reorganizes its catalog every few years. Track your dataset codes in a registry and add a health-check that calls the catalog endpoint to confirm they still exist. Dimension codes (geo, unit, na_item, etc.) follow SDMX codelists and are very stable. Geographic codes follow Eurostat conventions (e.g., EU27_2020 for current EU, DE for Germany).