BBC Monitoring
China's Public Diplomacy on Social Media
Completed · 2019—2022The Project
Context & Objectives
An engagement with BBC Monitoring to map China's diplomatic social media activity across Twitter and Facebook — 432,800 messages collected across 372 diplomatic accounts (2019–2020), with 102,883 of those messages classified across 9 themes, 4 languages and 6 global regions (2021).
The aim was to give BBC Monitoring a systematic, scalable view of how Chinese diplomatic accounts framed narratives across regions and platforms — from COVID-19 and geopolitics to human rights and technology.
My Role
Contributions
Phase 1 (2019–2020): restructured and standardised the account seed list into a unified collection schema across 372 diplomatic accounts (13 handle/link types). Set up and maintained the data collection pipeline — cron-based scheduling for ongoing ingestion and historic backfill, collecting 432,800 messages between October 2019 and December 2020. Led the keyword-based thematic pilot across all four languages to establish an agreed thematic breakdown. Named contributor on the published BBC Monitoring report.
Phase 2 (2021): contributed to training and evaluating 34 binary classifiers across 4 languages and 9 themes. Supported annotation coordination across language specialist teams, running active learning cycles to iteratively improve model quality and refining the pipeline throughout.
Scope
Project Scope
Phase 1 — Data Collection (Oct 2019 – Dec 2020)
- Accounts mapped: 372 diplomatic accounts across 13 handle/link types (embassies, consulates, ambassadors, press officers)
- Messages collected: 432,800
- Languages: English, Arabic, French, Spanish
- Pipeline: cron-based scheduling for ongoing ingestion and historic backfill
Phase 2 — Classification (2021)
- Posts classified: 102,883 across Twitter and Facebook
- Regions: Asia-Pacific, Africa, Americas, Europe, Middle East, Eurasia
- Themes: 9 (Geopolitics, Economy, COVID-19, Politics & Society, Culture & People, Military & Security, Technology, Environment, Human Rights)
- Classifiers trained: 34 binary classifiers (9 themes × 4 languages)
- Evaluation: manually annotated gold-standard datasets per language and theme
Method
Approach & Pipeline
Phase 1 — Keyword Discovery (2019–2020): account seed list compiled and unified into a consistent collection schema across 372 accounts and 13 diplomatic handle types. Keyword-based thematic pilot run across all four languages to agree on a thematic framework before moving to supervised classification.
Phase 2 — ML Classification (2021): 9 themes × 4 languages = 34 binary classifiers, each trained on manually annotated gold-standard data. Annotation coordinated across four language specialist teams with defined guidelines and positive/negative example documentation. Active learning cycles used to iteratively surface high-value training examples and improve performance. Pipeline maintained and refined throughout.
Outcomes
Results & Impact
- Average F1: 80.5% across 34 classifiers — consistent performance across 4 languages and 9 themes
- Coverage: 102,883 posts classified across 6 global regions, spanning English, Arabic, Spanish and French
- Published report: "China's Public Diplomacy on Twitter and Facebook" — BBC Monitoring, May 2022