Ofcom (UK Communications Regulator)
Disinformation Narrative Mapping on Social Media
Pilot · 2023The Project
Context & Objectives
A pilot project for Ofcom — the UK's communications regulator — to map disinformation narratives across social media platforms from a curated list of known accounts. The goal was to understand what narratives existed, how they clustered thematically, and what each account's contribution to the broader disinformation ecosystem looked like.
A layered topic modelling approach was used — similar to the Swedish Institute project — with a first pass across the full corpus followed by theme-specific models to surface granular sub-narratives.
My Role
Data Scientist
Co-designed the analytical approach and research methodology. Responsible for executing an iterative two-week sprint cycle: week one to design, run and sample the topic model; week two for domain expert review of samples — with outputs generated and presented at every stakeholder meeting throughout the pilot.
This included preparing stratified samples for domain expert annotation, managing the annotation workflow, cleaning annotated outputs, and running the statistical analysis on the final classified data. The data collection pipeline and account list were pre-existing — contribution focused on the analytical, methodology and annotation workflow stages.
Scope
Project Scope
- Platforms: Facebook, Twitter, Telegram, Instagram, 4chan, YouTube
- Scale: ~1M messages (stratified sample from a much larger corpus)
- Account list: pre-curated list of known disinformation-adjacent accounts
- Themes: 5 dominant themes selected by the client for deeper investigation (e.g. Health → COVID conspiracy, vaccine hesitancy, pharma narratives)
Method
Approach & Pipeline
The full corpus was too large to process directly — a stratified ~1M message sample was drawn first. Content then passed through a two-layer pipeline:
- Layer 1 — global BERTopic across the sampled corpus to surface broad narrative themes, annotated into a general thematic breakdown
- Layer 2 — the client selected 5 dominant themes for deeper investigation; per-theme BERTopic models then surfaced granular sub-narratives within each (e.g. Health → COVID conspiracy, vaccination, pharmaceutical narratives)
Each iteration ran on a two-week cycle. Domain experts reviewed stratified samples from each cluster via structured annotation workflows. Outputs were consolidated, cleaned and quality-checked before statistical analysis and stakeholder reporting.
Outcomes
Results & Impact
Multilayered disinformation narrative map produced across the curated account set. Account-level thematic profiles built from the classified corpus.
Findings and methodology presented to Ofcom stakeholders on multiple occasions throughout the pilot. Internal deliverable — no published output. Further phases pending at time of completion.