← Back to Portfolio

The Project

Context & Objectives

FCDO and BII (British International Investment) commissioned ITAD to evaluate their efforts to mobilise private investment in South Africa. ITAD engaged CASM Technology to build the NLP analysis module — a sentence-level investor sentiment and driver index tracking how investor confidence in South Africa's energy sector shifted over 25 years (1996–2022) and identifying the key drivers of those shifts.

The central methodological contribution was distinguishing genuine investor sentiment — explicit statements of intent to invest or disinvest — from general market commentary, enabling a cleaner and more predictive signal than article-level approaches.

My Role

Contributions

Defined the sentiment framework — distinguishing positive and negative investor sentiment based on explicit investment actions and stated intent rather than general market tone. Designed the driver framework: 5 primary drivers and 14 sub-drivers (Economic, Political, Enabling Environment, Investor Priorities, COVID-19) grounded in domain knowledge and validated with subject matter experts.

Trained and evaluated the classifier suite using active learning to generate high-quality training examples for transformer-based models — covering investment relevance, sentiment direction, and all driver categories. Led the annotation and evaluation strategy, coordinating domain experts throughout.

Conducted all analysis — event detection, sentiment trends over time, driver attribution, volumetric analysis — and produced the findings and presentations delivered to FCDO and BII stakeholders.

Scope

Project Scope

  • Articles collected: 80,622 from 17 online news sources
  • Sentences analysed: ~50,000 extracted and segmented from relevant articles
  • Time period: May 1996 – January 2022
  • Geography: South Africa — energy sector
  • Language: English
  • Sentiment classes: Positive / Negative (explicit investor actions and intent only)
  • Drivers: 5 primary drivers, 14 sub-drivers
  • Classifiers trained: 7 binary classifiers (relevance, sentiment, 5 drivers)

Method

Approach & Pipeline

Sentiment Framework: sentence-level rather than article-level analysis, isolating explicit investor statements — investments made, withdrawals, stated intent — from background market commentary. Positive and negative classified independently (not mutually exclusive) to capture divergence patterns.

Classification Pipeline: a multi-stage pipeline covering investment relevance filtering, sentence splitting, and sentiment and driver classification. Contributions focused on the sentiment and driver layers:

  1. Positive sentiment classifier — 500 annotated sentences (F1: 89%)
  2. Negative sentiment classifier — 500 annotated sentences (F1: 80%)
  3. Driver classifiers — 5 binary models: Economics (82%), Politics (88%), Enabling Environment (81%), Investor Priorities (78%), COVID-19 (93%)

Active learning used throughout to iteratively select high-value training examples for transformer fine-tuning, reducing annotation effort while improving boundary precision.

Outcomes

Results & Impact

The analysis tracked positive and negative investor sentiment over time, attributed shifts to their underlying drivers, and identified the specific events that explain those shifts.

Key analytical findings: the relationship between drivers and sentiment direction was mapped — Enabling Environment emerged as the strongest correlate of positive sentiment; Economic factors the dominant negative signal. Sentiment breakdown per driver revealed distinct contribution patterns across the time period.

Key themes within positive and negative sentiment periods were identified and grounded in specific events — providing an interpretable, evidence-based narrative alongside the quantitative index.

Sentiment index validated as predictive of net investment flows — the strongest validation signal across all markets in the evaluation. Referenced in a published ITAD blog post (March 2023).

Tech Stack

Python HuggingFace Transformers PyTorch sentence-transformers scikit-learn Pandas Plotly