AI Engineer • Data Scientist • NLP Researcher

Ahmed Younes

Navigating data into decisions

I work with organisations to navigate their data challenges — from raw unstructured text to intelligence that drives real decisions. I work at the intersection of data science, natural language processing and engineering — combining research depth with hands-on building and strategic consultancy. I've worked with organisations including the Gates Foundation, FCDO, ISD, BBC Monitoring and Ofcom, and I guide professionals across 50+ organisations in applying AI and data science in practice.

Available for consulting, research & building AI solutions

9
Projects
15
Methods
8
Sectors
50+
Organisations

Projects

Emotech · Contract

LLM-as-a-Judge Evaluation Framework for Conversational AI

Built an automated evaluation pipeline for a customer service voice AI — GPT generates adversarial test scenarios, drives live conversations, and judges outputs across design bugs, logic bugs, and quality metrics. Scaled from 7 to 42 test cases across 14 categories.

LLM-as-a-Judge Gen AI Conversational AI Rasa CALM
View Details →

AxessAll · Contract

AI-Driven Accessibility Platform — Strategy & Implementation

Building a production RAG system for accessibility consultants, an NLP analytics pipeline over audit data, and an agentic auditing system — underpinned by a seven-layer national AI platform strategy.

RAG Agentic AI AI Strategy NLP
View Details →

EMIF / ISD

Pro-Kremlin Influence Network Mapping

Mapped pro-Kremlin influence networks across 7.8 million Telegram posts in France, Germany and Italy.

NLP Topic Modelling Contrastive Learning Python
View Details →

Gates Foundation

Global Fund Advocacy Evaluation

Evaluated social media advocacy for the Global Fund's 7th replenishment across 5 markets and 43,000 posts.

Social Media Analysis Thematic Framework NLP
View Details →

FCDO / BII

Investor Sentiment in South African Energy

Built an investor sentiment index for South Africa's energy sector validated as predictive of investment flows.

Sentiment Analysis Transformers NLP Python
View Details →

Swedish Institute

Sweden's Global Image Under Pressure

Analysed the impact of two major international incidents — the Quran Burning protests and the LVU child welfare controversy — on Sweden's image across 9 countries and 7 languages.

Multilingual NLP Topic Modelling Machine Translation
View Details →

Ofcom

Disinformation Landscape Mapping

Built a multilayered topic modelling system to profile disinformation actors on social media.

Topic Modelling BERTopic Actor Profiling
View Details →

BBC Monitoring

China's Public Diplomacy on Twitter and Facebook

Mapped China's diplomatic social media activity across Twitter and Facebook — 432,800 messages collected across 372 accounts, with 102,883 classified across 9 themes in 4 languages.

Python scikit-learn Twitter API CrowdTangle Pandas Plotly
View Details →

Spike Insight · via University of Sussex RISE

Customer Review Topic, Theme & Sentiment Analysis

Analysed ~13,000 customer reviews for a guided-walking-holiday operator — topic modelling, thematic annotation and a zero-shot correction layer lifted macro-F1 from 0.69–0.79 to 0.85–0.95.

Topic Modelling BERTopic Zero-Shot Classification Sentiment Analysis UMAP
View Details →

Experience

Jan 2026 — Present · Contract

AxessAll

Jan 2026 — Present AI Integration Lead
RAG Agentic AI AI Strategy WCAG 2.2
Full Details →
Aug 2025 — Present · Full-time

QA Ltd

Aug 2025 — Present Digital Learning Consultant
Jan 2025 — Mar 2025 · Contract

Emotech

Jan 2025 — Mar 2025 Applied LLM Researcher
LLM-as-a-Judge Conversational AI Rasa CALM Arabic NLP
Full Details →
2020 — Dec 2024 · Full-time

CASM Technology

Oct 2022 — Dec 2024 Senior Data Scientist
2020 — Oct 2022 · PhD Placement NLP Research Practitioner
NLP & Classification Data Engineering Pipeline Architecture Social Media Collection (Brandwatch)

Industries & Sectors

Through my role with QA, I've engaged with teams across 50+ organisations spanning multiple sectors — helping them incorporate data science and AI into their work.

Research

PhD Doctoral Output — University of Sussex, 2025

DeformAr

A diagnostic framework for NER evaluation across Arabic and English — introducing an entity-level error taxonomy and cross-lingual benchmarking protocols to surface where and why NER models fail on morphologically complex languages. DeformAr moves beyond aggregate F1 scores to provide actionable diagnostics for low-resource Arabic NER.

Python PyTorch HuggingFace Arabic NLP NER Evaluation
Full Research Page →

Methods

Topic Modelling Recipes

A practitioner's guide to topic modelling — checklists, investigations, and methodological decisions accumulated across multiple production projects.

Multilayered Topic Modelling

Hierarchical topic discovery from broad themes to granular sub-topics, enabling scalable narrative analysis across large corpora.

NER Evaluation Workflow

Three-stage benchmarking framework across low and high resource languages, surfacing systematic error patterns at entity level.

Machine Translation Evaluation

Comparative metric framework using BERTScore as primary signal, combining automatic and human evaluation for robust translation quality assessment.

Contact

Location
Markfield, UK