Methods — Ahmed Younes

Delivery Methodology

Stakeholder Communication Framework

A four-stage protocol for presenting analytical findings to mixed technical and non-technical audiences — anchor to the decision, show the funnel, translate outputs to business meaning, and engineer next steps that build momentum.

decision-first framing stakeholder communication delivery

View method →

Documentation Project

Topic Modelling Recipes

A practitioner's guide accumulated across multiple projects — methodological checklists, stage-specific investigations, and documented decisions from running topic modelling in production across languages and client contexts.

BERTopic methodology documentation checklist

View project →

Text Analysis

Topic Allocation Workflow

Five-stage pipeline for large-scale narrative discovery — sentence-transformer embeddings, UMAP, and HDBSCAN at its core. Adapts to qualitative exploration and quantitative classification, with multilayered and guided variants.

BERTopic sentence-transformers SetFit UMAP HDBSCAN

View method →

Research Investigation

Outlier Mitigation in HDBSCAN

Empirical investigation into the HDBSCAN -1 cluster — which consistently captures 40–60% of data. Compares soft clustering (membership vectors) and k-means as strategies for recovering analytically valuable fringe content.

HDBSCAN soft clustering k-means UMAP

View method →

Text Analysis

Granular Annotation Scheme

Entropy-based stopping criteria for topic annotation. Replaces fixed-sample characterisation with proportional sampling and per-message description, using Shannon entropy to determine when a topic's description has stabilised sufficiently to stop.

Shannon entropy annotation sampling

View method →

Text Analysis

Topic Model Evaluation

Review vs blind evaluation of thematic allocation — when each is appropriate, how anchoring bias emerges, and a hybrid approach that splits the evaluation sample into a review subset and a blind test subset.

evaluation stratified sampling blind evaluation

View method →

Research Investigation

Translation for Topic Modelling

Empirical comparison of clustering on source text vs translated text — three setups on Arabic-English data showing homogeneity tradeoffs, when each approach is appropriate, and what translation quality means for cluster coherence.

mBART multilingual BERTScore METEOR UMAP

View method →

Text Analysis

Guided Topic Modelling

Fine-tuning sentence transformer embeddings with contrastive learning to steer clustering toward predefined analytical objectives — bridging unsupervised discovery and analytically defined frameworks.

contrastive learning sentence-transformers SetFit UMAP HDBSCAN

View method →

Classification

Contrastive Fine-tuning for Classification

Reshaping the sentence transformer embedding space with contrastive learning to improve k-NN classification accuracy on population-scale annotated data — a standalone evaluation experiment with a structured pipeline and comparison dashboard.

contrastive learning k-NN classification sentence-transformers Streamlit

View method →

Text Analysis

Multilayered Topic Modelling

Iterative clustering passes applied to heterogeneous topics — Layer 1 produces a broad thematic breakdown, subsequent layers dissect heterogeneous topics into sub-narratives discovered from the data rather than declared by classifiers.

UMAP HDBSCAN sentence-transformers annotation

View method →

Classification

Classification Approaches

Three strategies for assigning categories to documents at scale — zero-shot NLI, exemplar-based k-NN, and keyword and ML-based classifiers — and a framework for choosing between them based on available labels and accuracy requirements.

k-NN zero-shot NLI sentence-transformers ChromaDB

View method →

Information Extraction

Broadcast Transcript Analysis

Pipeline for parsing multilingual broadcast STT transcripts — speaker diarisation, Arabic prefix merging, HuggingFace NER extraction, and fuzzy duplicate detection — with a Streamlit dashboard for exploration across Iraqi Arabic, Indonesian, and English channels.

HuggingFace speaker diarisation NER Arabic NLP Streamlit

View method →

Data Engineering

Social Media Data Collection

Multi-platform collection infrastructure managing continuous ingestion pipelines across Brandwatch, Telegram, YouTube, Twitter/X, and CrowdTangle — with daily quota management, allowlist curation, and automated ingestion into a centralised data warehouse for downstream NLP analysis.

Brandwatch Telegram API YouTube API Twitter / X API data engineering

View method →

Information Extraction

Multilingual NER Evaluation

Three-stage benchmarking framework for selecting and validating NER models across 26 languages, from high-resource to low-resource. Separates benchmark performance from project-specific domain validation.

XLM-RoBERTa seqeval HuggingFace CoNLL spaCy

View method →

Cross-lingual NLP

Machine Translation Evaluation

Comparative framework using BERTScore as the primary quality signal alongside METEOR. Includes structured error analysis covering named entity preservation, tokenisation, domain shift, and idiomatic language across morphologically rich languages.

BERTScore METEOR mBERTScore Python

View method →