Back to Home

About Dr. Tosin Ojajuni

Comprehensive overview of my professional background, experience, and expertise in Data Engineering and AI/ML systems.

About Me

Distinguished Senior Data & AI Engineer with PhD credentials, specializing in architecting enterprise-scale data platforms and AI-powered solutions across AWS, GCP, and Azure ecosystems. With 6+ years focused in Data and AI Engineering, I deliver mission-critical data pipelines processingpetabytes of data for 50M+ users, achieving 40% performance improvements and20% productivity gains.

I excel at building production-grade ML/AI systems, including GenAI agents and agentic AI solutions usingVertex AI and LLMs, with deep expertise in modern data stack technologies likeDatabricks, DBT, and Dataform. My diverse professional foundation spanning 15+ years provides exceptional analytical rigor and systematic problem-solving approaches that drive innovative data solutions.

Proven track record of driving $M+ cost optimizations through intelligent automation and architectural excellence. Certified across major cloud platforms including latest 2025 GCP certifications, with expertise in technical leadership and team mentoring to deliver scalable data engineering solutions that create measurable business impact.

15+
Years Professional Experience
50M+
Users Served
$M+
Cost Savings

Professional Projects & Solutions

Specialized Data & AI Engineering expertise delivering enterprise-scale solutions with measurable business impact. Enhanced by 15+ years of analytical rigor and systematic problem-solving across diverse professional domains.

Enterprise-Scale Data Platform Architecture

Entertainment Technology
Mission-Critical Infrastructure
Key Results & Impact
  • Architected enterprise data platform reducing data processing costs by 35% while improving pipeline reliability to 99.9% SLA
  • Pioneered Delta Live Tables implementation for real-time event streaming analytics, processing 10M+ events daily with sub-minute latency
  • Built production ML pipelines achieving 28% improvement in user engagement and fraud detection preventing $2M+ annual losses
  • Established MLOps best practices increasing ML deployment velocity by 3x and reducing model debugging time by 45%
  • Led cross-functional collaboration driving 15% increase in user retention
  • Mentored 4 junior engineers on advanced data engineering and ML principles
Technologies & Tools
DatabricksAWSDelta LakePySparkTensorFlowMLflowPythonSQL

Petabyte-Scale Cloud Migration & Data Lake Architecture

Telecommunications
Enterprise Infrastructure Transformation
Key Results & Impact
  • Orchestrated petabyte-scale data migration to GCP, completing migration 20% ahead of schedule with zero data loss
  • Built scalable data lake architecture processing 5+ petabytes of network telemetry data for 30M+ customers
  • Implemented automated workflows reducing manual intervention by 80% and improving data freshness from hours to minutes
  • Optimized data warehouse achieving 40% reduction in query response times and $100K+ annual cost savings
  • Developed comprehensive data catalog improving data discoverability by 60% across 200+ datasets
  • Engineered real-time streaming pipelines processing 1M+ messages/hour with 99.95% reliability
Technologies & Tools
GCPBigQueryDataflowPub/SubCloud ComposerPythonSQLApache BeamTerraform

Enterprise Data Solutions & BI Platform Development

Consulting & Analytics
Fortune 500 Digital Transformation
Key Results & Impact
  • Designed dimensional data models for Fortune 500 clients supporting $10M+ analytics initiatives
  • Automated data ingestion pipelines improving data quality from 60% to 95% accuracy
  • Built serverless ETL orchestration reducing infrastructure costs by 40%
  • Implemented Apache Airflow as central workflow orchestration platform, reducing pipeline development time by 30%
  • Architected BI platform reducing report generation time from days to hours
  • Delivered data governance framework ensuring GDPR compliance across 15+ enterprise applications
Technologies & Tools
AWSGlueRedshiftLambdaS3AirflowPostgreSQLMySQLPythonSQLTableau

Retail Analytics Platform & Customer 360 Solution

Retail & E-commerce
Global Customer Data Unification
Key Results & Impact
  • Architected cloud data warehouse consolidating customer data from 15+ global markets
  • Improved data accuracy from 55% to 92% enabling unified customer 360 view
  • Tripled BI capability reducing time-to-insight from weeks to hours for marketing teams
  • Built retail analytics platform processing 5M+ customer transactions monthly
  • Optimized transformations achieving 60% reduction in processing times
  • Established data governance standards ensuring data quality and compliance
Technologies & Tools
GCP BigQueryPythonSQLCloud FunctionsPub/SubLookerdbt

End-to-End ML Applications & GenAI Solutions

Cross-Industry ML Deployment
Production ML Systems
Key Results & Impact
  • Developed 10+ production ML applications serving 50K+ monthly users across finance, healthcare, and e-commerce
  • Built GenAI prototypes leveraging LLMs achieving 85% user satisfaction in pilot programs
  • Implemented MLOps pipelines enabling automated model training, versioning, and deployment
  • Engineered scalable data pipelines supporting ML model training and inference workloads
  • Conducted comprehensive EDA and feature engineering improving model performance by 20-30%
  • Designed scalable inference APIs with <100ms latency and 99.9% uptime using containerized deployments
Technologies & Tools
PythonTensorFlowPyTorchScikit-learnStreamlitGCPAWSDatabricksMLflowDocker

Technical Skills

Expertise across the full data engineering and AI/ML stack

Cloud Platforms

  • AWS (Glue, Redshift, S3, Lambda, EMR, Step Functions)
  • GCP (BigQuery, Pub/Sub, Dataflow, Vertex AI)
  • Azure (Databricks, Data Factory, Synapse Analytics)

Data Engineering

  • ETL/ELT: DBT, Dataform, Apache Airflow, Databricks Workflows
  • Big Data: Apache Spark, PySpark, Kafka, Delta Lake
  • Data Warehousing: BigQuery, Redshift, Snowflake

Programming

  • Python, SQL, PL/pgSQL
  • Scala, Bash
  • Processing: Pandas, NumPy, Polars, DuckDB

Machine Learning & AI

  • Platforms: Databricks MLflow, Vertex AI, SageMaker
  • Frameworks: TensorFlow, PyTorch, Scikit-learn, XGBoost
  • GenAI: LangChain, RAG pipelines, LLMs, Prompt Engineering

Data Visualization

  • Tableau, Power BI, Looker
  • Streamlit
  • Chart.js, Plotly

DevOps & MLOps

  • Docker, Kubernetes
  • Git, GitHub Actions, GitLab CI/CD
  • Model versioning, A/B testing, CI/CD for ML

Streaming & Real-time

  • Apache Kafka
  • Google Pub/Sub
  • AWS Kinesis

Specializations

  • Data Lake Architecture
  • Dimensional Modeling
  • Cost Optimization
  • Performance Tuning

Technical Leadership & Innovation

Bridging academic research with industry innovation through computational methods, technical leadership, and open source contributions to the data engineering community.

Research Specializations

PhD Research Focus

Data-Driven Engineering

Advanced computational methods for optimizing complex engineering systems through data analytics and machine learning.

Academic Specialization

Computational Methods

Development of novel algorithms for processing large-scale engineering datasets and predictive modeling.

Industry Application

AI in Production Systems

Bridging the gap between academic research and enterprise-scale AI implementation.

Technical Leadership

Cross-functional Team Leadership

Team development

Enterprise Organizations

Led engineering teams across multiple projects, driving technical strategy and mentoring junior engineers.

Architecture & System Design

Enterprise scale

Mission-Critical Systems

Architected data platforms serving 50M+ users with focus on scalability and reliability.

Knowledge Transfer & Mentoring

Technical Mentorship & University Teaching

10+ engineers mentored

Academic & Enterprise Teams

Established mentorship programs leveraging University lecturing experience to develop advanced data engineering and ML engineering capabilities across teams.

Technical Workshop Leadership

Knowledge transfer

Academic & Industry Events

Led workshops on Modern Data Stack, MLOps, and Cloud Architecture, drawing from 15+ years of Civil Engineering and Academic experience.

Academic & Industry Impact

15+
Years Experience
PhD
Advanced Degree
10+
Engineers Mentored

Education & Certifications

Education

PhD in Engineering

University of Surrey

Guildford, UK

Focus: Data-driven Engineering and Computational Methods

MSc Civil Engineering (Merit)

University of Surrey

Guildford, UK

Focus: Engineering Analytics and Optimization

Professional Certifications

Cloud

Google Cloud Professional Cloud Architect

Google Cloud

2025

AI/ML

Google Cloud Professional Machine Learning Engineer

Google Cloud

2025

AI/ML

Google Generative AI Leader Certification

Google Cloud

2024

Data Engineering

Databricks Certified Data Engineer Associate

Databricks

2023

Cloud

GCP Professional Data Engineer

Google Cloud

2023

AI/ML

TensorFlow Developer Certificate

Google

2023

Business

BCS Certificate in Business Analysis

BCS

2022

Get In Touch

Interested in collaborating on data engineering or AI projects? Let's connect!

Contact Information

Location

Portsmouth, England, UK

Email

Available via contact form

Open to Opportunities

I'm always interested in hearing about new opportunities in data engineering, ML engineering, AI engineering, agentic AI, and cloud architecture. Feel free to reach out!