Aarush Verma

01 / Research Overview

Overview

Delhi air pollution is driven by overlapping sources — crop burning, traffic, construction dust, weather trapping, festivals, and seasonal patterns. This project builds a digital twin that identifies these contributing elements and models their relationship to observed AQI.

Rather than only reporting pollution after it occurs, the framework maps which sources are active, how they interact, and how much each contributes to overall air quality — supporting targeted understanding and planning.

Sources

Contributing Elements

Fire, traffic, weather, construction, calendar events, and human activity mapped as distinct inputs.

Method

Digital Twin Modelling

Multi-source data fused into a unified model representing Delhi's pollution system.

Output

Contribution & AQI

Relative source impact estimated alongside AQI — approx. ±15 error on predicted levels.

Use

Source-Level Insight

Clarifies what is contributing to pollution and by how much — for policy, schools, and communities.

02 / Case Study Selection

Why Delhi?

Delhi was selected because crop residue burning, traffic, construction, weather, wind direction, low wind speed, festivals, and winter seasonality interact. This makes Delhi a strong location for testing whether machine learning can combine many variables better than simple monitoring.

Severity

High Pollution Severity

Delhi regularly records AQI levels among the worst globally.

Complexity

Many Interacting Causes

Traffic, fires, construction, weather, and festivals overlap simultaneously.

Seasonality

Severe Winter Episodes

Low wind and temperature inversion trap pollutants for weeks.

Policy

National Importance

India's capital — policy relevance and public attention are exceptionally high.

Public Life

Public Health & School Impact

School closures and health advisories directly affect millions of residents.

Data

Rich Data Availability

Multiple APIs and monitoring stations provide sufficient data for ML modelling.

03 / Research Objective

Research Objective

The objective is to build a Delhi pollution digital twin that identifies contributing sources, models their interaction with observed AQI, and quantifies how much each element contributes to pollution levels.

Identify Sources

Map crop burning, traffic, construction, weather, festivals, and human activity as distinct contributing elements.

Quantify Contribution

Model how much each source contributes to observed AQI — not just that pollution occurred, but what drove it.

Enable Targeted Action

Provide source-level insight that supports policy, school planning, and community awareness.

04 / Economic & Social Impact

Economic & Social Impact

Delhi pollution affects productivity, healthcare costs, schools, government activity, construction, transport, aviation, tourism, and local business activity.

Economy

GDP Loss

Citywide economic output declines during prolonged pollution episodes.

Workforce

Productivity Loss

Workers and businesses operate below capacity due to health impacts.

Healthcare Burden

Increased hospital visits, medicine costs, and respiratory treatment.

School Closures

Education disrupted when AQI exceeds safe thresholds for children.

Construction Shutdowns

Project delays and compliance costs when dust controls are enforced.

Aviation

Airport / Flight Disruption

Visibility and health concerns affect aviation and business travel.

View all 10 disruptions▼

Citywide GDP loss
Business productivity loss
Worker absenteeism
Healthcare and medicine spending
Hospital load and public health cost
Construction shutdowns and project delays
Construction compliance costs and fines
Transport and logistics disruption
Aviation and business travel disruption
Tourism and consumer spending loss

05 / Data Architecture

Data Sources

Eight data sources — AQI monitoring, NASA FIRMS fire data, weather APIs, traffic congestion, calendar and festival windows, construction activity, temporal features, and human activity proxies — are collected, stored, and prepared for model input.

API Source Layer

AQI Data

NASA FIRMS

Weather API

Traffic API

Calendar

Construction

Time Features

Human Activity

↓

Storage Layer

Supabase Database

↓

Model Layer

Gradient Descent ML

Feature Engineering

↓

Output Layer

AQI & Source Contribution Output

Source	Data Used	Model Role
Delhi AQI / Pollution Data	AQI, PM2.5, PM10, gases, station, timestamp	Target value
NASA FIRMS	Fire count, brightness, confidence, fire radiative power	Crop-burning signal
Weather API	Wind speed, wind direction, humidity, rainfall, temperature	Dispersion / trapping signal
Traffic API	Time per km, congestion, road speed	Vehicular emissions proxy
Calendar / Festival Data	Diwali, festival window, harvest season	Sudden event signal
Date / Time Features	Hour, weekday, month, season	Seasonal and daily cycles
Construction / Urban Activity	Dust risk, construction activity index	PM10 / dust proxy
Human Activity Patterns	Commute hour, night/day, activity score	Human movement proxy

06 / Feature Engineering

Feature Engineering

Each contributing element — crop burning, traffic, weather trapping, construction dust, festival periods — is encoded as a structured feature so the digital twin can learn how much each source drives observed AQI.

External Pollution

fire_count_near_delhi
average_fire_intensity
wind_direction_degrees
wind_speed_kmph

Local Pollution

traffic_index
average_time_per_km_minutes
construction_activity_index
dust_risk_score

Weather Trapping

temperature_celsius
humidity_percent
rainfall_mm
is_winter

Calendar Shocks

is_diwali_period
is_festival_period
is_harvesting_season
day_of_week
hour_sin
hour_cos

07 / Machine Learning Model

ML Model

The model uses Gradient Descent to predict AQI from input features, compare predicted AQI with actual AQI, calculate error using an RMSE/RMSD-style approach, and update weights iteratively until error converges.

Input historical data

Predict AQI

Compare with actual AQI

Calculate error

Update weights

Repeat

RMSE / RMSD Error Error = √(Σ(predicted − actual)² / n)

Weights updated iteratively to minimize this error across training epochs.

Why Gradient Descent? Explainable starting model that shows feature weights clearly. Good foundation for future Random Forest, XGBoost, or Neural Network comparisons.

08 / Results & Evaluation

Results & Evaluation

±15

Digital Twin AQI Validation

The twin model validates against observed AQI at approximately ±15 error — confirming that source contributions are mapped to realistic pollution levels.

Actual AQI	Predicted AQI	Difference
250	263	+13
310	296	−14
400	415	+15

Validation

Twin Model Accuracy

±15 AQI error confirms the digital twin reproduces observed pollution levels from combined source inputs.

Data Quality

Stronger With Complete Data

Source attribution improves when all contributing-element inputs are timely and complete.

Attribution

Source Contribution

Identifies which contributing elements are driving AQI and estimates their relative impact on pollution levels.

09 / Limitations & Next Steps

Limitations & Next Steps

Outlier events, spatial variation across monitoring stations, limited construction data, indirect human-activity proxies, and API dependency define the current model boundary and shape the next phase of research.

Current Limitations

Outlier events can cause higher error
Local station sensitivity — AQI varies by location
Limited construction data availability
Human activity estimated via proxies
API dependency — delays or missing data
Nonlinear pollution interactions

Future Research Upgrades

Station-level localized data
Improved construction & road-dust datasets
Real-time traffic feed integration
Random Forest, XGBoost, Neural Networks
School-facing AQI awareness dashboard
Framework adaptation to other cities

Technical details — model assumptions▼

Gradient Descent assumes approximately linear feature–AQI relationships in the initial model.
Feature scaling and normalization applied before training.
Train/test split used historical time-series windows.
RMSE-style error metric used for weight updates.
Future models may capture nonlinear interactions more effectively.

10 / Research Foundation & Citations

Research Foundation & Citations

This research is supported by published work on Delhi AQI forecasting, machine-learning-based PM2.5 prediction, crop-residue burning, winter pollution episodes, Diwali/firecracker effects, source apportionment, and the health/economic burden of air pollution.

Research foundation built on 18 high-value sources

Delhi-specific forecasting

Satellite fire data

Source apportionment

Health/economic impact

Source	Supports	Gist	Link

Predicting Delhi Air Pollution Before It Peaks

Contributing Elements

Digital Twin Modelling

Contribution & AQI

Source-Level Insight

Research Readiness

Business Mindset

Technology Builder

Environmental Purpose

Career Direction