01 / Research Overview
Overview
Delhi air pollution is driven by overlapping sources — crop burning, traffic, construction dust, weather trapping, festivals, and seasonal patterns. This project builds a digital twin that identifies these contributing elements and models their relationship to observed AQI.
Rather than only reporting pollution after it occurs, the framework maps which sources are active, how they interact, and how much each contributes to overall air quality — supporting targeted understanding and planning.
Contributing Elements
Fire, traffic, weather, construction, calendar events, and human activity mapped as distinct inputs.
Digital Twin Modelling
Multi-source data fused into a unified model representing Delhi's pollution system.
Contribution & AQI
Relative source impact estimated alongside AQI — approx. ±15 error on predicted levels.
Source-Level Insight
Clarifies what is contributing to pollution and by how much — for policy, schools, and communities.
02 / Case Study Selection
Why Delhi?
Delhi was selected because crop residue burning, traffic, construction, weather, wind direction, low wind speed, festivals, and winter seasonality interact. This makes Delhi a strong location for testing whether machine learning can combine many variables better than simple monitoring.
High Pollution Severity
Delhi regularly records AQI levels among the worst globally.
Many Interacting Causes
Traffic, fires, construction, weather, and festivals overlap simultaneously.
Severe Winter Episodes
Low wind and temperature inversion trap pollutants for weeks.
National Importance
India's capital — policy relevance and public attention are exceptionally high.
Public Health & School Impact
School closures and health advisories directly affect millions of residents.
Rich Data Availability
Multiple APIs and monitoring stations provide sufficient data for ML modelling.
03 / Research Objective
Research Objective
The objective is to build a Delhi pollution digital twin that identifies contributing sources, models their interaction with observed AQI, and quantifies how much each element contributes to pollution levels.
Identify Sources
Map crop burning, traffic, construction, weather, festivals, and human activity as distinct contributing elements.
Quantify Contribution
Model how much each source contributes to observed AQI — not just that pollution occurred, but what drove it.
Enable Targeted Action
Provide source-level insight that supports policy, school planning, and community awareness.
04 / Economic & Social Impact
Economic & Social Impact
Delhi pollution affects productivity, healthcare costs, schools, government activity, construction, transport, aviation, tourism, and local business activity.
GDP Loss
Citywide economic output declines during prolonged pollution episodes.
Productivity Loss
Workers and businesses operate below capacity due to health impacts.
Healthcare Burden
Increased hospital visits, medicine costs, and respiratory treatment.
School Closures
Education disrupted when AQI exceeds safe thresholds for children.
Construction Shutdowns
Project delays and compliance costs when dust controls are enforced.
Airport / Flight Disruption
Visibility and health concerns affect aviation and business travel.
- Citywide GDP loss
- Business productivity loss
- Worker absenteeism
- Healthcare and medicine spending
- Hospital load and public health cost
- Construction shutdowns and project delays
- Construction compliance costs and fines
- Transport and logistics disruption
- Aviation and business travel disruption
- Tourism and consumer spending loss
05 / Data Architecture
Data Sources
Eight data sources — AQI monitoring, NASA FIRMS fire data, weather APIs, traffic congestion, calendar and festival windows, construction activity, temporal features, and human activity proxies — are collected, stored, and prepared for model input.
| Source | Data Used | Model Role |
|---|---|---|
| Delhi AQI / Pollution Data | AQI, PM2.5, PM10, gases, station, timestamp | Target value |
| NASA FIRMS | Fire count, brightness, confidence, fire radiative power | Crop-burning signal |
| Weather API | Wind speed, wind direction, humidity, rainfall, temperature | Dispersion / trapping signal |
| Traffic API | Time per km, congestion, road speed | Vehicular emissions proxy |
| Calendar / Festival Data | Diwali, festival window, harvest season | Sudden event signal |
| Date / Time Features | Hour, weekday, month, season | Seasonal and daily cycles |
| Construction / Urban Activity | Dust risk, construction activity index | PM10 / dust proxy |
| Human Activity Patterns | Commute hour, night/day, activity score | Human movement proxy |
06 / Feature Engineering
Feature Engineering
Each contributing element — crop burning, traffic, weather trapping, construction dust, festival periods — is encoded as a structured feature so the digital twin can learn how much each source drives observed AQI.
- fire_count_near_delhi
- average_fire_intensity
- wind_direction_degrees
- wind_speed_kmph
- traffic_index
- average_time_per_km_minutes
- construction_activity_index
- dust_risk_score
- temperature_celsius
- humidity_percent
- rainfall_mm
- is_winter
- is_diwali_period
- is_festival_period
- is_harvesting_season
- day_of_week
- hour_sin
- hour_cos
07 / Machine Learning Model
ML Model
The model uses Gradient Descent to predict AQI from input features, compare predicted AQI with actual AQI, calculate error using an RMSE/RMSD-style approach, and update weights iteratively until error converges.
Input historical data
Predict AQI
Compare with actual AQI
Calculate error
Update weights
Repeat
Σ(predicted − actual)² / n)Weights updated iteratively to minimize this error across training epochs.
08 / Results & Evaluation
Results & Evaluation
The twin model validates against observed AQI at approximately ±15 error — confirming that source contributions are mapped to realistic pollution levels.
| Actual AQI | Predicted AQI | Difference |
|---|---|---|
| 250 | 263 | +13 |
| 310 | 296 | −14 |
| 400 | 415 | +15 |
Twin Model Accuracy
±15 AQI error confirms the digital twin reproduces observed pollution levels from combined source inputs.
Stronger With Complete Data
Source attribution improves when all contributing-element inputs are timely and complete.
Source Contribution
Identifies which contributing elements are driving AQI and estimates their relative impact on pollution levels.
09 / Limitations & Next Steps
Limitations & Next Steps
Outlier events, spatial variation across monitoring stations, limited construction data, indirect human-activity proxies, and API dependency define the current model boundary and shape the next phase of research.
Current Limitations
- Outlier events can cause higher error
- Local station sensitivity — AQI varies by location
- Limited construction data availability
- Human activity estimated via proxies
- API dependency — delays or missing data
- Nonlinear pollution interactions
Future Research Upgrades
- Station-level localized data
- Improved construction & road-dust datasets
- Real-time traffic feed integration
- Random Forest, XGBoost, Neural Networks
- School-facing AQI awareness dashboard
- Framework adaptation to other cities
- Gradient Descent assumes approximately linear feature–AQI relationships in the initial model.
- Feature scaling and normalization applied before training.
- Train/test split used historical time-series windows.
- RMSE-style error metric used for weight updates.
- Future models may capture nonlinear interactions more effectively.
10 / Research Foundation & Citations
Research Foundation & Citations
This research is supported by published work on Delhi AQI forecasting, machine-learning-based PM2.5 prediction, crop-residue burning, winter pollution episodes, Diwali/firecracker effects, source apportionment, and the health/economic burden of air pollution.
Research foundation built on 18 high-value sources
| Source | Supports | Gist | Link |
|---|