← Back to blog

I Tried to Predict Singapore's Rain and It Humbled Me

In a country with three weather modes – hot, hotter, and raining sideways – I built a real-time weather dashboard with LightGBM rainfall forecasting, SHAP explainability, animated radar, and 8 years of tropical weather data. Now with interactive charts so you can explore the data yourself.

Tech Stack
Python FastAPI React 18 LightGBM SHAP Leaflet Tailwind CSS PostgreSQL APScheduler Railway

Why build a weather app?

Honestly? Because I wanted a playground to learn a bunch of things at once. Not just "call an API and show a number" but the full loop: real-time data ingestion, exploratory data analysis, feature engineering, ML training, model explainability, frontend visualisation, and deployment. A weather app turned out to be the perfect vehicle because the data is free, always updating, and everyone has an opinion about whether the forecast is right.

The idea started at NUSS, drinking Ang Moh Liang Teh when the sky opened up and we got completely stuck in the rain. Apple Weather had been showing "Cloudy" all day — no warning, no umbrella, just vibes and regret. NEA's forecasts are actually solid, but the default weather app on our phones didn't quite catch this one. We decided right there that we had to take things into our own hands — if we could predict the rain even a few hours out, we'd know whether to grab an umbrella or when it's safe to head back. Predicting rain in Singapore is genuinely hard, and I wanted to see how far I could push a gradient-boosted model against that chaos.

This project was also a chance to get serious about EDA — not just df.describe() on a toy dataset, but actually understanding 8 years of messy real-world sensor data.

Try it yourself

Head to lionweather.kooexperience.com. Drop a pin anywhere on the Singapore map and the dashboard shows you:

Your location never leaves the browser. Everything is stored in localStorage. I'm not tracking you; I just want to show you if it's going to rain.

⚠️
Not a replacement for NEA. This is a learning project. The ML model is decent but definitely not perfect. If you need to decide whether to bring an umbrella, check both and use your own judgment.

How it all fits together

LionWeather is a two-service setup on Railway: a FastAPI backend and a React frontend. They talk through a Vite proxy in dev, and Railway handles routing in production.

┌─────────────────────────────────────────────────────┐
│  Backend (FastAPI + Uvicorn)                        │
│                                                     │
│  17 API routers                                     │
│  ├── /api/weather         Current conditions        │
│  ├── /api/forecasts       4-day + hourly            │
│  ├── /api/ml/rain-forecast  ML predictions          │
│  ├── /api/ml/full-analysis  EDA + SHAP + benchmarks │
│  ├── /api/radar           Animated rain imagery     │
│  └── /admin/*             Retrain, export, health   │
│                                                     │
│  APScheduler (background jobs)                      │
│  ├── Every 10 min   → Collect NEA observations      │
│  ├── Every 1 hour   → Collect official forecasts    │
│  ├── Every 2 min    → Fetch radar frames            │
│  └── Sunday 2 AM    → Retrain ML model              │
│                                                     │
│  PostgreSQL (Railway) / SQLite (local)              │
└─────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────┐
│  Frontend (React 18 + Vite + Tailwind)              │
│                                                     │
│  ├── Interactive Leaflet map with pin placement     │
│  ├── Detailed weather cards (feels-like, UV, wind)  │
│  ├── ML forecast comparison panel                   │
│  ├── Animated radar layer                           │
│  ├── Full EDA + SHAP analysis dashboard             │
│  └── Browser notifications on rain start/stop       │
└─────────────────────────────────────────────────────┘

Yes, 17 routers is a lot. The project grew organically as I kept adding features. If I were starting over I'd probably consolidate a few, but honestly each one does one thing and they're easy to find, so I'm at peace with it.

Collecting the data

The backend polls multiple data sources on different intervals:

Everything normalises to a WeatherRecord dataclass before hitting the database. There's bounds checking too: if the temperature comes in at 60 degrees C or rainfall is negative, something went wrong and the reading gets flagged. Singapore is tropical but not that tropical.

# Validation bounds (Singapore tropical range)
TEMP_MIN, TEMP_MAX = 15.0, 42.0     # °C
HUMIDITY_MIN, HUMIDITY_MAX = 20, 100  # %
WIND_MAX = 60.0                       # km/h
# Plus 3-sigma statistical outlier detection

The database uses a unique constraint on (timestamp, country, location) with upsert logic, so duplicate polls are harmless. This matters because sometimes NEA returns the same data twice, and I'd rather handle that at the DB level than add brittle deduplication logic everywhere.

Meet the data

Before we get into the heavy statistical analysis, let's look at what the data actually feels like. The charts below are rendered from LionWeather's live analysis API — the same data that powers the ML dashboard.

Start with the hourly pattern. Singapore's rainfall has a very distinctive daily rhythm that anyone living here knows instinctively: mornings are usually dry, and the afternoon hours bring the thunderstorms. The data confirms it.

Hourly patterns

Average by Hour of Day
8 years of hourly observations (2016–2024)
Loading chart data...
🔎
What to look for: Rainfall peaks sharply between 3–6 PM — that's Singapore's afternoon convective thunderstorm window. Temperature mirrors this: it climbs through the morning, dips when the rain hits, then slowly cools into the night. Humidity inverts temperature almost perfectly.

Annual rainfall trends

How much does it actually rain each year? And is it getting worse? The annual totals tell a story of variability — some years are significantly wetter than others, driven by El Niño / La Niña cycles and shifting monsoon patterns.

Annual Rainfall Totals
Total recorded rainfall per year in mm
Loading chart data...

EDA: learning what the data actually looks like

If you're new to time series analysis, this section is for you. EDA (Exploratory Data Analysis) is the step where you look at your data before building any model. It sounds obvious, but skipping it is one of the most common reasons ML projects go sideways.

I had 8 years of NEA historical data (2016 to 2024) and I wanted to understand it properly before throwing it at a model. The training script runs a full statistical workup and saves everything to a full_analysis.json that the frontend renders as an interactive dashboard. Here's each technique, what it means, and what it revealed.

Step 1: Descriptive statistics

Start simple. Before any fancy analysis, compute the basics: mean, min, max, standard deviation, and percentile distributions. For LionWeather, this meant looking at annual rainfall totals, percentage of rainy hours, breakdown by intensity category (no rain, light, heavy, thundery), temperature ranges, and humidity.

What I found: In 2017, Singapore recorded 3,198 rainy hours (36.7% of all hours) with 790 thundery events. Mean temperature across years hovered around 27 to 28°C, with max readings touching 33 to 34°C. Average humidity sat around 78 to 81%. These numbers set your baseline expectations before you go deeper.

If you want to learn more about starting with descriptive statistics, this Towards Data Science guide walks through a solid six-step EDA framework for time series.

Step 2: Time-series decomposition (STL)

What is STL? STL stands for Seasonal and Trend decomposition using Loess. It splits a time series into three components: trend (the long-term direction), seasonal (repeating patterns at fixed intervals), and residual (everything left over that the first two can't explain). Think of it like separating a song into bass (trend), melody (seasonal), and noise (residual).

Why it matters: If the seasonal component is strong, your model can exploit those repeating patterns. If the residual is large, your data has a lot of randomness and your model will struggle. STL tells you upfront how predictable your data is.

STL Decomposition
Monthly data decomposed into trend, seasonal, and residual components
Loading chart data...
🔎
What to look for: Look at the seasonal component — the peaks align with the northeast monsoon (Nov–Jan). The trend line reveals whether Singapore is getting wetter or drier over time. The residual shows pure randomness: the larger it is relative to the other components, the harder prediction will be.

What I found: The seasonal component was crystal clear. The northeast monsoon (November to January) brings sustained rain, while the southwest monsoon (May to September) brings shorter, more intense afternoon showers. But the residual was large, meaning rainfall has a lot of unexplained variance. This told me early on: don't expect 90% accuracy. Tropical rain is inherently noisy.

I used the statsmodels STL implementation. For a more detailed walkthrough of decomposition techniques, Sandeep Pawar's forecasting series covers STL alongside other decomposition methods with code examples.

Step 3: Autocorrelation (ACF and PACF)

What is ACF? Autocorrelation Function (ACF) measures how correlated a time series is with itself at different lag intervals. If the ACF at lag 1 is high, it means the current value is strongly related to the value one time step ago. PACF (Partial Autocorrelation) does the same but removes the indirect effects of intermediate lags, giving you the direct relationship at each lag. Think of it like asking a chain of people how accurately a message reaches person N — ACF measures the total distortion, PACF measures how much each individual person adds.

Why it matters: ACF and PACF tell you which lag features are worth creating. They also help determine the order of traditional time series models (AR, MA, ARMA). Even if you're using gradient boosting like I did, understanding autocorrelation guides your feature engineering.

ACF / PACF
Autocorrelation and Partial Autocorrelation with 95% confidence bands
Loading chart data...
🔎
What to look for: Notice how rainfall correlation drops to near-zero by lag 6–8. That's the "memory" of Singapore rain — after about 6 hours, the current rain tells you almost nothing about the future. Temperature shows strong periodicity at lag 24 (one full day cycle). Bars exceeding the dashed confidence lines are statistically significant.

What I found: Rainfall's ACF showed significant correlation at lag 1 (0.55) that dropped sharply to 0.20 at lag 2 and faded to near-zero by lag 6 to 8. This means: if it's raining now, it will probably still be raining in an hour (lag 1), maybe in two hours (lag 2), but by six hours the signal is gone. Temperature showed strong 24-hour periodicity, which makes sense since days are warm and nights are cool, even in Singapore.

These results directly informed my lag features: I created rainfall lags at 1h, 3h, 6h, and 24h. The ACF told me that anything beyond 6h for rainfall is mostly noise. For a deeper dive into reading ACF/PACF plots, this Towards Data Science article explains the interpretation with visual examples.

Step 4: FFT spectral analysis

What is FFT? Fast Fourier Transform converts your time-domain data into the frequency domain. Instead of asking "what happened at each time step?" you're asking "what cycles exist in this data and how strong are they?" It's like using a prism on sunlight — it separates the signal into its constituent frequencies. FFT is foundational to audio processing, signal analysis, and many areas of engineering.

Why it matters: FFT reveals hidden periodicities that might not be obvious from just plotting the raw data. Annual cycles, weekly patterns, or diurnal (day/night) rhythms all show up as distinct peaks.

FFT Power Spectrum
Dominant cycles in the data, log scale
Loading chart data...
🔎
What to look for: The tallest peak at ~24 hours confirms the diurnal cycle. The secondary peak at ~180 days captures monsoon transitions. Annual peaks (~8,760 hours) reflect the yearly monsoon cycle. Each peak is a "frequency" the model can learn from.

What I found: The diurnal (~24h) cycle was the dominant peak for rainfall, driven by afternoon convective storms. Longer-period peaks corresponding to monsoon transitions also appeared in the spectrum. Toggle to temperature to see an even sharper 24-hour spike. These periodicities validated my decision to include time-of-day encoding and monsoon flags as features.

All of this lives in the ML Analysis tab of the app. I wanted it visible, not buried in a Jupyter notebook, because the whole point was to learn how to present EDA findings in a way that's useful to anyone curious. You can explore the charts yourself at lionweather.kooexperience.com.

📚
Want to learn time series EDA from scratch? I recommend starting with this practical guide on Towards Data Science, then working through STL and ACF/PACF on your own dataset. The best way to learn is to pick a dataset you actually care about.

Stationarity: can we even model this?

Before throwing data at a model, there's a fundamental question: is the data stationary? A stationary time series has a constant mean and variance over time. If it's not stationary, many statistical techniques fall apart, and even ML models can be fooled by drifting distributions.

The Augmented Dickey-Fuller (ADF) test checks this. It tests the null hypothesis that the series has a unit root (non-stationary). A very negative test statistic and a small p-value (< 0.05) means we can reject the null and conclude the data is stationary. Good news: it means the patterns we see are stable enough to learn from.

Loading stationarity data...
🔎
What to look for: All four variables should pass the ADF test (p-value < 0.05). The more negative the test statistic compared to critical values, the stronger the evidence for stationarity. This is good — it means the patterns we extracted from 8 years of data should remain valid for future predictions.

Feature engineering

Raw sensor readings aren't enough. The model needs context. Here's what I engineered from the base data:

Temporal features

Why sin/cos encoding? If you feed "hour = 23" and "hour = 0" to a model as raw integers, it thinks they're 23 units apart. But in reality, 11 PM and midnight are one hour apart. By encoding time as sin(2π × hour/24) and cos(2π × hour/24), you preserve the circular nature of time. Same idea for day of year. This is a standard trick in time series ML and it genuinely improves model performance.

I also added monsoon flags: NE monsoon (November to January) and SW monsoon (May to September). These are binary features that tell the model which seasonal regime it's operating in.

Lag features

What are lag features? A lag feature is simply a past value of a variable used as input. rainfall_lag_1h is "what was the rainfall one hour ago." The ACF analysis told me which lags carry signal (1h, 3h have strong correlation; beyond 6h it fades). So I created lags at 1, 3, 6, and 24 hours.

The data leakage trap: This is the most common mistake in time series ML. If you don't use .shift() correctly, your lag features can accidentally include future data. I learned this the hard way when my first model hit 95% accuracy and I got suspicious. Sure enough, it was peeking into the future. After fixing the alignment, accuracy dropped to ~73%. That's the real number. Painful, but honest.

Thunderstorm indicators

Singapore's rain often comes from convective thunderstorms that build up in the afternoon. I engineered features to catch the warning signs:

Spatial features

Latitude, longitude, distance from CBD (Haversine), and a coastal flag. Coastal stations tend to get different rain patterns due to sea breeze convergence.

The star feature: dry spell hours

How many consecutive hours without rain before the prediction window. This ended up being the most important feature by LightGBM's built-in gain metric, and ranks in the top 5 by SHAP. Makes intuitive sense: a long dry spell in tropical Singapore often means conditions are building toward a release.

Training the model

I went with LightGBM over XGBoost or random forests for a few reasons: it handles categorical features natively, trains fast on large datasets, and has good support for multi-class classification. Also, I'd used XGBoost before and wanted to try something different. Learning was the whole point.

Data split

This matters more than people realise. A random split would leak temporal patterns. I used a strict year-based split:

No shuffling. No interleaving. If your lag features reference data from 3 hours ago, and that 3-hour-ago sample is in the test set while you're training, congratulations: you've just built a time machine, not a model.

Two types of models

I trained both regression (predict rainfall amount in mm) and classification (predict rain category) models. The classification models use a 3-class scheme for the app's UI:

And a 4-class scheme for benchmarking against NEA's own forecast categories, which separates heavy rain from thundery showers.

Four prediction horizons

Separate models for 1h, 3h, 6h, and 12h ahead. Each was trained on 48,687 samples, validated on 8,296, and tested on 8,765 (the held-out 2024 data). As expected, accuracy drops with the horizon:

Loss curves

Every model's training story is told by its loss curves. These show how the model improved over training rounds, and more importantly, whether it started overfitting (when validation loss starts climbing while training loss keeps falling).

Training Loss Curves
Classification (multi_logloss) — train vs. validation
Loading chart data...
🔎
What to look for: The gap between train and validation curves indicates overfitting. A small, stable gap is healthy. If the val curve diverges upward while train keeps dropping, the model is memorising noise. Early stopping (500 rounds with patience) prevents this.

Reading the confusion matrix: The 1-hour model correctly identified 4,616 of its "no rain" predictions but misclassified 905 light rain samples as "no rain." It's conservative: it would rather tell you it's dry than cry wolf. Recall for heavy rain is lower since these events are rarer. Class imbalance is a real challenge — most hours in Singapore are technically dry.

The 12-hour model is barely better than a coin flip. In Singapore's climate, that's not surprising — a thunderstorm can form and dissipate in 30 minutes. Some things aren't meant to be predicted 12 hours ahead.

Hyperparameters

# LightGBM config (classification)
params = {
    "objective": "multiclass",
    "num_class": 3,
    "learning_rate": 0.05,
    "num_leaves": 63,
    "min_child_samples": 50,
    "feature_fraction": 0.8,
    "bagging_fraction": 0.8,
    "bagging_freq": 5,
    "reg_alpha": 0.1,      # L1
    "reg_lambda": 0.1,     # L2
    "n_estimators": 500,
}

Nothing exotic. Conservative regularisation to avoid overfitting on 8 years of data. The model artifacts range from 90 KB (12h) to 14 MB (1h classification), small enough to commit to git and load at startup.

SHAP: why did you predict that?

What is SHAP? SHAP (SHapley Additive exPlanations) is a method from game theory that explains individual predictions. Instead of just saying "the model predicts heavy rain," SHAP tells you why: "the model predicts heavy rain because humidity is at 95%, there hasn't been rain for 8 hours, and wind just accelerated." Each feature gets a score showing how much it pushed the prediction up or down. It turns a black box into something you can actually reason about.

Why it matters: If your model is wrong, SHAP helps you figure out why. Maybe it's relying too heavily on a leaky feature, or ignoring something important. Without explainability, you're just trusting a number. With SHAP, you can validate whether the model's reasoning makes physical sense.

SHAP Feature Importance (1-Hour Regression)
Top 15 features by mean |SHAP value| — higher means more influence
Loading chart data...
🔎
What to look for: The top features should make physical sense. Spatial rainfall variation (rain_spatial_std) dominates because localised showers strongly predict nearby rain. Pressure features rank surprisingly low — in the tropics, spatial patterns trump barometric signals.

Here are the top features from LionWeather's 1-hour regression model, ranked by mean absolute SHAP value (higher means the feature has more influence on predictions):

  1. rain_spatial_std (6.65) – how unevenly rain is distributed across stations. High variance means localised showers, which strongly predicts nearby rainfall.
  2. rain_max_station (3.43) – the heaviest rainfall at any single station. If one station is getting hammered, neighbours are next.
  3. rain_west (3.08) – rainfall in western Singapore. Weather systems often approach from the west, so this acts as an early warning.
  4. rain_region_max (1.96) – peak rainfall across all regions.
  5. dry_spell_hours (1.76) – how long since the last rain. A long dry spell in tropical Singapore often means conditions are building for a release.

What surprised me: pressure features ranked much lower than I expected. In temperate climates, a falling barometer is the go-to rain predictor. In the tropics, spatial rainfall patterns and humidity dynamics matter more. Singapore's pressure variance is tiny compared to, say, London's. The data showed me this; I wouldn't have guessed it.

The monsoon flags also appear in the importance rankings, suggesting the model distinguishes between monsoon regimes. Combined with the strong showing of spatial features (rain_west, rain_spatial_std), this indicates the model picked up on the directional nature of Singapore's weather systems without me explicitly encoding it.

For a deeper introduction to SHAP in the context of time series, this Towards Data Science article on SHAP for time series covers the key concepts and pitfalls.

Results: confusion matrices and benchmarks

Numbers are nice, but a confusion matrix tells the full story. It shows not just overall accuracy, but what kinds of mistakes the model makes. Think of it like a fire alarm — high recall catches every fire but may give false alarms (annoying but safe), while high precision means every alarm is real but you might miss some fires (dangerous).

3-Class confusion matrix

Confusion Matrix (3-Class)
No Rain / Light Rain / Heavy + Thundery
Loading chart data...
🔎
What to look for: Strong values along the diagonal mean correct predictions. Off-diagonal values are errors. Notice how the model gets more "confused" at longer horizons — the diagonal weakens and off-diagonal spreads. The model is conservative: it rarely predicts heavy rain when there's none, but sometimes misses light rain.

NEA benchmark: ML vs. the national forecast

The real test: how does a gradient-boosted model compare to Singapore's official weather agency? NEA uses numerical weather prediction (NWP) with satellite data, doppler radar, and human meteorologists. My model uses 8 years of historical sensor data and some clever features. I also tested a 60/40 ensemble blend of ML and NEA.

NEA vs. ML vs. Ensemble Accuracy
3-class accuracy comparison on held-out test period
Loading chart data...
🔎
What to look for: If the ensemble bar is taller than either individual model, that means the two approaches complement each other — they make different kinds of mistakes. An ensemble that beats both components is a strong signal that the models capture different aspects of the problem.

The radar overlay

This was the "fun feature that turned out to be annoying to build" part. NEA publishes radar imagery on weather.gov.sg, but there's no official API. The backend fetches the images directly from their CDN, caches them with a 120-second TTL, and serves them to the frontend.

The Leaflet map overlays these frames as an animation, so you can watch rain systems move across the island. The bounds are hardcoded to Singapore's coordinates: 1.155°N to 1.475°N, 103.565°E to 104.130°E. There's a 500ms throttle between image fetches to be polite to their server.

It's one of those features that took a disproportionate amount of time but makes the app feel alive. Watching a rain blob drift toward your location on the map is weirdly satisfying, even if it means your laundry is about to get wet.

Frontend: making it feel like a real app

The frontend is React 18 with Vite, Tailwind, Leaflet for maps, and Recharts for the EDA visualisations. A few things I'm proud of:

NEA area snapping

When you drop a pin, it snaps to the nearest official NEA neighborhood (Ang Mo Kio, Tampines, etc.) via lat/lon distance matching. This means the weather readings actually correspond to a real station, not some random point in a park.

Sun and moon arc

Calculated client-side using SunCalc. No API needed. The card shows a live arc with the sun's current position, golden hour, and after sunset flips to show the moon and tomorrow's sunrise. A small detail, but it makes the dashboard feel polished.

Rain notifications

The app sends browser notifications when rain starts or stops at your saved locations. Crucially, it only fires on state transitions with a 1-hour cooldown. Nobody wants 50 notifications during a patchy drizzle.

The ML analysis dashboard

This is the EDA nerd section. It renders the entire full_analysis.json as interactive charts: annual rainfall breakdown, STL decomposition, ACF/PACF with confidence intervals, FFT spectrograms, SHAP waterfall plots, confusion matrices by horizon, precision/recall/F2 scores, and sortable NEA benchmark tables.

Is it too much? Probably. But this was the whole point of the project: to take EDA from a notebook and put it in front of anyone who's curious. Not everyone cares about SHAP waterfall plots, but everyone wants to know if they should bring an umbrella. The app works for both audiences.

Deployment on Railway

Two services on Railway: backend on a Python buildpack, frontend served through Vite's preview mode. A few things I learned:

Lessons learned

This project taught me more than any course I've taken. Here are the big ones:

EDA is not optional, it's the job. Before I did proper EDA, my first model was garbage. After spending a week on decomposition, autocorrelation, and feature analysis, the second model was meaningfully better. Not because I used a fancier algorithm, but because I understood the data.

Data leakage is sneaky and will flatter your metrics. My first model hit 95% accuracy. I was thrilled for about ten minutes, then realised my lag features were leaking future data. After fixing the split and using proper .shift() alignment, accuracy dropped to ~75%. That's the real number. It hurt, but it's honest.

Tropical weather is humbling. Convective thunderstorms can form in 15 minutes and vanish in 30. A model trained on 8 years of data still only hits ~59% at the 12-hour horizon. To be fair, if Singapore's weather were predictable, we wouldn't all carry umbrellas in our bags 365 days a year. Sometimes the best answer is "I don't know, check the radar."

SHAP makes you a better engineer. When you can see that your model relies on dry_spell_hours more than pressure, you start questioning your assumptions. The features that "should" matter don't always match the features that do. Let the data tell you.

Ship the messy version. LionWeather started as a single-page app with three API calls. Now it has 17 routers, 4 prediction horizons, animated radar, and a full EDA dashboard. None of that would exist if I'd waited until it was "ready." I shipped early, showed friends, got feedback, and iterated. That's the only way to build something real.

If you made it this far, go open the app and see what Singapore's weather is doing right now. And if the model says "no rain" and it's pouring outside your window, well, welcome to Singapore. The weather here has been ignoring forecasts since before machine learning existed. At least my wrong predictions come with SHAP explanations.


References