Weather forecasting

March, 2019

Edit from the future, December 2025: When I thought up this project, I had no idea that around this time, this is what some of the bigger weather forecasting sources actually started to do. It’s now nearly mainstream to do this type of consensus modeling for weather forecasting – so below is a naive investigation into something that, unbeknownst to me, would indeed become commonplace.

Can I make a more accurate temperature forecasting model than the leading sources, by just combining them together in different ways?

This project provided a proof-of-concept that says, yes! (potentially). That said, it wasn’t especially scientific, the data is somewhat biased, and I only tested for accuracy with regard to daily high/low temperatures. Even so, it may pave the way for something interesting in the future…

I tracked ~5 weeks of 7-day high/low temperature forecasts from 6 major weather forecasting services: AccuWeather, DarkSky, the National Weather Service (NWS), OpenWeatherMap, Yahoo Weather, and MyWeather2. I also retroactively recorded each day’s actual, observed high and low temperatures.

I calculated the actual and absolute differences between each source’s prediction and the observed temperatures. I trained a few basic linear regression models on part of the data and recorded the error for those in testing as well, but this was mainly just to have another reference—the main “model” I was curious about was a simple average between all sources’ forecasts.

The results

Out of the 6 original sources, surprisingly Yahoo had the most accurate high/low weekly temperature predictions. OpenWeatherMap was pretty bad, especially when there were large deviations from the average temperature – like on Feb 4 when the observed temp got up to 61°F but they’d put the high at 39°F. The top performing model, however, based on absolute error, was one of my simple averages – an average between Yahoo, DarkSky, and the National Weather Service’s temperature forecasts!

This may not be the most surprising outcome, given that I’m selectively averaging the best-performing sources and ranking based on lowest absolute error. But it does tell us that averaging several good weather sources can produce a better prediction, not just an average-of-the-bests prediction. Additionally, I should note that my average performed the best for this small sample for 5 weeks between February and March in Somerville, Massachusetts, so it is by no means conclusive. But it is potentially a proof-of-concept for something larger – simplicity wins in this case, but what if I ran the same test for 50 major cities across the US, for a whole year? The regression (and few others I played around with) models would likely perform much better given the increase in training data, being able to consider differences in source accuracy by region, time of year, combining models for high-temp vs low-temp, and so on. Many of these weather services have APIs you can easily plug into, several endpoints of which I wrote python scripts against to test out.

And what about doing the same for forecasted precipitation? Cloud cover? With enough time and resources, I believe it could be possible to create a model that predicts basic weather patterns slightly better than the leading sources, by simply selectively combining their own predictions.

I’m still impressed with Yahoo, though.

⠀⠀