Accuracy of Grib weather data

.............

It would be very helpful to me to have some indication of what the met office thinks the accuracy/probability is.

First, the GFS, the usual source of GRIBs is not the Met Office, it is NOAA. The reliability factor, or whatever, will vary with location and length of the forecast period. That is feasible in principle but impossible in practice for too many reasons to rehearse here.

My only advice, if you are wanting some indication of confidence over the next few days, is to look at D + 8, D + 7, D + ^, D + 5 on successive days. If they are consistent, there is a good chance that the next 5 days will be well predicted. If the differ greatly, the next few days will be uncertain. All, of course in general terms.

Forecasts for the next couple of days will generally be good – in general terms. Detail of typical size less than about 100 km is unlikely to be good. Detailed short period forecasts eg on the met Office App may be useful for rain and other weather parameters for much of the next 24 hours or so. Wind detail will always be problematical. Use GRIBs, meso -scale forecasts – Met Office and HIRLAM particularly as they are free, GMDSS texts PLUS your own nous. The last named is of fundamental importance when it comes to interpretation of any forecast in the short term and for a small area.
 
First, the GFS, the usual source of GRIBs is not the Met Office, it is NOAA.

well, first, I did not mention gribs or gfs. and second, I am American, and for me NOAA IS the met office.

The reliability factor, or whatever, will vary with location and length of the forecast period. That is feasible in principle but impossible in practice for too many reasons to rehearse here.

Incorrect. If we are talking about gribs . . . They could run with various perturbation in the initial conditions and in the model coefficients (a 'super ensemble') and in each grib square for each period provide some indication in the variance. They also could do exactly as you suggest below - and indicate in each grib square at each time how stable the data has been over the past several model runs. Those two approaches are both very possible/practical to do, and would provide useful information. If we are not talking about gribs but about human assisted forecasts, the forecaster could again put in a confidence indication.

My only advice, if you are wanting some indication of confidence over the next few days, is to look at D + 8, D + 7, D + ^, D + 5 on successive days. If they are consistent, there is a good chance that the next 5 days will be well predicted. If the differ greatly, the next few days will be uncertain. All, of course in general terms.

Yes, of course, and also looking several different models gives some further indication of 'confidence'.

....
 
Last edited:

Agree that an ensemble can be and is run, but the area where the perturbations are introduced would be different for different areas of interest. You would have to run several ensembles.

Then, you might well get different probabilities for different lengths of forecast. Starting from T=0' you would have a different probability for day 8 than for day .

All that makes it possible but nigh impossible just now.

PS. I have never heard an American refer to NOAA as the Met Office, but I take your point.
 
.....

All that makes it possible but nigh impossible just now.

I really don't understand /agree at all that this would be 'nigh impossible'. Rather it seems technically simple, with the question being more whether NOAA thinks an analytic confidence measure is worth the incremental cost..

There are three ways you could do it, in increasing difficult, but none at all 'impossible'.

Say we have a global model. Let's just consider one single grid square, and one single output for that square (lets say mslp), for 40 forward time periods ( T0, T+3hr, ..., T+120hrs).

1. Now for the first simplest confidence measure lets simply look at the past 40 model runs output for that square. For the current T0 output you have 40 prior forecasts, for the current T3 you have 39 prior forecasts, etc. Calculate a weighted variance (weighting the more recent ones more highly) of those 'similar-time' matched outputs. That would give you a 'confidence' measure for that grid square for each of the forward time periods as a new grib data layer. In the user interface it could be displayed as a color - blue = low variance, red = high variance. Easy as pie to do, with all the data right there in one place. You just need to run a lot of arithmetic on a lot of data, something supercomputers are very good at, and it could be run 'off line' from the model.

2. Slightly more complex but still very doable, would be to do an 'ensemble variance'. You would take your global initialization data and create (say) 40 different versions with small random perturbation to each and every cell (or perhaps to random cells or perhaps to 'key' cells), and take your key model coefficient matrix and create 40 different versions with small random perturbations to each coefficient, and then make 40 model runs randomly matching initialization with coefficient matrices. You now have 40 forecast outputs for each single cell for each time period. Again calculate a variance for that cell for each time period and you have a confidence measure that could be a new grib layer. Again all the data is right there in one place, but this technique requires full model runs (not just some simple arithmetic as in #1 above) so more supercomputer run time.

3. Finally, take the output from say the 10 best world models and re-grid them to a common grid (of the model you want to publish the confidence layer on) and you now have 10 data points for each grid square. Again do some sort of variance. This is more difficult for human reasons - because the data is not all in the same place or controlled by the same organizations so you would need to negotiate. But computationally it is not much more difficult than #1 above.

And you could do some sort of combination of the above as a 'super confidence metric'.

NOAA could easily do #1 & #2 above anytime they considered it a priority. The programming would take no time at all (say a week for #1 and a month for #2) but there would be incremental supercomputer time each and every-time it was run. The question is whether analytic knowledge of the confidence level is worth that incremental model run cost. I personally would find it the data valuable but I don't know what the incremental cost level is.

The third one would require some level of coordination between the different organizations controlling the major models. Some of them already are quite friendly and if they already have 'data feeds' between each other it would not be so difficult, but if not they would have to be negotiated and built (which could take years).
 
I really don't understand /agree at all that this would be 'nigh impossible'. Rather it seems technically simple, with the question being more whether NOAA thinks an analytic confidence measure is worth the incremental cost..

These models are extremely complex and computer intensive. They run on state of the art supercomputers. Yes, running them multiple times sounds easy - but doing it in time to be useful would overwhelm the resources available.

It wouldn't be an "incremental cost" - it would be a straight multiplier for each model run. And to make the results statistically valid (Monte-Carlo simulation style) would require an enormous number of runs, to ensure that global peturbations were taken into account.

One thing that hasn't been mentioned here is that the process of ingesting new data is incredibly complex; it isn't as simple as "change the value here for the one measured". You have to take account of the fact that you've got measurements at a spot locations and extrapolate them over entire grid-cells; that is not an easy thing to do. How do you weight the values you've measured at spot locations so they represent the whole grid square? It's the inverse of the reason why the forecast is right over long distances but wrong locally.

The current ensembles only run a few times (I've seen 5 or 7), with limited variations suggested by expert knowledge in the initial parameters. Getting a global confidence parameter would be extremely costly.

I've pointed out that GRIB could carry the information - that's simply a matter of data representation, and trivial. Obtaining a reliable parameter is another matter entirely, and we are probably a few generations of super-computers away from being able to do it.
 
I really don't understand /agree at all that this would be 'nigh impossible'. Rather it seems technically simple, with the question being more whether NOAA thinks an analytic confidence measure is worth the incremental cost..............

.

If it was as easy as you suggest, I guess that somebody would have done so. Quick thoughts.

Option 1 assumes that the error distribution would be meaningful. I see no reason why that should be the case.

Option 2 overlooks my point that you would have to, say, one ensemble for W Europe, another for the E US sea board, one for the W sea board etc/

Option 3 is the most sensible but are there enough models? GFS, NOGAPS. CMC. UK Met, JMA. ECMWF. Is that a big enough sample.

But, none is really a starter because none would meet national needs. Who would pay for running these various options? .

PS. At presemt, the UK, I cannot speak for NOAA can only run the global model with a 25 km grid. That stretches their computer to its limits. Their next target will be to get more computer power so that they can get a more detailed analysis so reducing the scope for analysis errors/uncetainties.
 
Last edited:
...............................

One thing that hasn't been mentioned here is that the process of ingesting new data is incredibly complex; it isn't as simple as "change the value here for the one measured". You have to take account of the fact that you've got measurements at a spot locations and extrapolate them over entire grid-cells; that is not an easy thing to do. How do you weight the values you've measured at spot locations so they represent the whole grid square? It's the inverse of the reason why the forecast is right over long distances but wrong locally.

The current ensembles only run a few times (I've seen 5 or 7), with limited variations suggested by expert knowledge in the initial parameters. Getting a global confidence parameter would be extremely costly...........................
.

A couple of minor corrections.

The UK runs its ensembles 24 times to get a useful sample. These are “degraded,” insofar as, for example, the grid is 60 km as opposed to 25 km. See http://www.metoffice.gov.uk/research/modelling-systems/unified-model/weather-forecasting.

But, you are quite correct; the models do require enormous computer resources. The most powerful computers are puny compared to the atmosphere.

PS I forgot to say that they run a 24 mode; ensemble.
 
Last edited:
............The UK runs its ensembles 24 times to get a useful sample. These are “degraded,” insofar as, for example, the grid is 60 km as opposed to 25 km. .............

Frank - it is great that you join in these threads as it increases all our understanding of this complex subject.

I have done some modelling and analysis in the past, and one thing that always stood out was less the centre of the error bars, but much more, what thoseresults that were outside the error bars could portend. Effectively they provided a lot of the areas of risk, for both size and frequency. Thus looking at the average forecast from the model, I am always wondering what the models have been suggesting outside the error bars.
 
A couple of minor corrections.

The UK runs its ensembles 24 times to get a useful sample. These are “degraded,” insofar as, for example, the grid is 60 km as opposed to 25 km. See http://www.metoffice.gov.uk/research/modelling-systems/unified-model/weather-forecasting.

But, you are quite correct; the models do require enormous computer resources. The most powerful computers are puny compared to the atmosphere.

PS I forgot to say that they run a 24 mode; ensemble.

Thanks, Frank. I'd still call 24 runs "a few" with respect to the number of degrees of freedom, I think :) As you say, the atmosphere is a better analogue computer than any digital machine yet built :D
 
If you have, for example, a strong NW airstream behind a cold front, the average gradient might correspond to, say, 30 KN. Within that there will be areas and times when the wind is above or below 30 kn. The computer model cannot resolve these to better than about 130 to 150 km size.

I think I'm 100% clear on the significance of the resolution.

The bit I don't get is why the model is typically 1 force over above 15 knots of wind instead of higher or lower and throughout the wind strengths.

You even say above that the different areas can be above or below.




http://www.ybw.com/forums/archive/index.php/t-281075.html
 
Last edited:
Frank - it is great that you join in these threads as it increases all our understanding of this complex subject.

I have done some modelling and analysis in the past, and one thing that always stood out was less the centre of the error bars, but much more, what thoseresults that were outside the error bars could portend. Effectively they provided a lot of the areas of risk, for both size and frequency. Thus looking at the average forecast from the model, I am always wondering what the models have been suggesting outside the error bars.

I think that you may not understand the nature of NWP models. These have nothing in common with socio-economic or other such models that rely upon empirical or statistical relationships.

NWP is an initial value problem in physics. They start with the best analysis possible then apply the laws of physics - basic Newtonian type - and us forward integration step wise in 15 minute or thereabouts steps. The outcome depends on the analysis. There is no statistical history built in.
 
I think I'm 100% clear on the significance of the resolution.

The bit I don't get is why the model is typically 1 force over above 15 knots of wind instead of higher or lower and throughout the wind strengths.

You even say above that the different areas can be above or below.




http://www.ybw.com/forums/archive/index.php/t-281075.html

Effevtively it is a smoothing effect. You smooth out the peaks and the valleys.
 
My last comment on the topic since we are obviously not going to agree on this

If it was as easy as you suggest, I guess that somebody would have done so. Quick thoughts.

Option 1 assumes that the error distribution would be meaningful. I see no reason why that should be the case.

What?! If the model has predicted the exact same mslp for grid xx/yy for all the last 40 model runs, I think we can be quite confident, certiantly more so than if it was predicting very different mslp over each of the last 10 model runs. You yourself already agreed that this exact process was useful to do manually. Why would it not be useful to do globally automatically?

Option 2 overlooks my point that you would have to, say, one ensemble for W Europe, another for the E US sea board, one for the W sea board etc/

Why? Why not do a global ensemble? All we are trying to do is spot areas/features that are more sensitive to perturbations. You might be able to make the perturbations in a more smart or accurate way by area, but a global ensemble would provide useful data.

Option 3 is the most sensible but are there enough models? GFS, NOGAPS. CMC. UK Met, JMA. ECMWF. Is that a big enough sample.

Yes of course it is. Again, we already know it provides useful data when we do the comparison manually. Why would it be less so to do it automatically/globally? It certainly was useful to see the spaghetti map of hurricane sandy. BTW, you could easily incorporate regional grib models into this option.

But, none is really a starter because none would meet national needs. Who would pay for running these various options? .

Well,we can at least agree that is the key question. I personally think that this confidence data would be valuable to users. The answer to who would pay is ultimately the US defense department (and thus the US tax payer).

PS. At presemt, the UK, I cannot speak for NOAA can only run the global model with a 25 km grid. That stretches their computer to its limits. Their next target will be to get more computer power so that they can get a more detailed analysis so reducing the scope for analysis errors/uncetainties.

Yes, understood, but do realize that my approach #1 & #3 would NOT require huge amounts of computer resources. Option #2 would, except they already run ensemble analysis and calculate the mean. It would again NOT add vast resources to also calculate a weighted variance.

It wouldn't be an "incremental cost" - it would be a straight multiplier for each model run. And to make the results statistically valid (Monte-Carlo simulation style) would require an enormous number of runs, to ensure that global peturbations were taken into account.

Not for my options #1 & #3. Those do NOT require extra model runs. and my option #2 might also NOT if one can simply use the ensembles that are already being run.

One thing that hasn't been mentioned here is that the process of ingesting new data is incredibly complex; it isn't as simple as "change the value here for the one measured". You have to take account of the fact that you've got measurements at a spot locations and extrapolate them over entire grid-cells; that is not an easy thing to do. How do you weight the values you've measured at spot locations so they represent the whole grid square? It's the inverse of the reason why the forecast is right over long distances but wrong locally.

Well they have obviously already solved this problem in the ensembles they already run. And to do further perturbations you can vary directly the grid level data and not the spot location data.

we are probably a few generations of super-computers away from being able to do it.

Again, not for my options #1 & #3, perhaps or perhaps not for #2.
 
Last edited:
My last comment on the topic since we are obviously not going to agree on this

No, I doubt that we can agree.

Option 1. What I tried to say that I do not see why what happened over the past 40 days would be a useful statistical predictor for today.

Option 2. I must make the point again that forecast errors over western Europe will be related to errors in analysis over one area. The UK will run its ensembles using perturbations in that area. NOAA will use perturbations in other areas. I cannot guess at the effects of perturbing the whole system.


Option 3 is certainly a way of pointing at greater or lesser uncertainty in the same way as looking at forecasts on successive days and looking for (in) consistencies Using regional models would not increase the sample size as they are, nested in global models, taking in data around their boundaries from the global model, using the same data, the same or very similar assimilation on a finer scale. I understand that the UK is discontinuing the NAE because it tells then little if anything different from the global model.
 
I think I'm 100% clear on the significance of the resolution.
The bit I don't get is why the model is typically 1 force over above 15 knots of wind instead of higher or lower and throughout the wind strengths.
You even say above that the different areas can be above or below.

Effevtively it is a smoothing effect. You smooth out the peaks and the valleys.

I don't get why that smoothing, consistently leads to an underestimate. Surely it would lead to a value somewhere in the middle. Yes, you're missing the peaks, but you're also missing the valleys.
 
I don't get why that smoothing, consistently leads to an underestimate. Surely it would lead to a value somewhere in the middle. Yes, you're missing the peaks, but you're also missing the valleys.


See my two diagrams at
http://weather.mailasail.com/Franks-Weather/Grid-Length-Resolution.

Then remeber that any grid can only describe features of about 5 grid length size ie of size ~ 130 km. It is all too easy for the strongest winds to be not well represented.
 
Last edited:
Then remeber that any grid can only describe features of about 5 grid length size ie of size ~ 130 km. It is all too easy for the strongest winds to be not well represented.

Yes, I can see that, they may well miss the strongest localized winds due to the limitations of the resolution. Why don't they also miss the lightest winds and come up with a value somewhere in the middle?

It's the fact that all the errors in the model are one way that I'm trying to comprehend.

You say: "Within that there will be areas and times when the wind is above or below 30 kn. The computer model cannot resolve these to better than about 130 to 150 km size."

I can't reconcile that with:

"Add on at least one wind force for GFS above F3 seems to be a good rule of thumb."

For example: Imagine the grib at a location predicts 20kts. A local high wind area might have been missed giving real wind of 30kts, but surely it's equally likely that a low wind are might have been missed giving real 10kts.
 
Last edited:
Yes, I can see that, they may well miss the strongest localized winds due to the limitations of the resolution. Why don't they also miss the lightest winds and come up with a value somewhere in the middle?

It's the fact that all the errors in the model are one way that I'm trying to comprehend.

You say: "Within that there will be areas and times when the wind is above or below 30 kn. The computer model cannot resolve these to better than about 130 to 150 km size."

I can't reconcile that with:

"Add on at least one wind force for GFS above F3 seems to be a good rule of thumb."

For example: Imagine the grib at a location predicts 20kts. A local high wind area might have been missed giving real wind of 30kts, but surely it's equally likely that a low wind are might have been missed giving real 10kts.

Frank will correct me if I've got this wrong, but I think it is because the distribution of wind strengths is assymetric around the mean. In other words, the light winds observed will be nearer the mean than the strong winds.
 
Frank will correct me if I've got this wrong, but I think it is because the distribution of wind strengths is assymetric around the mean. In other words, the light winds observed will be nearer the mean than the strong winds.



Apologies for the delay in replying, we have been travelling.

Thank you, yes. The distribution of winds speed is asymmetric.

Remember that the atmosphere is not precise; forecasts are not precise; neither are many statements about forecasts, including mine.

Wind, for several reason upon which I could expend, is extremely variable. You only have to sail a few miles just about anywhere to see that. If a GMDSS forecast says F4-5, I assume that I will have spells of F3 and F6.

Because of the use of a grid for NWP there is some inevitable smoothing. When I look at GRIBs, bearing in mind that my wife and I are in our late 70s (very in my case), my concern is to avoid being out in an F8 and, preferably, a F7 also. If I see the GRIB saying F6, then I know that there is a good chance of having some F7-8. Stan Honey, a VOLVO RTW winning navigator, always says that he automatically adds 20% to speeds generally and 25% in the Southern Ocean. I am not sure why he adds more there but that is what he finds.

These are rules of thumb. I cannot defend the figures, except to say that is how it seems to work out in my and other people’s experience. It is pretty obvious in a qualitative manner why that is the case.

Using weather forecasts is a mixture of commonsense, nous and pragmatism. Nothing wrong with any of those.
 
Top