DON’T SEARCH TO THE RANGE OF THE VARIOGRAM

How many times do we need to stumble over this one folks? I was just reading yet another report of a project failure linked to poor resource estimation where someone (who called themselves a Competent Person) set their search distances to the range of the variogram.

I’m thinking this is in part a problem of language, definitions and lazy software development.

Language because we use the term ‘range’ when modelling and describing variograms and we use the same term when we are talking about search distances. Same term but two totally different meanings and uses.

Lazy software because so many developers set the default search range to the variogram range (misunderstanding the language issue). Guys, guys, guys…. Just leave the search range empty – don’t set a default. Force the user to at decide and stop pandering to the lowest common denominator. I mean if you can’t even independently think about the parameters in your model you shouldn’t be creating the model in the first place.

OK, let’s dig into the variogram and the meaning of variogram range. The range of a variogram is the distance in a direction beyond which sample pairs have no correlation. For example, if your north-south variogram has a range of 100m then for distances less than 100m a pair of samples/data points will have some degree of correlation. Beyond 100m the data points are not correlated. This however, has nothing to do with the search distance. It is ok, and in fact often necessary, to search beyond (or less than) the range of the variogram.

Consider this simple example and logic

Take a variogram with a long range. What does that mean? It means that pairs of data are well correlated (i.e. similar) for long distances. Therefore, how far do you need to search to make a good estimate of a block value? Well, if the samples are all similar for long distances then you really only need to search over a short range – after all, if you use a very large search you are not adding much to the quality of the estimate.

In contrast look at a variogram with a short range. That means that even nearby data is not correlated, so how on earth do you make a meaningful estimate? The easiest way to envision this is to consider a variogram with 100% nugget (i.e. a zero range). In that case, the ‘best’ estimate is to average the grades of all the samples in the domain – there is no spatial correlation so there’s not much else you can do. How do you average the grades in the entire domain? Well you need to search the entire domain. Your short range variogram means you need a long search distance.

 

Pure nugget or highly continuous – how far should you search?

It’s easy to be fooled. The sample correlation you are modelling in your variogram tells you about how similar your samples are at a certain distance in a certain direction. That is not the same as telling you how far you should look to make a good estimate. The ‘goodness’ of your estimate is entirely dependent on what you plan to use it for. It is a global estimate where you are only interested in the big picture? Is it a local-scale estimate where you are making ore/waste allocations on a block-by-block basis? How does the scale of interest compare to the selective mining unit (SMU) etc. etc..

The search range and strategy also depends on your sample spacing and sampling configuration. Does anyone else out there remember the old DOS program ‘Playkrige’? It was a fantastic learning tool. A 2D kriging engine where you could adjust the block size or the sampling pattern or the variogram and see what happened. It would show the change in weights and other metrics and gave a great indication of what happened in different scenarios. The most important lesson? All that modelling and the differences were totally independent of the sample values. All that matters are the variogram, the block size and the search strategy.

There tends to be two schools of thought about search strategy and search range. The one school seeks to minimise conditional bias albeit by smoothing the estimate. The other says to hell with local conditional bias, I want to mimic the grade trends I see in my data. In a high nugget environment these two schools trend to be diametrically opposed. The first calls for a wide search. The second calls for a much shorter search. The impact? The first school gives smoother estimates – more samples used from greater distances gives results that approach the average of the domain. The second school tends towards a moving window average – samples are averaged within the confines of the smaller search window (remember it’s high nugget so the sample weights will be more similar to each other than a low nugget domain).

On top of all the confusion and differences of opinion about search ranges we have additional parameters we can apply. Arguably settings that limit the maximum number of samples allowed to inform a block can have more impact than the search range itself – particularly in a grade control setting or where there is dense sampling. In fact, there’s a third approach to the search range…. Don’t worry about the range. Set it to a large number and then apply a maximum number of samples restriction. That way you always use the same number of samples in estimating each block and the distance to the samples varies depending on the sampling density.

And there are even more ways you can modify the search and the samples used to estimate blocks

We have octant settings designed to spatially decluster the data (these are largely a hang-over from IDW estimates but still get used today). You can specify the number of octants that must contain samples and the minimum/maximum numbers of samples in each octant.

We have the ability to restrict the number of samples from any one drill hole (or other index/key value).

We have the ability to restrict the range of influence of ‘outlier’ samples.

Octants, key field restrictions and spatial restrictions all have the same impact. They force more distant samples to be used. Hmmmm… What’s your variogram look like again? The thing is, each of these additional modifications to the search neighbourhood can have unpredictable and unexpected results. More so in cases where the sample spacing is highly variable and/or where there are drill holes in multiple orientations.

It’s a science!

I’d like to see some science and common sense applied to designing your search. Use some metrics to evaluate the quality of your choices, experiment with different settings. These days, with the computing power we have available, it’s possible to ‘optimise’ the samples used to estimate every block on a block-by-block basis. Of course, the optimisation depends on your objective function, be it the slope of regression, kriging efficiency, number and proportion of negative kriging weights or the grade outcome you’ve already announced to the market!

So, let’s see the end to the default “search to the range of the variogram”. The variogram is a statistic. It describes spatial correlation. That bears absolutely no resemblance to the number and location of samples you may (or may not) want to use to estimate a block.

Most modern geostatistics software lets you see what samples are used during an estimate and even the weights of those samples. While it may not be as elegant as Playkrige, I encourage anyone who is serious about resource estimation to use these features. Set up a number of different block/sample/search configurations and look at what happens.

What’s the most important driver of the estimation result? The samples that are used to estimate each and every individual block!

 

 

 

 

 

 

 

One thought on “The truth about estimation #2 – Searching… Searching… Searching…

  1. Scott,

    I agree 100% with your thrust and comments here. There are too many professionals who just blindly use Geostats and do not understand what they are doing. Like you, I have seen many disastrous estimation results by poorly applying Geostats.

    Regards
    George Bujtor

Leave a Reply

Your email address will not be published.