2.3.2 Spatial Analysis and Smoothing

There are several types of geographical analysis of disease incidence:

disease mapping, which aims to provide an estimate of the disease rate in each small area which is as close as possible to the true value;
cluster studies, which specifically search for “clusters”—areas or groups of areas where risk is significantly higher than in the rest of the population;
point source studies, which investigate disease risk around a "point source" of possible risk which has been defined a priori (e.g. an industrial site).

Because the primary aim was to estimate risks precisely in each small area (ED or ward), disease mapping methodology was used.

Incidence rates, whether crude or standardised, are subject to high variability due to the small number of cases occurring in each small area, and the often small population-at-risk. In many instances, areas with small populations can appear to have a particularly high or low risk, purely by chance. The average population of an ED or ward in Ireland overall was about 1,420, but some were considerably smaller. One of the commonest cancers, colorectal cancer, had an incidence rate of 0.5 cases per 1,000 persons per year, so even over the 13-year period examined here, only about 9 cases would be expected in an average ED or ward, and most cancers analysed in this report have considerably lower incidence rates than this. With such small numbers, random variation is the major factor in the variation of incidence rates between EDs or wards, and this “noise” tends to obscure any other patterns. Therefore, simply mapping the SIRs for each ED or ward can be seriously misleading, as the SIRs tend to be more extreme in areas where the population is sparse. These areas are often the largest in area and can dominate a map visually. This is illustrated for colorectal cancer in men in Map 2.7.

The way of dealing with this problem involves "smoothing" the estimates of disease risk (Elliott et al., 1996). Smoothing removes the noise (i.e. it smoothes out the random variation) and shows more clearly the geographical pattern of the true underlying distribution of cancer rates—or the relative risks (RR). The effect of smoothing is illustrated in Map 2.8, which shows smoothed RRs for male colorectal cancer, compared with the unsmoothed SIRs in Map 2.7.

Map 2.7 Colorectal cancer, crude standardised incidence ratios: males, 1995-2007	Map 2.8 Colorectal cancer, smoothed relative risks: males, 1995-2007

The principle of spatial smoothing is straightforward. If we assume that the risk of cancer does not vary much between areas which are close to each other, then differences between EDs or wards are more likely to be due to random variation than to real differences in risk. The smaller the population of the area, the larger will be the element of random variation and the crude SIR will be quite an unreliable indicator of real risk. Smoothing the SIR for an ED or ward allows us to strengthen the estimate for the ED or ward by “borrowing strength” from adjacent areas (local smoothing) and/or from the overall/national map (global smoothing) in order to increase the stability of the estimated RR. Therefore, smoothing adjusts risk estimates based on small numbers towards a local mean—based on the rates in the neighbouring areas—and also towards the national value.

Many methods have been proposed for smoothing disease rates (Elliott et al., 1996; Best et al., 2005). We have chosen to use a Bayesian approach (Best et al., 2005). The main advantage of Bayesian techniques is that they work well in situations of limited information and high uncertainty. They are better at accurately depicting the geographical pattern in risk than other techniques, such as non-hierarchical approaches, which are more likely to be visually misleading (Pascutto et al., 2000).

The SIRs were smoothed by estimating relative risks using conditional autoregressive models (CAR) (Clayton and Kaldor, 1987) based on a spatial Poisson model with two random effects, as follows:

where

was the observed number of cancer cases in area i;

was the expected number based on age-adjusted national incidence rates in area i;

was the estimated relative risk in area i;

α was the intercept;

was a random effect which models the unstructured heterogeneity; and

was a spatially structured random effect (which is given a CAR prior distribution).

Use of CAR models is widespread in disease mapping and this particular model is considered to be appropriate in most situations (Lawson et al., 2000; Best et al., 2005). The suitability of the specific model above for Ireland was evaluated by comparing it with several alternative models which included covariates for population density and/or country. However, it was decided to use the basic model in this atlas as, while the alternative models were successful in detecting covariate effects, it was not clear what the covariates were actually markers for. Any effects due to socio-economic factors, for example, would be identified by means of the negative binomial regression analysis (section 2.3.3).

Other disease mapping methods (e.g. kernel smoothers, mixture models) seem to give poorer results than CAR (Lawson et al., 2000). Although risk estimates can be somewhat underestimated, CAR models have a high specificity (Richardson et al., 2004), and this conservative approach means that high or low estimates are more likely to be real. However, as with any smoothing method, it is possible that areas of genuinely high risk may be missed by smoothing with neighbouring areas. The method also assumes that risk varies smoothly at the scale studied, an assumption which may not be justified if risk factors vary considerably at a purely local level.

Models were fitted using Markov Chain Monte Carlo (MCMC) algorithms with WinBUGS software (Lunn et al., 2000). Estimates were checked to ensure convergence had been reached. A burn-in of 150,000 iterations was performed and the posterior distributions were derived using one in three iterations from the subsequent 10,000 iterations of 2 chains.

Ireland has a number of off-shore islands which form EDs but which have no neighbours (i.e. adjacent areas). Smoothing is based on a shared boundary between EDs, and the absence of such a boundary means that the risk for islands cannot be smoothed in the same way as that for mainland EDs. A similar situation arises with a number of headlands and small peninsulas, which share a boundary with only one other ED. It is common for such EDs or wards to appear as “hotspots” on smoothed maps. To minimise this problem, we created artificial “neighbours” for islands and those headlands which had only one neighbour, by assigning the nearest mainland EDs or wards as “additional neighbours”, so that each island and headland had a minimum of two neighbours (Appendix table A2.5). The “additional neighbours” were given a weighting half that of true neighbours in the smoothing algorithm.

Relative risks (RR) were mapped for each cancer site individually using ArcMap 9.3. For those cancers which affect both sexes, maps are included for both sexes combined and for men and women separately. County and district council boundaries are shown faintly on the maps to help the reader with geographical orientation; a map of these is on page 4 (Map 2.1). To aid orientation, a map is also provided at the same scale, showing the same boundaries, as well as some towns and cities on the island (Map 2.2). To facilitate comparisons between cancer sites, each map is shown using the same colour ramp, which ranges from dark green for an estimated RR less than 0.50 to dark blue for a RR higher than 2.00 (i.e. the same colour represents the same value of RR on each map). The grid from 0.50-1.00 was based on the assumption of normality of the estimated relative risks so that approximately equal numbers would fall into each interval. The grid from 1.00-2.00 was chosen as the reciprocal of the 0.50-1.00 intervals (e.g. the reciprocal of 0.50-0.55 is 1.82-2.00) as this was considered appropriate for ratios (relative risks). This scale is different from that used in the RoI atlas (Carsin et al., 2009) and so the maps are not directly comparable.

Appendix table A3.1 contains summary information from the mapping of each cancer site, including average numbers of cases per ED and ward, and ranges of SIRs and smoothed RRs.

NCR books

Error message