2.3 Statistical Methods

2.3.1 Standardised Incidence Ratio

In comparing cancer incidence between areas or over time, three important factors must be considered—the number of people at risk, their sex and their age. In this report, cancer incidence for men and women was considered separately, which deals with possible differences between sexes. The reason for correcting for the number of people at risk is obvious; the number of cases is divided by the number of people resident in the area during a specified period (as reported by the census) to produce an incidence rate.

Since the risk of developing cancer doubles with every eight or nine years of life, an area with an older population would be expected, all else being equal, to have more incident cancer cases than an area with a younger population. There are several different approaches available to adjust for differences in age; this atlas has used indirect standardization, which is the most appropriate method for small area comparisons, as it provides more stable rates than other standardization techniques, and works even if there is no population-at-risk in some age groups within the area (Estève et al., 1994). For each small area i, the national incidence rates for each age group j were applied to the population counts (N) in each age group, to calculate the total expected number of cancers (E) in the area. This can be compared to the number actually observed (O) in the area, in the form of an observed to expected ratio, or percentage. This is called the standardised incidence ratio, abbreviated to SIR. The SIR for any cancer for either men or women for Ireland as a whole is, by definition, 1 (or 100%), where for any small area (ED or ward) i:

where

2.3.2 Spatial Analysis and Smoothing

There are several types of geographical analysis of disease incidence:

disease mapping, which aims to provide an estimate of the disease rate in each small area which is as close as possible to the true value;
cluster studies, which specifically search for “clusters”—areas or groups of areas where risk is significantly higher than in the rest of the population;
point source studies, which investigate disease risk around a "point source" of possible risk which has been defined a priori (e.g. an industrial site).

Because the primary aim was to estimate risks precisely in each small area (ED or ward), disease mapping methodology was used.

Incidence rates, whether crude or standardised, are subject to high variability due to the small number of cases occurring in each small area, and the often small population-at-risk. In many instances, areas with small populations can appear to have a particularly high or low risk, purely by chance. The average population of an ED or ward in Ireland overall was about 1,420, but some were considerably smaller. One of the commonest cancers, colorectal cancer, had an incidence rate of 0.5 cases per 1,000 persons per year, so even over the 13-year period examined here, only about 9 cases would be expected in an average ED or ward, and most cancers analysed in this report have considerably lower incidence rates than this. With such small numbers, random variation is the major factor in the variation of incidence rates between EDs or wards, and this “noise” tends to obscure any other patterns. Therefore, simply mapping the SIRs for each ED or ward can be seriously misleading, as the SIRs tend to be more extreme in areas where the population is sparse. These areas are often the largest in area and can dominate a map visually. This is illustrated for colorectal cancer in men in Map 2.7.

The way of dealing with this problem involves "smoothing" the estimates of disease risk (Elliott et al., 1996). Smoothing removes the noise (i.e. it smoothes out the random variation) and shows more clearly the geographical pattern of the true underlying distribution of cancer rates—or the relative risks (RR). The effect of smoothing is illustrated in Map 2.8, which shows smoothed RRs for male colorectal cancer, compared with the unsmoothed SIRs in Map 2.7.

Map 2.7 Colorectal cancer, crude standardised incidence ratios: males, 1995-2007	Map 2.8 Colorectal cancer, smoothed relative risks: males, 1995-2007

The principle of spatial smoothing is straightforward. If we assume that the risk of cancer does not vary much between areas which are close to each other, then differences between EDs or wards are more likely to be due to random variation than to real differences in risk. The smaller the population of the area, the larger will be the element of random variation and the crude SIR will be quite an unreliable indicator of real risk. Smoothing the SIR for an ED or ward allows us to strengthen the estimate for the ED or ward by “borrowing strength” from adjacent areas (local smoothing) and/or from the overall/national map (global smoothing) in order to increase the stability of the estimated RR. Therefore, smoothing adjusts risk estimates based on small numbers towards a local mean—based on the rates in the neighbouring areas—and also towards the national value.

Many methods have been proposed for smoothing disease rates (Elliott et al., 1996; Best et al., 2005). We have chosen to use a Bayesian approach (Best et al., 2005). The main advantage of Bayesian techniques is that they work well in situations of limited information and high uncertainty. They are better at accurately depicting the geographical pattern in risk than other techniques, such as non-hierarchical approaches, which are more likely to be visually misleading (Pascutto et al., 2000).

The SIRs were smoothed by estimating relative risks using conditional autoregressive models (CAR) (Clayton and Kaldor, 1987) based on a spatial Poisson model with two random effects, as follows:

where

was the observed number of cancer cases in area i;

was the expected number based on age-adjusted national incidence rates in area i;

was the estimated relative risk in area i;

α was the intercept;

was a random effect which models the unstructured heterogeneity; and

was a spatially structured random effect (which is given a CAR prior distribution).

Use of CAR models is widespread in disease mapping and this particular model is considered to be appropriate in most situations (Lawson et al., 2000; Best et al., 2005). The suitability of the specific model above for Ireland was evaluated by comparing it with several alternative models which included covariates for population density and/or country. However, it was decided to use the basic model in this atlas as, while the alternative models were successful in detecting covariate effects, it was not clear what the covariates were actually markers for. Any effects due to socio-economic factors, for example, would be identified by means of the negative binomial regression analysis (section 2.3.3).

Other disease mapping methods (e.g. kernel smoothers, mixture models) seem to give poorer results than CAR (Lawson et al., 2000). Although risk estimates can be somewhat underestimated, CAR models have a high specificity (Richardson et al., 2004), and this conservative approach means that high or low estimates are more likely to be real. However, as with any smoothing method, it is possible that areas of genuinely high risk may be missed by smoothing with neighbouring areas. The method also assumes that risk varies smoothly at the scale studied, an assumption which may not be justified if risk factors vary considerably at a purely local level.

Models were fitted using Markov Chain Monte Carlo (MCMC) algorithms with WinBUGS software (Lunn et al., 2000). Estimates were checked to ensure convergence had been reached. A burn-in of 150,000 iterations was performed and the posterior distributions were derived using one in three iterations from the subsequent 10,000 iterations of 2 chains.

Ireland has a number of off-shore islands which form EDs but which have no neighbours (i.e. adjacent areas). Smoothing is based on a shared boundary between EDs, and the absence of such a boundary means that the risk for islands cannot be smoothed in the same way as that for mainland EDs. A similar situation arises with a number of headlands and small peninsulas, which share a boundary with only one other ED. It is common for such EDs or wards to appear as “hotspots” on smoothed maps. To minimise this problem, we created artificial “neighbours” for islands and those headlands which had only one neighbour, by assigning the nearest mainland EDs or wards as “additional neighbours”, so that each island and headland had a minimum of two neighbours (Appendix table A2.5). The “additional neighbours” were given a weighting half that of true neighbours in the smoothing algorithm.

Relative risks (RR) were mapped for each cancer site individually using ArcMap 9.3. For those cancers which affect both sexes, maps are included for both sexes combined and for men and women separately. County and district council boundaries are shown faintly on the maps to help the reader with geographical orientation; a map of these is on page 4 (Map 2.1). To aid orientation, a map is also provided at the same scale, showing the same boundaries, as well as some towns and cities on the island (Map 2.2). To facilitate comparisons between cancer sites, each map is shown using the same colour ramp, which ranges from dark green for an estimated RR less than 0.50 to dark blue for a RR higher than 2.00 (i.e. the same colour represents the same value of RR on each map). The grid from 0.50-1.00 was based on the assumption of normality of the estimated relative risks so that approximately equal numbers would fall into each interval. The grid from 1.00-2.00 was chosen as the reciprocal of the 0.50-1.00 intervals (e.g. the reciprocal of 0.50-0.55 is 1.82-2.00) as this was considered appropriate for ratios (relative risks). This scale is different from that used in the RoI atlas (Carsin et al., 2009) and so the maps are not directly comparable.

Appendix table A3.1 contains summary information from the mapping of each cancer site, including average numbers of cases per ED and ward, and ranges of SIRs and smoothed RRs.

2.3.3 Regression Analysis: Ward/ED Characteristics and Cancer Incidence

A count of the number of cases of cancer by type and sex was available for each ward/ED. Relating these counts to the ward/ED characteristics is traditionally done by modelling the count data using Poisson regression. However a key assumption behind this approach is that the mean and variance of the counts being modelled are the same. Deriving the mean number of cancer cases diagnosed in each small geographic area, and the variance between areas in these counts, illustrates that this assumption is not valid and that the data is over-dispersed; that is, the variance is greater than the mean (Table 2.7) (Breslow, 1984).

Table 2.7 Mean and variance in the number of cancer cases diagnosed in each ward/ED: 1995-2007

cancer	males		females
	mean	variance	mean	variance
non-melanoma skin cancer	14.2	347.5	12.5	334.7
breast	-	-	9.8	186.5
colorectal	5.4	48.9	4.3	35.0
lung	5.3	58.7	3.3	30.7
prostate	8.4	108.3	-	-
non-Hodgkin’s lymphoma	1.3	3.7	1.2	3.4
stomach	1.5	5.0	0.9	2.7
melanoma of the skin	0.9	2.5	1.4	5.3
bladder	1.6	5.4	0.6	1.4
head and neck	1.4	5.3	0.6	1.2
leukaemia	1.2	2.9	0.8	1.9
pancreas	0.9	1.9	0.9	2.3
kidney	1.0	2.7	0.6	1.3
oesophagus	1.0	2.5	0.6	1.3
ovary	-	-	1.6	5.7
brain and other central nervous system	0.8	1.6	0.6	1.1
cervix uteri	-	-	1.0	3.1
corpus uteri	-	-	1.3	4.1

Although a great deal of this variance may be explained by the differing population sizes of each geographic area, which is adjusted for in a Poisson regression model, we decided to use a modification of Poisson regression, known as negative binomial regression, to adjust more fully for the over-dispersion. This model produces a relative risk (RR) for each categorical variable included in the model, relative to a baseline value. For example, if RoI is taken as the baseline (by definition, RR=1) in a variable indicating which country the geographic area is in, then if NI has a relative risk greater than 1, this means that the incidence of cancer is higher in NI than RoI; conversely a relative risk lower than 1 means that incidence is lower in NI than RoI. Five small area characteristics were examined for a relationship to cancer incidence using this approach—country, population density tertile, and quintiles of unemployment, third-level education and elderly living alone (see section 2.2.4.3).

It has already been noted (section 2.2.4.3) that the variables we are studying are not completely independent of each other. Therefore, if we see a relationship between cancer risk and a specific variable (for instance level of unemployment), part of this relationship might be due to another factor, such as the average age of the population, which would influence both cancer rates and unemployment levels. For this reason, measures of the effect of each variable must be adjusted for the effects of the others (see section 2.3.1). The most important adjustment is for age, as cancer risk rises rapidly with age. Two comparisons were made between NI and RoI, one of which was adjusted for age alone, and the other for age, population density, unemployment, education and percentage of elderly living alone. All other relative risks reported were adjusted for the effects of all the other variables. Thus, risk estimates are reported for:

country adjusted by age only;
country, adjusted by age, population density, unemployment, education and elderly living alone;
population density, adjusted by age, country, unemployment, education and elderly living alone;
unemployment, adjusted by age, country, population density, education and elderly living alone;
education, adjusted by age, country, population density, unemployment and elderly living alone; and
elderly living alone, adjusted by age, country, population density, unemployment and education.

The risk estimates with 95% confidence intervals and tests of statistical significance are given in full for each site in Appendix 1. Summary figures are presented in each chapter.

2.3.4 Summary Measures

A series of summary measures was computed for each cancer site. The incidence of each cancer is expressed in terms of the average number of new cases each year between 1995 and 2007, and as a percentage of all new cancer cases, both including and excluding non-melanoma skin cancer.

Time trends

Estimated annual percentage rate of change in the number of cases was calculated over the period 1995-2007 (13 years) by taking the 12th root of the total percentage growth rate (12 years of growth).

Cumulative risk

Cumulative risk to age 74 ( is the risk of developing a specified cancer or cancers up to and including age 74, in the absence of competing risks (Estève et al, 1994). This was calculated as follows:

where, if x is one of 15 five-year age groups from 0 to 74:

t_x=age-specific incidence rate

The cumulative risk is given as a percentage and also as a ratio (e.g. a cumulative risk of 4% is expressed as 1 in 25).

Prevalence

15-year prevalence was estimated as the total number of individuals diagnosed between 1/1/1994 and 31/12/2008 who were still alive on 31/12/2008. Numbers are given for those who were aged under 65 years on 31/12/2008, and for those who were aged 65 years or older on that date.