Summary of Cancer Data 1988-1996
Appendix D:
Statistical Methods Estimated Annual Percent Change (EAPC)
The EAPC was calculated using the same method as employed by the NCIÍs SEER
program. A regression line was fit to the natural logarithm of the rates (r)
using the calendar year as the independent variable. That is, y = mx + b where
y = ln(r), x = the calendar year, and m is the slope of the line. The EAPC was
estimated as 100 (em - 1). The determination of whether the EAPC was different
than zero was made by testing whether the slope of the regression line was statistically
different than zero.Lifetime Cancer Risk A theoretical population of 1,000 males
or 1,000 females is ñagedî according to current life expectancy. That is, the
current age-specific, all-cause mortality rates are applied to each five-year
age group, 0-4, 5-9, . . ., 95+, to estimate the number of individuals that
would be alive at each age. The individuals surviving into each group are at
risk of developing cancer and the risk is specified by the current age-specific
incidence for the cancer in question. The expected number of cancers for a given
age group is calculated and the number expected in the lifetimes of the theoretical
1,000 person cohort is obtained by summing the expected numbers for each age
group.
Regression
There
are many models for creating a regression line that ñbest fitsî empirical data.
The purpose of modeling the data is to smooth out random variation from an underlying
relationship and to enhance the parsimonious interpretation of that relationship.
Least
square regression used in the report is one of the methods used to model data
(e.g. cancer incidence rates as a function of calendar year). A straight line
is estimated that minimizes the square of the difference between the observed
and expected values. In this context, the best fitting straight line is the
one that minimizes this difference. Once the characteristics of the best fitting
line are determined, analytic parameters, such as slope and intercept required
for specific estimates, can be easily defined.
Standard Error of Age Standardized Rates
Age-standardized
rates are computed from weighted averages of the age-specific rates. The weights
are traditionally calculated from the 1970 U.S. census as the proportion of
the total census that the specific age group represents. Age-standardized rates
are then considered age-adjusted in that differences in age distributions of
two populations will not distort the comparison of the (directly) agestandardized
rates.
The
statistical inference whether the rates are different requires consideration
of the variability (standard error) of the age-standardized rates. Keyfitz (Human
Biology 26:301-7, 1966) developed estimates of the standard error using the
Poisson probability distribution. The larger the population and the resultant
number of cases, the smaller the standard error of the estimated rate.
Standardized Incidence Ratio (SIR)
The
SIR is a ratio of two similarly age-standardized incidence rates. Under the
null hypothesis of no association between the two populations from which the
rates were drawn and their respective incidence rates, the expected value of
the SIR is 1.0.
Standardized Morbidity (Mortality) Ratios (SMR)
SMRs
are calculated from the ratio of the number of observed cancers (deaths) to
that expected based on the assumption of an underlying rate. The observed numbers
are just that, the number of cancers or cancer deaths that occurred during a
specific time period. The expected numbers are calculated using a standard rate
such as the rates for the entire state of Minnesota. The number of persons in
each age group of the community of interest are multiplied by the standard rate
to estimate the number of cancers that would occur if the community rate and
the stateÍs rate were the same. The numbers for each age group are summed, and
the result represents the total number of expected cancers for the community
if the community had experienced the stateÍs cancer incidence rates.
|