| | |||||||||||
| |
|
|||||||||||
|
Youth Substance Use: State Estimates From the 1999 National Household Survey on Drug Abuse |
||||||||||||
Appendix F: Limitations of the Data
F.1 Target Population
An important limitation of the National Household Survey on Drug Abuse (NHSDA) estimates of drug use prevalence is that they are only designed to describe the target population of the survey-the civilian, noninstitutionalized population aged 12 or older. Although this population includes almost 98 percent of the total U.S. population aged 12 or older, it does exclude some important and unique subpopulations who may have very different drug-using patterns. The survey excludes active-duty military personnel, who have been shown to have significantly lower rates of illicit drug use. Persons living in institutional group quarters, such as prisons and residential drug treatment centers, are not covered in the NHSDA and have been shown in other surveys to have higher rates of illicit drug use. Also excluded are homeless persons not living in a shelter on the survey date, another population shown to have higher than average rates of illicit drug use. Appendix H describes other surveys that provide data for these populations.
The sampling error of an estimate is the error caused by the selection of a sample instead of conducting a census of the population. Sampling error is reduced by selecting a large sample and by using efficient sample design and estimation strategies, such as stratification, optimal allocation, and ratio estimation.
With the use of probability sampling methods in the NHSDA, it is possible to develop estimates of sampling error from the survey data. These estimates have been calculated for all prevalence estimates presented in this report using a Taylor series linearization approach that takes into account the effects of the complex NHSDA design features. The sampling errors are used to identify unreliable estimates and to test for the statistical significance of differences between estimates.
As was done in the past, direct survey estimates considered to be unreliable due to unacceptably large sampling error are not shown in this report and are noted by asterisks (*) in the tables containing such estimates. The criterion used for suppressing all direct survey estimates was based on the relative standard error (RSE), which is defined as the ratio of the standard error over the estimate.
For proportion estimates (p) within the range [0 < p < 1], rates and corresponding estimated number of users were suppressed if
[se(p)/p ÷ -1n( p)] > 0.175 when p < 0.5
or
[se(p)/(1-p)] ÷ -1n(1-p) > 0.175 when p > 0.5.
This is an ad hoc rule that requires an effective sample size in excess of 50 when 0.10 < p < 0.90. As (p) approaches 0.00 or 1.00, it requires increasingly larger effective sample sizes. Estimates were also suppressed if they were close to 0 or 100 percent (if p < .00005 or if p > .99995).
For estimates of other totals, and means (not bounded between 0 and 1), estimates were suppressed if
se(p)/p > 0.5.
Additionally, estimates of mean age were suppressed if the sample size was smaller than 10 respondents.
When making comparisons of estimates for different population subgroups from the same data year, the covariance term, which is usually small and positive, has typically been ignored. This results in somewhat conservative tests of hypotheses that will sometimes fail to establish statistical significance when in fact it exists.
Nonsampling errors occur from nonresponse, coding errors, computer processing errors, errors in the sampling frame, reporting errors, and other errors. Nonsampling errors are reduced through data editing, statistical adjustments for nonresponse, and close monitoring and periodic retraining of interviewers.
Although nonsampling errors can often be much larger than sampling errors, measurement of most nonsampling errors is difficult or impossible. However, some indication of the effects of some types of nonsampling errors can be obtained through proxy measures, such as response rates and from other research studies.
Response rates for the NHSDA were stable for the period from 1994 to 1998, with the screening response rate at about 93 percent and the interview response rate at about 78 percent. Of the 187,842 eligible households sampled for the 1999 NHSDA main study, 169,166 were successfully screened for a weight-adjusted screening response rate of 89.6 percent. In these screened households, a total of 89,883 sample persons were selected, and completed interviews were obtained from 66,706 of these sample persons, for a weighted interview response rate of 68.6 percent. A total of 11,276 (18.0 percent) sample persons were classified as refusals, 5,692 (6.7 percent) were not available or never at home, and 6,209 (6.8 percent) did not participate for various other reasons, such as physical or mental incompetence or language barrier. The response rate was highest among the 12- to 17-year-old age group (78.1 percent). The response rate was 71.2 percent for the 18- to 25-year-old age group and 66.7 percent for adults aged 26 or older.
The increase in nonresponse in the 1999 NHSDA can be attributed primarily to an insufficient number of field interviewers (FIs) and their inexperience. Recruiting and training of FIs were major challenges due to the number required for the large sample and the tight labor market. This resulted in a relatively inexperienced FI staff. There were 2,010 FIs hired and trained, and more than a third of them did not complete the survey year (37.6 percent). Both prior NHSDA experience and on-the-job experience were shown to be related to nonresponse. Previously experienced interviewers and interviewers with one, two, or three quarters of on-the-job experience were more successful at obtaining an interview. The overall nonresponse was also demonstrated to be a product of the combined influences of urbanicity and the age and gender of the respondent. Interviews were completed at a greater rate in rural regions than urban and by younger and female respondents.
Among survey participants, item response rates were above 98 percent for most questionnaire items. However, inconsistent responses for some items, including the drug use items, are common. Estimates of drug use from the NHSDA are based on the responses to multiple questions by respondents, so that the maximum amount of information is used in determining whether a respondent is classified as a drug user. Inconsistencies in responses are resolved through a logical editing process that involves some judgment on the part of survey analysts and is a potential source of nonsampling error. Because of the automatic routing through the computer-assisted interviewing (CAI) questionnaire (e.g., lifetime drug use questions that skip entire modules when answered "no"), there is less editing of this type than in the paper-and-pencil interviewing (PAPI) questionnaire used in previous years. In addition, less logical editing is used because with the CAI data, statistical imputation is relied upon more heavily to determine the final values of drug use variables in cases where there is the potential to use logical editing to make a determination. The combined amount of editing and imputation in the CAI data is still considerably less than the total amount in the PAPI study. For the 1999 CAI data, 2 percent of the estimate of past month hallucinogen use was based on logically edited cases and 4 percent on imputed cases, for a combined amount of 6 percent. In the 1998 NHSDA, the amount of editing and imputation for past month hallucinogens was 60 and 0 percent, respectively, for a total of 60 percent. The combined amount of editing and imputation for the estimate of past month heroin use was 15 percent for the 1999 CAI and 37 percent for the 1998 PAPI data.
NHSDA estimates are based on self-reports of drug use, and their value depends on respondents' truthfulness and memory. Although many studies have generally established the validity of self-report data and the NHSDA procedures were designed to encourage honesty and recall, some degree of underreporting is assumed. No adjustment to NHSDA data is made to correct for this. (Appendix H mentions a number of references addressing the validity of self-reported drug use data.) The methodology used in the NHSDA has been shown to produce more valid results than other self-report methods (e.g., by telephone) (Aquilino, 1994; Turner, Lessler, & Gfroerer, 1992). However, comparisons of NHSDA data with data from surveys conducted in classrooms suggest that underreporting of drug use by youths in their homes may be substantial (Gfroerer, 1993; Gfroerer, Wright, & Kopstein, 1997).
The following is a general description of the procedure used to measure incidence rates and some of the limitations of those data. Although much of the discussion here is applicable to the incidence estimates discussed in Chapter 4, the actual calculations in that chapter are based on the formula in Section 1.3.1.
For diseases, the incidence rate, IR, for a population is defined as the number of new cases of the disease, N, divided by the person time, PT, of exposure or
IR = N / PT.
The person time of exposure can be measured for the full period of the study or for a shorter period. The person time of exposure ends at the time of diagnosis (e.g., Greenberg, Daniels, Flanders, Eley, & Boring, 1996, pp. 16-19). Similar conventions were followed for the NHSDA when defining the incidence of first use of a substance.
In order to stabilize the annual rate, the incidence was calculated over a 2-year period and later divided by 2. The time period for recording incidence cases in this report was the 24 months prior to the date of interview. This moving 2-year window for defining incidence cases differs from the calendar year time periods used to estimate incidence at the national level. An approximation was also used to simplify the estimation of the person time, PT, of exposure in the denominator of the incidence rate. It was assumed that the date of first use for initiates was uniformly distributed over the 2 years prior to the interview. With this assumption, the expected number of 2-year units of exposure experienced by initiates was (½)N because the expected fraction of the interval that initiates were at risk was (½).
If O denotes the number of persons who would report never having used marijuana if a census of the population was conducted, the number of 2-year units of exposure experience by the population at risk at the beginning of the period is PT = [(½)N + O)] because each of the O persons who had still not used marijuana at the time of interview were exposed for one full 2-year period. This leads to a 2-year incidence rate of the form
IR2 = N ÷ [(½)N + O].
The average annual incidence rate (AAIR) over the 2 years prior to the interview is then defined as AAIR = (IR2÷2).
The AAIR is an appealing approximation
because it can be recast in terms of two population prevalences that can be
estimated by the survey-weighted hierarchical Bayes software developed for NHSDA
small area estimation (SAE). This software fits logistic mixed models to binary
(one/zero) outcome variables. Letting M depict the total survey-eligible
population with
PI≡(N/M)
and PO≡(O/M) denoting
the associated population fractions of past 24-month initiates and never users,
respectively, then
AAIR = (½){PI ) [(½)PI + PO]}.
The national incidence estimate uses the
reported month and day of initiation to calculate the PT of exposure for
initiates. This national incidence calculation uses the observed average
fraction of the time period that initiates are at risk, say
,
in place of the assumed uniform fraction of (½) in the calculation of PT.
Although the uniform distribution assumption for initiation dates will lead to
some bias relative to the estimator incorporating
,
one could not use the State by age-group-specific versions of
to form average annual incidence estimates because these
's
would be much too unstable. Jointly modeling PI, PO, and
would be the ideal solution, but this is currently beyond the scope of the
project.
A more important distinction between the model-based State-level average annual incidence estimates and their design-based national analogs is the way that age groups are handled. To produce the age-group-specific average annual incidence estimates, we simply condition the PI and PO prevalence estimates on the survey respondents' age at interview. This is consistent with how all the State-level age-specific small area estimates are produced. The design-based national estimates, on the other hand, assign incidence cases to age groups depending on the respondents' age at initiation. Therefore, someone just turning 12 at the time of the interview could have their initiation included in the 12 to 17-year-old count of incidence cases if it occurred when they were just turning 10. Similarly, respondents aged 18 through 19 at the time of the interview who reported first use during the 24-month time period prior to the interview when they were 17 would not have their initiation included in the 12 to 17-year-old value of N.
The assignment of exposure fractions is also
different in the design-based and model- based estimates. In the national
design-based estimate, the fraction of the time interval that an initiate is at
risk is restricted to the fraction that he or she is both at risk and in the age
interval. In the calculation of PT, the average of these age-a
restricted fractions, say
,
multiplies the count of initiations that occur to respondents when they are
aged-a, say
.
The never users' exposure time of one unit is also age restricted to the
fraction of the time interval that they are aged-a. If the average of
these age-restricted exposure fractions for never users is
,
then
PTa = [(
)Na
+ (
) Oa]
where
is the count of never users at interview who have non-zero fractions of the
2-year time period when they are aged-a. This distinction between how age
grouping is handled makes it clear that the national design-based incidence
estimates and the national aggregates of the State-level model-based estimates
will not be comparable. The State-level model-based estimates are incidence-like
rates that can be compared across States.
The 95 percent prediction intervals quoted for the AAIR's also involved an approximation. Because the PI and PO prevalences were modeled separately, there was no direct way to produce 95percent prediction intervals for the State-level AAIR's that would account for the posterior correlation between the two prevalences. Pearson correlation between the two State-level prevalences was used as a substitute.
Bias due to differential mortality occurs because some persons who were alive and exposed to the risk of first drug use in the historical periods shown in the tables died before the 1999 NHSDA was conducted. This bias is probably very small for estimates shown in this report. Incidence estimates are also affected by memory errors, including recall decay (tendency to forget events occurring long ago) and forward telescoping (tendency to report that an event occurred more recently than it actually did). These memory errors would both tend to result in estimates for earlier years (i.e., 1960s and 1970s) that are downwardly biased (because of recall decay) and estimates for later years that are upwardly biased (because of telescoping). There is also likely to be some underreporting bias due to social acceptability of drug use behaviors and respondents' fear of disclosure. This is likely to have the greatest impact on recent estimates, which reflect more recent use and reporting by younger respondents. Finally, for drug use that is frequently initiated at age 10 or younger, estimates based on retrospective reports 1 year later underestimate total incidence because 11-year-old children are not sampled by the NHSDA. Prior analyses showed that alcohol and cigarette (any use) incidence estimates could be significantly affected by this. Therefore, for these drugs no 1998 estimates were made.
Johnson, Gerstein, and Rasinski (1998) concluded that the marijuana incidence trend from the NHSDA was biased because the reporting of initiation declines as the length of time between initiation and the survey increases. However, this study did not address very recent estimates (i.e., 1996 to 1998), which could be biased because they reflect recent drug use and because they are heavily based on the reports of adolescents. To better understand the size of the biases and to assess the reliability of estimates for recent years, the Office of Applied Studies (OAS) performed an analysis of estimates based on single years of NHSDA data. This analysis focused on three drugs: marijuana, cocaine, and heroin. Using the survey data from 1994 to 1998, estimates were made of the number of initiates, the rate of initiation for youths aged 12 to 17, and the rate of initiation for persons aged 18 to 25. For the 1994 survey, an estimate was made for the year 1993. For the 1995 survey, another estimate was made for the year 1993. In this way, two recent estimates of the same year could be compared. Similarly, the 1995 and 1996 data provided two estimates for 1994, the 1996 and 1997 surveys provided two estimates for 1995, the 1997 and 1998 surveys provided two estimates for 1996. Because these calculations represent two measurements of the same population characteristic, they would ideally be the same. Examples of these estimates are shown in the following table.
Drug initiation rates for youths aged 12 to 17 for the more hard-core drugs (such as cocaine and heroin) appear to be most prone to bias. For example, on average across the 4 survey years, the estimate for the rate of initiation of cocaine use among youths aged 12 to 17 was 48 percent higher the first time the estimate could be made than the second time. This indicates a probable bias in the estimation; however, it is unclear which estimate is the correct one. As a result, one should be cautious in interpreting any changes between the prior year and the most recent year in the initiation rates for youths of the more stigmatized drugs. Because only 5 years of data were used to estimate how the rate of incidence changes between the first year it can be estimated and the second, one should be cautious about inferring the magnitude of the bias (e.g., that it is 48 percent for cocaine). In 1999 and thereafter, the youth and young adult samples will be much larger, and more precise estimates of the bias will be possible.
Year of Initiation |
Average of Ratio of 1-Year Recall to 2-Year Recall | ||||||||
1993 |
1994 |
1995 |
1996 | ||||||
Year of Survey | |||||||||
1994 |
1995 |
1995 |
1996 |
1996 |
1997 |
1997 |
1998 | ||
Rate for Youths Aged 12 to 17 | |||||||||
Marijuana Cocaine Heroin |
59.2 8.9 0.7 |
53.7 5.0 0.5 |
74.2 10.2 2.1 |
75.2 5.7 1.4 |
75.7 10.6 2.5 |
73.6 8.0 1.8 |
83.2 11.3 3.9 |
75.6 11.0 1.5 |
1.055 1.480 1.722 |
Rate for Young Adults Aged 18 to 25 | |||||||||
Marijuana Cocaine Heroin |
46.9 12.8 0.1 |
41.4 12.8 1.4 |
42.1 9.9 1.4 |
55.9 11.8 2.1 |
47.7 13.8 2.4 |
53.4 14.7 1.9 |
53.6 14.8 2.3 |
50.5 13.9 3.0 |
0.960 0.961 0.692 |
Number of Initiates | |||||||||
Marijuana Cocaine Heroin |
2,035 595 41 |
1,783 538 62 |
2,251 533 122 |
2,548 530 97 |
2,368 652 141 |
2,443 654 93 |
2,540 675 171 |
2,384 664 127 |
1.015 1.031 1.195 |
This page was last updated on June 03, 2008. |
|
SAMHSA, an agency in the Department of Health and Human Services, is the Federal Government's lead agency for improving the quality and availability of substance abuse prevention, addiction treatment, and mental health services in the United States.
* PDF formatted files require that Adobe Acrobat Reader® program is installed on your computer. Click here to download this FREE software now from Adobe. |