Sample Construction

Most of the statistical formulas associated with sampling theories are based upon the assumption of simple random sampling. Specifically, the statistical formulas for specifying the sampling precision (estimates of sampling variance), given particular sample sizes, are premised on simple random sampling. Unfortunately, random sampling requires that all of the elements in the population have an equal chance of being selected. Since no enumeration of the total population of the United States (or its subdivisions) is available, all surveys of the general public are based upon an approximation of the actual population and survey samples are generated by a process closely resembling true random sampling.

The survey samples were based on a modified stratified random digit dialing method, using an area probability/RDD sample rather than a single-stage/RDD sample. There are several important advantages to using an area probability base: (1) it draws the sample proportionate to the geographic distribution of the target population rather than the geographic distribution of telephone households, which is vital to constructing unbiased population estimates from telephone surveys; (2) it allows greater geographic stratification of the sample to control for known geographic differences in non response rates; and (3) it facilitates the use of Census estimates of population characteristics to weight the completed sample to correct for other forms of sampling and non-sampling bias. Moreover, the precision of sample estimates is generally improved by stratification.

Hence, as specified for the study design for the survey, the adult household population of the United States was stratified by the ten NHTSA regions. The estimated distribution of the population for 2003 by stratum was calculated on the basis of the Bureau of the Census, State Population Projections: 2001 to 2005, (release date, November 2, 2000). At the time of the survey, these were the most recent projections of the distribution of adult population by state. Based on the Census data on the geographic distribution of the target population, the total sample was proportionately allocated by stratum. The expected geographic allocation of the cross-sectional sample for the survey is presented in Table 1.

Population Aged 16 and Older by NHTSA Region:
2003 Projections for National Cross-Section Survey of 4,500

Region I CT, ME, MA, NH, RI, VT
Region II NJ, NY
Region III DE, DC, MD, PA, VA, WV
Region IV AL, FL, GA, KY, MS, NC, SC, TN
Region V IL, IN, MI, MN, OH, WI
Region VI AR, LA, NM, OK, TX
Region VII IA, KS, MO, NE
Region VIII CO, MT, ND, SD, UT, WY
Region IX AZ, CA, HI, NV
Region X AK, ID, OR, WA
Source: State Population Projections: 2001 to 2005, U.S. Bureau of the Census, Population Division, Populations Projections Branch,, (release date, November 2, 2000).

Once the sample had been geographically stratified with sample allocation proportionate to population distribution, a sample of assigned telephone banks was randomly selected from an enumeration of the Working Residential Hundreds Blocks of the active telephone exchanges within the region. The Working Residential Hundreds Blocks were defined as each block of 100 potential telephone numbers within an exchange that included 3 or more residential listings. (Exchanges with one or two listings were excluded because in most cases such listings represent errors in the published listings).

The use of residential listings to identify working residential exchanges is generally described as "list assisted" or "truncated" RDD sampling. In a series of empirical studies, Brick, et. al.1, demonstrated that only about four percent of all telephone households are excluded in national samples using this method. In addition, these studies indicate that the differences between covered and uncovered samples are trivial in most instances. The principal advantage of "list assisted" sampling is that an equal probability systematic sample of telephone numbers can be selected under this procedure and the variances of estimates from the list-assisted sample are usually lower than those from a clustered design like the Mitovsky-Waksberg RDD method.

In the third stage sample, a two-digit number was randomly generated by computer for each Working Residential Hundreds Block selected in the second stage sample. This third stage sampling process is the random digit dialing (RDD) component. Every telephone number within the Hundreds Block has an equal probability of selection, regardless of whether it is listed or unlisted.

The third stage RDD sample of telephone numbers was then dialed by SRBI interviewers to determine which were currently working residential household phone numbers. Non-working numbers and non-residential numbers were immediately replaced by other RDD numbers selected within the same stratum in the same fashion as the initial number. Ineligible households (e.g., no adult in the household, language barriers other than Spanish) were also immediately replaced. Non-answering numbers were not replaced until the research protocol (in this study, a ten-call protocol) was exceeded.

1 Brick, J, Waksberg, J, Kulp, D and Starer, A. Bias in List-Assisted Telephone Samples, Public Opinion Quarterly, Summer 1995, Vol. 59, No. 2, pp.218-235.
Table of Contents