Model Years

Results

Research Question: What is the distribution of vehicle model years in the target population?

Note that we include vehicles that are driven in Utah County without being registered in Utah County. Thus, we provide added information to what is publicly available from government registration records. That being said, we place a strong prior on the government registration data, so our results closely track the government registration records. Notice the dips in 2009 and 2020 in Figure 1.

Figure 1: Vehicle Model Years as of March 2025.
Figure 2: Vehicle Model Ages as of March 2025.

Appendix: Methods

Strategy

Use the registration counts as the concentration parameters for a Dirichlet distribution. Use the technique here to use these concentration parameters as pseudocounts to be added to our observed counts. The summed counts can then be used as the concentration parameter for the posterior Dirichlet distribution of the relative frequencies of different vehicle model years in the population.

response variable: vector of means (proportion for each model year)
posterior parameter(s): concentration vector for Dirichlet distribution

Transform Registration Counts based on Prior Beliefs

Extrapolate the registration counts for pre-1913 model year vehicles

The first steam-powered vehicle dates back to 1672 (Wikipedia). The Utah registration data starts for model year 1913. We assume that the number of vehicle registrations is 0 for model years not listed in the government data. Also, we assume that the model year of a vehicle must be an integer between 1672 and 2026.

Extrapolate the registration counts for post-2023 model year vehicles

New vehicles are still being sold for 2024, 2025, and 2026 model years, but not for model year 2023. The registration data that we have is for vehicles registered in 2024 all of the way up to February 17, 2025. Thus, there may be additional registrations for the newer model year vehicles between February 2025 and March 2025 which are not in our dataset. We can modify the registration counts for these new model years using the count for model year 2023. Use linear interpolation to get extrapolated (March 2025) counts for model years 2024-2026 based on the count for model year 2023.

Non-business Registrations

Use a distribution of Richard’s curves to model the non-business registrations in the government data. For the pre-2005 model years, most were probably registered by actual individuals instead of businesses. For model years post 2005-ish, a good chunk of the registrations are likely for commercial vehicles.

Dirichlet

Assume that the proportion of vehicles in the population for each model year follows a Dirichlet distribution whose parameters are a function of the registration counts. The total number of on-highway vehicle registrations (all-vehicle types including motorcycles and heavy trucks) in Utah County, Utah, United States with an expiration of January 1, 2025 or greater as of February 17, 2025 was 584221.

The 1 July, 2024 estimate of the number of residents of Utah County, Utah is 747,234. The five-year 2019-2023 ACS estimate of the number of households in Utah County is 195,602. According to DataUSA, the number of vehicles per household in Utah is about 2. This would put a lower bound on the target population size of about 195602 * 2 = 391204. Note that vehicles can be registered by businesses and not just households and not all vehicles in the target population are in government registration records.

Appendix: Sensitivity Analysis for Non-sampling Error

Strategy: We come up with a reasonable model for the non-responses. We use a higher non-response rate for older cars because they are more likely to be in a car shop or rarely driven private collection than in a parking lot.

Figure 3: Non-response Simulation. The estimate in orange (shown for comparison only) is the result of multiplying the posterior predictive distribution’s median by N - n, where N is assumed to be 500000.

The simulated results shown in Figure 3 give rise to the following possible correlations between responders and non-responders.

Next, we show the simulated absolute errors based on the simulated characteristics of the non-responders.