Problems with CR!
I found out about a blog written by Michael Karesh of TrueDelta from a news article here at allpar. The original blog by Michael Karesh can be found here.
In Michael’s blog, he discusses the “anomalies” that exist in CR’s reporting about the “reliability” of particular vehicles. The majority of the vehicles discussed in Michael’s blog consists of vehicles that have different bodystyles of the same basic car structure, e.g., Dodge Magnum and the Chrysler 300. As Michael states in his blog these vehicles “share the great majority of their parts, including powertrains. This is especially the case if both are made in the same plant.”
Michael then reports on such “anomalies” such as the V8 Magnum scoring 55 points higher then the V8 300 yet the V6 Magnum scores 25 points worse then the V6 300. Michael goes on to discuss a wide variety of vehicles where this same “anomaly” or situation occurs.
Michael runs a site, named TrueDelta, where his analysis not only includes the mean or average number of successful repair trips per year but he also includes the standard deviation (error bars, uncertainties, variance, confidence intervals) of his data. This type of data reporting is crucial to any analytic undertaking. An example of his data reporting can be found here at his website.
From Michael’s analysis it is easy to see, with no questions asked, that the “reliability” or the number of trips per year that the 06 Chevy HHR is larger then the average number of trips per year of a 06 VW Jetta. Error bars are included!!! Similarly, his analysis shows that a 06 Chevy HHR is not more “reliable” then an 06 Charger/Magnum/300 or visa versa.
From surveys, people report to CR about a number of items about their particular vehicle. They report on the vehicles cooling, suspension, transmission, engine, etc. For each vehicle, each one of these sub-classification, e.g., transmission, will have a distribution of numbers related to the “reliability” or happiness of the consumer. (CR’s data is obtained by polling information of their subscribers). To calculate the average CR would add all of the data and divide by the number of data points. To calculate the error of their average CR would (or does but who knows since they do not publish these results) calculate a root mean squared uncertainty. This uncertainty is found when each data point is subtracted from their calculated mean, then squared, then summed, then divided by the number of data points (or N-1), then finally the square root of this number obtains their uncertainty or standard deviation. Another possible method is to fit the data to a Gaussian then the standard deviation would just be the second moment of the distributions or sigma = sqrt[{x^2} - {x}^2]). (These should be averages in x, but I cannot place the “average” symbols here). In summary each item that CR reports, e.g., transmission or engine, should have a mean and a standard deviation associated with the mean.
For example if CR asks 4 owners of a Chrysler 300 to rate their suspension of their vehicle; rating the suspension from 0 to 10 they may be a variety of answers. Lets say for illustrative purposes the responses that they get are as follows
Person A may say 7
Person B may say 3
Person C may say 8
Person D may say 10
CR would report the suspension of the 300 to be (7+3+8+10)/4 = 7. To calculate the error of their reported average they would calculate {[(7-7)^2+(3-7)^2+(8-7)^2+(10-7)^2]/4}^(1/2) = 2.5 (If we used N-1 or 4-1=3 for the normalization constant, the error would be 2.9).
It would be wrong for CR to just report that the average Chrysler 300 customer rates their suspension of their vehicle as 7 out of 10. They must also report the variance of their data and state that the average 300 customer rates their suspension to be 7 +/- 2.5 out of ten.
Then CR must then give an average of all of the sub-classifications of each vehicle. (I take it this is where they rate or calculate the reliability of the vehicle). To accomplish this they will find an average of each of the sub-classification. To find their uncertainty on their “reliability” of the vehicle they must add the square of each of their uncertainties of the sub-classifications to find the “average uncertainty” for each vehicle.
For example, Vehicle A was surveyed and the average and uncertainty of each of the 5 classification or sub-classes was found to be the following (for illustrative purposes)
Engine: 5 +/- 3
Transmission: 7 +/- 3
Suspension: 6 +/- 4
Cooling Ability: 7 +/- 3
Noise Levels: 5 +/- 4
The average score for the overall vehicle would be (5+7+4+7+5)/5 = 5.6 (With equal weighting for each of the sub-classifications).
We wouldn’t add their uncertainties like this (2+3+2+3+4)/5 = 2.8 (This is wrong).
You add the squares of their uncertainties, then take the square root such that [(2^2+3^2+1^2+3^2+4^2)/5]^(1/2) = 2.898.
So the “reliability” of Vehicle A is found to be 5.6 +/- 2.9.
The same survey for Vehicle B might show:
Engine: 6 +/- 4
Transmission: 8 +/- 2
Suspension: 5 +/- 5
Cooling Ability: 7 +/- 3
Noise Levels: 4 +/- 3
Which would have a “reliability” if 6 +/- 3.5.
CR, in my opinion, would rate Vehicle B higher or more “reliable” then Vehicle A. Sure the average is higher along with the upper bounds of the error bars (6+3.5=9.5 versus 5.6+2.9=8.5) but the lower end of the error bars puts vehicle B at 2.5 (6-3.5=2.5) versus 2.7 for vehicle A (5.6-2.9=2.7).
However, it is my opinion that it would be silly or not prudent to say that Vehicle B is more “reliable” then Vehicle A due to the larger area of overlap between the error bars associated with each vehicle. A reduction in the error bars would be needed before a concrete discussion was made about the relative reliability of either vehicle. One may reduce the size of the error bars by increasing the number of data points.
In order to calculate the reliability of brands the average reliability of each of the brands vehicles should then be calculated. In order to accomplish this CR would have to make a decision on how to weight each vehicle in their calculations.
For illustrative purposes, if brand X produces 3 Vehicles A, B, and C.
Vehicle A has a calculated “reliability” rating of 5 +/- 2.
Similarly Vehicle B has 7 +/- 1 and Vehicle C has 8 +/- 1.
Assuming equal weighting for each vehicle, brand X would have an average reliability rating of 6.7 +/- 1.4.
However, if brand X sold 10,000 units a year comprised of 9,000 units of Vehicle A, 750 units of Vehicle B, and 250 units of Vehicle C. You would think that you should included the weights of these units into your calculation for the “reliability” of brand X. Assuming this, brand X would have a “reliability” rating of 5.23 +/- 0.69 (instead of the equal weighting of the vehicles found to be 6.7 +/- 1.4).
CR doesn’t state how they weight each vehicle when they calculate an average for each brand. Obviously this is important information since it dramatically changes the results.
In summary, I want to congratulate Michael Karesh of TrueDelta for finally creating a second option, relative to CR, for consumers to obtain information regarding manufactures of automobiles. His analysis has been open and honest. It contains the most BASIC ingredients, ones which are lacking in CR reporting, regarding statistical analysis of data collection.