Consumer Reports reliability ratings for cars and trucks: are they reliable?
Consumer Reports response rates
A high response rate is the key to validity; survey findings are questioned when a low percentage of the population answers. So how many people respond to a Consumer Reports survey?
“Of over 4 million questionnaires sent this year, the magazine received responses regarding about 480,000 vehicles,” wrote Detroit News. If most people reported on two cars (because most families have two or more cars), that would put the response rate at 6%. Even assuming one car per family - a highly dubious assumption - we have a tawdry 12% response rate.
Response rates can be boosted with rewards, e-mail reminders, and follow-up cards; or a survey with low response rates can be “verified” by using followup calls or other measures to ensure that the non-respondents would not have given different answers than the respondents. This costs money, but when you have the world's most influential auto reliability study, investing relatively small amounts in validity makes sense.
Differences in the nature of car owners
People who buy different car models also maintain them differently, according to various studies. As an example, American Honda clean their garage floors much more often than buyers of American cars. Can this meticulous approach to garages also affect maintenance? Does that affect reliability? (From an 8/27/97 Detroit News article, via Bob Meyer).
What causes a person to buy a car might also cause them to change the transmission fluid frequently, or not at all. This may result in different reliabilities.
Those who select from a manufacturer also are likely to have different driving characteristics than those who select from another manufacturer. Some people drive their cars more aggressively than others, which may wear them out faster. Some vehicles, e.g. Jeeps, are more likely to be used and abused off-road, resulting in earlier suspension repairs.
John Greenstreet: "the CR survey may over/understate the reliability of certain cars because the people that own them are not homogeneous. ... many people will have a subconscious need to justify their purchase of a Japanese auto over of a domestic one, and they could do this by believing superior reliability is the reason they bought it. Because of cognitive dissonance, they would tend to overlook or downplay anything that would attack this mind-set. We do see many people who vehemently defend Japan's cars' reliability and smear that of others."
All car owners are not alike, and they can have personality traits that directly influence their choice of vehicle, their vehicle expectations, and how they subsequently treat and maintain their cars. Consumer Reports does not control for this kind of systematic error in their surveys. (As far as we know, nobody does, so it's more of a "heads up" than a criticism). Perhaps a statistical analysis one year could accompany the reviews with a footnote given when reviews are quoted.
Note that CR has in recent years lumped together siblings sold under different labels to give the appearance of validity.
The oil debacle
In July 1996, Consumer Reports tested motor oils for their readers, but instead of using normal cars, they used New York City taxis, which are normally run 24 hours a day and never allowed to cool down - which means that the most strenuous test of motor oil, the cold start (which causes most engine damage), occured rarely, if ever, during their testing. They found no difference between any of the motor oils, from the cheapest to the best synthetic, and concluded that all “natural” oils are interchangeable, but that synthetics still hold an advantage for some drivers. The idea that the research was meaningless because their research methods were horribly flawed was not brought up; nor did they go to the natural conclusion that if they couldn't tell the difference between Mobil One and the cheapest oil on the shelf, they probably couldn't use that research model to tell whether individual natural oils were different in quality. More on synthetic oils
Lumping cars together in the car ratings
Different drivetrains have different reliabilities -- CR often lumps them all together. (Now they are also combining "corporate twins" to hide the anomalies of years past). Standard and Grand Caravans are listed in the same category, despite the very different repair histories of the two transmissions and the different engines. They separate some engines but lump the 3.0 V6 oil-leaker in with the more reliable 3.3 engine.
David Ta wrote: "I'd expect CR to point me to those unique problem(s) from different make-model-year combination. Those CR reliability reports, regardless how they were done, did not reveal those problems. For example, 6 months ago, I noticed a bunch of postings on [Japanese SUV] problem of blown head gasket, within the first 70k miles. I checked CR report on [vehicle] and compared to CAA report on the same make-model-year. Not surprisingly, CR reported a [best rating] under the "engine" category. And surprisingly CAA reported the same make-model-year a "much worse than average" two red-dots for that category."
Lloyd says that unreliable options or components are sometimes pointed out in the ratings. This is indeed true, with an emphasis on "sometimes." We have to trust Consumer Reports on that, which we'd rather not do, considering what they do not tell us. WhatCar? used to have a very consistent approach to this; and WhatCar? also used to point out what the manufacturer had done, or had not done, to solve the problem. (WhatCar? used actual repair records from vehicles leased in the UK.)
Setting expectations in car and trucks ratings
Will Mast said that Consumer Reports' harping on some cars may sensitize owners to existing issues. For example, people who never noticed "bad" shifting may suddenly "see" problems where none existed before; while those who experience similar issues on a "good" car will not see anything.
This is related to a common problem in psychology: people who volunteer normally have a high need for approval. Therefore, they may try to to bring their experience into line with what Consumers Reports seems to want. Consumers Union provides clear "demand characteristics" - we know what cars they believe are the best. The research on this topic indicates that people will change their perceptions to match what they think the experimenter (in this case Consumer Reports) wants.
We should note that these issues are certainly present in car reviewers, whether they work with CR or not. We've suggested blindfolding testers for the ride and noise evaluations, and covering up internal badging and identity cues (and blindfolding testers on their way to the cars) as ways to help avoid this bias. Yes, some cars will of course be clearly recognizable anyway; but the ride and noise would be thoroughly unbiased, if opinions were given before the blindfold came off, and eventually perhaps the other evaluations would become more fair. Consumers Guide says they're going to work on the second idea; we hope Consumer Reports will better them.
Note that CR has in recent years lumped together siblings sold under different labels to give the appearance of validity.
Consumer Reports never defines "serious"
People who are inclined to buy different brands may define "serious" differently (see above). If you've never received a survey, ask a friend who subscribes to see theirs before they return it (if they return it). You will notice that Consumers' Reports really doesn't say what a "serious" problem is. I believe should define it or say "any" problem.
This was evident in reactions to the problem of sludge in the engines of many Toyotas - a problem which Toyota, to its credit, eventually admitted and acted on. The Corolland forums were full of people claiming the problem was not real but simply in the minds of those who claimed they had it; and if was real, it was the fault of owners and not Toyota. We doubt they'd feel the same way if, say, Neons were victims of sludge. [Parris Boyd wrote: “over three thousand customers have now signed an online petition alleging that Toyota refuses to comply with the terms of the sludge settlement.”]
Routine maintenance varies by vehicle
Jim Eldridge essentially wrote this for us by example:
I have a 1985 Dodge Daytona that has 135,000 miles on it. It runs great. At about 85,000 miles the timing belt broke, stranding my wife. The maintenance schedule says nothing about replacing the belt. Dodge thinks it's OK to wait till it breaks and then replace it; the design is such that it does nothing bad to the engine. However, to my wife, the car broke down and had a "serious engine problem." [Note: the manual actually does suggest replacing the belt at 105,000 miles.]
My friend with a Nissan Maxima just had his 60,000 mile maintenance at the dealer. He had the timing belt replaced, the fuel injectors cleaned, oil change, etc. and a fuel injector replaced. Cost, $850! If he filled out the CR form, he would show no major problems, just routine maintenance.
He then told me he was considering replacing all of his shocks because "it was about time." No Dodge owner would ever consider replacing shocks before the car bounced down the road. All Dodge had to do was recommend the belt change at 60,000 miles to avoid a "serious engine problem."
Will Mast said, “A friend with a Toyota used to brag about how trouble free it was until I showed him all the repairs, including a cracked exhaust valve, that were hidden in his 30,000 mile "maintenance" visits to the dealer.”
The solution is to get far more specific - and perhaps, to be really careful, to find out something about owners' routine maintenance.
We do not think a sample of two people is significant. These are illustrations of a general principle.
Consumers Union's self-selected sample
Those who send in their surveys are different from those who do not. Most studies try to raise their response rates through follow-up calls, letters, even post-cards. Many studies check on the characteristics of the nonrespondents to see what the error might be. Consumers' Reports does neither of these, as far as I know. Brent Peterson wrote a wonderful simile:
[A controlled experiment could use 30 carefully bred rats in cages]... A survey would be like having 100 lab rats starting the experiment and then letting them roam freely around the building with access to doors leading outside. Then measuring those who came back for dinner in their cages at the end of the experiment. Say 8 rats returned, you do not know what happened with those other 92 rats that escaped. ... [we presume they have different characteristics than the two that returned, just like people who do not return surveys are different from those who do. Note that we've adjusted the example slightly to reflect Consumer Reports' apparent response rate].
Raymond DeGennaro II pointed out that
CR does not draw their data from the general public, only from subscribers....They have to prove that their data represents the general public, and they haven't.
The solution here is to get a larger non-subscriber sample and compare the results every ten years or so, if there's no difference.
Splitting hairs in the auto ratings
In some cases, it seems that the difference between an "average" vehicle and a "better than average" or "worse than average" vehicle is quite small, especially considering the actual number of people reporting problems. This opens the possibility that it's just random chance. We can't tell because they don't report standard deviations. We believe that there should probably only be three categories, considering their research methods: low, average, and high. Borderline cases can be footnoted or described in text.
The new bar charts that show exact standings (rather than lumping into categories) are a wonderful alternative, but really should have error bars to show whether, say, the difference between the Civic and the Neon, or the Corolla and PT Cruiser, are greater than the sampling error. But that might make their ratings look absurd... which in itself would be valuable information.
CR not reporting all needed information
Consumers Reports does not report the number of people responding to each item, or the standard deviation. Supposing we found out that the standard deviation was fairly large - we could not reliably differentiate an above average from a below average rating? Perhaps they should use fewer categories - "above average," "average", "below average," where "above average" took the place of today's "much better than average."
What is the difference between an average and below average rating, in terms of actual owner ratings? Would five owners reporting "serious" problems cause a car to get a black dot instead of a red dot? How many does it take? When will they tell us?
Incidentally, what is the sampling error for each model? We really need this to know whether there is really a difference between two models we are considering.
No advertising does not equal no bias.
Consumer Reports' ads imply that they have no bias. Their articles prove otherwise. When they say they are unbiased because they do not accept advertising, think about their logic for a moment. Is the Dodge Dart enthusiast site unbiased because it does not have advertising? Our ideas on reducing bias are shown up front - essentially, try to make sure testers do not know which car they are testing (to avoid self-fulfilling test results) and also to keep an eye out for bias. Our friends at Consumers Guide could do that better, too - they seem to have a need to write "but not up to the best of the European/Japanese imports" at the end of every American review. Well, some of the imports aren't up to the best of the Americans - but we never read that. (If you still believe Consumer Reports, take a look at Mercedes quality ratings, and tell me about "the best of the imports!")
For that matter, we can look at their February 2005 issue, where they call the interior of the Caravan "plasticky" (no more than the Sienna in our experience); and, as "Grim" said,
[Regarding a comparison of the Acura TSX and Volvo S40], seeing them trash a car from a company I don't like confirmed their bias more than seeing them trash a car from a company I do like (where I might be biased myself). For example, in a one page review, they said five times that the Volvo had unacceptably tight rear legroom. This despite the fact that in the objective measurements published on the next page, the Volvo had as much legroom as any other car in the comparo (there were four) and more than most...They also call the Acura's gas mileage "good," while they call the Volvo's "acceptable." That's interesting, since they get the exact same mileage and the Volvo gets it on regular gas rather than premium like the Acura. They also ding the Volvo a couple of times for sluggish acceleration, despite the fact that it's only two-tenths slower to 60 than the Acura (which was "good" and "peppy"). Two-tenths falls well within the range of measurement error.
John Phillips wrote: "A few years ago, they had the [2 domestic nameplates and one foreign nameplate all of the same car] owner's satisfaction. The [domestic nameplate] had the least owner satisfaction of these three. Next was the [other domestic nameplate]. The best owner support was for the [foreign nameplate]. There was a fair spread between them. Funny thing: all of these are built at the same American plant, only varying, primarily, in "hood ornaments." How can the same car be perceived differently when the only real difference was the label?"
Chris Jardine wrote:
I've noticed a number of occasions where data they have presented simply CANNOT be correct. Example 1 - a few years ago I looked at their reliability chart for the [car and car with another engine]. They claim that exterior fit and finish was [good rating] on the [one engine] and [terrible rating] for the [other engine] . This translates to a 4 and a 1 on a 1 to 5 scale. Since these vehicles were produced by the same workers, tools, raw materials, etc it is not possible for this to happen! I could buy a difference of one but not three between the two. A short statistical analysis lesson would be appropriate here. You can expect a variation of one when working with something like this. If you see the deviation that you do here you simply have not sampled the data properly! This is basic statistics. If this difference came in something that was not common to the two, like the engine, cooling system, transmission, etc. I would be able to accept the variation as correct. However, there is no way that this deviation from one to the next can occur with common items to the two.
Example 2 - [same cars, different nameplates]. There were major differences with the engine, electrical, fit and finish, etc. between these two. The only difference between them was the name plate applied near the end of the assembly line and a code in the VIN. There were differences in standard levels of equipment, but, that should not statistically effect what CR would have us believe it did. This is another case of improper statistical procedures.
For these reasons, I for one simply cannot believe much of anything CR prints as statistical data.
(Webmaster note: the reliability differences could have based on different types of people buying each car, and treating them differently. If we generalize from this, are any Consumer Reports ratings worth looking at? Can we really compare a "sporty" car with a regular sedan, or cars in different price classes? Or even cars in the same "general" price class but with a couple of thousand dollars' difference in price?)
- Since the time this section was written, CR has "solved" (we would say "hidden") the problem by merging statistics for under-the-skin-twins. That makes it harder to criticize them, but does nothing to solve their underlying validity issues.
Other Consumer Reports auto ratings gripes
Steven Lee posted: (Edited for length)
...a good survey should not allow responses to be optional. Phone surveys are good because it is harder for the surveyed to decline to respond. Surveys that solicit responses through magazine inserts, web pages, open forums (e.g. TV or newspaper ads requesting responses to a post office box) generally range from horrible to completely useless, regardless of the sample size, because only a small proportion of those who read the surveys eventually respond; they are susceptible to the "fail safe syndrome," where only those with the "expected" responses end up responding. Most of the "failure" cases would never get reported.
A magazine once conducted a survey of "unhappy marriages in the United States" using postage-paid inserts. A large portion of the responses reported unhappy marriages, so most marriages in the U.S. were concluded to be bad. To unsuspecting readers this survey would be believable because the magazine collected thousands of responses. However, one can quickly realize that those with unhappy marriages were much more likely to respond. People with happy marriages would probably dismiss such surveys as pointless or media hype. The sample size, regardless how big it was, became irrelevant [because the response rate was so low].
As CR conducts their survey with voluntary responses, the conclusions are probably worthless. People with problematic [car make] cars would be more likely to complain and whine about the expected [car make]'s lack of reliability and swear that "they will never buy another [car make] again." People with lemon [other car makes] would be more likely to keep their problems to themselves because they don't want others to know that they were unlucky [or at fault themselves - Webmaster] to own lemon [foreign country] cars because "[foreign country]'s companies don't make lemons."
Surveys aren't just all about math! Techniques count more!
Eric Bechtol wrote:
CU never figures in the "periodic maintenance" required by the dealer. Some Hondas require the dealer to repack front wheel bearings every 15K miles. Also, with the solid lifters, they need to be adjusted at certain intervals [Honda has switched to hydraulic lifters, one of the last companies to do so, since this was written in the late 1990s].
My brother is a Honda guy. He has had to have his gas tank, exhaust system, and AC condenser all replaced on his 1988 Honda, all due to rust. Well, my 1993 Spirit has been driven in the same weather and I have not had to replace any of these because of the superior rust prevention and stainless steel exhaust. All of my repair parts on the car have added up to less than $150, including installation! Does CU give credit for this? No, because the owners will say "well, mufflers should go out at 140K miles." Funny, mine never did. Also, just the muffler was over $100 from the dealer. He stated that he could run his engine much longer than mine and that may be true, however, I can buy a rebuilt head or even an engine for the price of his repairs and maintenance. [This story is meant as an illustration, for those who prefer concrete explanations to abstract concepts, not as proof.]
Summary - Consumers Reports reliability ratings
Everybody needs to understand the limitations and bias in their information.
CR needs to tell all their information and be skeptical about their own methods. They need to report more about their methods and formulas, and any problems they can see. They also need to address any other problems noted above.
I have been very disturbed by the quality of some recent reviews, and no longer trust their performance figures. Be careful!
I sometimes get e-mail saying things like "I had a Dodge and it stunk, so CR is right and you're just full of sour grapes." Hey, I don't make up the statistical and logical rules, I just report on them. It's my job to know this stuff. If you had a bad Dodge...well...you're not the only ones, there are plenty of lemons made by all the automakers. That's why we don't use sample sizes of one. No statistic can be used with n=1. [also, lately, Chrysler has been scoring high - and we're keeping this page up.]
The more rare a breakdown is, the more units are needed to get a reliable figure. I wouldn't want to publish any auto reliability data using fewer than 100 units.
Lloyd Klein wrote:
I have been a reader and later a member of CU since my college days, about 1960. The shortcomings of the "Reports", which you mostly capture, have been apparent to me for nearly 40 years. Never-the-less I remain an enthusiastic supporter of CU, simply because in many areas, they are the "only game in town"; while in other areas they complement the leading reviewers. (Cars, photography, audio and equipment, and computers, being some areas that interest me keenly, and for which other good sources of review are available.)
I am pleased that you have taken a formal approach to pointing out, to CU and the world, that there is definitely room for improvement. Although, you sometimes only hint at the direction in which improvement is needed, rather than providing a useful road-map. Your criticisms are mostly well taken, even when not well stated, and ought to be treated seriously by CU.
Bias is a universally human characteristic, even among, those of us who consider ourselves to be scientists and have received training on how to avoid bias in our research. Your web site illustrates a lot of bias, while Consumer Reports illustrates noticeably less in my opinion. However biased or not, both of you have an important role to play, and ought to play it to the best of your respective abilities. CU can definitely do better, not just on the basis of eliminating bias, but in the quality of their reports, which carry both ambiguity and outright errors; more so recently (15 years) than in the past. I can often identify the existence of an error, while reading the reports, even before doing the research that tells me the extent of the error. But I don't subscribe to the notion that 'the critic or her critique must be perfect', rather, all I require is a rigorous and honest approach. By that standard both your Web site and CU's reports meet my criterion.
It CR is nothing more than subjectiveness when they publish their annual recommendation. What can be a better source of determining quality? Seek out the numbers from an unbiased source: NHTSA.
When Jim Press left Toyota, after pushing the creation of Lexus and placing Toyota on the global manufacturing map, he knew it was a matter of time when Toyota begin the slide in quality and ended up like GM. Growth spurs quality problems. Jim mentioned last week in a short interview how Toyota was striving to cover up the mistakes in QC and QA and focused on numbers and profit.
After an initial review of recalls, investigations, and TSBs, Toyota changed from its normal 5th position as the nation’s most problematic brand to the (2010) number one spot, beating out Ford which held the spot for two straight years as the worst brand on the market (in terms of recalls and investigations).
CR gave Chrysler strong recommendations in the mid-Sixties very nearly across the board, and A-bodies continued getting favorable reports through the end of production. Their ratings of B- and C-bodies started plummeting in the late 1960s (say 1969 or so) but I can't remember which they started trashing first. The Volare and Aspen initially got high recommendations based on goodwill generated by their predecessors, but that didn't last.
Ed from PA
Your info about Consumer Reports inconsistent ratings is on the mark. My father treats the CU ratings as gospel, so when we bought a new 1996 Dodge Grand Caravan, he almost fell over. He showed me all the poor CU ratings for the Mopar minivans then sat back and waited for all the big repairs we were supposedly in for. Never happened. We owned that van for 8 years and with two boys, used it hard. It got an oil change every 3-4k miles and my wife kept up with the in and out appearance. We only sold it because we needed a small SUV for the 4wd and so my wife could pull our son's dirt bikes if needed. Ironically we ended up getting an Isuzu Trooper, a vehicle CU declared unacceptable. It's been a year and this vehicle has been equally as outstanding. If I understand correctly, all SUVs are at risk of tipping but CU went after Isuzu with a vengence- which is probably why Isuzu is all but out of the passenger vehicle business. I have a hard time believing these ratings are impartial, when for over 20 years this magazine insisted we should all be driving a Toyota or Honda.
Great article. I lost confidence in Consumer Reports when they refused to investigate - or even expose - the large number of engine failures in MR2 Spyders, even though two major car clubs had been screaming about the issue for years. I've been blogging about Toyota for quite some time, featuring further details on these matters, along with links to the petition and the Spyder club websites.