How Do You Know if Your Doctor Is Doing a Good Job?

We’ve spent a lot of ink in this blog discussing how difficult it is to measure quality in the various US healthcare systems. One large-scale effort to measure quality is the “Medicare Merit-based Incentive Payment System,” or MIPS. MIPS is a big deal for health systems. Quality isn’t just for professional pride. The MIPS program has a significant impact on the reimbursement received by U.S. physicians.

Some of the surveys or questions you’ve undoubtedly had to answer in doctors’ offices the last few years are undoubtedly tied to their efforts to improve their MIPS score. MIPS rates physicians based on measures in four categories:

  1. Quality (30% weight), mostly in terms of clinical outcomes and patient experience. Doctors might be scored on the percentage of hypertensive patients who have their blood pressure controlled or the percentage of their patients who report a high level of satisfaction with their care.

  2. Promoting interoperability (25% weight), how well a physician uses technology to improve the quality and efficiency of their care. Measures in this category might include the percentage of patients using the electronic health record (EHR) portal or how many prescriptions are sent to the pharmacy electronically.

  3. Improvement activities (15% weight), how well a physician is working to improve her practice through activities like quality improvement programs.

  4. Cost (30% weight), how much a physician’s care costs compared to his peers. Think: the number of seemingly unnecessary tests and procedures ordered.

Because the work that, say, a psychiatrist does is so different from the work a urologist does, doctors who participate in MIPS may choose six of a possible 257 performance measures to report, only one of which must be an “outcome measure,” such as hospital admission for a particular illness. The others can be “process measures” like rates of cancer screening. Docs are given a composite MIPS score between zero and 100. To avoid a “negative payment adjustment,” (that is, a reduced fee) physicians must score >75, which seems high to me unless I frame it as a solid “C” grade. Also, 86% of the docs in the sample achieved at least that score, indicating that they either are good at gaming the system or that the score isn’t terribly difficult to achieve.

In spite of the massive effort put into MIPS by regulators, docs, and health systems, it’s unclear whether the MIPS program really reflects the quality of care provided by participating physicians. To investigate, investigators analyzed 3.4 million patients treated in 2019 by 80,246 primary care physicians using Medicare datasets (paywall). They looked specifically at five “process measures” like rates of diabetic eye examinations and breast cancer screens and the “patient outcomes” of all-cause hospitalizations and emergency department visits.

They found that physicians with low MIPS scores (<30) had worse performance on three of the five process measures compared to those with high (>75) MIPS scores. Specifically, the low-scoring docs had lower rates of diabetic eye exams, HbA1c screening for diabetes, and mammography for breast cancer screening. However, the lower-performing docs had better rates of flu vaccination and tobacco screening. In the “patient outcomes,” there was no consistent association with MIPS scores: emergency department visits were lower (e.g., better) for those with low MIPS scores, while all-cause hospitalizations were higher (worse).

Overall, these inconsistent findings suggest that the MIPS program may not be an effective way of measuring and incentivizing quality improvement among U.S. physicians. The “patient outcomes,” which I think most of us would be most interested in, showed no clear association with MIPS scores. In addition, the study found that some physicians with low MIPS scores had very good composite outcomes, while others with high MIPS scores had poor outcomes. Like every correlative study, there were outliers. This suggests that there may be other, more nuanced, factors at play that are not captured by the MIPS program that influence a physician’s performance.

The study is recent enough that we don’t have peer-reviewed criticism or hypothesizing yet about the potential mechanism of MIPS failure. But a blog post from Cornell puts it this way: “…there is inadequate risk adjustment for physicians who care for more medically complex and socially vulnerable patients and that smaller, independent primary care practices have fewer resources to dedicate to quality reporting, leading to low MIPS scores.” So, sicker patients going to smaller, independent practices may drag down results. Put another, more frank, way from Dr. Amy Bond in the same blog post, “MIPS scores may reflect doctors’ ability to keep up with MIPS paperwork more than it reflects their clinical performance.” For our comrades in Human Resources, I suspect this criticism rings especially true.

As the Medical Director of the Kansas Business Group on Health, I’m sometimes asked to weigh in on hot topics that might affect employers or employees. This is a reprint of a blog post from KBGH.

Wanting to Improve Is Not the Same Thing as Improving

Around 2013, I was diagnosed with pre-diabetes. Doctors fear dying of the diseases they know best: gastroenterologists of colon cancer, infectious disease specialists of sepsis, you get the idea. So, for the sake of my blood sugars, after a decade-plus of abstention due to medical school, young kids, and a growing medical practice, I decided to get back into racing bicycles. I dusted off my old cyclocross bike, aired up the tires, bought some Chamois Butt’r, and congratulated myself for my reentry into competition. But note what I did not do, which is put in the miles that it takes to be a competent, competitive rider. In my first race back, a fifty-mile gravel race around Sun City, Kansas, I barely finished. If not for the help of a fellow rider who felt sorry for me, I may not even have crossed the finish line. To say the least, I had not earned my smug self-congratulatory attitude going into the race. On the drive home, the Stuart Smalley voice in my head told me I wasn’t a bad person. He told me I was human. All I had done was give myself credit for wanting to improve when I should have waited to give myself credit for actually taking the steps to improve.

I was reminded of my past foibles recently when I came across a report in the British Medical Journal pithily titled Wanting to improve is not always the same as knowing how to improve. The authors described a quality improvement project in an English hospital that aimed to reduce the length of stay of patients after knee replacement surgery. Early in the project, the investigators decided that the method of anesthesia–light sedation, heavy sedation, local anesthesia, etc.–was the primary obstacle to getting patients out of the hospital quickly. Then they went through five (five!) different anesthesia protocols over the course of seven (seven!) years. In all that time, they didn’t budge patients’ length of stay. They didn’t show an effect on any other indicator of quality, like time until the patient first walked, the patient’s reported pain, or overall pain medication use, either.

But I’m willing to bet that in that seven years, the folks involved in the study were proud of their work on improving the project, in spite of what we can see now, with the benefit of time and perspective. After all, we give ourselves credit in a number of ways that we refuse to extend to others when we have some distance from the problem, even outside of quality improvement. I may have no problem recognizing how reckless another driver is when he blasts through a fading yellow light. But when I tap the accelerator to do the same thing, since I’m generally intending to be a safe driver, it never occurs to me that my actions, too, have put other people in danger. Princeton University psychologist Emily Pronin calls this the “introspection illusion.” Our distorted self-image, blinded by the stage lights of our own personal sitcom, sees our desire to be good and ignores the fact that our objective goodness might fall short.

This intersects with the “Dunning Kruger effect,” the demonstration that the more incompetent people are, the less aware they are of their incompetence (like a slightly chubby cyclist entering his first race in years, blind to his abject lack of fitness or preparation). The more we miss the mark on a given task, the more our estimation of our success departs from reality. Physicians are especially prone to Dunning Kruger.

So the next time you set out to improve a process in your work, avoid the mistakes made by our English friends. First, measure the outcome you’re interested in and don’t rely on an intuitive understanding of the issue. This may require pressuring your administrative consultants to help you get meaningful data. Then put some distance between yourself and the problem through “meta-cognition.” Instead of saying, “Let’s increase the number of employees getting their diabetes screenings,” say to yourself, “Here is a company [i.e., the company you work for] in which xx% of employees received diabetes screening in the last three years.” Instead of jumping to a presumed problem to solve, think of the environment that led to the outcome you’ve measured. As Don Berwick famously said, paraphrasing others, “every outcome is the product of a system perfectly designed to achieve that outcome.” At KBGH we do this through a process called Ishikawa Analysis, first applied in post-World War II Yokohama shipyards:

https://vanguardcommunications.net/fishbone-problem-solving/

Here, the company has divided inputs into Procedures, Technology, Patients, and People. But those are relatively arbitrary. You may find you have more or fewer input classes and that they’re more process or environmentally-oriented than the example.

Next, use that loose framework to talk to employees and the health care team about what is holding them back, what we call a “stakeholder analysis.” You may have gone into the problem assuming that employees are swamping their doctors with complaints of back pain or depression, based on claims data. But you may find instead that scheduling issues aren’t allowing patients to arrive at the lab fasting before work. This process is what led the folks in the study above to a breakthrough. In talking to staff, investigators realized that, in concentrating so closely on anesthesia, they had overlooked (for seven years!) other potential contributing factors, like patient expectations, limited staff, time constraints, and cultural factors like lack of staff ‘buy in’ to the project.

Eventually, I went on to finish and do well in many, many bike races. More importantly, I lost enough weight and stayed active enough to return my blood sugars to a normal range, where they thankfully remain. And I did it not by wishing my way to better performance, but by eliminating problem foods I knew I was over-eating, increasing my fiber intake to a pre-specified goal, scheduling time to ride, working on specific skills, and measuring specific outputs. But getting modestly faster on a bicycle is trivial compared to the challenge of improving the lives of our covered employees.

As the Medical Director of the Kansas Business Group on Health, I’m sometimes asked to weigh in on hot topics that might affect employers or employees. This is a reprint of a blog post from KBGH.

Do Online Physician Ratings Actually Help?

Toward the end of my full-time clinical career, I attended a speech by a physician who encouraged doctors to “own” their online personas. He said we should actively manage our social media presence, our clinic websites, and our ratings by third-party sites like Angie’s List and Yelp. Against my instincts, I took his advice and Googled myself. Reader, I don’t mean to be histrionic. Many factors contributed to the end of my clinical career. But that innocent internet search did not, to put it lightly, make me excited to show up for work the next day:

Screen Shot 2021-04-13 at 3.51.26 PM.png

I don’t share this anecdote as a bid for your pity. My experience with online ratings represents a tiny fraction of the “feedback” that a politician or a college football coach gets daily. I share the story as an entree to a question: do online physician ratings accurately reflect the quality of care people receive? If the ratings are accurate, then we should encourage our employees to use them. If they’re inaccurate, we should encourage employees (and practitioners) to ignore the ratings.

This is no idle inquiry. Some studies have suggested that up to 60 percent of patients consider online reviews important in choosing a provider. A recent national survey (paywall) of Americans aged 50 to 80, the heart of an internist’s practice like mine, revealed that more than 40 percent had looked up a physician’s rating for themselves at some time in their lives. Women, people with higher education levels, and (predictably) people with at least one chronic condition were more likely to have looked up a physician rating. The investigators in the recent study looked at several factors contributing to how prospective patients chose a physician, and online ratings came in only ninth, behind factors like “accepts my health insurance” and “convenient office location.” But the physician’s rating was still considered important almost as often as word-of-mouth reputation among family and friends, consistent with the results of smaller surveys.

But the ratings themselves are less influenced by clinical outcomes, like death, infection, or well-being, than they are by the patient’s experience. As we’ve blogged about before, denial of a patient request, especially for pain medications or lab tests, results in a dramatic decrease in patient satisfaction. That is surely poison for an online rating, regardless of the appropriateness of the denial. A very sophisticated study of dentist ratings showed that things like wait time were strongly associated with higher ratings, while raters barely mentioned clinical outcomes like infection or tooth loss. These experience-centric ratings may also reinforce biases that we already know exist. One study showed that, globally, male surgeons were rated higher on technical skills, while female surgeons were more highly rated for interpersonal skills.

It’s hard to tell if the ratings correlate with those harder clinical outcomes. A study of orthopedic surgeons’ online ratings found no correlation between ratings and total knee replacement outcomes. And one study found that the design of the rating website itself, like the presence or absence of advertisements for other doctors on the page, affected the quality of the data. But there is a hint of better outcomes in certain situations. A retrospective study showed that patients who had hip replacement surgery at hospitals highly ranked on physician rating sites did slightly better than patients at lower-ranked hospitals, for example.

If we can draw any conclusions from this muddled body of research, it seems that the most important lessons are, first, patients should understand the limitations of online reviews. A negative review of a highly skilled oncologist who has a gruff bedside manner may obscure the fact that his staff has experience in steering patients into clinical trials that may help complex cases. His staff’s skill may only be known by other providers. And second, doctors need to learn to use their online reviews as a source of quality improvement data. Someone who gives a doctor a lousy review may well have a valid complaint. The patient experience in American healthcare hardly has a sterling reputation. Instead of simply bristling at negative reviews, doctors should use the reviews as a tool to enact positive change.

As the Medical Director of the Kansas Business Group on Health I’m sometimes asked to weigh in on hot topics that might affect employers or employees. This is a reprint of a blog post from KBGH.