Indoor rock climbing is a lot of fun. It’s challenging and demanding in all of the best ways. It takes strength and coordination and endurance. It demands mental clarity and concentration. You get to feel strong (sometimes), and weak (most of the time), and if you stick with it you get to see yourself getting stronger and progressing to harder climbs. And best of all, you get to indulge your “childish” joy of climbing on things.
As fulfilling as it is, however, you do quickly realize that what you thought was fit enough, really isn’t. And while down on the ground you may think that that extra 10lbs around the middle isn’t such a big deal, when you’re hanging by your fingertips 40 feet off the ground, you discover that gravity has a very different opinion on the matter. (If you don’t believe me that 10lbs is more than you think, go to the supermarket and pick up a 10lb bag of rice and carry it around for 20 minutes.)
Obviously there are all kinds of diets and supplement regimens to choose from when you’re trying to lose weight, but the simplest of all is CICO (calories in/calories out). You don’t have to torture yourself with keto or veganism or atkins or intermittent fasting or any of the other bajillion ways of indulging your culturally-induced orthorexia. Basically, you eat less food than you need (calories out > calories in) and keep track of your weight. That’s it.
Weight-tracking is important because losing weight is the point of trying to lose weight, so you want to see whether you’re losing weight or not (duh). The other, less obvious reason is that keeping track lets you actually see your overall progress. Safe and healthy weight loss takes a long time, and so won’t become visible in the mirror until you’re a few weeks in. Using some method of tracking helps you hold the belief that your efforts are actually working.
This being the digital age, there are a number of apps out there which take this a step further and automatically generate graphs to make things even clearer (I use Libra). There’s a snag, though. However well-balanced and consistent your new weight-loss diet might be, your day-to-day weight is going to fluctuate, sometimes wildly, and even with a graph it can be hard to tell if and what kind of progress you’re making.
This variation is mostly because of your hydration level. which fluctuates according to how much you drink, sweat, pee, bleed, or cry. Water is also really heavy, weighing 1kg (2.2lbs) per liter. So you can “put on” half a kilo (1lb) just by drinking a big cup of coffee, and “lose” it again a couple of hours later when you pee it all out. So if you drink a bunch of Fresca and then hop on the scale, you’re going to get an artificially high reading.
This fluctuation adds a bunch of uncontrolled “noise” to your data which hides the trend you’re looking for behind a bunch of random ups and downs.
There are a number of practical tricks to managing this noise, like controlling the conditions under which you do your measurements. This means always using the same scale, weighing-in at the same time every day, and with the same amount of clothing on (i.e. none). This makes the best time to weigh in, for most people, before breakfast and after your morning trip to the bathroom. That way you establish reasonably consistent conditions for your weigh-in, meaning that you can always trust what the scale is telling you. But you’re still going to have a good amount of variability no matter what you do.
So, once you’re sure that the problem isn’t some variable in your data-collection methods, you can make the forest a bit easier to see by getting rid of some of the trees. Which is to say, the variability can be managed mathematically. Libra, for example, shows you all of the data points you’ve entered, but it also gives you a statistically-averaged trend-line which smooths out the data and lets you see what your progress actually looks like (left chart). It also gives you a projection of what your progress will look like and when you should reach your goal weight (right chart).
This is all great and wonderful, but there’s another snag: you start to get attached to your nice, downward-sloping projection, and any upward change can be very upsetting. This means that if you have a day when you drank a little too much Fresca, or had a moment of weakness with a box of cookies, and your weight suddenly spikes (left chart below) you might suddenly experience an almost insatiable urge to delete the problem entry so that you can get your nice, downward line back (right chart below). It’s not really dishonest, you tell yourself, it’s just one value, just a bit of noise in the data, so what difference does it make?
(I didn’t actually delete the point, by the way. But for the sake of the example we’ll pretend I did.)
The problem is one of principle; my desires and expectations are now starting to influence my data management. I am deciding which data to keep and which to throw away on the basis of how that data makes me feel. Not very scientific.
We all want to see our hard work come out with a happy ending, and the effect this has on our behaviour is called Optimism bias. Basically, optimism bias makes you unconsciously change the conditions of your testing or your analysis so that you get the results you want. This is a problem because scientific research isn’t about what you want, it’s about trying to discover what is objectively true. In fact, optimism bias is literally exactly why the scientific method was invented.
And compare the projection with the ugly data point and without it. See the difference when the unwanted 87kg value gets deleted? The projected trend drops downwards dramatically, and goes from basically horizontal to a much more pleasing downward slope. It also changes the projected completion date from August 2028 to April 2021.
I should say that deleting this single point won’t have a huge overall effect in the end because its influence quickly gets wiped out by the values that follow it (as you can see in table 1). Nevertheless, by deleting that point, I am effectively lying to myself and the rest of the world to make myself feel better.
What’s the big deal? And wait, what on earth does this have to do with neuroimaging?
Well, an fMRI experiment is doing essentially the same thing as my little weight-loss study: tracking the observed change over time in a fixed physical property under controlled conditions. The difference is that an fMRI tracks that property by taking a 3D “picture” of about 240,000 data points, once a second, for a few minutes at a time. Each participant is scanned this way multiple times in a session, and the study might need to scan as many as 100 people’s brains in total. That is a lot of data.
An MRI machine takes its “pictures” by measuring the magnetic response of the different kinds of blood that is flowing in the brain. Fresh, oxygen-rich blood is more responsive than used-up, oxygen-poor blood, and so the MRI machine can track which kind of blood is where. The basic idea is that the harder a part of the brain is working, the larger the amount of oxygen-rich blood that part will need. So the images we get of pretty multi-coloured brains have been filtered so that we’re seeing only the activity of fresh, oxygen-rich blood.
Once the scans are complete, sophisticated graphics software converts this information about where the fresh blood is into a colour-coded 3D “movie” of the subject’s brain activity. So the fascinating images we see of multi-coloured brains are actually only a single “frame”of the person’s brain frozen in time.
The cost and complexity of this technology means that the researchers who conduct fMRI studies have to put a huge amount of thought and effort into designing their experiments and controlling the conditions of their testing environment. But no technology is perfect, especially not one which is trying to measure something as subtle as this, so there is going to be some noise.
And because an MRI machine is so much more complicated than my little experiment, it also has many more sources of noise; some of it is because of the machine, some of if is because of the way our brains work, and some of it is because of the physics of atoms, molecules, and cells.
The most important kind of noise in an fMRI comes from the brain itself. Our brains are immensely complicated, interconnected system which are constantly processing multiple streams of information at the same time. If you ask someone to sort a deck of cards, a part of their brain is sorting the cards, but another part is thinking about dinner, and yet another is thinking about how boring it is to sort cards. All of this happening at once makes it is impossible to guarantee that your results won’t be affected by the by-products of some other neural process.
As mentioned, noise is extra information that you don’t need, and that makes it more difficult to clearly see what is going on in a given data-set. There are various ways of getting rid of noise, but each involves a decision about which data is important and which isn’t. And the problem is that no matter how well-trained and diligent they are, the researchers who are making these decisions are just as prone to optimism bias as I am.
Even this wouldn’t be a that big deal if everyone had the same goals, made the same assumptions, and took the same decisions when analyzing fMRI data. At least then everyone could be sure that even if the image is incomplete, they would all be looking at the same thing in the same way.
But there’s a snag. They don’t. There is no standardized set of assumptions or statistical methods for handling fMRI data. Every single person does it differently. Which means that each of their different sets of assumptions, expectations, and, sadly, financial needs of the person or team plays at least some part in how the data and analysis come out. Basically, this means that the same set of data can and will be interpreted totally differently depending on who’s doing the math and what results they want to see. And so there is a real possibility that actually important information might be mistaken for noise and get thrown out.
In my case, there was no harm in this at all. No one is conducting medical diagnoses based on my findings, or creating governmental policies. It doesn’t cost me any money and it won’t have much of an effect on my career or anyone else’s life.
Does it really matter?
But what then if the data being gathered cost tens of thousands, or even millions of dollars to collect? What then if its analysis and the conclusions drawn from it will be used for medical and psychiatric diagnoses for millions of people, or new government policies which would affect funding for healthcare or education or mental health?
What then if getting a positive result was vital to a researcher’s future as a scientist or as a professor at a university? Would it matter then that they fudged the numbers to make the colourful brain look pretty? Um, yes. It would definitely matter. Like, a lot.
(At this point, I should say that while there is a small number of cases in which statistical manipulation is used for selfish or malicious purposes, the majority is done with the best intentions and is rationally well-justified. However, that change the fact that people’s hopes and expectations from influencing experimental outcomes.)
Remember how I mentioned the financial needs of researchers? Well, unfortunately, funding is one of the major drivers of optimism bias. Aside from the natural human desire to succeed, there are two reasons for this: 1) the more papers you have published, the more likely you are to find work as a researcher, and 2) publishers don’t like to publish negative findings. This means that in order to be “successful,” or even to find a job to pay the bills, researchers always need to produce positive results. The resulting effect on behaviour is called publication bias and it unconsciously encourages scientists to manipulate their data so that they always get positive, publishable results.
One of the challenges of dealing with optimism and publication bias is that they’re difficult to demonstrate. Much of the time, a single study is conducted and never replicated, and the statistical analysis is done once and assumed to be correct. On top of this original data themselves have until recently never been published, and the conditions of most experiments are at best poorly described. This makes it extremely difficult either to re-do the research and compare findings, or to even re-evaluate the original data. And so, because there’s nothing to compare it to, we just have to assume that everything is alright.
But, when you do go back and try to replicatea lot of these studies, you often find that you can’t. Either the data you get looks completely different, or you come to a completely different conclusion. This has led to what is known as the replication crisis, which has thrown a huge shadow over much of social psychology, and psychological research in general.
But fMRI is different, right?
Because of the tightly-controlled conditions of MRI technology and the fancy, high-resolution images it generates, MRI is generally assumed to be more accurate and less vulnerable to bias and incompetence than other experimental methods. In spite of this, there have been some major issues which have called the technology itself into doubt, like a programming glitch discovered in 2016 which supposedly invalidated a total of 40,000 fMRI studies. (Ironically, the study which uncovered the glitch seems to have been deeply flawed itself, and its authors have been accused of over-inflating the number of affected studies – suggesting that it was deeply influenced by optimism and publication biases.)
And on top of this, there is the aforementioned fact that there is no standardized method of noise reduction and statistical analysis of fMRI data.
Ask 70 doctors and you’ll get 71 different opinions.
To investigate this problem, a team of researchers gave the same set of fMRI imaging data to 70 different teams for analysis. Each team was given a set of 9 hypotheses which they were asked to confirm or disconfirm and no specific methods of analysis were required or suggested. All of the teams were members of or affiliated with accredited institutions.
When they compared the analyses of the 70 teams, they found something very disturbing. Not only did every one of the 70 teams do their analysis in a different way, there was almost no consistency between the results each team returned. Even when teams did use similar methods, they ended up with different conclusions. And, most importantly, many of the teams said that their results were clearer and more meaningful than they actually were.
What does this mean?
This lack of consistency in any part of the analysis is quite troubling. It means, basically, that the conclusions drawn by every fMRI study that has ever been done are now in doubt. This doesn’t mean that they are wrong, necessarily, but it does mean that we can’t prove that they’re correct either. And judging by the results of Botvinik-Nezer’s study, every single one of them would have given different results if they’d been conducted by different people. This is very, very not scientific.
Sadly, Botvinik-Nezer’s study is not alone. It is just one of a number of findings which suggest that the images and conclusions which come out of fMRI studies are at best misleading, and at worst completely wrong. There are so many influencing factors involved, and so many different ways of analyzing the data, that it’s getting harder and harder to trust the conclusions drawn by fMRI research. Optimism and publication biases, sketchy assumptions about how well fMRI technology actually measures what it’s supposed to measure, and the basic statistical issue of how to calculate results – all of this requires a lot of blind-eye turning and assumption-making. The prevention of which, again, is literally exactly why the scientific method was invented.
Unfortunately, it’s impossible to predict what the outcome of all of this is going to be. It could be that researchers develop actual standards of practice and analysis which will solve the current issues. It could also be that the whole technology will have to be reconsidered, and perhaps abandoned altogether. Perhaps most optimistically, fMRI might prove to be a stepping stone to a new, more accurate, and trustworthy brain scanning technology.
As a non-psychologist and non-scientist, the main takeaway is that all psychological research, especially fMRI research, needs to be taken with a much larger grain of salt than it generally has been. Our beliefs about its infallibility often trick us into accepting its conclusions without questioning either the methods or the results.
And, again, it’s not that fMRI research is worthless and should be ignored. The point is that, like everything in life, science and research are much more complicated than they appear to be. The social and economic dynamics of the science world has a much larger influence than many are willing to admit. All of us should be wary of accepting any scientific research at face value, and doubly so when it’s accompanied by a bunch of impressive looking pictures full of pretty colours.
Botvinik-Nezer, R., Holzmeister, F., Camerer, C.F., et al. (2020). Variability in the analysis of a single neuroimaging dataset by many teams. Nature, 582, 84–88. https://doi.org/10.1038/s41586-020-2314-9 [(Link)] https://www.nature.com/articles/s41586-020-2314-9)