The Meta-Analysis as a Genre
Meta-analyses occupy a particular place in the evidence hierarchy. They aggregate findings across many studies to produce a more stable estimate of an effect than any single trial can offer. Done well, they are among the most useful documents in any research field. Done poorly, they can lend an aura of authority to conclusions that the underlying studies do not actually support. The neurofeedback literature has been the subject of an unusually contentious series of meta-analyses over the last fifteen years, and reading them carefully requires understanding both what they share and what makes the better ones genuinely informative.
What These Reviews Are Trying to Answer
The headline question in most neurofeedback meta-analyses is some version of: does this intervention produce effects beyond what placebo or sham conditions would produce? The specific outcomes vary by review, but the methodological structure is similar. Investigators pool studies that meet inclusion criteria, calculate effect sizes for each, and produce a weighted average effect with a confidence interval. The interpretation usually distinguishes between effects on EEG metrics themselves and effects on behavioral or clinical outcomes. These two categories of effect tell related but distinct stories, and conflating them is a common reader error.
The Sham Control Problem
The methodological hinge of nearly every serious neurofeedback meta-analysis is how studies handled control conditions. A study comparing neurofeedback against a waitlist is asking a different question than a study comparing neurofeedback against sham feedback in a double-blind protocol. The sham-controlled designs are harder to run, easier to underpower, and tend to produce smaller effect sizes than open-label comparisons. Martijn Arns and colleagues have been notably explicit about this distinction in their published work, including a series of papers examining the placebo response in neurofeedback specifically. The pattern that emerges across reviews is that open-label and unblinded studies report effects that are roughly twice as large, on average, as well-blinded sham-controlled trials. This is not unique to neurofeedback; it is the general pattern in any field with strong patient expectations.
Protocol Heterogeneity
A second methodological hinge is whether the meta-analysis pools all neurofeedback as a single intervention or distinguishes between protocols. The umbrella term covers approaches as varied as SMR training, Theta/Beta protocols, alpha-theta training, slow cortical potential training, infra-low frequency training, and various individualized z-score protocols. Pooling effect sizes across all of these is like pooling effects across all forms of exercise without distinguishing between resistance training, endurance work, and yoga. The 2016 meta-analysis by Cortese and colleagues, published in the Journal of the American Academy of Child and Adolescent Psychiatry, was notable for paying attention to protocol differences, though the cleaner subgroup analyses were constrained by the limited number of well-controlled trials in each protocol category.
Effect Sizes in Context
When you do see a pooled effect size, the next question is interpretation. Standardized mean difference, reported as Cohen’s d or Hedges’ g, has rough conventions: 0.2 is small, 0.5 medium, 0.8 large. Pooled effects from sham-controlled neurofeedback studies typically fall in the small-to-medium range, with substantial variability by outcome and protocol. This is neither nothing nor a triumph; it is the range most effective behavioral interventions occupy. Effect sizes of 1.5 or higher in single open-label studies should raise questions about expectation and selection bias.
Reading the Forest Plot
The forest plot is more informative than the pooled headline number. Tightly clustered effect sizes with overlapping confidence intervals tell a different story than wildly heterogeneous effects, even when pooled means are identical. Heterogeneity statistics like I-squared quantify this; values above 50% suggest pooling may obscure more than it reveals.
The Pigott Critique
Patrick Pigott and collaborators have argued that dismissive readings of neurofeedback research have themselves been methodologically selective: standards applied to neurofeedback are not consistently applied to comparison interventions, and some pharmacological alternatives have weaker long-term evidence than the skeptical literature implies. Whether you find this fully convincing or not, it is a useful corrective against assuming the field is settled.
A Reader’s Checklist
When you encounter a neurofeedback meta-analysis, a short checklist helps separate useful from misleading. Did the review distinguish sham-controlled from open-label studies? Did it distinguish protocols? What were inclusion criteria, and did exclusions track methodological quality or results? Do heterogeneity statistics justify the pooling? Are the outcome measures the ones you actually care about? Most meaningful disagreements in the literature are not whether effects exist but how large, how protocol-specific, and how durable they are. Those are the questions to bring to any new review you read.
NeuroSphere is a clinical-grade neurofeedback platform designed for adults who want measurable insight into their brain states. See how the protocol works.
See also: Neurofeedback vs Meditation: Where They Converge and Diverge.
NeuroSphere is a wellness and cognitive training tool, not a medical device or treatment for any condition. It does not replace care from a licensed clinician, therapist, or physician. Neurofeedback research is ongoing and findings vary; this post discusses general scientific context, not personalized clinical advice. If you are experiencing significant emotional distress, please reach out to a qualified professional. U.S. resources: 988 Suicide & Crisis Lifeline (call or text 988), SAMHSA (1-800-662-4357), National Institute of Mental Health.