The Problem with Evidence-Based Research & Education

Since the passage of the No Child Left Behind law in 2001, there’s been a requirement that schools (at least those receiving federal funds), use ”evidence-based” research to back-up the effectiveness of an instructional intervention in the classroom. The evidence-based research most often regarded as optimal is the experimental or randomized control trial (RCT). This is where there are two groups, the experimental group and the control group (matched with respect to socio-economic status and other variables). The experimental group receives the ”treatment” (e.g. the instructional intervention being evaluated), and the control group doesn’t. There are pre- and post- standardized tests administered to both groups, and at the end of the ”trial,” evidence is gathered to compare the two groups, with the overarching question being: did the experimental group make more progress (from pre- to post-test) than the control group.

One statistical method used to evaluate these results is the so-called ”effect size” where a comparison is made between the standard deviations of the two groups. An effect size of 1.0 would mean that a student at the 50th percentile in the experimental group is one standard deviation above the mean of a student in the control group. An effect size of 0.0 would mean there were no differences between the groups. A negative effect size would mean the control group did better on the outcome than the experimental group. Generally speaking, consensus has gathered around the idea that an effect size of .4 and above is generally indicative of evidence supporting the use of the instructional intervention (that would mean students at the 50%ile in the experimental group would be equivalent to students at the roughly 64%ile in the control group). In other words, an average student in the experimental group does as well as an above average student in the control group.

There are several problems with evidence-based research of the kind described above. First of all, there are problems defining a ”control” group. This presumably is a group that does not receive the intervention being evaluated. But the control group students are not just sitting in their classrooms with their hands folded starting off into space doing nothing. They are also exposed to classroom interventions, both intended and incidental, and more importantly both positive and negative. Perhaps the control group has a grouchy teacher and the experimental group has a even-tempered teacher. Unless the study is comparing this particular variable (and good luck finding someone willing to admit that they’re a grouchy teacher!), then this factor may account for a large part of the difference between the two groups.

Of course researchers have thought about this and often use ”meta-analyses” which pool several studies evaluating the same intervention so that these confounding variables might be diluted in their impact on results. However, this type of statistical maneuver might well do the reverse and actually increase the number of confounding variables. In addition, when you start to pool studies, you run the risk of comparing apples and oranges. In other words, one study of, say, graphic organizers, may use ”mind-mapping” as its ”treatment” while another one might use 4W charts (Who, What, When, Where), and a third might employ decision-trees. Each individual study is consistent within itself, and all are using ”graphic organizers,” but each study is using an essentially different approach.

I think the biggest problem with using evidence-based research in education is that it runs counter to the belief that every student is unique and that educators need to individualize according to each student’s particular needs. This can be determined only by individual teachers trying out different teaching methods in real classrooms with particular students, noticing what has worked in the past, trying out new things, using teacher intuition borne of years of teaching experience, and sometimes even doing quite outlandish things to reach the unreachable student (recall Jaime Escalante with a chef’s hat on and a large cleaver in the movie Stand and Deliver).

This is the way things used to be when I was getting my teaching credentials in the 1970’s. Now, however, teachers must consult a list of ”approved” evidence-based methods, and use an approach that may work for the ”mean” of hundreds or even thousands of students, but in reality not work with individual outliers who did not represent the ”mean” of the studies involved. The funny thing is, such evidence-based strategies may not even work with any ONE single student, because they are based upon statistical averages. The numbers 2, 6, 1 and 7 average out to 4, which is a number not even represented in the series. Similarly, although researchers may have come up with a statistical ”effect size,” it may simply be a statistical artifact and not speak the truth regarding any of the individual students who were involved in the studies. This process of bowing to a statistic, which is such an unfortunate part of today’s educational climate, is an example of what is called ”positivism” where the quantitative dimensions of science are given a value far above true human experience. The evidence-based studies used now by education come originally from medicine, where it makes sense to study the impact, say, of a new medication, by having a group that takes it and a control group that doesn’t but instead is given a placebo. In education, however, there are simply too many complex variables involved in being human, to be able to come up with any truly valid results.

There’s a great story to illustrate this kind of ”number worship.” A man goes to a friend’s house and says ”I’d like to borrow your donkey” The friend says ”I’m sorry but I lent him out for the day.” The man is disappointed and starts to walk down the man’s sidewalk, but then he hears the donkey braying in the back yard. He rushes back up to the house, knocks on the door, the friend answers, and the man says ”I thought you said you lent your donkey out for the day.” The friend says, ”Yes, I certainly did.” The man says, “But I can hear your donkey braying out in the back yard!”‘ The friend then replies: “Who are you going to believe, the donkey or me?” This is the central question: are we going to believe the ”experts” who give us purported ”truths” in the form of statistics that point to ”approved” teaching methods, or are we going to believe our own senses as teachers working in the trenches, knowing what works and what doesn’t work for our own unique students?

For more information see my book If Einstein Ran the Schools: Revitalizing U.S. Education