Education research: all in it together

Christian Bokhove

Evidence explained

Imagine you are designing a textbook and only at the end you decide to evaluate its effectiveness. This would be a risky thing to do, as it would mean making an enormous investment, without any inkling of ‘whether it works’. This would not be money well spent. During the whole design process, there are opportunities to gather ‘evidence’ to improve the quality of the materials, using sound principles along the way.

In the last five years, ‘evidence’ has become an increasingly more important word in the context of education, and even more so in the context of teaching. Rather than ‘evidence-based’ perhaps ‘evidence-informed’ is a more appropriate term as it will seldom be the case that we base our decisions on one or more particular study, but rather that we combine different sources into a decision-making process. I think most people would agree that at least using some form of ‘evidence’ to inform your decisions might not be a bad thing.

So, what constitutes ‘evidence’? Within the realm of education practice and education research there is a range of views, which we could perhaps mark on a spectrum. At one end of the spectrum, one might find teachers’ perceptions of good-quality materials and experiences, and at the other end we might see robust and extensive field tests in classrooms – including Randomised Controlled Trials (RCTs) – and lab-based experiments. In almost all cases we would be dealing with an ‘intervention’: some kind of treatment, for example, a lesson series, a digital tool or a textbook.

Giving design research the attention it deserves

Last autumn, Oxford University Press asked my research department at the University of Southampton to take a look at the RCT of Inspire Maths carried out by the University of Oxford in 2016. First, it was clear that the Inspire Maths programme had been designed based on sound principles like ‘concreteness fading’ (Fyfe et al., 2014), which is akin to Singapore’s Concrete-Pictorial-Abstract approach, and bar modelling.

The brief for the University of Oxford RCT was to trial and evaluate the impact of a Singapore textbook programme using Inspire Maths. The conclusion of this evaluation was that children made significant progress using Inspire Maths. A further and final step was to ask another party (my department) to review the evaluation. We scrutinized the design and analysis of the RCT and found the evaluation was rigorous and well conducted. The transparency of the research was strong and we had every reason to feel confident about the robustness of the findings. In my view, we see here in action the importance of evidence in the design process.

I have always liked the model of ‘layers of formative evaluations’ (Tessmer, 1993), whereby a design or research project moves from more informal evaluations in the early stages of the project (self-evaluation, one-to-one evaluation, expert review) to small group evaluations aimed at testing the practicality and effectiveness, to a full evaluation. We see similar principles in developments around ‘design research’. One could speak of a ‘research pipeline’ going from separate, individual ideas to a rigorous, independent evaluation. One large advantage is that the design process of the intervention gets the attention it, in my view, deserves. Before a full evaluation takes place, everything has been done to maximise the quality of the intervention materials and the measurement instruments.

Testing the pipeline

I attempted to adopt the same principles while doing my PhD from 2007-2011 when designing an algebra intervention (Bokhove & Drijvers, 2012). I started off with small scale, one-to-one think-aloud sessions with first prototypes of the algebra intervention tool that were based on design principles from the existing literature. Then I scaled up to a few classes in a school I knew. Finally I visited nine schools. With the PhD students I supervise now, I also encourage that an intervention study has the best possible intervention quality, by putting additional effort in its design.

Of course it remains a challenge what to do if after most of this ‘pipeline’ the final evaluation turns out not to yield the fantastic effects one was expecting. However, the whole endeavour has still served a useful purpose. Along the way numerous ‘research-worthy’ activities have been done, ranging from implementing research-informed principles, to testing materials at a smaller scale.

Building insight together

To conclude, evidence-informed research can benefit education in several ways. By systematically studying the way we design and implement classroom practices we can gain valuable insights in ‘what works’. In my opinion, though, this ‘what works’ should cover more than just the final effectiveness. In a ‘research pipeline’, small-scale and large-scale, quantitative and qualitative, formative evaluation should take place. It should involve several contributors with a range of expertise. Rather than look for a silver bullet in education research, we slowly and incrementally build more insight into education processes. We don’t do this alone in our Ivory Towers, but we are all in it together.


Dr Christian BokehoveDr Christian Bokehove is a member of the research team at the University of Southampton which, in October 2017, conducted a review report on the University of Oxford RCT into the impact of Inspire Maths on teaching and learning. In its executive summary, the Southampton team rated the study highly and said that ‘the evaluation has been rigorous and well conducted. The transparency of the research is strong and this is every reason to feel confident about the robustness of the findings.’ Find out more about the findings of the RCT carried out by the University of Oxford.



Bokhove, C., & Drijvers, P. (2012). Effects of a digital intervention on the development of algebraic expertise. Computers & Education, 58(1), 197-208. DOI: 10.1016/j.compedu.2011.08.010

Fischer, G. (2001). Communities of interest: learning through the interaction of multiple knowledge systems. In the Proceedings of the 24th IRIS Conference S. Bjornestad, R. Moe, A. Morch, A. Opdahl (Eds.) (pp. 1-14). August 2001, Ulvik, Department of Information Science, Bergen, Norway.

Fyfe, E., McNeil, N.M., Son, J.Y., & Goldstone, R.L. (2014). Concreteness fading in mathematics and science instruction: A systematic review. Educational Psychology Review, 26, 9–25.,

Tessmer, M. (1993). Planning and conducting formative evaluation. London: Kogan.