Tag Archives: Scalability

Why Not: 3 Ingredients Enable Universal, Annual Digital Program Evaluations

This post originally appeared in an EdSurge guide, Measuring Efficacy in Ed Tech. Similar content, from a perspective about sharing accountability that teachers alone have shouldered, is in this prior post.

Curriculum-wide programs purchased by districts need to show that they work. Even products aimed mainly at efficiency or access should at minimum show that they can maintain status quo results. Rigorous evaluations have been complex, expensive and time-consuming at the student-level. However, given a digital math or reading program that has reached a scale of 30 or more sites statewide, there is a straightforward yet rigorous evaluation method using public, grade-average proficiencies, which can be applied post-adoption. The method enables not only districts, but also publishers to hold their programs accountable for results, in any year and for any state.

Three ingredients come together to enable this cost-effective evaluation method: annual school grade-average proficiencies in math and reading for each grade posted by each state, a program adopted across all classrooms in each using grade at each school, and digital records of grade average program usage. In my experience, school cohorts of 30 or more sites using a program across a state can be statistically evaluated. Once methods and state posted data are in place, the marginal cost and time per state-level evaluation can be as little as a few man-weeks.

A recently published WestEd study of MIND Research Institute’s ST Math, a supplemental digital math curriculum using visualization (disclosure: I am Chief Strategist for MIND Research) validates and exemplifies this method of evaluating grade-average changes longitudinally, aggregating program usage across 58 districts and 212 schools. In alignment with this methodological validation, in 2014 MIND began evaluating all new implementations of its elementary grade ST Math program in any state with 20 or more implementing grades (from grades 3, 4, and 5).

Clearly, evaluations of every program, every year have not been the prior market norm: it wasn’t possible before annual assessment and school proficiency posting requirements, and wasn’t possible before digital program usage measurements. Moreover, the education market has greatly discounted the possibility that curriculum makes all that much difference to outcomes, to the extent of not even trying to uniformly record what programs are being used by what schools. (Choosing Blindly: Instructional Materials, Teacher Effectiveness, and the Common Core by Matthew Chingos and Russ Whitehurst crisply and logically highlights this “scandalous lack of information” on usage and evaluation of instructional materials, as well as pointing out the high value of improving knowledge in this area.)

But publishers themselves are now in a position, in many cases, to aggregate their own digital program usage records from schools across districts, and generate timely, rigorous, standardized evaluations of their own products, using any state’s posted grade-level assessment data. It may be too early or too risky for many publishers. Currently, even just one rigorous, student-level study can serve as sufficient proof for a product. It’s an unnecessary risk for publishers to seek more universal, annual product accountability. It would be as surprising as if, were the anonymized data available, a fitness company started evaluating and publishing its overall average annual fitness impact on club member cohorts, by usage. By observation of the health club market, this level of accountability is neither a market requirement, nor even dreamed of. No reason for those providers to take on extra accountability.

But while we may accept that member-paid health clubs are not accountable for average health improvements, we need not accept that digital content’s contribution to learning outcomes in public schools goes unaccounted for. And universal content evaluation, enabled for digital programs, can launch a continuous improvement cycle, both for content publishers and for supporting teachers.

Once rigorous program evaluations start becoming commonplace, there will be many findings which lack statistical significance, and even some outright failures. Good to know. We will find that some local district implementation choices, as evidenced by digital usage patterns, turn out to be make-or-break for any given program’s success. Where and when robust teacher and student success is found, and as confidence is built, programs and implementation expertise can also start to be baked into sustained district pedagogical strategies and professional development.

Tagged , , , , , , ,

If something Actually Worked in Education… 4 things to Look For

“Education” is a huge conversation, but almost all of the conversation is about the education problem. There is not a serious or mature conversation about solutions.

Meanwhile lots of resources and, as important, attention, are going into an ever-changing, wide variety of programs which, it is hoped, will be part of some solution.  How do we know when we’re looking at any given program, whether it can have game-changing impact on K-12 schools, or has no chance? I call the former, programs that “Actually Work”.

Here I identify and describe four characteristics of any program that Actually Works: Scope, Results, Robustness, and Scalability. Failure at any one of these is failure to Work.

1 – Scope

A program that Actually Works must cover or affect an entire course curriculum. For example, let’s think of mathematics. A program that Actually Works can be as small as one grade level, or one course. But the scope can’t be less than that — it can’t be just a specific concept — because you need to be able to Tell if something Actually Works, and that takes an end-of-course assessment. And what Actually Works needs to help educators and students with what they are held accountable to: summative assessments. So, in mathematics, a cool interactive visualization of parabola parameters on a graphing calculator doesn’t meet the Scope requirement.

2 – Results

To Actually Work, a program needs Positive Results. As stated above in Scope, the results need to be measured using a summative assessment. The Positive Results need to be essentially this: for educators and teachers using the program, all the students become Proficient in the given subject area. I’m willing to accept Proficient as defined by states as meeting state standards on a state standardized test, and “all” as 90%+, but a more general way to define Proficient is: functionally competent to create work product, or solve problems. So, do these Positive Results need to pass a clinical, random-assignment gold-standard evaluation to scientifically attempt to identify the Cause? No. For a program that Actually Works, the results will speak for themselves, robustly at scale in any real world setting, and there need be no argument about what caused the results. Empirically, on implementing the program, you get the Positive Results. In a wonderfully luxurious later phase of this education conversation, when multiple programs Actually Work, we can fret about what we don’t yet know. We can fret over whether extra hours made the difference, or what component came from human or material or process components of the program, or motivation, or factors external to a school. We can fret over whether one program that Actually Works has been properly compared to another program that Actually Works. And we can fret over exactly how do we evaluate the answers to those questions. But the breakthrough step is to have “one” exemplar program that Actually Works.

3 – Robustness

If a program Actually Works, it will work for any educator and any student.  A program that only works for some educators — for the most motivated, or the most experienced, or the most skilled — does not Actually Work. A program that Actually Works needs to work for a first-year teacher with low confidence, right? I mean if it doesn’t work in that situation, then we have a Big Problem, right? And a program that Actually Works has to work with every student subgroup. In the area of math, for example, that means that it works for students who are multiple grades below grade-level. It works for students who have been testing low. It works for students who don’t get much help at home on math. It works for students who have low confidence in their math abilities. It works for students who are suffering from low motivation to do well. And it works for students who are low in language arts proficiency, and students who are learning English.

And the educator or student, for whom the program Actually Works, can be in any school, any school district, in any city and in any calendar year. In other words, Positive Results should be replicable across geography and time.

4 – Scalability

If a program that Actually Works can’t Scale, then it is of no use. By Scale I mean reach millions of students. And there are three aspects of scaling: Implementability, Economics, and Speed.

First Scalability Aspect: Implementability. The program needs to be able to be implemented across the dominant education model. In other words, it must fit within the existing structure of facilities, people, and time. The program should *not* then be requiring some additional changes, such as re-tooling school buildings; re-tooling school hardware; re-tooling school personnel; requiring home technology, or changing the time or location of education. Another consideration for scalability is that the program must not only be do-able by teachers, but also be embraced by teachers. Anything less than a full embrace by teachers will result in failure, failure to achieve Positive Results, either from outright rejection of the program, or by spotty or sub-par implementation of the program.

Second Scalability Aspect: Economics. The program needs to fit within current school economics. This means both for initialization of the program as well as for sustaining the program. For start-up, there is not only the program outright cost but also the costs for site facilitization (see above) and especially in teacher time for training. Outright cost needs to be in the neighborhood of current instructional materials costs per student. Facilitization costs need to be near zero. And teacher training time costs need to also be near zero, as PD days and substitute days become artifacts of the past. Sustaining costs (e.g. annual renewals) need to be low enough to survive tough priority battles during even the toughest budget years (see above: teachers must embrace and highly value any program, for it to be Scalable).

Third Scalability Aspect: Speed. If there is a program that Actually Works with all of the above, to be worthwhile it must also have the characteristic of quick scale-up. By quick I mean that it must have the capability, if the market demands it, of adding millions of students and their teachers per year to the program. In a country with 49 million students and 3.5 million teachers in K-12, anything less than this means we have no solution at all.

%d bloggers like this: