Fall is the time of year where data describing achievements of the prior school year are made public and available for analyses. One of the many research projects we conduct annually is a complete evaluation of each item we employ in our data collection tools. The focus of this evaluation is two-fold: first, to determine if our items continue to discriminate different levels of the conditions they assess; and second, to determine if our items individually and collectively continue to predict important school outcomes. For the latter of these analyses, we include the results from end-of-level or standardized measures of literacy and numeracy conducted in the spring at every public school we work with. For schools in the state of Utah, however, there exists a unique problem, and perhaps, a unique opportunity.
As far back as the 19th century, students across schools and districts were given identical or similar academic tests; tests that required students to recall facts they were assumed to have learned at school. In those days, however, the results of these tests served to evaluate students for high school and college placement, not to appraise school or classroom efficacy. The assumption was, if a student did poorly, it was their fault. In the early part of the 20th century, efforts by academics like Edward Thorndike created systematic methods of assessment adopted gradually by school professionals. They saw it as a means to promote education as a science, and in their minds, testing made schools look good. Although these tests now included items that assessed more sophisticated curricular elements, they were still intended to sort students for placement, not to judge practice.
By the late 1950s, in response to Brown vs. Board of Education and later the Coleman Report, federal and state governments began to pass legislation mandating the testing of students to determine if progress was being made toward equalizing educational opportunities for every school aged child in America. In addition, the results of these tests provided politicians and the taxpayers they represented a method of holding schools accountable for the funds they consumed. So despite the fact that standardized testing was first intended to inform practice, and adopted to add legitimacy to the profession, the age of school accountability had arrived.
Accountability testing became common practice across the nation, and although each new president rolled out a new educational initiative, the consequences to schools for poor results were often negligible or at least inconsistent. Then in 2002, No Child Left Behind tied test results to explicit consequences that schools could no longer ignore. Test scores became the 800 pound gorilla in the classroom.
In 2010, the Common Core initiative emerged, and the response in many states was to align their accountability testing with the new curriculum. Not to mention, many agreed it was time to replace existing assessments that encouraged schools to spend considerable time “teaching to the test;” a practice loathed among educators and perceived by many as a form of cheating that was absent real educational value. Instead of partnering with other states looking to make the same testing upgrades, Utah, whose public perceived the Common Core Curriculum as federal overreach, decided to go solo, and created an online dynamic testing system whose item pool was created by Utah educators. Thus was born the Student Assessment of Growth and Excellence, or SAGE.
Despite its technological core and modern appearance, the public has quickly grown skeptical of SAGE. This is not surprising as Utahns, wary of the Common Core and seemingly unaware that SAGE was created locally, associate the two. Educators aren’t too happy about SAGE either. Test scores from the first year cast Utah schools in a dim light. The results implied that more than half of Utah students failed to master language arts, math, and science. A proportion very different from the one suggested by prior tests. Our records from the old criterion referenced tests show typical language arts proficiency percentages in the 70s and 80s and close to that for elementary school math. Subsequent years have shown slight improvement in SAGE scores, but the tipping point is in sight, and the SAGE heartbeat grows feint.
It’s interesting to note that the features SAGE used to promote its value have turned out to be its downfall. Other than boasting a better item pool than prior tests, SAGE was seen as a modern high tech tool with artificial intelligence. That is, it was conducted online and adapted to individual student responses; creating the illusion that the test was capable of a more fine tuned exam for each student.
Adaptive tests can be powerful tools and have their place, but they are inappropriate for annual school-wide tests. Because students can experience very different item sets, adaptive tests are not standard and often defy the kinds of comparisons expected for accountability. Adaptive tests are at their best when they summarize the achievement status of individual students. Aggregation of these data creates mud.
Despite the fact that online systems avoid paper-mountains and facilitate instantaneous reporting, they can cause user interface problems. Students with limited computer experience or limited English proficiency are at a disadvantage before the test begins. On the surface, this seems unlikely given the assumption that every student has a smart device stuck to their nose, but our data scream confirmation. Correlations between SAGE results for a school and the socioeconomic status of the student population served by the school are nearly perfect. These high correlations are also present with the proportion of students who are English language learners, and indicate that no matter how effective or ineffective a school’s teachers are, they will be unable to influence their SAGE scores. In the end, SAGE is just a measure of poverty and primary language; meaning, SAGE fails to discriminate good instructional practice from bad. This essentially renders SAGE useless as an accountability measure, as it is unreasonable to hold educators accountable for the socioeconomic makeup of their school neighborhoods.
Neither of these problems really change the more fundamental problems associated with current approaches to accountability testing in Utah and across the country. Using an annual examination that covers way too much material and fails to incentivize tester performance is doomed from the outset. It also doesn’t make much sense to put schools through a time consuming process that fails to provide them with specific feedback on performance objectives that will help them reach student achievement goals. Annual accountability testing conducted this way will always feel like punishment and guarantee distrust between educators and the public, or at least educators and the legislature. Think of it this way, the current systems for annual accountability testing are the Simon Cowell of education: they appear valid, evoke fear, are indifferent to participant discomfort, speak loudly, and in the end don’t really help.
Fortunately, there is a better way.
Successful businesses typically need to know more than if they are making or losing money. To be successful, they should collect data describing what features of the business (e.g., sales, inventory, marketing, personnel, technology, resources, distribution, etc) are working and which are not. With that information they can make adjustments and check those changes against the next profit/loss statement. Although schools aren’t businesses, they can employ the same improvement model. They can do more than assess learning as educational profit. Successful schools in the future will persistently evaluate malleable teacher and school performance features so that adjustments can be made and checked against the next learning assessment. This provides schools with the means to influence the data that suggest their worth. Tough to do now in Utah, as the only predictor of SAGE is poverty, and schools can’t control that.
Monitoring teacher and school performance is the next step in education, but it does not exclude the need for assessing student achievement as a part of school accountability. Unlike the current model; however, it seems best if learning measures can serve students, teachers, administrators, and the public all at the same time. This integration will reduce school and teacher alienation and increase incentives for students to do their best; giving everyone better data to help improve the system.
In addition to the tests given in spring, educators conduct assessments across other time increments. As educators well know, accountability tests are currently conducted annually, cover an entire school year, and as previously discussed, aren’t much help. Benchmark tests are conducted three to six times per year, cover material generally taught during a single grading period, and provide better, but insufficient data for accountability. Finally, teachers also conduct assessments that provide feedback on student learning for the most recent curriculum standards covered. These are either used for grading, or take the form of common formative assessments that contribute data to professional learning communities. These tests have their own issues, but it should be clear to the reader that they are far from standard and can’t serve the purpose of addressing school accountability.
For the most part, the assessments discussed above are curriculum oriented; which seems like the best thing to test. Teach a topic, or many topics, and determine if students retained the knowledge they were supposed to learn. Unfortunately, knowing a thing isn’t always what leads a student to learning the next thing. What we know leads a learner from one curricular standard to the next is fluency in basic skills. That is, for example, being able to read quickly with comprehension and being able to rapidly solve four function math problems. There are more basic skills to be sure, like vocabulary, logic, and interpersonal and self-management skills, but when students develop basic skills to fluency, they can learn and retain any part of the k-12 curriculum with ease. Think of skills as tools, when you have the right ones and know how to use them, you can build useful things using any materials set before you. And best of all, every student can learn them given sufficient instruction and practice, and this is true for students from any background. This is where teachers and schools can make the biggest impact as well.
Fortunately, there already exist tests that assess basic literacy and numeracy skills. Many are standardized, scored to place individuals in the population, common, easy to administer, can be used often, and some are already used in Utah and many other states. They provide feedback to the learner, they inform teachers about the impact of their instruction, and they can also be aggregated in ways that allow taxpayers to better understand the efficacy and comparability of school practices. Thus, they incentivize performance, help teachers, and inform decision makers.
Most importantly, using skills testing as a part of accountability encourages teachers to help students who are behind. Right now, if a ninth grader fails algebra, the solution is to have them take it again. This makes sense as the school will be accountable for that student learning that material. Yet, this makes no sense as the vast majority of ninth graders that fail algebra do so because they have basic skills deficits. The teachers and administrators facing this problem are short circuited by an accountability system that doesn’t reward successful intervention in this regard. In fact, the current accountability paradigm feels like it punishes anything that detracts from learning grade-aligned curricular standards. Once a student falls behind, even through no fault of their own, they will have little support from schools to catch up. If schools were to emphasize skills testing, they would be incentivized to provide tailored support to students, regardless of the student’s current mastery of the k-12 curriculum.
Keep in mind that skills-testing is not the be-all-end-all of testing, and it wasn’t designed for accountability, but it is far better than what we currently employ for that purpose. At a minimum it should be in the conversation about what should, in part, replace SAGE in the state of Utah.
Those data would certainly help us each year as we refine the learning environment inventories that we believe should be the first place to make schools accountable.
Matthew J. Taylor, PhD