A judge in New Mexico appears to think a protest filed by the American Institutes for Research involving the PARCC multi-state testing consortium has merit—or at least that its filing didn’t miss important deadlines, as New Mexico officials at first claimed, Education Week reports.
According to the complaint, the main PARCC testing request for proposals was written such that Pearson was the only company that stood a chance. That’s because the RFP bundled test development and test delivery in the first year with the next three years, and Pearson was already working on the first year. This bundling gave Pearson an automatic advantage for getting the contract in subsequent years, and the AIR, a nonprofit, complained.
While practices like bundling may not be illegal in RFPs, they are unethical. New Mexico officials, who wrote the RFP and approved Pearson’s bid on behalf of more than a dozen states in the Partnership for Assessment of Readiness for College and Careers consortium, should have known better.
The AIR, based in Washington, D.C., raised this issue in a protest filed with New Mexico officials in December, but those officials said the protest should have been filed with different people in the state. In her ruling, Judge Sarah M Singleton of the Santa Fe First Judicial District concluded that the RFP was “ambiguous” about where protests over the bidding process should be submitted, according to a transcript of the hearing, and that the AIR had “substantially” complied with the protest process.
But in the meantime, much of the work at PARCC, being carried out by Pearson, has come to a halt while New Mexico officials sort this out.
The contract is potentially lucrative for the company that gets it, as PARCC now plans to charge states about $24 per kid and PARCC’s website says about 15 million testable students are in PARCC states and would have to take PARCC’s tests under federal law.
Do the math: That’s $360 million a year for Pearson, more than $1 billion over the three-year term of the contract.
I wouldn’t be so upset about spending that kind of money on tests if I knew the tests were valid, reliable, and fair, as PARCC’s fiscal agent contract with the state of Maryland says they will be. But that is not the case.
Let’s start with fairness, because it’s the easiest one to pick on. Math tests taken on the computer and scored for showing work, as many PARCC test items will be, are not fair, because kids who do their work with a pencil don’t transfer all of it to the computer. They therefore don’t receive the same scores as kids who show all their work in a test book.
Don’t even get me started about how the tests give an unfair advantage to rich kids who attend schools in affluent neighborhoods and whose parents spill endless dollars for private tutoring and other education-related services.
Let’s move on to “valid”: Because of a concept known as “error,” which test designers admit, PARCC tests cannot possibly measure what we say they’re measuring. For example, the math test cannot possibly measure how well a student understands concepts in the Common Core State Standards for math. The main reason is that the tests can only look at a sub-part of the whole set of standards.
Consider a 1998 book by Noel Wilson, Educational Standards and the Problem of Error, which is available here. He writes:
Other things being equal, more sources of evidence are better than fewer. However, the quality of the evidence is of primary importance, and a single line of solid evidence is preferable to numerous lines of evidence of questionable validity.
That is, the more questions you ask, the more valid the test is, but since we can’t keep kids in a testing room all year long, we have to take shortcuts. So, on the one hand, we have to ask ourselves, How much of a shortcut are we willing to accept before we say we don’t have a valid measure of students’ math understanding?
You can also look at this in terms of question type. Multiple-choice questions don’t provide very solid evidence of a student’s understanding, so we have to put more questions on tests if they’re multiple-choice than we would if we use only constructed-response questions, since constructed-response questions provide a stronger body of evidence.
But on the other hand, the question is moot. It’s especially moot if the test questions themselves aren’t really solid lines of evidence.
Assessing the Common Core may, in fact, be impossible, so politicians and corporations often try to direct our attention to the first understanding of validity: more tests, more questions on the test, and so on. The second question is more important, though, as it concerns the quality of the evidence used, the quality of the test questions themselves. There can never be enough good questions to satisfy some people’s requirements for test validity, but all the bad questions in the world won’t make a valid test—ever.
Now for “reliability”: I generally think of this as “an estimate of the error you’d expect if the student did a hypothetical parallel test. And in generalizability theory, it’s an estimate of the difference between the ‘universe’ score and the score on any particular test,” quoting again from Mr Wilson’s book.
In other words, reliability just means that students with the same ability in math will get the same score no matter how many times they take the math test. It’s a very theoretical concept, since we only administer tests one time, but that’s what reliability is all about.
And if you think about it like that, it means that what we’re really looking at is how well a test interacts with the student, not how reliably the test measures a student’s ability compared to other pieces of evidence about the student’s achievement, like grades or classroom work.
What we want to know, under No Child Left Behind, is how well a student’s grades, given by his teacher, reflect the true ability level of the student with respect to the standards in the Common Core. What we have received, for the billions of dollars we have paid to testing companies over the years since NCLB went into effect, is a test that may be reliable with regard to other tests.
We’ve stopped asking if the test is even reliable with regard to the other forms of evidence we could use, so we have completely missed the boat on accountability. The ship sailed the moment we started counting tests as measuring more than they possibly can measure.
Once again, with NCLB, we had the right idea, but the execution failed us.
New Mexico officials may or may not rule that’s what happened with the current contract for PARCC, but state judges don’t usually bring work that affects a dozen other states to a screeching halt if they don’t think there’s something wrong with the way the respondents are doing business.
Stay tuned …












You are correct that this Judge did not make her decision lightly. There have been several scandals involving NM PED Secretary-designate’s appointees over the past three years. Most recently as reported in the Santa Fe Reporter yesterday (see link below) and in Politico’s Morning Education story by Stephanie Simon (also see link below) a former NM PED General Counsel and Secretary-designate Hanna Skandera signed off on a contract that PED administers for the federally funded 21st Century Community Learning Center, and the company receiving the $150,000.00 contract was owned by the attorney who at the time was working for Skandera’s office … major conflict of interest … it is also interesting that the same company was turned down on a contract proir to the attorney becoming the General Counsel for the granting state agency. Her response was basically … oops, I made a mistake …. but she is still fighting to get the money. there are more ethical issues involving another attorney who did a short stint as thr Charter School Czarina for Skandera … even steering clients to her law partner while she was working for the state … she resigned shortly after her conflict was exposed. And then in late 2011 Skandera’s office was reprimanded for other violations of the state’s procurement codes and processes.
See the stories of the current conflict exposed:
http://www.sfreporter.com/santafe/article-8746-double-signature.html
For the Politico link (here below) make sure you scroll to the June 5th edition (today’s) as it may automatically move to the next day as each day goes by:
http://www.politico.com/morningeducation/
I just know that even if New Mexico’s procurement officer finds that there’s not a single thing wrong with the RFP or Pearson’s contract, New Mexico school or procurement officials messed this one up big time. Now several other states can continue to negotiate with Pearson et al, but can’t sign anything, because it all could be thrown back into a rebidding process, thanks to New Mexico’s dismissive response to the original protest. That could, of course, cause delays that would make testing in the fall (for students on semester block schedules) or even the spring all but impossible. And then, who knows what could happen? I hope states didn’t throw away their old tests, because this could take a while, depending on how appeals work.
I think the N.M. procurement officer has until mid-June to evaluate the protest on its merits, since the state can’t just throw it out now by saying the AIR didn’t file it properly. A billion dollars of all our tax money is in the balance, and N.M. officials were sloppy and “ambiguous” in handling the competitive bidding process. What a mess! and thanks for your comment.
-Paul