Using Statistical Methodologies in Detection of Cheating in Academic Integrity Cases

An an attorney who defends academic integrity disciplinary actions at state universities, I am often ask to give an opinion on charges brought against students for cheating based on similarities of their answers on exams. For example, let’s take a look at the following hypothetical:

A professor suspects two students of cheating, and studies the exam answers for similarities. The professor notes that both students got the same questions wrongacademic integrity lawyer. For example, on a multiple choice exam on American presidents, they both incorrectly answered 1) that Theodore Roosevelt was the longest serving president, 2) that George W. Bush initiated the invasion of Panama, and 3) that James Madison presided of the Civil War. The two students each got all other test questions correct.

To some professors, that would seem to be hard evidence of cheating.  What are the chances of two unrelated students each getting the same answers wrong in the same way, and all other questions correct? Not so fast, say statistical experts. As one statistician has noted:

It is our position, echoed by courts and statisticians alike, that at no time can one accept probabilistic evidence as sufficient merely because the occurrence of some value of a test statistic is highly probable.  Reasonable competing explanation must be considered. The limitations of the mechanistic detection strategies, and the inherent variability in test deign and administration reliability and validity found in all except the most rigors of standardized tests and testing situations, preclude an automatic acceptable of probabilities data as prima facie demonstration of misconduct. 

See, Dwyer, David J.; Hecht, Jefrey B. “Cheating Detection: Statistical, Legal, and Policy” (1994). Available at: The author goes on to explain:

Finally, we must answer the question “is the sample of students being compared merely random or is it representative of the class as a whole?” If the class is comprised of distinct subgroups (by achievement, ethnicity, gender, etc?) then the sample from which we draw an inference must representative of the subgroup(s) as well. It is our opinion that no mechanistic detection method currently available sufficiently address these concerns to an adequate degree, casting doubt as to the utility of mechanistic methods to detect wrongdoing with a known and consistent degree of accuracy.

The problem with using statistical error analysis in academic integrity cases is that often times students will work together preparing for a test in a study group. Preparing for a test in a study group can lead students to all have similar understanding of the material, and this includes possible errors or misunderstanding of the subject, and can lead to similar incorrect answers on an exam. There are legal precedents that support this idea. One such case is the court opinion in Boehm v. Univ. of Pennsylvania Sch. of Veterinary Med., 392 Pa. Super. 502, 573 A.2d 575 (1990).  In that legal precedent a veterinary student at the University of Pennsylvania was accused of cheating, and was ultimately found to have committed the offense.  While the student was found to have been guilty, the university threw out the supposed statistical analysis and ruled:

While the information raises suspicion as to the cheating charges, the panel considers the comparison [of the test answers] to be unreliable due to its lack of statistical foundation particularly since the influence on test scores of studying together is unknown. Accordingly this information was not considered in the panel’s deliberations.

Id, 392 Pa. Super. 502, 516, 573 A.2d 575, 582 (1990) (emphasis added).  Similar statistical reasoning was also  rejected in the case of Papelino v. Albany Coll. of Pharmacy of Union Univ., 633 F.3d 81 (2d Cir. 2011).  In that case three pharmacy students were accused of cheating on a test.  The case described the circumstances as follows:

In support of the charges, Nowak [a teacher] presented evidence, which consisted primarily of  “statistical” charts that she had prepared based on her review of exams taken by Papelino, Basile, and Yu in various courses. Papelino, Basile, and Yu countered with (1) the lack of evidence of the means by which the three might have managed to cheat; (2) the fact that the three studied together, and therefore had similar knowledge bases; and (3) the lack of validity of the “statistical” evidence.

Papelino v. Albany Coll. of Pharmacy of Union Univ., 633 F.3d 81, 86-87 (2d Cir. 2011).  The case goes on to explain that the Supreme Court of New York rejected such evidence, and “…concluded that the Honor Code Committee’s determinations were based ‘solely’ on a ‘statistical compilation’ that was based upon ‘false assumptions’ and did not provide ‘a rational basis to conclude that petitioners cheated.’” Id at 87.

While statistical similarities can be a starting point for an academic integrity investigation, such proof should not be taken as conclusive evidence by itself.

Related Posts Plugin for WordPress, Blogger...

Leave a Reply

Comments may be edited for content. Please avoid harsh language and profanity. Flaming or use of threatening language is not allowed. Adding a signature is completely optional. I reserve the right to edit or delete comments as I feel necessary. If you do not like the way your comment has been edited, let me know and I will delete it. Thanks for commenting! (It may appear that your comment does not post at first. However, it will usually appear once the administrator logs in.) Leaving a comment does not create an attorney client relationship, and no confidential information should be left in a comment.

Photo of Steve Graham Steve Graham is a criminal defense lawyer, and he splits his time between Spokane and Seattle, Washington. Visit his website by clicking:
Law Office of Steve Graham
1312 North Monroe Street, #140
Spokane, WA 99201
(509) 252-9167
Blogs I Read