Bayesian Litigation
(An Application of Bayes’ Theorem to Law)
F. E. Guerra-Pujol
Image Credit: Lars Pålsson Syll
Abstract Building on our previous work (Guerra-Pujol, 2011), we present a simple Bayesian model of
federal litigation in this paper. The emphasis of this paper is on methodology. We show how one
can build a simple and straightforward Bayesian model of appellate cases even with very little
information, e.g. a small sample of previously-decided cases. As we shall see, the government not
only has a high probability of winning its cases (the main result in Guerra-Pujol, 2011); it will
most likely win its non-criminal cases during the trial or post-trial stages of litigation.
Keywords Bayesian Analysis, Law, Litigation
JEL Codes C11, K41
Date 23 May 2018
Word Count 1862
Bayesian Litigation
(An Application of Bayes’ Theorem to Law)
F. E. Guerra-Pujol
Building on our previous work (Guerra-Pujol, 2011), we present a simple Bayesian model of
federal litigation in this paper. The emphasis of this paper is on methodology. We show how one
can build a simple and straightforward Bayesian model of appellate cases even with very little
information, e.g. a small sample of previously-decided cases. As we shall see, the government
not only has a high probability of winning its cases (the main result in Guerra-Pujol, 2011); it
will most likely win its non-criminal cases during the trial or post-trial stages of litigation.
Our original motivation for this work is set forth in a previous paper (Guerra-Pujol, 2011, p. 46).
In that paper, we developed a simple “relative frequency” model of federal litigation in order to
estimate the probability distribution of litigation outcomes in various categories of federal
appellate cases. Building on our previous relative frequency model (ibid., pp. 47-48), we will
now present a simple Bayesian model of probability to estimate the probability distribution of
appellate cases.
Here, we shall use a sample of cases collected in our previous work (Guerra-Pujol, 2011), i.e. the
set of all reported cases in Volume 287 of the Federal Reporter, excluding criminal cases. [In
future work, we will sample the set of all cases decided by the United States Supreme Court
during the 2017-2018 Term.] In all, our sample of appellate cases contained 54 non-criminal
“government cases” and 103 “non-government cases” that were initially decided at various
stages during the initial litigation: pre-trial, trial, and post-trial. [See Tables 2AA and 2D in
Guerra-Pujol, 2011, pp. 58-59. The term “trial” includes non-jury adjudications, such as
injunction hearings and other equity cases. We are excluding criminal cases from our sample
since plea bargains are not reported in the Federal Reporter.]
From a probabilistic perspective, our sample of appellate cases in Guerra-Pujol (2011) is like a
“legal urn” or black box containing a set of 154 federal cases (events) instead of 154 balls. Based
on our modest sample, our legal urn contains the following mix of cases (cf Schlaifer, 1959, p.
330-331):
!
I. Just under (i.e. 71/148 or .48) of the cases were decided or disposed of during the
"
!
pre-trial stage; of these cases, about (i.e. 15/71 or .21) were non-criminal
#
$
government cases; about (i.e. 56/71 or .79) were non-government cases.
#
!
II. A little over (i.e. 86/148 or .52) of the cases were decided at trial or post-trial; of
"
"
these cases, over (39/86 of .45) were non-criminal government cases; a little less
#
%
than (47/86 or .55) were non-government cases.
#
We can visualize these priors in Figure 1 below as follows:
-------------------------------------------------------------------------------------------------------------------------------
[INSERT FIGURE 1 HERE: PRIOR PROBABILITIES]
.48 .52
.55
.79
.69
.45
.21
-------------------------------------------------------------------------------------------------------------------------------
The values above the box represent the prior probabilities [or unconditional relative frequencies]
that a federal case will be decided at the pre-trial stage or trial/post-trial stage, while the four
values inside the box represent the conditional probabilities [or conditional relative frequencies]
that any given case is either a non-criminal government case or a non-government case.
[Although our sample consists of appellate cases, we carefully determined at which stage the
case was decided before it went up to appeal. Also, we have equalized the first set priors (above
the box) for mathematical simplicity.]
This prior distribution is depicted in Table 1 below as follows:
-------------------------------------------------------------------------------------------------------------------------------
[TABLE 1 – PRIOR PROBABILITIES IN TABULAR FORM]
Event of interest Column A: Prior Probability of event* Column B: Probability the case is a (non-
criminal) government case given the event
Pre-trial disposition .48 .21
Disposition at trial or post-trial .52 .45
.48 + .52 = 1.0
[* Note that Column A is essential in order to avoid the base rate fallacy. (See generally Bar-
Hillel, 1980.]
-------------------------------------------------------------------------------------------------------------------------------
Next, suppose a new case is now pending before a federal court (or suppose we draw any federal
case at random), and further suppose that the only additional information we have is the identity
of the parties, i.e. we have no idea whether the case was decided during the pre-trial stage or
during the trial/post-trial stage. What is the probability that a case decided during the pre-trial
stage is a government case? The solution to this problem requires us to revise or update our
priors, using the two sets of prior probabilities shown in Table 1 as follows:
We first use the multiplication rule to compute the joint probabilities of “pre-trial decision” and
“non-criminal government case” (i.e. .48 times .21 = .1008) and “trial/post-trial decision” and
“non-criminal government case” (i.e. .52 times .45 = .2340). These joint probabilities are shown
in column C of Table 2 below. We can then use the addition rule to compute the marginal
probability of that a case will be a non-criminal government case. Lastly, we apply the definition
of conditional probability to compute the revised or posterior probabilities in column D of Table
2:
-------------------------------------------------------------------------------------------------------------------------------
[TABLE 2 – POSTERIOR PROBABILITIES IN TABULAR FORM]
Event of interest Column C: Joint probability of event and Column D: Probability of event given the
(non-criminal) government case case is a (non-criminal) government case
Pre-trial disposition .1008 @ .1 .1/.3 = .33
Disposition at trial or post-trial .2340 @ .2 .2/.3 = .66
.1 + .2 = .3 (as per addition rule) .33 + .66 @ 1
-------------------------------------------------------------------------------------------------------------------------------
Table 2 can also be depicted in visual form. The information in Column C of Table 2 is
represented in visual form in Figure 2 below:
-------------------------------------------------------------------------------------------------------------------------------
[INSERT FIGURE 2 HERE – JOINT PROBABILITIES]
.2
.1
[pre-trial and [trial/post-trial
Gov Case] and Gov Case]
-------------------------------------------------------------------------------------------------------------------------------
Figure 2 reproduces only that part of Figure 1 that corresponds to the event “non criminal
government case” decided at either the pre-trial stage or trial/post-trial stage case.
Next, we create Figure 3 by enlarging Figure 2 in such a way that its total area becomes 1. How
do we do this? First, we first calculate the original area of each part of Figure 2 (the joint
probabilities) from the data in Figure 1 and then divide the area of each part of Figure 2 by the
total of the areas in Figure 2 [.3; the sum of .1 and .2] to raise the revised area to 1. (Cf.
Schlaifer, 1959, p. 332.) The information in Column D of Table 2 is thus represented in visual
form in Figure 3 below:
-------------------------------------------------------------------------------------------------------------------------------
[INSERT FIGURE 3 HERE: POSTERIOR PROBABILITIES]
.66
.33
[pre-trial and [trial/post-trial
Gov Case] and Gov Case]
-------------------------------------------------------------------------------------------------------------------------------
That is to say, given our base rates--i.e. the ratio of pre-trial cases to trial/post-trial cases--the
probability that a case is a government case given that it was decided or disposed of during the
trial/post-trial stage (instead of the pre-trial stage) is high. In other words, the government not
only has a high probability of winning its cases (Guerra-Pujol, 2011). There is also a high
probability that it will win its cases during the trial or post-trial stage.
Conclusion
The advantage of the Bayesian approach is that it allows us to make the best use of a minimum
amount of information and experience. Of course, it is no substitute for experience, since it, by
itself, cannot determine the probability of any event. [Cf. Schlaifer, 1959, p. 333.] Instead, our
Bayesian formulation allows us to make more effective use of our experience by assigning
probabilities to those events of which we have some experience or prior knowledge, rather than
to events that may in reality determine litigation outcomes but with which we have no direct
knowledge or experience.
Bibliography
Maya Bar-Hillel, “The base-rate fallacy in probability judgments.” Acta Psychologica, Vol. 44,
no. 3 (1980), pp. 211-233.
F. E. Guerra-Pujol, “Chance and litigation.” The Boston University Public Interest Law
Journal, Vol. 21, no. 1 (2011), pp. 45-59.
________________, “Visualizing probabilistic proof.” Washington University Jurisprudence
Review, Vol. 7, no. 1 (2013), pp. 39-75.
Robert Schlaifer, Probability and statistics for business decisions. McGraw-Hill (1959).
Acknowledgements
I would like to begin by thanking my wife Sydjia for lending me her heart and my in-laws, Erle
and Andrea Robinson, for lending me their home, where I wrote most of this paper. I also wish
to thank professors Kevin Johnson and John Patrick Hunt. Unbeknownst to them, a conversation
we had in the fall of 2008 at the Marriott Wardman Park Hotel in Washington, D.C. led me to
refine my thinking about the role of probability in the law. I also need to thank J. B. Ruhl, Dan
Katz, and Orlando Martinez-Garcia for many fruitful conversations (in some cases, going back to
the early 2000s) about computation, mathematics, and Bayesian probability over the years.
Although it has taken me many years to understand Bayesian probability and its possible
applications to law, without their intellectual friendship and devotion to truth I would have never
gotten this far …
Appendix: Algebraic Notation
The Bayesian logic of Tables 2 and 3 and Figures 1 through 3 in the main text can be expressed
using the following algebraic notation:
Let A denote a case decided at the pre-trial stage [striped]; let C denote a case decided at the
trial or post-trial stage [dotted]; and let B denote a non-criminal government case.
P(B, A) = P(A) times P(B|A) = (.48) times (.21) @ .1
P(B, C) = P(C) times P(B|C) = (.52) times (.45) @ .2
The addition rule gives us the marginal probability as follows:
P(B) = P(A) times P(B|A) + P(C) times P(B|C) = (.48) times (.21) + (.52) times (.45) @ .1 + .2 @ .3
Lastly, the definition of conditional probability then gives us Bayes’ Theorem:
P(A|B) = P(A, B) divided by P(B)
and
P(A, B) divided by P(B) = [P(A) times P(B|A)] divided by [P(A) times P(B|A) + P(C) times P(B|C)]
Next, recall that P(A) = .48; P(B|A)] = .21; P(C) = .52; and P(B|C) = .45. Accordingly, we can now
plug in the relevant numerical values from our tables and figures in the main text as follows:
P(A|B) = (.48 times .21) divided by [(.48 times .21) + (.52 times .45)]
P(A|B) = .1 divided by (.1 + .2)
P(A|B) = .1 divided by .3
P(A|B) = .66
Note to this appendix: the numerical values above and in the main text can also be expressed
more intuitively in discrete numbers, instead of fractions. For an example of such an approach,
see Guerra-Pujol (2013).