Level 01
______________________________________________________________________________________________
40. Introduction to Probability
Learning Objective Statements
● Define Probability
● Explain the Impact of the law of large numbers on a series of outcomes
● Define Random Variable and the phrase "Independent and identically distributed"
● Identify skew and kurtosis
The Search for the High Probability Trade
● Even the newest trader understands that there is no guarantee that a given trade will be
profitable––even if it is a so-called “high-probability trade.”
● Because financial markets are incredibly noisy from a statistical standpoint, randomness can play a
large role in investment returns.
● That poses a challenge to investment managers because it means that even valid trading systems
can produce prolonged streaks of underperformance.
● If those runs happen to present themselves in the early stages of a strategy’s performance, a
manager may be met with a drawdown that takes many years to recover from, or that even prompts
unsurvivable redemptions from investors.
● On the other hand, a result of the noisy nature of the markets that is less often acknowledged is the
fact that positive performance due to randomness can lead to false discoveries—that is, employing
strategies that make money due to luck alone.
● It is important to realize that those questions are rarely easy to answer.
Properties of Probability
Definition of Probability
● Probability measures the extent to which an event is likely to occur.
● Probabilities are measured on a scale between 0 and 1.
● A probability of 0 indicates that an event is more or less impossible
● While a probability of 1 indicates proximate certainty.
● Practically speaking, the majority of probabilities we are concerned with fall somewhere in between.
● For a given event (E) with N mutually exclusive and equally likely outcomes, the probability of E is
classically defined by: P(E) = NE/N
● What this suggests is that, assuming a true high-probability trading strategy (that is, one where the
alpha generated is not due purely to chance), the outcome of any single trade might be random, but
the probabilistic expectation is that over a large number of trades, the observed win rate converges
with the true win rate of the strategy.
www.yubha.com | December, 2022 Edition
Level 01
______________________________________________________________________________________________
Independent and Identically distributed Variables
● Independence means that the occurrence of one event does not affect the probability of the
occurrence of the other.
○ With a coin flip, independence means that the event of flipping heads is not impacted by how
many heads were flipped in prior coin tosses.
○ This can be a hard concept for many people to grasp––even after flipping nine heads, the
probability of flipping heads on the 10th toss is still 0.5.
● Identically distributed variables are variables with the same probability distribution.
○ For example, if we flip two identical coins back-to-back, each has the same probability of
resulting in heads.
○ If, however, one coin has a different probability of heads coming up than the other, then the
two coins are not identically distributed.
○ It is important to note that identically distributed does not necessarily mean that outcomes
must be equally probable; two coins that each have a 70% probability of resulting in heads
would be identically distributed.
● The i.i.d. assumption is a key assumption of many statistical calculations. This is important to be
aware of, because a large body of academic research has shown that financial return series data is
not an i.i.d. process.
The Probability Distribution
● Recall Example 1 (From Chapter 39
● If we add more observations to our dataset of Dow Jones Industrial Average returns, we start to get a
clearer picture of how the Dow’s annual returns are distributed.
○
Histogram of Dow Jones Industrial Average returns from 1921 through 2017.
www.yubha.com | December, 2022 Edition
Level 01
______________________________________________________________________________________________
○ Just as with the shorter span of Dow returns in the previous lesson , the dataset shown in the
histogram in Figure above of this lesson can provide us with descriptive statistics that
summarize a “typical” return for the Dow.
○ In fact, despite being longer by 86 years, the mean return for the Dow in the longer dataset is
7.95%, a mere 0.14% difference from the 7.81% average return between 2007 and 2017.
● One interesting take away from Figure is that as the number of observations increases, the histogram
of returns begins to look more like a “bell curve.”
● This is an example of a probability distribution, a function that provides the probabilities of
occurrences of different possible outcomes for a random variable.
● A probability distribution helps us understand what is likely given some amount of randomness.
The Normal Distribution
● The best-known bell curve is the normal distribution (also sometimes called a Gaussian distribution).
The normal distribution can be found in many applications; for instance, human height and weight
across a population tend to be more or less normally distributed.
● We can also plot on our distribution the descriptive statistics that we discussed in the previous lesson
in order to get a better visual sense of where our data lie, as well as how probable a given observation
is within the context of the distribution.
● This is useful when evaluating strategy performance as well as understanding the likelihood behind
surprising events, like the deep drawdowns the Dow took in 2008.
● A variation of the normal distribution is the log-normal distribution, which refers to a random variable
whose logarithm is normally distributed. This is a commonly used distribution in finance because in
some markets it has been found to be a better approximation of return distributions.
● Before moving on, it is important to address the commonly repeated observation that stock price
returns do not follow a normal distribution. Academic studies have repeatedly shown that the actual
return distributions empirically exhibited by stock prices are not normally distributed. This is one
reason why price shocks occur far more frequently than academic models would suggest.
● One of the consequences of this is that the assumption of normality made by many indicators (such
as those using standard deviation) means that they do not necessarily do what investors think they
do.
● For example, under a normal distribution, we expect that observations will only exceed two standard
deviations above or below the mean 5% of the time. However, in 2008, daily returns for the S&P 500
exceeded this threshold by almost a factor of three.
● Many statistics have a normality assumption in their interpretation. Likewise, many academic models
have some assumption of normally distributed returns baked in. For that reason, it is wise for technical
analysts to use caution when relying on common statistical tools (particularly complex ones) to draw a
conclusion.
www.yubha.com | December, 2022 Edition
Level 01
______________________________________________________________________________________________
Other common Distributions
● Another (even simpler) type of distribution is the uniform distribution, which represents a situation
where all intervals of the same length on the distribution are equally probable.
○ For instance, a series of rolls of dice follow a uniform distribution.
○ The uniform distribution is sometimes used to detect fraud by examining the rightmost digits of
a study or investment return series where the numbers are reported with a very high degree of
precision. In these cases, the rightmost digit of the number is effectively random, and follows a
uniform distribution.
○ Spotting cases where certain digits appear more frequently than is likely under the uniform
distribution can be a tip off that the numbers were fabricated.
● Another useful distribution is the binomial distribution, which is used to measure probabilities of events
with two distinct outcomes.
○ For instance, a trading strategy with a 70% win rate could be modeled as a binomial
distribution with P(win) = 0.7 and P(loss) = 0.3.
○ Of course, this naively assumes that our wins and losses don’t have the same statistical
issues that returns do, but it can still be a useful model to have available.
Skewness and Kurtosis
Skewness and kurtosis are statistics that describe the way a probability distribution looks.
Skewness is the degree to which returns are asymmetric around the mean.
○ Normal distribution Has a Skew of 0.
○ The skewness is positive when the right “tail” of the distribution is larger than the left side (that
is, it is stretched to the right).
○ A comparison of a normal probability density function versus one with negative skew is shown
in Figure Below.
www.yubha.com | December, 2022 Edition
Level 01
______________________________________________________________________________________________
○ All things being equal, positive skewness is preferred because it means that deviations from
the mean tend to average higher returns; there is more area on the right side of the density
curve. One important side effect of high levels of skewness is that conventional measures of
risk that assume symmetric distributions (such as standard deviation) no longer do a good job
of assessing risk.
Kurtosis measures the degree to which returns show up in the tails of a distribution.
○ Distributions with higher kurtosis have more returns out in the tails.
○ A normal distribution has kurtosis of 3.
○ Kurtosis is often measured relative to the normal distribution, a distribution with excess
kurtosis of 1 would actually have kurtosis of 4.
○ The right panel of Figure above shows a probability density function with excess kurtosis of
1.95.
○ Generally speaking, lower kurtosis is seen as a good thing because the size and occurrence
of “tail events” are lower
www.yubha.com | December, 2022 Edition