CSC 378: Data Structures and Algorithm Analysis

Probability Review

Probability

Suppose we have a linked list and we pick a key. We then search the list to recover the data associated with that key. What's the expected search time? To answer this, we resort to random variables and ``expectation.''

Events and Sample Spaces

Defining the problem more formally, we perform an experiment. The experiment consists of picking a key and searching for it. The act of picking a key and searching for it is an event.

Suppose the possible search keys are index1.gif . Then the sample space of the experiment is the set, index2.gif , of events that can occur:

index3.gif

where, in event index4.gif , we pick key index5.gif and search for it.

Each event, index6.gif , has some probability, index7.gif , of occurring. If each event is equally likely to occur, the distribution of probabilities is said to be uniform, and index8.gif .

Of course, index9.gif and index10.gif .

Random Variables and Expectation

A random variable is a variable that is associated with an event. For example, when event index11.gif occurs, some amount of time is required to search for key index12.gif . Let index13.gif be that amount of time. index14.gif is a random variable.

The expected value of a random variable is the average value of the random variable, given the probability distribution of events.

The expected value of a random variable, index15.gif , is denoted index16.gif .

If the probability distribution is uniform, the expected value is is just the average value that we're familiar with:

index17.gif

But if the probability distribution is not uniform, more probable events are given greater weight in the sum. Each event has a corresponding value of the random variable. In the sum, that value is given a weight (or ``importance'') equal to the probability that the event occurs:

index18.gif

Note that the uniform distribution is just a special case of this.

Searching in a Linked List

Going back to our original question: What's the expected search time in a linked list of index19.gif elements? Here's the list:

index20.gif

The experiment: Given a linked list of index21.gif elements, pick one key which is known to be in the list and search for it.

The sample space: Let index22.gif be the set of keys in the list, so index23.gif denotes the event of picking key index24.gif and searching for it:

index25.gif

It's sometimes convenient, as we've done here, to name the events after parts of the experiment, like the keys. There's nothing wrong with calling an event index26.gif (that is, naming the event after the key, index27.gif ). Whether index28.gif denotes a key or an event should be clear from the context.

The probability distribution: We will assume that each event is equally likely: index29.gif .

The random variable: Let index30.gif be a random variable associated with event index31.gif . index32.gif is the time to search for key index33.gif .

So, what's index34.gif ?

index35.gif

Thus, on average, we'll search about half-way down the list.

A Problem

Here's a problem for you to solve: Suppose that the search times are not uniform and that

index36.gif

Answer the next two questions in order. Don't click on the button until you've worked on the problem yourself!




Back to the lecture index

Copyright 1998 James Stewart