ECSE-2500 Engineering Probability, RPI, Spring 2011 Home Page
Loading...
|
|
Search all EngProbSpring2011 pages: |
|
1. Lectures
1.1 Lecture 1, Tues Jan 25
Toggle display
Topics:
- Syllabus and Intro.
- Why probability is useful
- AT&T installed bandwidth to provide level of iphone service (not all users want to use it simultaneously).
- also web servers, roads, cashiers, ...
- fair price for financial CDO that reduces risk
- Model
- Real thing too expensive, dangerous, time-consuming (aircraft design).
- Capture the relevant, ignore the rest.
- Coin flip: relevant: it's fair? not relevant: copper, tin, zinc, ...
- Validate model if possible.
- Computer simulation model
- For systems too complicated for a simple math equation (i.e., most systems outside school)
- Often a graph of components linked together, e.g., with
- Matlab Simulink
- PSPICE
- many examples, e.g. antilock brake, US economy
- Can do experiments on it.
- Deterministic model
- Resistor: V=IR
- Limitations: perhaps not if I=1000000 amps. Why?
- Limitations: perhaps not if I=0.00000000001 amps. Why?
- Probability model
- Roulette wheel: {$p_i=\frac{1}{38} $} (ignoring http://www.amazon.com/Eudaemonic-Pie-Thomas-Bass/dp/0595142362 )
- Statistical regularity
- {$ lim_{n\rightarrow\infty}f_k(n) =p_k $}
- law of large numbers
- weird distributions (e.g., Cauchy) violate this, but that's probably beyond this course.
- Properties of relative frequency
- the frequencies of all the possibilities sum to 1.
- if an event is composed of several outcomes that are disjoint, the event's probability is the sum of the outcomes' probabilities.
- E.g., If the event is your passing this course and the relevant outcomes are grades A, B, C, D, with probabilities .3, .3, .2, .1, then ppass=.9. (These numbers are fictitious.)
- Axiomatic approach
- Probability is between 0 and 1.
- Probs sum to 1.
- If the events are disjoint, then the probs add.
- Building a model
- Want to model telephone conversations where speaker talks 1/3 of time.
- Could use an urn with 2 black, 1 white ball.
- Computer random number generator easier.
- Detailed example in more detail - phone system
- Design telephone system for 48 simultaneous users.
- Transmit packet of voice every 10msecs.
- Only 1/3 users are active.
- 48 channels wasteful.
- Alloc only M<48 channels.
- In the next 10msec block, A people talked.
- If A>M, discard A-M packets.
- How good is this?
- n trials
- Nk(n) trials have k packets
- frequency fk(n)=Nk(n)/n
- fk(n)->pk probability
- We'll see the exact formula (Poisson) later.
- average number of packets in one interval: {$$\frac{\sum_{k=1}^{48} kN_k(n)}{n} \rightarrow \sum_{k=1}^{48} kp_k = E[A] $$}
- That is the expected value of A.
- Probability application: unreliable communication channel.
- Transmitter transmits 0 or 1.
- Receiver receives 0 or 1.
- However, a transmitted 0 is received as a 0 only 90% of the time, and
- a transmitted 1 is received as a 1 only 80% of the time, so
- if you receive a 0 what's the probability that a 0 was transmitted?
- ditto 1.
- (You don't have enough info to answer this; you need to know also the probability that a 0 was transmitted. Perhaps the transmitter always sends a 0.)
- Another application: stocking spare parts:
- There are 10 identical lights in the classroom ceiling.
- The lifetime of each bulb follows a certain distribution. Perhaps it dies uniformly anytime between 1000 and 3000 hours.
- As soon as a light dies, the janitor replaces it with a new one.
- How many lights should the janitor stock so that there's a 90% chance that s/he won't run out within 5000 hours?
Reading:
- Leon-Garcia, chapter 1.
Announcements:
- Homework 1 available.
- Bring your clicker on Fri.
1.2 Lecture 2, Fri Jan 28
Toggle display
Topics:
- Why not use LMS?
- This wiki is much easier to use. I can create and save material more easily.
- You don't need to login to see the content. You can deep-link into the material. People outside the course can see, and save, the material.
- A few years ago, LMS sued an open source competitor. (They lost!)
- Coin toss applet demonstrates how the observed frequencies converge (slowly) to the theoretical probability.
- Example of unreliable channel
- Want to transmit a bit: 0, 1
- It arrives wrong with probability e, say 0.001
- Idea: transmit each bit 3 times and vote.
- 000 -> 0
- 001 -> 0
- 011 -> 1
- 3 bits arrive correct with probability (1-e)3 = 0.997002999
- 1 error with probability 3(1-e)2e = 0.002994
- 2 errors with probability 3(1-e)e2 = 0.000002997
- 3 errors with probability e3 = 0.000000001
- corrected bit is correct if 0 or 1 errors, with probability (1-e)3+3(1-e)2e = 0.999996999
- We reduced probability of error by factor of 1000.
- Cost: triple the transmission plus a little logic HW.
- Example of text compression
- Simple way: Use 5 bits for each letter: A=00000, B=00001
- In English, 'E' common, 'Q' rare
- Use fewer bits for E than Q.
- Morse code did this 170 years ago.
- E = .
- Q = _ _ . _
- Aside: An expert Morse coder is faster than texting.
- English can be compressed to about 1 bit per letter (with difficulty); 2 bits is easy.
- Aside: there is so much structure in English text, that if you add the bit strings for 2 different texts bit-by-bit, they can usually mostly be reconstructed.
- That's how cryptoanalysis works.
- Example of reliable system design
- Nuclear power plant fails if
- water leaks
- and operator asleep (a surprising number of disasters happen in the graveyard shift).
- and backup pump fails
- or was turned off for maintenance
- .....
- What's the probability of failure? This depends on the probabilities of the various failure modes. Those might be impossible to determine accurately.
- Design a better system? Coal mining kills.
- The backup procedures themselves can cause problems (and are almost impossible to test). A failure with the recovery procedure was part of the reason for the recent Skype outage.
- Nuclear power plant fails if
Chapter 2
Toggle display
- A random experiment has 2 parts:
- experimental procedure
- set of measurements
- Random experiment may have subexperiments and sequences of experiments.
- Outcome or sample point {$\zeta$}: a non-decomposable observation.
- Sample space S: set of all outcomes
- |S|:
- finite, e.g. {H,T}
- discrete = countable, e.g., 1,2,3,4,... Sometimes discrete includes finite.
- uncountable, e.g., {$\Re$}, aka continuous.
- Types of infinity:
- Some sets have finite size, e.g., 2 or 6.
- Other sets have infinite size.
- Those are either countable or uncountable.
- A countably infinite set can be arranged in order so that its elements can be numbered 1,2,3,...
- The set of natural numbers is obviously countable.
- The set of positive rational numbers between 0 and 1 is also countable. You can order it thus: {$$ \frac{1}{1}, \frac{1}{2}, \frac{1}{3}, \frac{2}{3}, \frac{1}{4}, \ \frac{3}{4}, \frac{1}{5}, \frac{2}{5}, \frac{3}{5}, \ \cdots $$}
- The set of real numbers is not countable (aka uncountable). Proving this is beyond this course. (It uses something called diagonalization.
- Uncountably infinite is a bigger infinity than countably infinite, but that's beyond this course.
- Why is this relevant to probability?
- We can assign probabilities to discrete outcomes, but not to individual continuous outcomes.
- We can assign probabilities to some events, or sets of continuous outcomes.
- E.g. Consider this experiment to watch an atom of sodium-26.
- Its half-life is 1 second (Applet: Nuclear Isotope Half-lifes)
- Define the outcomes to be the number of complete seconds before it decays: {$ S=\{0, 1, 2, 3, \cdots \} $}
- {$ |S| $} is countably infinite, i.e., discrete.
- {$ p(0)=\frac{1}{2}, p(1)=\frac{1}{4}, \cdots $} {$ p(k)=2^{-(k+1)} $}
- {$$ \sum_{k=0}^\infty p(k) = 1 $$}
- We can define events like these:
- The atom decays within the 1st second. p=.5.
- The atom decays within the first 3 seconds. p=.875.
- The atom's lifetime is an even number of seconds. {$$ p = \frac{1}{2} + \frac{1}{8} + \frac{1}{32} + \cdots = \frac{2}{3} $$}
- Now consider another experiment: Watch another atom of Na-26
- But this time the outcome is defined to be the real number, x, that is the time until it decays.
- {$ S = \{ x | x\ge0 \} $}
- |S| is uncountably infinite.
- We cannot talk about the probability that x=1.23 exactly. (It just doesn't work out.)
- However, we can define the event that {$ 1.23 < x < 1.24 $}, and talk about its probability.
- {$ P[x>x_0] = 2^{-x_0} $} (corrected)
- {$ P[1.23 < x < 1.24] $} {$ = 2^{-1.23} - 2^{-1.24} = 0.003 $}
- Event
- collection of outcomes, subset of S
- what we're interested in.
- e.g., outcome is voltage, event is V>5.
- certain event: S
- null event: {$\emptyset$}
- elementary event: one discrete outcome
- Set theory
- Sets: S, A, B, ...
- Universal set: U
- elements or points: a, b, c
- {$ a\in S, a\notin S $}, {$A\subset B$}
- Venn diagram

- empty set: {} or {$\emptyset$}
- operations on sets: equality, union, intersection, complement, relative complement
- properties (axioms): commutative, associative, distributive
- theorems: de Morgan
- Prove deMorgan 2 different ways.
- Use the fact that A equals B iff A is a subset of B and B is a subset of A.
- Look at the Venn diagram; there are only 4 cases.
- 2.1.4 Event classes
- Remember: an event is a set of outcomes of an experiment, e.g., voltage.
- In a continuous sample space, we're interested only in some possible events.
- We're interested in events that we can measure.
- E.g., we're not interested in the event that the voltage is exactly an irrational number.
- Events that we're interested in are intervals, like [.5,.6] and [.7,.8].
- Also unions and complements of intervals.
- This matches the real world. You can't measure a voltage as 3.14159265...; you measure it in the range [3.14,3.15].
- Define {$\cal F$} to be the class of events of interest: those sets of intervals.
- We assign probabilities only to events in {$\cal F$}.
- 2.2 Axioms of probability
- An axiom system is a general set of rules. The probability axioms apply to all probabilities.
- Axioms start with common sense rules, but get less obvious.
- I: 0<=P[A]
- II: P[S]=1
- III: {$ A\cap B=\emptyset \rightarrow $} {$ P[A\cup B] = P[A]+P[B] $}
- III': For {$ A_1, A_2, .... $} if {$ \forall_{i\ne j} A_i \cap A_j = \emptyset$} then {$P[\bigcup_{i=1}^\infty A_i] $} {$ = \sum_{i=1}^\infty P[A_i] $}
- Example: cards. Q=event that card is queen, H=event that card is heart.
These events are not disjoint. Probabilities do not sum.
- {$Q\cap H \ne\emptyset $}
- P[Q] = 1/13=4/52, P[H] = 1/4=13/52, P[Q {$\cup$} H] = 16/52!=17/52.
- Example C=event that card is clubs. H and C are disjoint. Probabilities
do sum.
- {$C\cap H \ne\emptyset $}
- P[C] = 13/52, P[H] = 1/4=13/52, P[Q {$\cup$} H] = 26/52.
- Example. Flip a fair coin {$A_i$} is the event that the first time you see
heads is the i-th time, for {$i\ge1$}.
- We can assign probabilities to these countably infinite number of events.
- {$ P[A_i] = 1/2^i $}
- They are disjoint, so probabilities sum.
- Probability that the first head occurs in the 10th or later toss = {$ \sum_{i=10}^\infty 1/2^i $}
- Corollory 1
- {$ P[A^c] = 1-P[A] $}
- E.g., P[heart] = 1/4, so P[not heart] = 3/4
- Corollory 2: P[A] <=1
- Corollory 3: P[{$ \emptyset $}] = 0
- Corollory 4:
- For {$ A_1, A_2, .... A_n $} if {$ \forall_{i\ne j} A_i \cap A_j = \emptyset$} then {$P\left[\bigcup_{i=1}^n A_i\right] = \sum_{i=1}^n P[A_i] $}
- Proof by induction from axiom III.
1.3 Lecture 3, Tues Feb 1
Toggle display
- Homework 2 online.
- Teaching assistants' photos online.
- How Did Economists Get It So Wrong? is the article by Paul Krugman (2008 Nobel Memorial Prize in Economic Science) that I mentioned in the first lecture. It says, "the economics profession went astray because economists, as a group, mistook beauty, clad in impressive-looking mathematics, for truth." You might see a certain relevance to this course. You have to get the model right before trying to solve it. Though I don't know much about it, I'll cheerfully try to answer any questions about econometrics. Another relevance to this course, in an enrichment sense, is that some people believe that the law of large numbers does not apply to certain variables, like stock prices. They think that larger and larger sample frequencies do not converge to a probability, because the variance of the underlying distribution is infinite. See Do financial returns have finite or infinite variance? A paradox and an explanation. This also is beyond this course.
- Corrections: 1, 2, 3
- iClicker (2nd try)
- Answer this question (won't be graded): 2+2=? 1, B:2, C:3.9999, D:4, E: whatever you want it to be.
- Register your iClicker, using your RCS ID. E.g., I would register as W Randolph Franklin frankwr
- iclicker exercise based on retransmitting a noisy bit 3 times here (not graded)
- Set e=0.1
- What is probability of no error in 3 bits:
- A 0.1
- B 0.3
- C 0.001
- D 0.729
- E 0.9
- Corollory 5:
{$ P[A\cup B] $} {$ = P[A] + P[B] - P[A\cap B] $}
- Proof from the fact that {$ P[A], P[B], P[A\cap B] $} are disjoint.
- Example: Queens and hearts. P[Q]=4/52, P[H]=13/52, P[Q {$\cup$} H]=16/52, P[Q {$\cap$} H]=1/52.
- {$ P[A\cup B] \le P[A] + P[B] $}
- Corollory 6:
{$ \begin{array}{c} P\left[\cup_{i=1}^n A_i\right] = \\ \sum_{i=1}^n P[A_i] \\ - \sum_{i<j} P[A_i\cap A_j] \\ + \sum_{i<j<k} P[A_i\cap A_j\cap A_k] \cdots \\ + (-1)^{n+1} P[\cap_{i=1}^n A_i] \end{array} $}
- Example Q=queen card, H=heart, F=
face card
- P[Q]=4/52, P[H]=13/52, P[F]=12/52,
- P[Q {$\cap$} H]=1/52, P[Q {$\cap$} F] = you tell me
- P[H {$\cap$} F]= you tell me
- P[Q {$\cap$} H {$\cap$} F] = you tell me
- So P[Q {$\cup$} H {$\cup$} F] = ?
- Example from Roulette:
- R=red, B=black, E=even, A=1-12
- P[R] = P[B] = P[E] = 16/38. P[A]=12/38
- {$ P[R\cup E \cup A] $} = ?
- You could try at http://www.roulette4fun.com/online_game.php but it's slow.
- Example Q=queen card, H=heart, F=
face card
- Corollory 7: if {$A\subset B$} then P[A] <= P[B] Example: Probability of a repeated coin toss having its first head in the 2nd-4th toss (1/2+1/4+1/8) {$\ge$} Probability of it happening in the 3rd toss (1/4).
- 2.2.1 Discrete sample space
- If sample space is finite, probabilities of all the outcomes tell you everything.
- sometimes they're all equal.
- Then {$ \text{P[event]} $} {$ = \frac{\text{# outcomes in event}}{\text{total # outcomes}} $}
- For countably infinite sample space, probabilities of all the outcomes also tell you everything.
- E.g. fair coin. P[even] = 1/2
- E.g. example 2.9. Try numbers from http://random.org/.
- What probabilities to assign to outcomes is a good question.
- Example 2.10. Toss coin 3 times.
- Choice 1: outcomes are TTT ... HHH, each with probability 1/8
- Choice 2: outcomes are # heads: 0...3, each with probability 1/4.
- Incompatible. What are probabilities of # heads for choice 1?
- Which is correct?
- Both might be mathematically ok.
- It depends on what physical system you are modeling.
- You might try doing the experiment and observing.
- You might add a new assumption: The coin is fair and the tosses independent.
- Example 2.11: countably infinite sample space.
- Toss fair coin, outcome is # tosses until 1st head.
- What are reasonable probabilities?
- Do they sum to 1?
- 2.2.2 Continuous sample spaces
- Usually we can't assign probabilities to points on real line. (It just doesn't work out mathematically.)
- Work with set of intervals, and Boolean operations on them.
- Set may be finite or countable.
- This set of events is a Borel set.
- Notation:
- [a,b] closed. includes both. a<=x<=b
- (a,b) open. includes neither. a<x<b
- [a,b) includes a but not b, a<=x<b
- (a,b] includes b but not a, a<x<=b
- Assign probabilities to intervals (open or closed).
- E.g., uniform distribution on [0,1] {$$ P[a\le x\le b] = \frac{1}{b-a} $$}
- Nonuniform distributions are common.
- Even with a continuous sample space, a few specific points might have
probabilities. The following is mathematically a valid probability
distribution. However I can't immediately think of a physical system
that it models.
- {$ S = \{ x | 0\le x\le 1 \} $}
- {$ p(x=1) = 1/2 $}
- For {$ 0\le x_0 \le 1, p(x<x_0) = x_0/2 $}
- For fun: Heads you win, tails... you win. You can beat the toss of a coin and here's how...
- Example 2.13, page 39, nonuniform distribution: chip lifetime.
- Propose that P[(t, {$ \infty $} )] = {$ e^{-at} $} for t>0.
- Does this satisfy the axioms?
- I: yes >0
- II: yes, P[S] = {$ e^0 $} = 1
- III here is more like a definition for the probability of a finite interval
- P[(r,s)] = P[(r, {$ \infty $} )] - P[(s, {$ \infty $} )] = {$ e^{-ar} - e^{-as} $}
- Probability of a precise value occurring is 0, but it still can occur, since SOME value has to occur.
- Example 2.14: picking 2 numbers randomly in a unit square.
- Assume that the probability of a point falling in a particular region is proportional to the area of that region.
- E.g. P[x>1/2 and y<1/10] = 1/20
- P[x>y] = 1/2
- Recap:
- Problem statement defines a random experiment
- with an experimental procedure and set of measurements and observations
- that determine the possible outcomes and sample space
- Make an initial probability assignment
- based on experience or whatever
- that satisfies the axioms.
1.4 Lecture 4, Fri Feb 4
Toggle display
- iclickers: We'll try yet again. Bruce Laplante writes, "Some students do not realize that the iClicker takes 3 AAA batteries. The third battery is all the way down the tube and requires a few taps of the clicker to get it out. When batteries die, students who don't know this replace only 2, and the transmitter still won't work."
- iClicker (2nd try)
- Register your iClicker, using your RCS ID. E.g., I would register as W Randolph Franklin frankwr
- Answer this question (won't be graded): 2+2=?
- 1
- 2
- 3.9999
- 4
- Whatever you want it to be.
- Continuous probability:
- S is the real interval [0,1].
- P([a,b]) = b-a if 0<=a<=b<=1.
- Event A = [.2,.6].
- Event B = [.4,1].
- What is P[A]?
- .2
- .4
- .6
- .8
- 1.
- What is P[B]?
- .2
- .4
- .6
- .8
- 1.
- What is P[A {$\cup$} B]?
- .2
- .4
- .6
- .8
- 1.
- What is P[A {$\cap$} B]?
- .2
- .4
- .6
- .8
- 1.
- What is P[A {$\cup$} Bc ]?
- .2
- .4
- .6
- .8
- 1.
- Today: counting methods.
- We have an urn with n balls.
- Maybe the balls are all different, maybe not.
- W/o looking, we take k balls out and look at them.
- Maybe we put each ball back after looking at it, maybe not.
- Suppose we took out one white and one green ball. Maybe we care about their order, so that's a different case from green then white, maybe not.
- Applications:
- How many ways can we divide a class of 12 students into 2 groups of 6?
- How many ways can we pick 4 teams of 6 students from a class of 88 students (leaving 64 students behind)?
- We pick 5 cards from a deck. What's the probability that they're all the same suit?
- We're picking teams of 12 students, but now the order matters since they're playing baseball and that's the batting order.
- We have 100 widgets; 10 are bad. We pick 5 widgets. What's the probability that none are bad? Exactly 1? More than 3?
- In the approval voting scheme, you mark as many candidates as you please. The candidate with the most votes wins. How many different ways can you mark the ballot?
- In preferential voting, you mark as many candidates as you please, but rank them 1,2,3,... How many different ways can you mark the ballot?
- Efficiency rankings have been calculated for each of 32 football teams. Assuming that everything were uniform and random, what's the probability that both Pittsburgh and Green Bay are in the top 5?
- Leon-Garcia 2.3: Counting methods, pp 41-46.
- finite sample space
- each outcome equally probable
- get some useful formulae
- warmup: consider a multiple choice exam where 1st answer has 3 choices,
2nd answer has 5 choices and 3rd answer has 6 choices.
- Q: How many ways can a student answer the exam?
- A: 3x5x6
- If there are k questions, and the i-th question has {$n_i$} answers then the number of possible combinations of answers is {$n_1n_2 .. n_k $}
- 2.3.1 Sampling WITH replacement and WITH ordering
- Consider an urn with n different colored balls.
- Repeat k times:
- Draw a ball.
- Write down its color.
- Put it back.
- # Distinct ordered k-tuples = {$n^k$}
- Example 2.1.5. How many distinct ordered pairs for 2 balls from 5? 5*5.
- iClicker. Suppose I want to eat one of the following 4 places, for
tonight and again tomorrow, and don't care if I eat at the same place
both times: Commons, Sage, Union, Knotty Pine. How many choices to I
have where to eat?
- 16
- 12
- 8
- 4
- something else
- 2.3.2 Sampling WITHOUT replacement and WITH ordering
- Consider an urn with n different colored balls.
- Repeat k times:
- Draw a ball.
- Write down its color.
- Don't put it back.
- # Distinct ordered k-tuples = n(n-1)(n-2)...(n-k+1)
- iClicker. Suppose I want to visit two of the following four
cities: Buffalo, Miami, Boston, New York. I don't want to visit one
city twice, and the order matters. How many choices to I have how to
visit?
- 16
- 12
- 8
- 4
- something else
- Example 2.1.6: Draw 2 balls from 5 w/o replacement.
- 5 choices for 1st ball, 4 for 2nd. 20 outcomes.
- Probability that 1st ball is larger?
- List the 20 outcomes. 10 have 1st ball larger. P=1/2.
- Example 2.1.7: Draw 3 balls from 5 with replacement. What's the
probability they're all different?
- P = {$ \small \frac{\text{# cases where they're different}} {\text{# cases where I don't care}} $}
- P = {$ \small \frac{\text{# case w/o replacement}} {\text{# cases w replacement}} $}
- P = {$ \frac{5*4*3}{5*5*5} $}
- 2.3.3 Permutations of n distinct objects
- Distinct means that you can tell the objects apart.
- This is sampling w/o replacement for k=n
- 1.2.3.4...n = n!
- It grows fast. 1!=1, 2!=2, 3!=6, 4!=24, 5!=120, 6!=720, 7!=5040
- Stirling approx: {$$ n! $$} {$$ \approx \sqrt{2\pi n} \left(\frac{n}{e}\right)^n\left(1+\frac{1}{12n}+...\right) $$}
- Therefore if you ignore the last term, the relative error is about 1/(12n).
- Example 2.1.8. # permutations of 3 objects. 6!
- Example 2.1.9. 12 airplane crashes last year. Assume independent,
uniform, etc, etc. What's probability of exactly one in each month?
- For each crash, let the outcome be its month.
- # events for all 12 crashes = {$ 12^{12} $}
- # events for 12 crashes in 12 different months = 12!
- Probability = {$ 12!/(12^{12}) = 0.000054 $}
- Random does not mean evenly spaced.
- 2.3.4 Sampling w/o replacement and w/o ordering
- We care what objects we pick but not the order
- E.g., drawing a hand of cards.
- term: Combinations of k objects selected from n. Binomial coefficient. {$ C^n_k = {n \choose k} = \frac{n!}{k! (n-k)!} $}
- Permutations is when order matters.
- Example 2.20. Select 2 from 5 w/o order. {$ 5\choose 2 $}
- Example 2.21 # permutations of k black and n-k white balls. This is choosing k from n.
- Example 2.22. 10 of 50 items are bad. What's probability 5 of 10
selected randomly are bad?
- # ways to have 10 bad items in 50 is {$ 50\choose 10$}
- # ways to have exactly 5 bad is 3 ways to select 5 good from 40 times # ways to select 5 bad from 10 = {$ {40\choose5} {10\choose5} $}
- Probability is ratio.
- Multinomial coefficient: Partition n items into sets of size {$ k_1, k_2, ... k_j, \sum k_i=n $} {$ \frac{n!}{k_1! k_2! ... k_j!} $}
- 2.3.5. skip
- Probability in the news: Statistician Cracks Code For Lottery Tickets
Reading for Tues: 2.4 Conditional probability, page 47-
1.5 Lecture 5, Tues Feb 8
Toggle display
- Homework 3 out.
- Review last time by solving the applications I posted then.
- New stuff, pp. 47-66:
- Conditional probability - If you know that event A has occurred, does that change the probability that event B has occurred?
- Independence of events - If no, then A and B are independent.
- Sequential experiments - Find the probability of a sequence of experiments from the probabilities of the separate steps.
- Binomial probabilities - tossing a sequence of unfair coins.
- Multinomial probabilities - tossing a sequence of unfair dice.
- Geometric probabilities - toss a coin until you see the 1st head.
- Sequences of dependent experiments - What you see in step 1 influences what you do in step 2.
- 2.4 Conditional probability, page 47.
- big topic
- E.g., if it snows today, is it more likely to snow tomorrow? next week? in 6 months?
- E.g., what is the probability of the stock market rising tomorrow given that (it went up today, the deficit went down, an oil pipeline was blown up, ...)?
- What's the probability that a CF bulb is alive after 1000 hours given that I bought it at Walmart?
- definition {$ P[A|B] = \frac{P[A\cap B]}{P[B]} $}
- Consider a fictional university that has both undergrads and grads. It
also has both Engineers and others:

- iClicker: What's the probability that a student is an Engineer?
- 1/7
- 4/7
- 5/7
- 3/4
- 3/5
- iClicker: What's the probability that a student is an Engineer,
given that s/he is an undergrad?
- 1/7
- 4/7
- 5/7
- 3/4
- 3/5
- P[A{$\cap$}B] = P[A|B]P[B] = P[B|A]P[A]
- Example 2.26 Binary communication. Source transmits 0 with probability (1-p) and 1 with probability p. Receiver errs with probability e. What are probabilities of 4 events?
- Total probability theorem
- {$B_i$} mutually exclusive events whose union is S
- P[A] = P[A {$\cap B_1$} + P[A {$\cap B_2$} + ...
- {$ P[A] = P[A|B_1]P[B_1] $} {$ + P[A|B_2]P[B_2] + ... $}
What's the probability that a student is an undergrad, given ... (Numbers are fictitious.)
- Example 2.28. Chip quality control.
- Each chip is either good or bad.
- P[good]=(1-p), P[bad]=p.
- If the chip is good: P[still alive at t] = {$ e^{-at} $}
- If the chip is bad: P[still alive at t] = {$ e^{-1000at} $}
- What's the probability that a random chip is still alive at t?
- 2.4.1, p52. Bayes' rule. This lets you invert the conditional probabilities.
- Bj partition S. That means that
- If {$i\ne j$} then {$ B_i\cap B_j=\emptyset $} and
- {$ \bigcup_i B_i = S $}
- {$$ P[B_j|A] = \frac{B_j\cap A}{P[A]} $$} {$$ = \frac{P[A|B_j] P[B_j]}{\sum_k P[A|B_k] P[B_k]} $$}
- application:
- We have a priori probs {$ P[B_j] $}
- Event A occurs. Knowing that A has happened gives us info that changes the probs.
- Compute a posteriori probs {$ P[B_j|A] $}
- Bj partition S. That means that
- In the above diagram, what's the probability that an undergrad is an engineer?
- Example 2.29 comm channel: If receiver sees 1, which input was more probable? (You hope the answer is 1.)
- Example 2.30 chip quality control: For example 2.28, how long do we have to burn in chips so that the survivors have a 99% probability of being good? p=0.1, a=1/20000.
- Example:
False positives in a medical test
- T = test for disease was positive; T' = .. negative
- D = you have disease; D' = .. don't ..
- P[T|D] = .99, P[T'|D'] = .95, P[D] = 0.001
- P[D'|T] (false positive) = 0.98 !!!
- Example: Pick a cookie, from same page.
- 2.5 Independent events
- {$ P[A\cap B] = P[A] P[B] $}
- P[A|B] = P[A], P[B|A] = P[B]
- A,B independent means that knowing A doesn't help you with B.
- Mutually exclusive events w.p.>0 must be dependent.
- Example 2.33. Last condition above is required.

- More that 2 events:
- N events are independent iff the occurrance of no combo of the events affects another event.
- Each pair is independent.
- Also need {$ P[A\cap B\cap C] = P[A] P[B] P[C] $}
- This is not intuitive A, B, and C might be pairwise independent, but, as a group of 3, are dependent.
- See example 2.32, page 55. A: x>1/2. B: y>1/2. C: x>y
- Common application: independence of experiments in a sequence.
- Example 2.34: coin tosses are assumed to be independent of each other. P[HHT] = P[1st coin is H] P[2nd is H] P[3rd is T].
- Example 2.35 System reliability
- Controller and 3 peripherals.
- System is up iff controller and at least 2 peripherals are up.
- Add a 2nd controller.
- 2.6 p59 Sequential experiments: maybe independent
- 2.6.1 Sequences of independent experiments
- Example 2.36
- 2.6.2 Binomial probability
- Bernoulli trial flip a possibly unfair coin once. p is probability of head.
- (Bernoulli did stats, econ, physics, ... in 18th century.)
- Example 2.37
- P[TTH] = {$ (1-p)^2 p $}
- P[1 head] = {$ 3 (1-p)^2 p $}
- Probability of exactly k successes = {$$ p_n(k) = {n \choose k} p^k (1-p)^{n-k} $$}
- {$ \sum_{k=0}^n p_n(k) = 1 $}
- Example 2.38
- Can avoid computing n! by computing {$ p_n(k) $} recursively, or by using approximation. Also, in C++, using double instead of float helps. (Almost always you should use double instead of float. It's the same speed.)
- Example 2.39
- Example 2.40 Error correction coding
- Multinomial probability law
- There are M different possible outcomes from an experiment, e.g., faces of a die showing.
- Probability of particular outcome: {$p_i$}
- Now run the experiment n times.
- Probability that i-th outcome occurred {$k_i$} times, {$ \sum_{i=1}^M k_i = n $} {$$ P[(k_1,k_2,...,k_M)] $$} {$$ = \frac{n!}{k_1! k_2! ... k_M!} p_1^{k_1} p_2^{k_2}...p_M^{k_M} $$}
- Example 2.41 dartboard.
- Example 2.42 random phone numbers.
- 2.7 Computer generation of random numbers
- Skip this section, except for following points.
- Executive summary: it's surprisingly hard to generate good random numbers. Commercial SW has been known to get this wrong. By now, they've gotten it right (I hope), so just call a subroutine.
- Arizona lottery got it wrong in 1998.
- Even random electronic noise is hard to use properly. The best selling 1955 book A Million Random Digits with 100,000 Normal Deviates had trouble generating random numbers this way. Asymmetries crept into their circuits perhaps because of component drift. For a laugh, read the reviews.
- Pseudo-random number generator: The subroutine returns numbers according to some algorithm (e.g., it doesn't use cosmic rays), but for your purposes, they're random.
- Computer random number routines usually return the same sequence of number each time you run your program, so you can reproduce your results.
- You can override this by seeding the generator with a genuine random number from linux /dev/random.
- 2.8 and 2.9 p70 Fine points: Skip.
1.6 Lecture 6, Fri Feb 11
Toggle display
- Review Bayes theorem, since it is important. Here is a fictitious
(because none of these probilities have any justification) SETI
example.
- A priori probability of extraterrestrial life = P[L] = {$10^{-8} $}.
- For ease of typing, let L' be the complement of L.
- Run a SETI experiment. R (for Radio) is the event that it has a positive result.
- P[R|L] = {$ 10^{-5} $}, P[R|L'] = {$ 10^{-10} $}.
- What is P[L|R] ?
- Some specific probability laws
- In all of these, successive events are independent of each other.
- A Bernoulli trial is one toss of a coin where p is probability of head.
- We saw binomial and multinomial probilities on Tues.
- The binomial law gives the probability of exactly k heads in n tosses of an unfair coin.
- The multinomial law gives the probability of exactly ki occurrances of the i-th face in n tosses of a die.
- iClicker: You have a coin where the probability of a head is p=2/3
If you toss it twice, what's the probability that you will see one head
and one tail?
- 1/2
- 1/3
- 2/9
- 5/9
- 4/9
- 2.6.4 p63 Geometric probability law
- Repeat Bernoulli experiment until 1st success.
- Define outcome to be # trials until that happens.
- Define q=(1-p).
- {$ p(m) = (1-p)^{m-1}p = q^{m-1}p $} (p has 2 different uses here).
- {$ \sum_{m=1}^\infty p(m) =1$}
- Probability that more than K trials are required = {$q^K$}.
- Example: probability that more than 10 tosses of a die are required to get a 6 = {$ \left(\frac{5}{6}\right)^{10} = 0.16 $}
- iClicker: You have a coin where the probability of a head is p=2/3
What's the probability that the 1st head occurs on the 2nd toss?
- 1/2
- 1/3
- 2/9
- 5/9
- 4/9
- Example 2.43: error control by retransmission. A sent over a noisy channel is checksummed so the receiver can tell if it got mangled, and then ask for retransmission. TCP/IP does this. Aside: This works better when the roundtrip time is reasonable. Using this when talking to Mars is challenging.
- 2.6.5 p64 Sequences, chains, of dependent experiments.
- This is an important topic, but mostly beyond this course.
- In many areas, there are a sequence of observations, and the probability of each observation depends on what you observed before.
- It relates to Markov chains.
- Motivation: speech and language recognition, translation, compression
- E.g., in English text, the probability of a u is higher if the previous char was q.
- The probability of a b may be higher if the previous char was u (than if it was x), but is lower if the previous two chars are qu.
- Need to look at probabilities of sequences, char by char.
- Same idea in speech recognition: phonemes follow phonemes...
- Same in language understanding: verb follows noun...
- Example 2.44, p64
- In this example, you repeatedly choose an urn to draw a ball from, depending on what the previous ball was.
- Chapter 3, p 96. Discrete random variables
- From now on our random experiments will always produce numbers, called random variables, at least indirectly.
- Then we can compute, e.g., a fair value to pay for a gamble. What should you pay to play roulette so that betting on red breaks even on average?
- Discrete is different from discreet.
- Random experiment → nonnumerical outcome {$ \zeta $} →
- Random Variable {$ X(\zeta ) $}. Any real number.
- Random vars in general: X, Y, ...
- particular values: x, y, ...
- It's the outcome that's random, not the r.v., which is a deterministic function of the outcome.
- Example 3.1 Coin tosses
- Define X to be the number of heads from 3 tosses.
- {$ \zeta $}=HHH, X(HHH) =3, {$ S_X=\{0,1,2,3\} $}
- Example 3.2 Betting game addon to 3.1
- Define another random var Y to be payoff: 8 if X=3, 1 if X=2, 0 else.
- Y is derived from X
- Example 3.3 add probs to 3.2, assuming fair coin. P[X=2], P[Y=8]
- 3.1.1 Ignore since it's starred.
- 3.2 Discrete r.v. and Probability mass function (pmf).
- The pmf shows the probability of every value of random variable X, and of every real number.
- If X cannot have the value x, then the pmf is 0 at x.
- {$ p_X(x) = P[X=x] = P[\{\zeta:X(\zeta)=x\}] $}
- p100: 3 properties of pmf. They're all common sense.
- Nonnegative.
- Sums to one.
- The probability of an event B is the sum of the probabilities of the outcomes in B.
- Example 3.5 probability of # heads in 3 coin tosses: {$ p_X(0)=1/8 $}
- Example 3.6 betting game {$ p_Y(1)=3/8 $}
- Fig 3.4. You can graph the pmf.
- There are many types of random variables, depending on the shape of the pmf. These start out the same as the various probability laws in Chapter 2. However we'll see more types (e.g., Poisson) and more properties of each type (e.g., mean, standard deviation, generating function).
- Example 3.7 random number generator
- produces integer X equally likely in range 0..M-1
- {$ S_X=\{0, 1, ... M-1 \} $}
- pmf: {$ p_X(k)=1/M $} for k in 0..M-1.
- X is a uniform random variable over that set.
- Example 3.8 Bernoulli random variable
- indicator function {$ I_A(\zeta)=1 $} iff {$ \zeta\in A $}
- pmf of {$ I_A $}: {$ p_I(0)=1-p, p_I(1)=p $}
- {$ I_A $} is a Bernoulli random variable.
- Example 3.9 Message transmission until success
- {$ p_X(k)=q^{k-1}p, k=1,2,3,... $}
- Geometric random variable
- What about P[X is even]?
- Example 3.10 Number of transmission errors
- {$ p_X(k) = {n \choose k} p^k q^{n-k}, k=0,1,...n $}
- binomial random variable
- Fig 3.5 You can graph the relative frequencies from running an experiment
repeatedly.
- It will approach the pmf graph (absent pathological cases like the Cauchy distribition that are beyond this course.)
- 3.3 p104 Expected value and other moments
- This is a way to summarize a r.v., and capture important aspects.
- E.g., What's a fair price to pay for a lottery ticket?
- Mean or expected value or center of mass: {$ m_X = E[X] = \sum_{x\in S_X} x p_X(x) $}
- Defined iff absolute convergence: {$ \sum |x| p(x) < \infty $}
- Example 3.11 Mean of Bernoulli r.v.
- Example 3.12 Mean of Binomial r.v. What's the expected # of heads in 3 tosses?
- Example 3.13 Mean of uniform discrete r.v.
- Run an experiment n times and observe {$ x(1), x(2), ... $}
- {$ N_k(n) $} # times {$ x_k $} was seen
- {$ f_k(n) = N_k(n)/n $} frequencies
- Sample mean {$ <X>_n = \sum x_kf_k(n) $}
- With lots of experiments, frequencies approach probabilities and sample mean converges to E[X]
- However it may take a long time, which is why stock market investors can go broke first.
- Example 3.14 p 106 Betting game
- Example 3.15 Mean of a geometric r.v.
- Example 3.16 St Petersburg paradox
- 3.1.1 Expected value of a function of a r.v.
- Z=g(X)
- E[Z] = E[g(x)] = {$ \sum_k g(x_k) p_X(x_k) $}
- Example 3.17 square law device
- {$ E[a g(X)+b h(X)+c] = a E[g(X)] + b E[h(x)] + c $}
- Example 3.18 Square law device continued
- Example 3.19 Multiplexor discards packets
- Next time:
- Variance, how spread out is the random variable?
- Lots more specific distribitions.
- Continuous random variables.
1.7 Lecture 7, Tues Feb 15
Toggle display
- This chapter covers Discrete (finite or countably infinite) r.v.. This contrasts to continuous, to be covered later.
- Discrete r.v.s we've seen so far:
- uniform: M events 0...M-1 with equal probs
- bernoulli: events: 0 w.p. q=(1-p) or 1 w.p. p
- binomial: # heads in n bernoulli events
- geometric: # trials until success, each trial has probability p.
- Compute mean of a binomial distribution (started last Fri).
- Compute mean of a geometric distribution (started last Fri).
- 3.3.1, page 107: Operations on means: sums, scaling, functions
- iclicker: From a deck of cards, I draw a card, look at it, put it back
and reshuffle. Then I do it again. What's the probability that exactly one of
the 2 cards is a heart?
- A: 2/13
- B: 3/16
- C: 1/4
- D: 3/8
- E: 1/2
- iclicker: From a deck of cards, I draw a card, look at it, put it back
and reshuffle. I keep repeating this. What's the probability that the 2nd card
is the 1st time I see hearts?
- A: 2/13
- B: 3/16
- C: 1/4
- D: 3/8
- E: 1/2
1.8 Lecture 8, Fri Feb 18
Toggle display
- 3.3.2 page 109 Variance of an r.v.
- That means, how wide is its distribution?
- Example: compare the performance of stocks vs bonds from year to year. The expected values (means) of the returns may not be so different. (This is debated and depends, e.g., on what period you look at). However, stocks' returns have a much larger variance than bonds.
- {$ \sigma^2_X = VAR[X] = E[(X-m_X)^2] = \sum (x-m_x)^2 p_X(x) $}
- standard deviation {$ \sigma_X = \sqrt{VAR[X]} $}
- {$ VAR[X] = E[X^2] - m_X^2 $}
- 2nd moment: {$ E[X^2] $}
- also 3rd, 4th... moments, like a Taylor series for probability
- shifting the distribution: VAR[X+c] = VAR[X] ''(corrected from VAR[c])''
- scaling: {$ VAR[cX] = c^2 VAR[X] $}
- Derive variance for Bernoulli.
- Example 3.20 3 coin tosses
- general rule for binomial: VAR[X]npq
- Derive it.
- Note that it sums since the events are independent.
- Note that variance/mean shrinks as n grows.
- iclicker: The experiment is drawing a card from a deck, seeing if it's
hearts, putting it back, shuffling, and repeating for a total of 100
times. The random variable is the total # of hearts seen, from 0 to 100. What's
the mean of this r.v.?
- A: 1/4
- B: 25
- C: 1/2
- D: 50
- E: 1
- iclicker: The experiment is drawing a card from a deck, seeing if it's
hearts, putting it back, shuffling, and repeating for a total of 100
times. The random variable is the # of hearts seen, from 0 to 100. What's
the variance of this r.v.?
- A: 3/16
- B: 1
- C: 25/4
- D: 75/4
- E: 100
- Example 3.22 Variance of geometric r.v. We'll derive it.
- 3.4 page 111 Conditional pmf
- Example 3.23 random clock - skip
- Example 3.24 Residual waiting time
- X, time to xmit message, is uniform in 1...L.
- If X is over m, what's probability that remaining time is j?
- {$ p_X(m+j|X>m) = \frac{P[X =m+j]}{P[X>m]} = \frac{1/L}{(L-m)/L} = 1/(L-m) $}
- {$ p_X(x) = \sum p_X(x|B_i) P[B_i] $}
- Example 3.25 p 113 device lifetimes
- 2 classes of devices, geometric lifetimes.
- Type 1, probability {$\alpha$}, parameter r. Type 2 parameter s.
- What's pmf of the total set of devices?
- Example 3.26.
- 3.5 More important discrete r.v
- Table 3.1: We haven't seen {$ G_X(z) $} yet.
- 3.5.4 Poisson r.v.
- The experiment is observing how many of a large number of rare events happen in, say, 1 minute.
- E.g., how many cosmic particles hit your DRAM, how many people to call center.
- The individual events are independent.
- The r.v. is the number that happen in that period.
- There is one parameter, {$\alpha$}. Often this is called {$\lambda$}.
- {$$ p(k) = \frac{\alpha^k}{k!}e^{-\alpha} $$}
- Mean and std dev are both {$\alpha$}.
1.9 Lecture 9, Tues Feb 22
Toggle display
- Correction: Variance of shifted r.v.: VAR[X+c] = VAR[X]
- Hang Zhang's office hours are: Mon 7:10-8pm (or as long as people are there asking questions), and Wednesday at 10:00-11:10am in ECSE flip flop lounge.
- Poisson r.v. continued.
- Poisson approximates binomial for fixed np as {$n\rightarrow\infty $} . I'll try to prove this. One useful approximation is this: {$ (1-x)^a \rightarrow e^{-ax} $} as {$ x\rightarrow 0 $} You can prove it, with some work, from the following equation: {$ e^x = \sum_i \frac{x^i}{i!} $}
- Proof that Poisson mean is {$ \alpha $}
- Since its variance is also {$ \alpha $}, about 2/3 of the time, {$ \mu-\sigma < x < \mu + \sigma $} so {$ \alpha-\sqrt{\alpha} < x < \alpha+\sqrt{\alpha} $}
- Poisson example 3.30 p 122 call center queries; Probability of over 4 calls in 10 seconds.
- Example 3.31 packet arrivals: Probability of no packets in a particular interval.
- Example 3.32 Optical transmission errors.
- iclicker: Your home page gets on average 1 hit per hour. The number
of hits per 25 hours with about 2/3 probability is
in the following range:
- -4 to 6
- 0 to 2
- 0 to 50
- 20 to 30
- 24 to 26
- Section 3.5.5 Discrete uniform r.v. - We'll derive the mean and variance.
- Zipf r.v.
- {$ p(k) = c/k $} where c is chosen to make {$ \sum_{k=1}^L p(k)=1 $}
- L is important because summing to {$ \infty $} slowly diverges. That is, it keeps increasing, proportional to {$ \ln n $} .
- The population of the i-th largest city is said to follow this.
- This may apply to other statistics, like the sales of the i-th most popular book on Amazon.
- I used weasel words above because there is some argument about that, which is beyond this course.
- Zipf has a long tail because rare events, cumulatively, are more common than you'd think. (Again, some people dispute this.)
- Example: Try Zipf on the 1000 most common words. For L=1000, c=1/7. (corrected).
- iclicker: What's the probability of the most common word?
- 1/1000
- 1/100
- 1/70
- 1/7
- 1/2
- iclicker: What's the probability of the 10th most common word?
- 1/1000
- 1/100
- 1/70
- 1/7
- 1/2
- zeta r.v. - generalizes Zipf by adding exponent.
- {$ p(k) = c/k^\alpha $} where c is chosen to make {$ \sum_{k=1}^L p(k)=1 $}
- Sum to infinity exists for {$ \alpha>1 $}
- Mean exists for {$ \alpha>2 $}, variance exists for {$ \alpha>3 $}
- This is an example of a perfectly well defined probability distribution whose moments may not exist.
- Review questions
- Sampling with replacement with ordering: Each day I eat lunch at either the Union, Mcdonalds, Brueggers, or Sage. How many ways can I eat lunch over 5 days next week?
- sampling w/o replacement and w/o order: How many different possible teams of 3 people can you pick from a group of 5?
- sampling w/o replacement and with order: 5 people run a race for gold, silver, bronse. How many ways can the medals be won, w/o any ties?
- binomial: A coin falls heads with p=.6. You toss it 3 times. What's the probability of 2 heads and 1 tail?
- multinomial: You play 38-slot roulette 3 times. Once you got red, once black and once 0 or 00. What was the probability?
- conditional probability: You have 2 dice, one 6-sided and one 12-sided. You pick one of them at random and throw it w/o looking; the top is 2. What's the probability that you threw the 6-sided die?
- What's the expected number of times you'll have to toss the unknown die to get your first 2?
- Special office hour: On Thurs 2/24, Hang will be in the flip flop lounge from 7 to 8pm. As always, I will also be available after Friday class, and Hang will be available Monday 7:10-8pm in the FF loung.
1.10 Lecture 10, Fri Feb 25
Toggle display
- I post announcements at the top of the this web page, before moving them to the current day.
- Last year's 1st midterm exam is here: Exam1
- The different counting formulae for selecting k items from n.
- With replacement; order matters: {$ n^k $}.
- W/o replacement; order matters: {$ n(n-1)\cdots(n-k+1) = \frac{n!}{(n-k)!} $}.
- With replacement; order does not matter: {$ {{n-1+k} \choose k} $} (corrected; thanks Bryan). See page 47 in the text.
- W/o replacement; order does not matter: {$ {n\choose k}=\frac{n!}{k!(n-k)!} $}.
- More review:
- Independence: Consider {1,2,3,...12}. Is the set of even numbers independent of the set of multiples of 3? What if we use {1,2,..10}?
- iclicker
We often add a check bit to an 8-bit byte, and set it so there are an odd
number of 1 bits. When we read the byte, which is now 9 bits, if
there are an even number of 1s, then we know that there was an error.
Assume that the probability of any one bit going bad is 1e-10. (The real
number is much smaller.)
What is the probability of the byte going bad (within 1 significant
digit)?
- 1e-10
- 8e-10
- 9e-10
- 3.6e-19
- 7.2e-19
- 1e-10
- 8e-10
- 9e-10
- 3.6e-19
- 7.2e-19
- Several review questions from the book.
- 2.83 on page 90.
- 2.99.
- 3.5 on page 130.
- 3.9.
- 3.15.
- 3.26.
- 3.88 on page 139.