Statistics I, Math 127A, Fall 2010

Contact Information

  • Professor Mrinal Raghupathi
  • 1523 Stevenson Center
  • Email: firstname [dot] lastname [at] universityname [dot] edu
  • Office hours: Monday, Wednesday, Friday: 10:30 am — 11:30 am
  • Syllabus.

Teaching Assistants

NameSectionRecitationOffice Hours
Lizzie Young 3 R 8:35 am, SC 1210 W, 10 am — 12 noon, SC 1218 or 1232
10 R 4:10 pm, SC 1210
Audra Crosby 5 R 11:10, SC 1312 M, 2 pm — 4 pm SC 1218 or 1232
Melissa Johnston 8 R 2:10 pm, SC 1214 M, 2 pm — 4 pm, SC 1218 or 1232
9 R 3:05 pm, SC 1210
Xiaoyu Zhou 6 R 12:15 pm, SC 1210 R, 10 am — 12 noon, SC 1218 or 1232

Exams

  • Exam 1
  • Exam 2
    • October 20, 2010 in-class. Chapters 8 — 11.1.
    • Review Monday, October 18, at 7pm in Wilson 103.
    • Practice exam, Solutions
  • Final Exam
    • December 11, 2010 in-class. Chapters 13 — 18.
    • Review Friday, December 10, 103 Wilson hall from 1 pm — 2 pm.
    • Practice exam, Solutions.

Homework

  • Due 9/9. Chapter 3. Review exercises 4, 5, 7, 9, 11. Due on 9/9 in recitation
  • Due 9/16. Chapter 4. Review exercises 4, 5, 7, 9, 12.
  • Due 9/23. Chapter 5. Review exercises 3, 4, 5, 9, 11.
  • Due 9/30. Chapter 6. Review exercises 1 — 5 on page 104.
  • Due 10/7. Chapter 8. Review exercises: 2, 5, 6, 10, 12.
  • Due 10/21. Chapter 9. Review eercises: 6, 8, 10; Chapter 10: Review exercises: 2, 4.
  • Due 10/28. Chapter 11. Review exercises: 2, 3, 4, 5, 12.
  • Due 11/4. Chapter 12 Review exercises: 1, 5, 9; Chapter 13 Review exercises: 2, 5.
  • Due 11/11. Chapter 13 Review exercises: 7, 9, 12; Chapter 14 Review exercises: 2, 4.
  • Due 12/2. Chapter 14 Review exercises: 11, 13; Chapter 15 Review exercises: 2, 8, 11.

Quizzes

  • 9/2. Chapters 1 and 2.
  • 9/9. Chapter 3.
  • 9/16. Chapter 4.
  • 9/23. Chapter 5.
  • 9/30. Chapter 7.
  • 10/7. Chapter 8.
  • 10/14. No quiz, fall break
  • 10/21. Chapter 11.1, 11.2.
  • 10/28. Chapter 11.
  • 11/4. Chapter 13.
  • 11/11. Chapter 14.
  • 11/18. No recitation
  • 12/2. Chapter 16, 17.

Online Reading Quizzes

Here are the list of online quizzes and there due dates and times. Participation in quizzes is mandatory, but does not carry a grade.

Online quizzes are available at text-book.appspot.com. Your Vanderbilt gmail account will not work on this site. If you are signed into it, sign out and then login.

  • Chapter 6: 9/27 at 8 am
  • Chapter 7: 9/29 at 8:30 am
  • Chapter 8: 10/4 at 8 am
  • Chapter 9.1 — 9.3: 10/6 at 8 am
  • Chapter 9.4, 9.5, 10.1: 10/11 at 8 am
  • Chapter 10.1 — 10.3: 10/13 at 8 am
  • Chapter 10.4 — 10.5, 11.1: 10/18 at 8 am
  • Chapter 11.2 — 11.5: 10/27 at 8 am
  • Chapter 12: 11/1 at 8 am
  • Chapter 13.1, 13.2: 11/3 at 8 am
  • Chapter 14: 11/10 at 8 am
  • Chapter 15: 11/15 at 8 am
  • Chapter 16.1: 11/17 at 8 am
  • Chapter 17: 12/6 at 8 am

Problems

  • Chapter 2:
    • A: 2, 3, 5, 7
  • Chapter 3:
    • A: 1, 2, 4, 5, 8
    • B: 1 — 4
    • C: 1, 2
    • D: 1, 2
    • E: 1,2
    • F: 1,2
  • Chapter 4:
    • A: 1, 2, 4, 5
    • B: 1, 2, 3, 5, 6
    • C: 1, 2, 3, 6
    • D: 1, 2, 7, 8, 9
    • E: 4, 8, 11, 12
  • Chapter 5:
    • A: 1, 2
    • B: 1, 3, 4, 5
    • C: 2
    • D: 2, 4
    • E: 1, 2, 3
    • F: 1
  • Chapter 7:
    • A: 1 — 3
    • B: 1 — 6
    • C: 1
    • D: 1 — 6
    • E: 1 — 6
  • Chapter 8:
    • A: 1, 2, 3, 6.
    • B: 1, 3, 6, 8, 9.
    • C: 1 — 4
    • D: 1, 4
  • Chapter 9:
    • A: 1, 6, 7, 10.
    • B: 2, 3, 4.
    • C: 1 — 4
    • D: 1, 2
    • E: 1, 2, 6
  • Chapter 10:
    • A: 1 — 3, 5.
    • B: 1 — 4.
    • C: 1 — 5.
    • D: 1 — 3.
    • E: 1 — 3.
  • Chapter 11:
    • A: 3, 4, 5, 6, 8.
    • B: 1 — 3.
    • C: 1 — 3.
    • D: 1, 3, 4, 5, 6.
    • E: 1 — 3.
  • Chapter 12:
    • A: 2, — 4.
    • B: 1 — 5.
  • Chapter 13:
    • A: 1 — 5.
    • B: 1 — 4.
    • C: 1 — 7.
    • D: 1 — 8.
  • Chapter 14:
    • A: 1, 4.
    • B: 1, — 4, 6.
    • C: 1, 2, 3, 5.
    • D: 1, 2, 4, 5..
  • Chapter 15: All
  • Chapter 16: All
  • Chapter 17: All
  • Chapter 18: All

External Links

  • Online quizzes are available at text-book.appspot.com. Your Vanderbilt gmail account will not work on this site. If you are signed into it, sign out and then login.

Lecture 26. The normal approximation: problems.

Wednesday, December 8, 2010

Today we did problems about using the normal approximatino to estimate the sum of draws from a box.

Other Stuff

Lecture 25. The normal approximation.

Monday, December 6, 2010

Pretend that making a certain number of draws from a box is a game, that the sum is the player's score, and that several players play this game independently of each other. The normal approximation allows us to decide what fraction of these players have scores in a certain range. It tells us that the histogram of scores looks approximately normal. In chapter 18, examples are given of 4 boxes and the corresponding histograms are shown. Note that the histograms all look normal after a sufficient number of draws have been made.

Things to do

  • Read chapter 18.

Other Stuff

Lecture 24. The law of averages and the normal curve.

Wednesday, December 1, 2010

The normal curve can be used to determine the chance that the sum of draws from a box will lie in a certain range of values. Next time we will complete the example in the slides and also look more closely at why the normal curve can be used in this way. This mathematics behind this rather deep fact is beyond our reach. Next time, we will look at some graphs to convince of ourselves of this.

Things to do

  • Finish reading chapter 17.
  • Take the online quiz for chapter 17 by Monday 12/6 a 8 am

Other Stuff

Lecture 23. The expected value and the standard error.

Monday, November 29, 2010

Today we looked at the expected value and the standard error for the sum of draws from a box. This box model is very flexible. The basic facts we learn about standard error and expected value will play a central role in sampling and surveys.

Other stuff

  • Slides.
  • Hand-out containing results of the experiment of draws from a box.

Lecture 22. The Law of Averages.

Wednesday, November 17, 2010

Today we looked at the law of averages and what it says. Essentially the law of averages tells us that if we make repeated draws, with replacement, from a box, then the number that we expect to see is the equal to the number of draws times the average of the box. This is called the expected value.

From now on we will consider the behavior of this sum. We will compute the average (called expected value), the standard deviation (called standard error), and the histogram. We will at for more examples of this in the next few weeks. Although the example of a box and tickets may seem a little conrived we are laying the foundation for sampling theory. This is the subject of 127B.

Things to do

  • Finish reading chapter 16.

Other Stuff

Lecture 21. The binomial formula and model.

Monday, November 15, 2010

The bionomial model is used for chance processes in which a certain experiment is repeated a fixed number of times. In each case we care about some specific event (called success sometimes), say a six, or a head. The experiments or trials must be independent and the probability of the event must be a fixed number that remains the same during all the experiments.

The binomial formula gives us a way to compute the probability that a certain number (k) of events occurs in n repetitions of the experiment.

Things to do

  • Finish reading chapter 15 and read chapter 16.1.
  • Then take the online quiz on chapter 16.1

Other Stuff

  • Slides. There are some additional examples to those done in class.

Lecture 20. Independence and mutually exclusive events.

Wednesday, November 10, 2010

Two outcomes are mutually exclusive if the occurence of one prevents the occurence of the other one. The addition rule states that the chance of at least one of two mutually exclusive events happening is the sum of the individual chances.

Things to do

  • Read chapter 15. It has a few more mathematical symbols than most of the other chapters. After that take the quiz.
  • Take the online quiz over Chapter 15.1 by Monday 11/15 at 8 am.

Other Stuff

Lecture 19. More Chance

Monday, November 8, 2010

Two outcomes are independent if the occurence of one does not affect the chance of the other. For independent outcomes the chance of both outcomes is the product of the chances of each of the individual outcomes. In the next class we will discuss mutually exclusive events and look at counting problems.

Things to do

  • Read Chapter 14. The most important part of this chapter is the definition of mutually exlcusive events and the distinction (section 3) between mutually exclusive and independent.
  • Take the online quiz over Chapter 14 by Wednesday 11/10 at 8 am.

Other Stuff

Lecture 18. Chance (the basics)

Wednesday, November 3, 2010

The basic law of chance that we look at is the multiplication rule. In this class we will take the frequenctist approach to probability.

Things to do

  • Read the rest of chapter 13. Then answer some of the questions on the quiz.
  • Take the online quiz over the rest of Chapter 13 by Monday 11/8 at 8 am.

Other Stuff

Lecture 17. Regression round-up and Chance

Monday, November 1, 2010

The regression equation provides us with a convenient formula for making predictions based on regression. The method of least-squares is used a lot in the natural sciences to estimate unknown quantities in physical laws. In class, we looked at an example from geometry. At some point in history these problems were non-trivial.

Here is John Oliver on the Large Hadron Collider.

I also watched an excellent video on youtube this weekend. Terry Tao (a world famous mathematician) gave a public lecture entitled the "Cosmic Distance Ladder". It's a great introduction to how discoveries are made in science and how knowledge has built up over time.

Things to do

  • Read chapter 13. Then answer some of the questions on the quiz.
  • Take the online quiz over Chapter 13 by Wednesday 11/3 at 8 am.

Other Stuff

Lecture 16. The regression line.

Wednesday, October 27, 2010

The RMS error can be viewed as the standard deviation of the residuals. For situations where regression works best we can use this as an estimate of the amount by which our prediction is "off".

Among all lines that could be drawn on the scatter diagram, the regression line minimizes the RMS error. In any prediction problem there can be other variables that affect the quantity we are measuring. Multiple regression provides a method to deal with this. Transformations on the data can also be used to "linearize" the problem.

Things to do

  • Read Chapter 12.
  • Take the online quiz by Monday 11/1 at 8 am.

Other stuff

Lecture 15. Residual plots and the RMS error.

Monday, October 25, 2010

In order to judge the suitability of regression for a particular situation we need to look at the residual plot. The residual is another word for the error. When looking at the plot we look for the following features:

  1. The residuals should have an average of about 0.
  2. An even spread at each x-value. The average of the residuals should be approximately 0 in each vertical strip.
  3. The spread (standard deviation) in the residuals at different x-values should be approximately the same. This is called homoscedasticity.

In essence we are looking for football-shaped data.

Next time we will look at how the normal curve can be used inside each vertical strip (chapter 11.5). You may find chapter 11.5 a little tough at first glance. Have a look at the example before you come to class on Wednesday. We will discuss it.

Things to do

  • Read the rest of chapter 11.
  • Take the online quiz by Wednesday 10/27 at 8 am.

Other stuff

Lecture 14. The regression fallacy, and the RMS error.

Monday, October 18, 2010

We did some more problems related to the regression line and looked at the RMS error. In making predictions we can think of three straight lines that pass through the point of averages. The horiztontal line with height equal to the average of y, the SD line and the regression line. Each of these could be used to make predictions. The regression line is the straight line that minimizes the RMS error

Things to do

  • Study for the exam. Chapters 7, 8, 9, 10, 11.1.

Other stuff

Lecture 13. The graph of averages, mean reversion, and predictions.

Wednesday, October 13, 2010

Looking at vertical stips can provide a lot of insight into the data. As a first step we computed the average of the data in each strip and discovered that the regression line does a good job of approximating the averages. On Monday we will look at the histogram in each of these strips and discover more.

When it comes to predicting y values we can take progressively more sophisticated steps.

  1. A first guess would be the average of y.
  2. Next, the average of y given the value of x, or the average in the vertical strip.
  3. Next, the regression estimate based on the actual x-value. This is closely related to the previous point, but there is a difference, after all the graph of averages do not in general lie on a striaght line

In dealing with regression we have to distinguish between predictions about y based on x, and predictions about x based on y. The averages, standard deviations and correlation coefficients are unaffected if you interchange x and y. However, the slope of the regression line does change.

On Monday we will see that the regression line mimimizes the error in predicted values, this is what 11.1 is about.

Things to do

  • Read chapter 10.4, 10.5, 11.1.
  • Take the online reading quiz for 10.4, 10.5, 11.1. The quiz expires at 8 am on Monday 10/18.
  • Exercises for chapter 11 are posted under exercises. You can start trying the problems from 11.1 if you like.

Other stuff

Lecture 12. The regression line as a predictor of averages.

Monday, October 11, 2010

Today we looked at non-linear association and outliers. We then computed the equation of the regression. As with the SD line, the regression line passes through the point of averages. However, it is has less slope.

The regression line of y on x can be used to make predictions about the average value of y given a certain value of x.

Unfortunately, in my first attempt to compute the predicted value, I messed up the units in my calculation. The slides below are corrected and contain the three examples we did. There is also an alternative calculation of the predicted value for the first problem.

Things to do

  • Read chapter 10.1 — 10.3.
  • Take the online reading quiz for 10.1 — 10.3.The quiz expires at 8 am on Wednesday 10/13.
  • Exercises for chapter 10 are posted under exercises.

Other stuff

Lecture 11. Correlation and regression.

Wednesday, October 6, 2010

Today we discussed four factors that may or may not affect the coefficient of correlation. Interchanging the data, scaling the data, shifting the data and scrambling the data. Then we looked at how correlation can appear to be lower than its true value. An example of this is physical laws where the correlation is 1, and measurement error can cause some scattering.

After class, I was asked about confounding variables when making predictions in the father son examples. Clearly the height of a son is influenced by both the height of his father and his mother. More advanced techniques are needed to handle this situation.

Things to do

  • Read the rest of chapter 9 and 10.1 for Monday 10/6.
  • Take the online reading quiz for 9.4, 9.5 and 10.1. The quiz expires at 8 am on Monday 10/11.
  • Exercises for chapter 9 are posted under exercises.

Other stuff

Lecture 10. Correlation and regression.

Monday, October 4, 2010

We focused on two aspects of correlation. The computation of the correlation coefficient and its basic properties.

Things to do

  • Read chapter 9.1, 9.2 and 9.3 for Wednesday 10/6.
  • Take the online reading quiz for chapter 9. The link to the website is under external links. The quiz expires at 8 am on Wednesday 10/6.
  • Exercises for chapter 8 and chapter 9 are posted under exercises.

Other stuff

Lecture 9. Correlation.

Wednesday, September 29, 2010

We looked at Sir Francis Galton (cousin of Charles Darwin) who was Karl Pearson's mentor. These two men founded the modern theory of statistics and much of what we look at is based on their ideas.

Today we began studying correlation and the regression line. We looked at what properties it has, and we will compute the equation and do examples on Monday

Galton's notebook

You can find a link to pictures of Galton's original notebook below. The notebook is rather fascinating. It records in great detail the information on families. In particular take note of the following.

  • The data is stored as records (rows), each row represents one family.
  • There is a column for the father's height, mother's height, sons and daughters. An indication that gender could be a confounding factor
  • The data is sorted so that the median is easy to find
  • The data is not recorded in inches but as the deviation from some nominal value (60 inches). This makes it easier to look at the data, in much the same way that the discrepancy in weights on the standard kilogram is quoted in micrograms (chapter 6).
  • Missing or inaccurate data is handled in many ways. Entries such as tall, medium, even idiotic, idicate that data is missing.

The data collection effort is remarkable and is very modern. The survey form and instructions are quite contemporary, and there is even a confidentiality clause

Things to do

  • Read chapter 8 for Monday 10/4.
  • Take the online reading quiz for chapter 8. The link to the website is under external links. The quiz expires at 8 am on Monday 10/4.
  • I will post exercises for chapter 8 on Friday evening.

Other stuff

Lecture 8. Bias, chance error and straight lines.

Monday, September 27, 2010

Things to do

  • Read chapter 7. There are practice problems assigned.
  • Take the online reading quiz for chapter 7. The link to the website is under external links. The quiz expires at 8:30 am on Wednesday 9/29.
  • Exercises for practice from chapter 7: A: 1 — 3 B: 1 — 6 C: 1 D: 1 — 6 E: 1 — 6
  • Exercises to submit from chapter 6: Review exercises 1 — 5 on page 104. due on 9/30 in recitation.

Other stuff

Exam 1.

Wednesday, September 22, 2010

Things to do

  • Read chapter 6. We will discuss this on Monday.
  • Take the online reading quiz. The link to the website is under external links.

Lecture 7. The Normal Curve.

Wednesday, September 15, 2010

We looked at how the average and standard deviation change when we transform the data (subtract the average, subtract a fixed quantity, reduce by a percentage).

We also began computations with the normal curve. The normal curve is central to statistics and probability. It will appear again and again in this course. The HANES height data follows the normal curve. The income data does not. A common myth is that all data is basically normal. This is not true.

Things to do

  • Exercises for practice from Chapter 5. These are for today and Monday (9/20). A: 1, 2; B: 1, 3, 4, 5; C: 2; D: 2, 4; E: 1, 2, 3; F: 1
  • Homework to submit from Chapter 5. Review exercises 3, 4, 5, 9, 11. Due on 9/23 in recitation

Other stuff

Lecture 6. Standard Deviation.

Monday, September 13, 2010

We did another histogram today for the salary data. You will find the histogram and the computations (including the median) in the slides . We also defined standard deviation.

Things to do

  • Read chapter 5 for Wednesday 9/15. On Wednesday we will do some problems at the beginning and then move on to the normal distribution. Be prepared for some clickers.
  • Exercises for practice from Chapter 4. C: 1, 2, 3, 6; D: 1, 2, 7, 8, 9: E: 4, 8, 11, 12.
  • Homework to submit from Chapter 4. Review exercises 4, 5, 7, 9, 12. Due on 9/16 in recitation
  • Go to the following website and sign in using your google account. You will be asked for your VUnetID and will see a short quiz. The quiz is really just to test if you can login and doesn't carry a grade.

Other stuff

Lecture 5. Average and Standard Deviation.

Wednesday, September 8, 2010

We looked at variables (quantitative and qualitative). We also looked at how to draw a histogram for discrete data.

Today we defined average (mean), mode and median. You will find a scanned (black and white) copy of the slides from below. Please print them with only if you need to. An extra example of average, mode and median is included at the end.

Things to do

  • Read chapter 4 for Monday 9/13. On Monday we will also begin chapter 5 so you can look through it, if you would like to read ahead.
  • Exercises for practice from Chapter 4. A: 1, 2, 4, 5; B: 1, 2, 3, 5, 6.
  • Sign up for a google account. If you have a Vanderbilt email account, then I have heard that it will work. So try that first and see. If you already use gmail, then you're fine.

Other stuff

Lecture 4. Histograms.

Monday, September 6, 2010

Today we looked at frequency plots and histograms. The difference between the two can be subtle. The key idea is that the area of box in a histogram represents the percentage. The height of the box is called the density and is obtained from the formula density x width = percentage.

The two key consequences are:

  1. the vertical units will always be the same (percent per unit on the horizontal axis).
  2. The total area will be the same - 100%.

We will focus almost exclusively on histograms this semester

Things to do

  • Read chapter 3 for Wednesday 9/8. On Wednesday we will also begin chapter 4 so you can look through it, if you would like to read ahead.
  • Exercises for practice from Chapter 3. A: 1, 2, 4, 5, 8; B: 1 — 4; C: 1, 2; D: 1, 2; E: 1,2; F: 1,2.
  • Homework to submit from Chapter 3. Review exercises 4, 5, 7, 9, 11. Due on 9/9 in recitation

Other stuff

Lecture 3. Observational Studies.

Wednesday, September 1, 2010

Today we looked at the UC Berkeley sex bias study in the book and did some example problems.

Other stuff

Lecture 2. Experiments.

Monday, August 30, 2010

Today we looked at the two main types of experiments. Controlled experiments and observational studies. The slides below outline the main differences. The class discussion brought up the following areas in which we might be interested in collecting statistics:

  • Medicine: drug tests, cancer research.
  • Sports: basketball (free throws) and baseball.
  • Education
  • Politics and voting

Things to do

  • Read chapter 2 for Wednesday, 9/1/2010. Look at the examples in section 3 and the sex bias study.
  • Chapter 2, Exercise Set A: 2, 3, 5, 7 (at least). This is good for practice, you do not need to turn them in. We will discuss 1, 4, 6, 8 in class.

Other stuff

Lecture 1. Introduction.

Wednesday, August 25, 2010

John Oliver on the Large Hadron Collider and Letter frequencies.

Things to do

  • Read chapter 1.