Statistics II, Math 127B, Spring 2011

Contact Information

  • Professor Mrinal Raghupathi
  • 1523 Stevenson Center
  • Email: firstname [dot] lastname [at] universityname [dot] edu
  • Office hours: Monday: 10 am — 11 am; Tuesday: 2 pm — 3 pm; Wednesday: 2 pm — 3 pm.
  • Syllabus.

Teaching Assistants

NameSectionRecitationOffice Hours
Lizzie Young 2 R 8:35 am, SC 1120 M 10:10 — 12:10 in SC 1218.
Audra Crosby 3 R 2:10 pm, SC 1214 T 1 — 3 in SC 1218.
Xiaoyu Zhou 4 R 12:10 pm, SC 1210 R 10 — 12 in SC 1218.

Exams

Exams will be held in-class. The review for each mid-term exam will be on the Monday preceding the exam. Here is a link to the final exam schedule as maintained by the registrar's office. Please check the website for the exam date. There is no alternate exam for this class

  • Exam 1
    • February 23, 2011 in-class.
    • Topics: Chapters 19 — 23.
    • Review: February 21, 2011 in-class. Bring your questions and doubts.
    • Practice exam (Exam) (Solutions).
    • Solutions to Exam
  • Exam 2
    • March 30, 2011 in-class.
    • Topics: Chapters 26 — 28.
    • Review: March 28, 2011 in-class. Bring your questions and doubts.
    • Practice exam (Exam) (Solutions).
    • Solutions to exam 2.
  • Final Exam
    • April 28, 2011. Buttrick 102, at 3 pm.
    • A link to the university calendar.
    • There is no alternate exam for this class. See the instructions at the top of the calendar.
    • In class Q&A session on Monday April 25, 2011
    • Practice final and solutions for students taking the R exam. If you plan to do the cumulative written final, then the practice midterms and actual midterms should be used for practice. The written final will be about twice as long as a midterm (7 -- 8 questions).
    • CPS data. This is a large file (6 MB).
    • Another problem. This is another practice problem for the R exam. The dataset for the exam is available here.
    • List of R commands.
    • Dataset for the final exam.

      The dataset is about the current population survey.

      This is a comma-separated dataset with a header row. The dataset is large (approx 13 MB). Download the file to your computer.

Weekly summary

    • Wed 1/12. Review of the box model. Chapter 16 and 17.
    • Thu 1/13. Practice problems in recitation.
    • Wed 1/19. Chapter 19.1 — 19.7.
    • Thu 1/20. Homework 1 is due. Chapter 19 practice problems.
    • Mon 1/24. Chapter 20.1 and 20.2.
    • Wed 1/26. Chapter 20.3 — 20.5.
    • Wed 1/26 Thu 1/27. Online quiz to be done by 8 am.
    • Thu 1/27. Homework 2 is due. Quiz 1 is on chapter 19. Chapter 20 practice problems.
    • Mon 1/31. Chapter 21.1 and 21.2. Online quiz to be done by 8 am.
    • Wed 2/2. Chapter 21.3 — 21.5. Online quiz by 8 am.
    • Thu 2/3. Homework 3 is due. Quiz 2 is on chapter 20. Chapter 21 practice problems.
    • Mon 2/7. Chapter 22.1 — 22.3. Online quiz to be done by 8 am.
    • Wed 2/9. Chapter 22.4 — 22.7. Online quiz by 8 am.
    • Thu 2/10. Homework 4 is due. Quiz 3 is on chapter 21. Chapter 22 practice problems.
    • Mon 2/14. Chapter 23.1 — 23.2. Online quiz to be done by 8 am.
    • Wed 2/16. Chapter 23.3 — 23.4. Online quiz by 8 am.
    • Thu 2/17. Homework 5 is due. Quiz 4 is on chapter 22. Chapter 23 practice problems.
    • Mon 2/21. Review for exam 1. No online quiz.
    • Wed 2/23. Exam 1. No online quiz.
    • Thu 2/24. Homework 6 and Quiz 5 on chapter 23..
    • Mon 2/28. Chapter 26.1 — 26.3. Online quiz at 8 am.
    • Wed 3/2. Chapter 26.4 — 26.6. Online quiz at 8 am.
    • Thu 3/3. Recitation, chapter 26 problems.
    • Mon 3/14. Chapter 27.1, 27.2.
    • Wed 3/16. Chapter 27.3 — 27.5. Online quiz at 8 am.
    • Thu 3/17. Recitation, chapter 27 problems. Homework 7 and Quiz 6 on chapter 26.
    • Mon 3/21. Chapter 28.1 — 28.2.
    • Wed 3/23. Chapter 28.3 — 28.4. Online quiz at 8 am.
    • Thu 3/24. Recitation, chapter 28 problems. Homework 8 is due and Quiz 7 on chapter 27.
    • Mon 3/28. Review for exam 2. No online quiz.
    • Wed 3/30. Exam 2. No online quiz.
    • Thu 3/31. Recitation. No quiz or homework due. Discussion of exam 2.
    • Mon 4/4. Basic R, lesson 1.
    • Wed 4/6. Basic R. Lesson 2.
    • Thu 4/7. Recitation. Homework 9 and Quiz 8 on chapter 28. During the recitation you will be given something to try out on R. You can work in a group, or on your own, to get the answer.
    • Mon 4/11. Basic R, lesson 3.
    • Wed 4/13. Basic R. Lesson 4.
    • Thu 4/18. Recitation. During the recitation you will be given something to try out on R. You can work in a group, or on your own, to get the answer.
    • Mon 4/18. Basic R. Homework 10 is due.
    • Wed 4/20. Basic R.
    • Thu 4/21. Recitation will be an R lab for more practice.

Homework

Homework will generally be a subset of the review problems from each chapter. Each week I will post the homework below. There will often be an extra problem that you need to turn in that is not part of the review exercises.

  1. Due 1/20/2011. This homework is for practice and will count as a bonus. Submission of the homework is compulsory. Download the homework.
  2. Due 1/27/2011. Chapter 19 Review exercises: 1, 4, 5, 9, 10.
  3. Due 2/3/2011. Chapter 20 Review exercises: 1, 3, 6, 8, 11.
  4. Due 2/10/2011. Chapter 21 Review exercises: 3, 4, 6, 8, 14.
  5. Due 2/17/2011. Chapter 22 Review exercises: 2, 6, 8, 10, 12.
  6. Due 2/24/2011. Chapter 23 Review exercises: 1, 2, 6, 8, 9.
  7. Due 3/17. Chapter 26 Review exercises: 2, 3, 4, 6, 9.
  8. Due 3/24. Chapter 27 Review exercises: 1 — 5.
  9. Due 4/7. Chapter 28 Review exercises: 1, 3, 5, 7, 9.
  10. Due 4/18. R homework. Submit it by email.

Quizzes

  1. 1/27/2011. Chapter 19. (Solutions)
  2. 2/3/2011. Chapter 20. (Solutions)
  3. 2/10/2011. Chapter 21. (Solutions)
  4. 2/17/2011. Chapter 22. (Solutions)
  5. 2/24/2011. Chapter 23. (Solutions)
  6. 3/17/2011. Chapter 26. (Solutions)
  7. 3/24/2011. Chapter 27. (Solutions)
  8. 4/7/2001. Chapter 28. (Solutions)

Practice Problems

In general I recommend doing all the chapter exercises. Homework will generally be a subset of the review problems from the book. You do not need to submit the practice problems, you can discuss these in recitation.

  • Chapter 19. A: 1 — 8, also recommend doing 9 — 12.
  • Chapter 20. A: 1, 3 — 4, 6 — 8. B: 1 — 5. C: 1 — 5.
  • Chapter 21. A: even. B: all. C: even. D: all. E: all.
  • Chapter 22. A: all.
  • Chapter 23. A: 1, 3, 6, 7, 9, 10; B: 1, 3, 5, 6, 7, 8; C: 1 — 6.
  • Chapter 26. A: 1 — 5; B: 1 — 5; C: 4 — 8; D: 3 — 5; E: 2, 3, 4, 7, 8, 9; F: 6, 7.
  • Chapter 27. all except A:1,3,6; B: 1, 4; C: 2; D: 2.
  • Chapter 28. A: 3 — 9; B: 1 — 3; C: 1 — 7.

Online Reading Quizzes

Here are the list of online quizzes and their due dates and times. Participation in quizzes is mandatory, but does not carry a grade. Online quizzes will start in week 3.

Online quizzes are available at read-the-book.appspot.com. Your Vanderbilt gmail account will not work on this site. If you are signed into it, sign out and then login.

To sign-up for a google account follow this link.

  • Due 1/27 at 8 am. Chapter 20.
  • Due 1/31 at 8 am. Sections 20.4, 21.1, 21.2.
  • Due 2/2 at 8 am. Sections 21.3 — 21.5.
  • Due 2/7 at 8 am. Sections 22.1 — 22.3.
  • Due 2/9 at 8 am. Sections 22.4 — 22.7.
  • Due 2/14 at 8am. Sections 23.1, 23.2.
  • Due 2/16 at 8am. Sections 23.3, 23.4.
  • Due 2/28 at 8am. Sections 26.1 — 26.3.
  • Due 3/2 at 8am. Sections 26.4 — 26.6.
  • Due 3/16 at 8 am. Chapter 27.
  • Due 3/23 at 8 am. Chapter 28.

Tests of significance in R.

Wednesday, April 20th, 2011

Today we will learn about tests of significance in R. We will focus on the chi-square test as an example. We will be using the census dataset from the previous class. We will also use the cps data set below. This is the same data as the one posted for the practice exam.

Data frames in R. US Census data.

Monday, April 18th, 2011

The dataset below is from the US census. It is a subset of the full dataset and has been combined with the FIPS database so that county names are listed. It has also been modified into a csv format without headers. There are five columns: the name of the county, the two-letter abbreviation for the state, the population, the labor force size, and the number of unemployed.

Data frames in R. Indexing, slicing, max, min, sort, order,..

Wednesday, April 13th, 2011

Today we will slice, dice, mince, sort, puree, blend, mash, mix, and shake, but not stir, data frames.

Homework 10 will be posted by the end of the day. You need to turn it in on Monday. I will have office hours on Friday, in case you want to discuss the homework.

Getting data into R, making pretty plots

Monday, April 11th, 2011

The notes contain additional information that we did discuss in class, but will do next time. The version below also contains some extra information on adding information to the plot.

Basic R. Plotting, manipulating lists.

Wednesday, April 6th, 2011

  • Notes for the lecture
  • Cigarette data
  • Here is the data for the class in a form that is easy to copy and paste.
    • tar data: 14.1, 16.0, 29.8, 8.0, 4.1, 15.0, 8.8, 12.4, 16.6, 14.9, 13.7, 15.1, 7.8, 11.4, 9.0, 1.0, 17.0, 12.8, 15.8, 4.5, 14.5, 7.3, 8.6, 15.2, 12.0
    • nicotine data: 0.86, 1.06, 2.03, 0.67, 0.40, 1.04, 0.76, 0.95, 1.12, 1.02, 1.01, 0.90, 0.57, 0.78, 0.74, 0.13, 1.26, 1.08, 0.96, 0.42, 1.01, 0.61, 0.69, 1.02, 0.82

Basic R. Using R as a calculator.

Monday, April 4th, 2011

Today we did some basic programming in R. Using R as a calculator. Next time we will learn how to modify R to our needs and compute the SD, EV, and SE of a box. We will also learn how to save our work and write a function.

The Chi-square test for independence.

Wednesday, March 23rd, 2011

Today we looked at using the chi-square test to decide whether two factors are independent.

The Chi-square goodness-of-fit test.

Monday, March 21st, 2011

Today we looked at a new test that helps determine whether a collection of observations fits a model.

Tests of significance. Experiments

Wednesday, March 16th, 2011

Today we showed that we could use our tests of significance in when the samples come from a randomized control experiment.

Tests of significance. Difference of means.

Monday, March 14th, 2011

Today we looked at the tests for the difference between two sample averages. The SE for the difference of two sample averages is higher than the SE for each sample, and can be computed from a square root law.

Tests of significance. The t-test.

Wednesday, March 2nd, 2011

Today we looked at the t-test for small samples. We also considered the z-test in the context of percentages and 0-1 boxes.

Tests of significance. The z-test.

Monday, February 28th, 2011

Today we outlined the method used in all tests of significance. We also considered examples of places where a test of significance might be useful. On Wednesday we will look at the t-test and the z-test applied to percentages.

Review: your questions.

Monday, February 21st, 2011

Solutions to the practice exam are posted under exams. Let me know if you spot a typo or a mistake.

The sample average. A real example.

Wednesday, February 16th, 2011

Today we discussed sampling and the accuracy of averages in the context of the tuition data from the US News and World report, 1995. The details of our example are in the lecture notes.

The average of draws from a box.

Monday, February 14th, 2011

We discussed the accuracy of the estimate of the average of a box, when draws are made from it with replacement. The chance error is quantified by the SE of the average of the draws.

We have seen 4 "different" formulas for the EV and SE: sum, number, percentage, and average. We explore the relation between them and note that the behavior of the sum is the foundation for the others.

From the observations in the sample we can construct a confidence interval. We do this and provide a correct interpretation of our results.

Next time we will look at some pictures of confidence intervals and do some sampling on a computer.

Weighting and half-samples.

Wednesday, February 9th, 2011

Today we discussed a simple method to weight a sample. We also looked at the method of half-samples which is used to estimate the SE of the sample. Next week we will be in chapter 23.

The Current Population Survey.

Monday, February 7th, 2011

Today we discussed the current population survey. This is a survey carried out by the US Census Bureau for the Bureau of Labor Statistics. A lot of information about the CPS can be found on the internet. There are two main things to keep in mind.

  1. Sampling procedures can be fairly complex. The systematic use of probability allows us to estimate the error.
  2. The box model and standard error computations from previous sections do not apply directly to the CPS.

Details about the CPS can be found at the websites below.

In the coming weeks we will try to get our hands dirty by doing some actual computations on the data from the CPS. In order to do this we will use a piece of software called R. Task for this week. Download R from the website and get it to run on your computer.

The accuracy of percentages (examples).

Wednesday, February 2nd, 2011

Examples from chapter 21 and a few things to remember.

The accuracy of percentages.

Monday, January 31st, 2011

We begin looking at inference from samples and confidence intervals. We learned about the "bootstrap method" for approximating the SD of a box from the sample percentage. We also constructed a confidence interval. An important fact to keep in mind, is that the chance is in the sampling procedure and not in the parameter. On Wednesday we will look at more examples.

The normal curve and correction factor.

Wednesday, January 26th, 2011

We can use the normal curve to compute the chance that the percentage in the sample will lie in a certain range. To do this we use the SE for percentages as standard units. We pointed out that the SE of percentages for sampling with replacement essentially depends on the size of the sample and not the size of the population. The correction factor will allow us to justify this claim for sampling with replacement.

Chance errors in sampling.

Monday, January 24th, 2011

Today Professor Whitehouse will be substituting for me

The chance model will allow us to judge the accuracy of the estimates made in sampling

Sample surveys.

Wednesday, January 19th, 2011

We considered the basic methods of sampling and the notion of a random sample. We also looked at bias in sample surveys.

  • Read chapter 19 of the textbook. It treats, in great detail, the basics of sample surveys. In particular, a great deal of attention is given to the Gallup poll and its historical connections to the presidential elections.
  • A few announcements
    • Homework 2 will be posted tonight and is due 1/27 in recitation.
    • Quiz 1 will be in recitation next week, 1/27.
    • Read chapter 20.1 and 20.2 for Monday, 1/24.
    • TAs have office hours now. They are posted on this page. Office hours are held in room 1218 Stevenson, one floor below ground level.
    • Practice problems for each lecture and chapter are posted under problems.
  • I will not post lecture notes today. The book (chapter 19) provides a careful and in-depth treatment of the material. However, I do encourage you to look at the Gallup website. The Gallup widget is posted above. You can use the menu to change the statistics that you want to look at. You can also share the widget with friends on facebook or twitter.

The box model, expected value and the law of averages.

Wednesday, January 12th, 2011

Today we considered the use of the box model in sampling. We reviewed the expected value and its relation to the average value of the box. Next time we will look at the way sampling surveys are conducted.