I’ve just finished an exhausting but rewarding 6 weeks teaching a summer-session course on “Experimental Design for Behavioral and Social Sciences,” CMU course 36-309. My course materials are secreted away on Blackboard, but here is my syllabus. You can also see some materials from a previous session here, including Howard Seltman’s textbook (free online).

The students were expected to have already taken an introductory statistics course. After a short review of basic concepts and t-tests, we dove into more intermediate analyses (ANOVA and regression, contrasts, chi-square tests and logistic regression, repeated measures) and into how a good study should be designed (power, internal/external validity, etc.)

I’ve taught one-off statistics workshops before, and I’ve taught once-a-week semester-long Polish language classes, but this was my first experience teaching a full-length course in statistics. Detailed notes are below.

Teaching and time management:

- The summer teaching schedule was intense! Five days a week (one lab and four lectures), 80 minutes each day. Between preparing lecture notes, homeworks, labs, and exams on such quick turnaround, there was barely room to breathe. I’m very grateful that Mattia Ciollaro could TA for me, grading the homeworks very quickly so we could give students fast feedback. Even with Mattia’s help, I was focused entirely on the course and got essentially no research done.
- On the other hand, it was great for productivity in another way. I simply wouldn’t let myself show up to class unprepared. With the 10:30am lecture time, I would come to school by 8 or 8:30, have a couple of hours of productive work immediately (no surfing Facebook for an hour before I get started), and be ready in time for class. If I managed to prepare everything for class the night before, I’d still arrive early and start working on the following lecture or homework.

This schedule worked really well for me, and now that teaching is over, I want to do the same thing with my research: set a**daily**research meeting before lunch to ensure at least a few morning hours of daily progress, no matter how productive I am in the afternoon. I assume my advisor won’t have time for daily meetings, but my classmates have agreed to try it and hold each other accountable. We’ll see how it works. - I was lucky to be working on a course that’s been taught many times in the past, so I could just revise existing lecture notes and assessments instead of writing them from scratch. Still, I did a lot of revision for some of these materials, based on things that seemed they could be structured better or things that were unclear to students in class. Before the semester began I read the whole textbook and all lecture notes, but I wish I had also worked through all the homeworks, labs, and exams before the first day. They were really helpful when I was deciding what to focus on in lecture, and what takeaways the students should have… and would have been even more helpful if I had such a view of the overall course from day 1, not building it up a day or two in advance.

As (my undergrad classmate) Mel Chua said in a talk on EduPsych theory, educational psychologists know it’s helpful to work in this order: (1) Content (What should the students be able to do?), (2) Assessment (How will I test them to see if they can do it?), (3) Pedagogy (How will I teach the material?)

Instead, too often people work in the order (1) Content (What should we cover?), (2) Pedagogy (What will I say in lecture?), (3) Assessment (What should I ask them about what we covered?). I was guilty of this myself and hope to improve next time. - For next time, I need a checklist of basic things that I usually-remember-to-do-but-not-always. For instance, double-check before each lecture: Have I marked my notes with some conceptual questions I can ask students to discuss, instead of having to make them up on the fly? Have I written up objectives for the day, in the form “Students will be able to…”? Have I prepared answers to any questions they had yesterday?

Lectures and assessments:

- I’d never realized how much work it is to come up with a good homework assignment or exam question. I didn’t want to repeat questions from past homeworks, to remove any student temptation to cheat (I saw too many cheaters when I’ve TA’ed before). I found a lot of good real datasets in Ramsey and Schafer’s
*The Statistical Sleuth*which are also available online as CSVs and in an R package. Several datasets on DASL, the Data and Story Library, were handy too. I also love Useful Science, which posts one-sentence summaries of the latest scientific research and links to the original papers: I usually couldn’t get their data, but at least I could fake similar data to illustrate the same analysis. - I tried to break up my lectures with “think-pair-share” style student discussions. While introducing or reviewing each topic, I’d ask a conceptual question or a simple problem; give the students a minute or two to think and discuss in pairs or small groups; then ask each group to share their answers with the class. From my end, I found it a useful way to keep the students awake and engaged, to give myself a breather before moving on to the next topic, and to “debug” my lectures by catching misunderstandings quickly. Probing for understanding this way was much more effective than when I’d just ask “Are there any questions?” and get blank stares back.

However, it can be tough to invent good questions. Besides*The Statistical Sleuth*mentioned above, I also found good “review exercises” in Dowdy et al.’s*Statistics for Research*and on websites like “Tools for Teaching and Assessing Statistical Inference”. - The think-pair-share thing is partly inspired by Eric Mazur’s work on teaching introductory physics, via his book
*Peer Instruction*. Mazur was basically flipping the classroom before it was cool: Students read the textbook outside of class (no pre-made Khan Academy online lectures in those days), motivated by a reading-check quiz at the start of each class. Then, instead of a traditional lecture, the instructor spends the entire class time on asking probing conceptual questions, having students discuss in small groups and vote on the answer, and explaining any misunderstandings that show up in the wrong answers.

Mazur’s book has a slew of conceptual physics questions for instructors to use. There don’t seem to be as many big question banks like this for statistics yet, but I did find a Statistics Concept Inventory (SCI) developed about a decade ago at University of Oklahoma: read about the pilot study, or see Kirk Allen’s dissertation with an example SCI starting on page 423.

I’d like to Mazur’s approach the next time I teach. I wasn’t happy enough with our textbook to lean on it so heavily, but it’s not like my lectures were amazing either. I also need to be clearer with my in-class examples before I show up: there were a few bad moments when I stumbled around trying to remember what the outcome and explanatory variables were, or forgot whether we reject or fail to reject the null… - I heard about Mazur and similar work through Carl Wieman’s lecture here at CMU: “Taking a Scientific Approach to Science Education.” It’s worth watching the video, or at least reading his slides.
- I’d planned to have a semester-long theme of designing an experiment to bake the ideal cookie. But in practice, some topics were hard to relate back to cookies, and it was tough to come up with a really good quantitative outcome to measure on cookies (taste test ratings have their issues), and I wanted to use more psychology-related examples for these students anyway. Finally, the oven in my office building was removed midway through the summer and a new one hasn’t been reinstalled, so we couldn’t bake cookies on campus anyway

But we did do a little baking-related “experiment” on the first day, with measuring flour. I asked each student to use a measuring cup to measure 1 cup of flour, weigh it, and record its weight in grams. Then I demonstrated the “right” way to measure flour (spoon flour into cup lightly then level it off — don’t scoop directly from the bag) and asked students to take a new measurement. This let us have a small homemade dataset for talking about EDA (exploratory data analysis), bias, variance, experimental protocols and treatment application, etc. - I wanted to do more in-class activities but ran out of time/energy to find or invent good ones. Gelman and Nolan’s book Teaching Statistics: a bag of tricks has some great in-class activities for the introductory stats course, but most of them were not as relevant for my course. If you have ideas for Experimental Design in-class activities, please share!

Administrivia and logistics:

- I was lucky to have a pretty small class, so it was easy to get to know all the students and hear from everyone during each lecture. However, we had some problems with punctuality, especially near the end, and that is worse in a small class: it hardly makes sense to start on time if only 2 of the 6 students are there on time. But I still don’t want to start late and have less time to cover material or answer questions…

Other folks in the department suggested short start-of-class quizzes, so that if you show up late you’re only hurting yourself by giving yourself less time for the quiz. - Also, on days when homework was due, quite a few students would show up a few minutes late with their freshly-printed assignment. It’s hard to be strict about not accepting late work — what if they really did come straight from another class that let out late? Maybe I should make the homework due online, say, 30 minutes before class, with a strict online submission time and no late work accepted, so that they have no incentive to show up late.
- I don’t want to be a jerk about deadlines, but I also can’t have students turning in homework late if I want to post solutions and give feedback ASAP (with such a time-condensed semester). I tried to balance this by letting them drop two homeworks, but I think that was too much — very few people turned in the last two homeworks. Another idea may be to say your homework score denominator will be lower than the numerator, i.e. if there are 10 homeworks for 100 points each, your final score will be out of 800 instead of 1000. That way you can “drop” 2 homeworks, but it’s also worth it to do them and get extra credit. But that might be too much extra credit.
- Also, I can’t really afford to spend hours preparing an extra exam for students who can’t attend the exam date, unless they warn me well in advance. This was in the syllabus, but I gave in anyway. Next time I need to remind students a week in advance, “If you tell me
**after today**that you have a conflict, I cannot give you a makeup exam,” and then stick to my guns.

Likewise, for an extra project, I should have been clearer about setting deadlines and parameters in writing instead of verbal agreement. Lesson learned. - I’m really glad Mattia could grade the homeworks with a one-day turnaround (submit on Tuesday, get it back with feedback on Wednesday). I hate it when homeworks are returned so late that you forgot what they covered and get no benefit from the feedback. But next time, I need to ask my TA to give the homeworks to me first so I can look them over for common problems, instead of giving them back to the students directly.
- It’s so funny to be on the other side of the exam process. As the instructor, to me the exam seemed so easy there’s practically nothing to study… And it’s obvious to me how the whole semester ties together, so why bother even asking if pre-midterm material is on the final exam? Of course it is; this is inherently a cumulative course! But to the students it’s all mysterious, overwhelming, and potentially terrifying.
- The professor who usually teaches this course gives oral exams as alternate exams if the student can’t make the final exam date. This is supposed to intimidate the students a bit (to discourage them from requesting makeup exams without good reason), and it might be easier to invent new questions so that an early exam-taker doesn’t share the questions with their classmates. I tried this too, but in practice, I wasn’t sure how to deliver a good oral exam. I didn’t want to probe the student too deeply if they answered wrong, since if they got the right answer thanks to my followup questions, that would be unfair to those who took the written exam without such interaction. So in practice, I just made up a second written exam and basically read it out loud to the student. If this comes up again, I’ll need to think of a better way to do oral exams — or just make a second written exam and leave it at that.
- I used Blackboard for course management. It was not terrible, but certainly not the fastest or easiest way to upload and organize all my lecture notes, assignments, and datasets. A simple HTML site would be much easier to manage next time. But it was good for letting the students keep track of all their grades so far. I meant to compute their midsemester grades too, but forgot about it until much too late.
- This course normally runs for 11 weeks during the fall or spring semesters, with one lab a week, so 11 labs (one per major topic). The labs are helpful for giving the students guided SPSS practice. We had it condensed into only 6 weeks, with still just one lab a week. Next time I might suggest reserving the lab classrooms twice a week, so that we could have up to 12 labs. I can still make them short labs and lecture for part of the time, which would let me assign homeworks more flexibly. As it is, I didn’t want to assign certain homeworks too early since they required SPSS practice, but I also wanted students to do that assignment before we cover the next topic… It was tough to find a good timing.
- This course gave me my first experience of several aspects of teaching I hadn’t thought about much before, such as working with the good folks at the Office of Disability Resources, dealing with course withdrawals and other paperwork, calculating and inputting final grades, etc.

Other points:

- CMU’s Eberly Center seminars have been very helpful in preparing to think about teaching, syllabus design, lecture note styles, etc. I also requested a teaching observation, which provided really useful feedback. The observation was also written up as a memo that I can use in my teaching portfolio when I’m on the job hunt.

The Eberly folks also suggested that I do early course evaluations about 1/3rd of the way into the course, which I found very helpful.

I’m still waiting to see my end-of-session official course evaluations. - Since this is a service course for Psych majors, whose faculty will expect them to know SPSS, that’s the software I had to use in teaching this course. I’d never used it much, and I was pleased to find out it actually delivers a fairly good balance between flexible analyses for experts and ease-of-use for non-experts. The SPSS Syntax also provides a record of your point-and-click analysis, so there is some scope for reproducible research… but as far as I can tell, there’s no reproducible reporting like R’s knitr or Sweave.
- I can imagine teaching this course quite differently in the future. Following the same coverage as previous runs of this course, I hammered in the assumptions that ensure your ANOVA or regression p-values are valid (independent errors with constant variance, etc.), but we had very little scope to discuss what to do if the assumptions are
**not**met. Furthermore, many students were confused about parametric sampling distributions and how they relate to Type I error and power, even though they should have seen this in an earlier course.

I wonder if we couldn’t kill two birds with one stone if we start our stats courses with nonparametric methods instead, like Wilcoxon-Mann-Whitney and Kruskal-Wallis tests. They may be easier to explain and understand than the menagerie of Normal, t, F, and Chi-square distributions. Then we could say, “Don’t worry about the details, but standard ANOVA F-tests are just a fancy mathematical approximation to Kruskal-Wallis which have even more power**if**the assumptions are met.” Instead, now we teach F-tests first, confuse everyone, and say “Oh yeah, nonparametric methods are a simpler way to do this, but we don’t have time for them.” Hmm.

I don’t know of a good text like this at this intermediate level, but Noether’s*Introduction to Statistics: A Nonparametric Approach*is this textbook for intro stats. - Statistics naming is terrible! Within-subjects or between-subjects factors, vs. MS_within and MS_between. Blocking to reduce error variance, vs. SPSS “blocks” in its regression dialog. Errors in a statistical model (deviations from the mean), vs. Type I and Type II errors. Independent errors, vs. “independent variables” i.e. outcome variables. “Ex-” words: explanatory variables, exploratory data analysis, experimental design. Some students even confused confidence interval vs. confidence level. I really wish we had better names for things!

Overall, this teaching experience really helped me develop a better perspective of what a teaching career would be like at the college level. I’m very glad I did this early in my PhD, so I can (1) learn from it next time I teach, and (2) consider it as I decide how much to focus on staying in academic research, vs. shifting towards teaching, vs. moving back to government/industry.

Thanks for such a thoughtful post! I designed and taught a course on data analytics for the first time last term, and I came to many of the same conclusions about flipped classrooms and conceptual questions!

Thanks, Rachel! What did your data analytics course cover? I’m always curious how it differs from a “traditional” statistics course.

I’ve had success with homework hands in by being extremely harsh, but offering an out. For example, it really annoyed me that a decent proportion of students turned in homeworks that weren’t stapled together. So for the next homework I said, if you hand in unstapled homework you’ll get a 0, _unless_ you send me a photo of you holding a stapler and box of staples, that’ll I show to class. Only one person forgot the first week, and then never again

Thanks, Hadley. That’s a great idea to harness public embarrassment as a force for good