Ukraine and Poland

We have been gravely following the heartbreaking news from Ukraine.
Flag of Ukraine
I have written before about one set of my grandparents, and how they met as schoolteachers in the aftermath of WWII. Now, as I read news about evacuation trains from Ukraine to Poland, my mind keeps coming back to the reason why my grandmother’s parents settled in western Poland in the first place: Soon after the war, her father got advance warning that his family was about to be forcibly resettled to somewhere deep in the interior of Russia. Instead, they packed in a hurry and decided to travel west, west, west, as far from the USSR as possible. From formerly-northeastern-Poland they rode the slow, crowded train for several weeks. According to family lore, they stopped only when the train tracks literally ran out and they could go no further. In light of the past few weeks, it seems to have been a wise decision. She still lives in western Poland and is safe at the moment—but after seeing decades of what seemed like slow, grueling social and political change for the better, she never expected to be so near a war zone again in her 90s.

As for my grandfather, he became a history student at university but got in trouble with the Soviet police for his “critical stance towards reality” (i.e., asking questions and not toeing the party line). He was forced out without the degree he had earned and sent to a tiny rural town to teach Phys Ed., instead of history. Although it’s fortunate for me that he met my grandmother there, it took him years of waiting for a political thaw before he was allowed to finish his degree and teach his students the historical facts and contexts that he knew they needed to learn. As an educator who spent the rest of his life working to broaden the minds of his students and fellow citizens, he would be dismayed by the echo chambers that still exist in Russian state media today.

So what can we do, here and now? Out of all the many worthy causes that need urgent support, I’d like to highlight one: Helping Ukrainian people with intellectual disabilities and their families.


Living in a war zone is horrific for everyone. A group that needs particular help is folks (like one of my own children) with intellectual and mobility challenges, who can’t just get up and leave on their own even if the roads are open. Inclusion Europe and Ukraine VGO Coalition are collecting funds for direct assistance for Ukrainian families in this situation. Please keep these groups or similar causes in mind, if you are fortunate enough to be able to make charitable donations.

The other thing we can do is encourage our leaders to remain in solidarity with Ukraine, even when we start to feel the economic effects ourselves around the world. This debate is very active in Poland right now, where individuals and charities are rushing in to help Ukrainian refugees but worrying about how long they can sustain the effort. Here is (my own rushed translation of) an excerpt from an opinion piece by Katarzyna Pełczyńska-Nałęcz, former Polish ambassador to Moscow:

Can we afford gasoline at 10 zł/liter (~$9/gal)? Before we ask, let’s think about the stakes in this war. […] The first shock has passed. We are getting used to the reality of being a country on the war front. The price of gas is spiking. Food prices will rise soon too[…] We will have to share hospitals and schools with over a million refugees. We are starting to see exhaustion and anger. [Among other things,] anger at our government, which brags about how Poland has welcomed the refugees, even though actually the massive volunteer efforts of the populace are doing most of this work in the government’s place. […] And then we start to wonder if maybe this is all overblown, if there are limits to self-sacrifice, if maybe it’s not worth taking on such great costs, because we too have our own worries and debts and lives.

Yet at this moment, it’s important to remind ourselves what the stakes are.

[Because if Ukraine loses, then] another Iron Curtain will fall on our eastern border. Beyond it, the Russians will build a totalitarian state, which will root out everything that is Ukrainian and terrorize our neighbors into one “great” Russian nation. […] From Ukraine there will be not 2 million but 10-15 million refugees. And along our borders, from the Baltic Sea to the Bieszczady Mountains, the Russian military will be standing there armed to the teeth. Putin, threating us with his nuclear button, will demand that the Americans leave Poland. Many businesses, but also everyday people, will start to wonder whether Poland is indeed a country worth investing in and living in. […]

So when the difficult moments come – and in the coming days there will come more and more of them – when we are overwhelmed with frustration and doubt, when we think that maybe our government is right and we can’t afford 10 zł/liter gasoline, then let’s simply remember what the stakes are in this war.


Update: For any academic readers, I’m also passing along a note from David Swanson, Professor Emeritus of Sociology, University of California Riverside:

For those interested in assisting our Ukrainian colleagues, a website set up and maintained by faculty at Charles University in the Czech Republic is a site where one can post offers of aid (e.g., a visiting scholar position) and where colleagues in Ukraine can access information about job offers, fellowships etc. directly at one place in the internet: https://helpline-demography.eu/

Please feel free to send any information to info@helpline-demography.eu

In memoriam: Leland Wilkinson

I am saddened to hear that Lee Wilkinson passed away a few days ago. Wilkinson created the hugely influential concept of a “Grammar of Graphics” and wrote it up in a thorough, thought-provoking book. Through his writings and his own entrepreneurial spirit (he started SYSTAT and sold it to SPSS, then worked with Tableau and H20.ai among others), the Grammar of Graphics became a hugely influential idea1, adopted in many powerful data visualization software packages—Tableau, R’s ggplot2, Python’s plotnine, Javascript’s D3.js and Vega, the SPSS Graphics Production Language (GPL) and Visualization Designer, IBM VizJSON…

Leland Wilkinson

Wilkinson was supposed to speak at a Data Visualization New York meetup tomorrow; instead, it has become a memorial tribute session. The event is online and open to all. Meanwhile, I have seen heartfelt tributes to Wilkinson from a who’s who of the data visualization world: Hadley Wickham (developer of ggplot2), Nathan Yau (creator of FlowingData), Jessica Hullman (prolific dataviz researcher), Jon Schwabish (creator of PolicyViz), Jeff Heer (developer of D3.js and Vega)… Everyone reiterates that he was not only an influential scholar, but also a generous, kind, decent human being.

Apart from his visualization work, I loved Wilkinson’s voice in a report written mostly by him on behalf of the American Psychological Association’s 1999 Task Force on Statistical Inference. Here’s the note I wrote myself when I first ran across this report, and I still stand by it:

This is a really great, short, but fairly complete overview of major components in a statistical study...
i.e., the things you want your junior statistician colleague to know without being told...
i.e., the things we ought to teach AND MEASURE ON our stats students.

Two of my favorite quotes from that report:

“Statistical power does not corrupt.”

and

The main point of this example is that the type of “atheoretical” search for patterns that we are sometimes warned against in graduate school can save us from the humiliation of having to retract conclusions we might ultimately make on the basis of contaminated data. We are warned against fishing expeditions for understandable reasons, but blind application of models without screening our data is a far graver error.

I had the incredible good fortune of meeting Wilkinson myself at a conference, though regrettably just once. This was SDSS 2019 in Seattle—the last conference I attended in person before the pandemic. One groggy morning, I stepped away from my conference breakfast table to get a second cup of coffee. I came back to find that Wilkinson had just sat down, thinking the table was empty. We ended up having a genuinely delightful conversation. I asked how he had managed to combine so many fascinating strands of work in his career, and he told me it had been a roundabout path: if I remember correctly, he had dropped his math major in his first week of college and switched to English; then later dropped out of divinity school; then just barely finished Psychology graduate school because he couldn’t stop tinkering with computers instead; then became a statistical software entrepreneur… He also reminisced fondly about attending conferences as a young researcher, where he got to hear giants in the field get drunk at the open bar and tell their life story 😛 Wilkinson was a witty and warm conversation partner. After breakfast he invited me to keep in touch, and I deeply regret that I never followed up. Rest in peace, Leland Wilkinson.

Your sabbatical has been eaten by a grue

Nerd alert! Do you remember those old-school text adventure games, aka interactive fiction?

> GO EAST
You enter Jerzy's office. You see an accordion and some junk mail here.
> TAKE ACCORDION
Taken.
> PLAY ACCORDION
You don't know any tunes on the accordion.

…and so on? Well, I recently discovered the excellent “50 Years of Text Games” blog. It’s been fun to revisit some old memories and learn about some lost gems. Maybe you’ll enjoy it too.1

Logo for the 50 Years of Text Games blog
Continue reading “Your sabbatical has been eaten by a grue”

And we’re back!

It only took me 9 or 10 months after realizing the blog was broken… but I was finally able to take a whole day to muck around in the innards of WordPress and fix it. If you were happier with the static HTML placeholder that I had up for the past few months, it’s still available here 🙂

In other news, I’m starting a pre-tenure sabbatical for the upcoming academic year. I hope to find time for blogging about my research, as well as any interesting work I come across as I catch up with recent developments.

“Concise Statistical Year-Book of Poland, 1939”

Eighty years ago this week, my grandmother and grandfather were each enthusiastic seven-year-olds, excited for September 1st — their first day of school! At the time, they lived hundreds of kilometers apart and had yet to meet. She had spent her childhood in France but was thrilled to be back in ancestral Poland, in the north-eastern city of Wilejka, where she would finally be able to study in Polish. He was a voracious reader in Poznań, the westernmost large city in Poland at the time. Still, both had laid out their best clothes and filled a satchel with notebooks and pens.

Of course, it was not to be. My grandfather’s mother woke him in the middle of the night and brought him quietly down to the cellar, in the dark, past windows blacked out with curtains and blankets, as German forces began shelling the city. In the morning his apartment still stood, but he saw the broken walls and ruined rooms of the building next door. Meanwhile, my grandmother’s long-awaited Polish school was cancelled as well, eventually replaced by a Russian school as Soviet forces occupied her city.

Somehow, they survived World War II and eventually met as teachers, committed to the critical importance of education in rebuilding their broken country. My grandfather went on to become a professor of history and a leading figure at the University of Zielona Góra, in the city where they finally settled (and where I was born). A few years ago, when he passed away, I found some of the old statistical yearbooks he must have used as research resources.

Worn cover of my grandfather's copy of the 1939 Concise Statistical Year-Book of Poland

The yearbook from 1939 is particularly touching. As a physical artifact, it has clearly been through a lot: worn from use, spine broken, pages torn, stamped and underlined and scribbled all over.

Title page of my grandfather's copy of the 1939 Concise Statistical Year-Book of Poland, with stamps and inked-out scribbles

But it’s the “Foreword to the 10th Edition,” written in April 1939, that really moves me with its premature optimism:

The current edition of the Year-Book closes the first ten years of its existence. Today I can emphatically assert the great utility of this publication … It remains only necessary to express a hope that the Concise Year-Book, completing currently the first decade of its existence and beginning in the near future its second decade… will continually and increasingly fulfill its mission as set out in 1930…

Once again, it was not to be. The statistical service could not continue its planned work, once the war began in September. The Polish government-in-exile in London did manage to publish a Concise Statistical Year-Book for 1939-1941, summarizing what was known about conditions in the German- and Soviet-occupied territories. But the regular annual compilation and publication of Polish statistical yearbooks did not resume until after the war, in 1947 — and even then it was interrupted again during 1951-1955 as the Soviets in charge did not want to risk revealing any state secrets.

First page of foreword to my grandfather's copy of the 1939 Concise Statistical Year-Book of Poland
Second page of foreword to my grandfather's copy of the 1939 Concise Statistical Year-Book of Poland

The Polish Wikipedia has a good article on these statistical yearbooks, but unfortunately it’s not yet translated into English. However, you can skim through a scanned PDF of the whole 1939 yearbook. For instance, the lovingly hand-drawn population density map reminds us that there were precursors to the (also beautiful) census dot maps based on 2010 US Census data.

Population density dot map from the 1939 Concise Statistical Year-Book of Poland

Now, on this 80th anniversary of the war, my own son is eager to start school, while I am preparing to bring the 1939 yearbook to my fall course on surveys and censuses. I am grateful that our life today is so much better than my grandparents’ was, even if it’s hard to be optimistic about the state of the world when you hear the news lately. All we can do is roll up our sleeves and get back to work, trying to leave the place better than we found it.

Another Pole, the poet Wisława Szymborska, said it well:

The End and the Beginning
After every war
someone has to clean up.
Things won’t
straighten themselves up, after all.

Someone has to push the rubble
to the side of the road,
so the corpse-filled wagons
can pass.

Someone has to get mired
in scum and ashes,
sofa springs,
splintered glass,
and bloody rags.

Someone has to drag in a girder
to prop up a wall,
Someone has to glaze a window,
rehang a door.

Photogenic it’s not,
and takes years.
All the cameras have left
for another war.

We’ll need the bridges back,
and new railway stations.
Sleeves will go ragged
from rolling them up.

Someone, broom in hand,
still recalls the way it was.
Someone else listens
and nods with unsevered head.
But already there are those nearby
starting to mill about
who will find it dull.

From out of the bushes
sometimes someone still unearths
rusted-out arguments
and carries them to the garbage pile.

Those who knew
what was going on here
must make way for
those who know little.
And less than little.
And finally as little as nothing.

In the grass that has overgrown
causes and effects,
someone must be stretched out
blade of grass in his mouth
gazing at the clouds.

After 10th semester of statistics PhD program

FINAL BOSS BATTLE! (Warning: severe nerdiness ahead.)

Previous posts: the 1st, 2nd, 3rd, 4th, 5th, 6th, 7th, 8th, and 9th semesters of my Statistics PhD program.

Research

It’s a strange thing to finish a thesis. You came in to grad school hoping, maybe even expecting, to make a grand contribution—a deep insight into the nature of the world. But Borges puts it best:

…the tall, haughty volumes that made a golden dimness in the corner of his room were not (as his vanity had dreamed them) a mirror of the world, but just another thing added to the world’s contents.

—Jorge Luis Borges, “The Yellow Rose,” Andrew Hurley’s translation

For that matter, when do you really “finish”? The dissertation is pretty much written — then a few weeks later the slides are ready — then the defense itself takes place — then you make revisions — then there’s a graduation ceremony — and even then, there remain a few minor “suggested” revisions I haven’t done yet but should do before getting it printed & bound & archived. [That finally happened half a year later, over January break, long after the PhD was “over.”]

Not to mention, my status felt weirdly ambiguous. The faculty at Colby have already hired me and seem to treat me as basically a colleague, pending the minor formality of defending. And at CMU I’m also teaching a course and students are calling me “Professor.” But of course, everyone else in the department is used to treating me as a student, so I feel very much like one. Admittedly, during the few weeks when I was no longer a “PhD Candidate” but not yet officially “Dr Wieczorek,” I enjoyed the brief chance to introduce myself as a “Graduand” 🙂

But it’s all very anticlimactic! There’s no trumpet fanfare saying: “NOW you are done.” If anything, it felt like the first Zelda game’s end music (starting about 20 sec into the clip below):

As you’ve played the game through, you’ve died many many times along the way, and Link’s death lullaby has been playing so long in all your lowest moments that you don’t even really hear it anymore. Then here it returns, after you thought you were done. Not this again?!? Yet suddenly your ears catch a faint counterpoint that wasn’t there before. This new bass line changes the character of the music completely, getting stronger and stronger, until somehow it has become a jazzy, celebratory tune. There’s no one moment of enlightenment, but a weary transition as the world slowly starts looking better and better.

Teaching

While I didn’t finish as much of the thesis last fall as I expected, it was supposed to be my focus this spring (along with the new baby!), with only minimal grading responsibilities as a TA.

HOWEVER… a week before spring term started, a last-minute shift in staffing became necessary and I was asked to teach 36-315, Statistical Graphics and Visualization—a hundred-student undergraduate class. I really should have said No (the chair told me it’s okay to say No!), but still, the department seemed to be at a loss, and I didn’t want to let them down after all they had done for me throughout my time at CMU. So instead, I found myself focused on teaching the largest class I’ve had yet, while just scraping by on both research and parenting.

Teaching this class was a valuable experience in and of itself, and you know I LOVE teaching data visualization, but—argh!

WHO CARES / SICK OF GRAPHS

This spring I also finished the last requirement for the Eberly Center’s Future Faculty Program. It was a bit funny to have to revise my Statement of Teaching Philosophy several times, given that I’d already written one that got me a job, but it does feel good to wrap this up formally. The program made a big impact on my teaching over the past few years, and I’m glad I can contribute to their completion rate.

Other projects

We continued our Teaching Statistics research, running many more think-aloud interviews, and presented our work at eCOTS 2018 (the Electronic Conference on Teaching Statistics).

I also traveled back to DC, to the Census Bureau, to give a talk on my paper with Tommy Wright and Martin Klein. The paper is about visualizations designed to help readers make many comparisons in a statistically-appropriate way. The talk was recorded, and now I see the video has been posted as part of a new “Census Academy” resource.

Life

The baby went through a rough patch where he wouldn’t fall sleep unless someone was holding him and moving around—sitting wasn’t good enough. Somehow I discovered that I could hold him while playing Crypt of the NecroDancer, which is basically a mashup of Dance Dance Revolution and Legend of Zelda: you use the arrow keys to move your dungeon-explorer in time with the beat of this amazing techo-ish soundtrack. I would play with one hand, standing and rocking back and forth to help myself stick to the beat, and the little guy would just fall asleep contentedly while I repeatedly destroyed got killed by skeletons and slime monsters.

Despite everyone’s sensible advice, we went house-hunting in Maine in mid-March—when all the houses were covered in 3 feet of snow—because spring break was the only time I could get away. Surprisingly, nobody else was house-hunting there that week 🙂 and as it turned out, we found a wonderful home at a very reasonable price. Our landlord in Pittsburgh has been wonderful, but we’re excited to become homeowners.

Comics

So, at this point you might be wondering: Is a PhD degree in Statistics worthwhile?

Someday I’ll follow up with a post on that, akin to my MS degree post. Meanwhile, Matt Might has the best post on the value of a PhD in general.

But for today, I will leave you with some comics that kept me crying smiling during the tough times of a PhD. Of course, PHD Comics (aka Piled Higher and Deeper) and XKCD both go without saying, but here are a few others that felt relevant.

When you’re starting down this precarious road:

It's an empty journey to triumph if you don't plant the seeds of catastrophe along the way.

When you forget why you’re even doing this:

I'm not worried. I'm just dwelling on difficulties and unknowns.

Well. Stop wanting that.

A man wants something. Later he's not so sure.

When you have spent years with almost no human company besides other academics no human company:

Become homies with your inner demons.

Yeah! Carpe!

When you are suddenly expected to interview for jobs with bona fide humans, after spending years around aforementioned academics:

'Interestingly, I have no common sense whatsoever.' 'That's not the sort of thing you should say during a job interview.' 'I don't see why not.'

When you are neeeeeearly done, and if you could just focus on your existing responsibilities you’d be able to wrap them up:

Later, the task is not done

Look, we all got problems

When you somehow manage to buckle down and knock out a ton of research/writing like a boss:

Because she is a professional, and she behaves like one!

When you finally submit the damned dissertation:

We didn't finish. You just stopped working.

'That’s not a book! You scribbled your hypothesis on some sheets of paper and folded them.' 'Buuut?' 'But it does seem more plausible now, yes.'

When you are as ready as you’ll ever be for the thesis defense:

The mixture of boredom and drowsiness created a state where, to the human mind, ANYTHING WAS POSSIBLE.

When you look back on your time as a PhD student with a note of defiant optimism:

My unsinkable affection for the world

After 9th semester of statistics PhD program

The final year! Our 2nd baby was on the way, and the job search was imminent. Oh, and by the way there’s this little thing called “finishing your dissertation”…

Previous posts: the 1st, 2nd, 3rd, 4th, 5th, 6th, 7th, and 8th semesters of my Statistics PhD program.

Research

This year my advisor was able to place me on half-TA’ing grant support. So I asked to have all my TA’ing pushed to the spring, with the intent of finishing the bulk of my thesis work this fall. I figured that in the spring I’ll be OK to grade 10 hrs/wk on my own time while helping with the new baby (due in Dec), as long as the thesis is mostly done.

However, job hunting took a HUGE amount of time. That’ll be its own post. (Also the spring TA duties turned into something larger, but that’s a story for the next post.)

In other words, thesis research progress was… minimal, to put it mildly.

Other projects

Well, OK, so there’s another reason my thesis work was slow: I jumped onto two new short-term projects that (1) were very much in my wheelhouse, and (2) seemed like great fodder for job applications and interviews. Both of them arose thanks to Ryan Tibshirani. Unsurprisingly, it turns out that it’s GREAT for a grad student to be on good terms with a well-known professor who gets too many requests for interesting projects and occasionally passes them on to students. In both cases, it was fantastic of Ryan to think of me, and although it’s been tough to be doing it just right now, this is legitimately the kind of research I want to do later on (and probably should have done for my thesis in the first place! argh…).

Another side project

First, knowing that I’m interested in education, Ryan asked if I’d like to help with some consulting for Duolingo, the language-learning app company (also founded by CMU folks and still Pittsburgh-based). The folks there had some interesting questions about experimental design and suitable metrics for A/B testing their app. One of our contacts there was Burr Settles, who wrote a helpful book on Active Learning (the ML technique, not the pedagogy). We had some good discussions after digging into the details of their problem: challenges of running permutation tests on massive datasets, whether old-school tests like sign-rank tests would be better for their goals, how the data-over-time structure affects their analyses, etc. These chats even led me to an old Portland State classmate’s work regarding how to get confidence intervals from a permutation test, and I have some ideas for extensions on that particular problem.

Second, knowing that I had experience with survey data, Ryan passed on to me an email request to review some new methodology. Varun Kshirsagar, working on the Poverty Probability Index (PPI), had made some revisions to this poverty-tracking method and wanted to get feedback from someone who understood both modern statistics & machine learning tools as well as the kind of complex-survey-design datasets used to estimate the PPI models. It was a real pleasure working with Varun, and thinking about how to combine these two worlds (ML and survey data) had been a huge part of my motivation to go back to grad school in the first place and attend CMU in particular. We wrote up a short paper on the method for the NIPS 2017 workshop on ML for the Developing World, and we won the best paper award 🙂 I plan to revisit some of these ideas in future research: How do we run logistic-regression lasso or elastic net with survey-weighted data? How should we cross-validate when the survey design is not iid?

Teaching

Alex and I also continued running the Teach Stats mini. This semester we decided it was a bit embarrassing that Statistics, of all fields, doesn’t have a better standardized assessment of what students are learning in our intro courses. Without such a tool, it’s hard to do good pedagogy research and give strong evidence about whether your interventions / new methods have any impact.

There are already a couple of assessment instruments out there, but most were written by psychologists or engineers rather than by statisticians. Also, many of their questions are procedural, rather than about conceptual understanding. Even though these assessments have passed all the standard psychometric tests, there’s no guarantee that whatever they measure is actually the thing we’d *like* to measure.

So we started discussing what we’d like to measure instead, drafting questions, and planning out how we would validate these questions. Inspired by my own experiences in user-experience research at Olin and Ziba, and with the help of an article on writing good assessments for physics education by Adams and Wieman, we started planning think-aloud studies. The idea is to watch students work through our problems, talking out loud as they do so, so that we can see where they trip up. Do they get it wrong just because the question is poorly worded, even if they understand the concept? Do they get it right just because you can eliminate some choices immediately, even if they don’t understand the concept at all?

We ran a pilot study this fall, and at the end of the semester, I had my first chance to help draft an IRB application—hurrah for statisticians actually working on research design!

This summer and fall, I also kept up with an informal History of Stats reading group organized by fellow PhD student Lee Richardson. Again, these journal-clubs and reading-groups have been my favorite part of the PhD curriculum, and I wish more of our “real” classes had been run this way.

Life

?!? “Life”? Not much that I can recall this semester.

Next up

The 10th and final semester of my Statistics PhD program.

After 8th semester of statistics PhD program

I realize this is over 2 years late, but I found these drafts sitting around and perhaps they are still worth posting.

Apologies for the negativity! I have to admit this was one of the toughest semesters for me, psychologically. I thought about toning it down, especially since I’m delighted to be where I am now—tenure track faculty at a small liberal arts college—but I don’t want to pretend it’s been easy to get here.

I believe this was the semester I was going to CMU’s Counseling and Psychological Services (CaPS). If you’re a grad student, I recommend that you get to know such resources on your campus. Just about everyone faces the all-pervasive impostor syndrome, which compounds any other specific challenges you might have personally, and it’s helpful to be able to talk through it all with someone experienced.

Previous posts: the 1st, 2nd, 3rd, 4th, 5th, 6th, and 7th semesters of my Statistics PhD program.

Life

The more my son grows up, the more the PhD starts to feel like an overwhelming distraction from real life. One day I got home late and my wife told me what he’d been singing:

“Old MacDonald had a daddy, E-I-E-I-O, with a work-work here and a work-work there…” 🙁

If you have a family while in grad school, you will often feel that either the PhD doesn’t come first, or your family doesn’t come first. The academic folks around you will lean towards the PhD coming first, which of course partly makes sense—if you’re not making the fullest use out of your limited time in grad school, why bother going at all?—but it’s also hugely selection bias. Many (though not all) of the faculty who land and stay at a top research university are people who did decide that work comes first.

Thankfully, nobody has ever sneered to my face that “Ah well, not everyone’s cut out for academia” (though I’ve heard of it happening in other departments). But I feel it in my head all the time. (And I overhear younger students dismissing 30+ year olds like myself as too old to do good research… And I see the offices full every evening and weekend…) At another stage in life, my reaction might have been, “Oh yeah? I’ll show you, I’ll work my butt off and prove I’m good enough to cut it!” But now, my reaction is, “I’ve got better things to do than play this petty game.” Yes, I’ll plod along and finish the PhD I’ve started—perseverance counts for a lot—but right now I am not eager to stretch out this stage of life any longer than necessary.

Research

  • Reading Thinking, Fast and Slow, I am noting the constant mentions of the author’s collaboration with Amos Tversky. I think I’ve been way too focused on self-sufficiency here. It’s true that to get the thesis, I need to demonstrate I can do the work on my own… BUT in order to actually succeed afterwards (whether in academia or not), I’ll need to collaborate with others, not just myself and advisor. Plus, it’s simply more fun, spending those hours tackling a tough problem with an equally interested collaborator! So, my plan starting this summer and into next year: Add a few collaborative projects—finish FPS with Daren, finish DL with Nick and Jordan, revisit CIs/AIPE with Alex, start something (visual inference? CV with confidence?) with Justin… [Looking back, I regret that I did not follow up and make most of these planned collaborations happen while I was still a student!]
  • Jing still amazes me with his quick insight and intuition about how to tackle a proof. When I get stuck after spending hours on something, it takes him almost no time to see: if we back up a few steps to this other point, and tackle that instead, it’ll be much cleaner. This trait is taking me a long time to learn.
  • Daren argues that technical math prowess is definitely not sufficient (good ideas of *what* to study are more important), but also not necessary (even theory-wizards like Yenchi and Ale have to constantly look up things they’ve forgotten). I disagree a bit: I really think fluency in the big-picture math concepts is important—if I have memorized the definition of an eigenvalue, but not internalized its *meaning*, then I will never see big-picture ideas quickly enough, nor know how to start proving technical details, nor recall where to find those details in work I’ve seen before. It’s like when I play clarinet: I don’t need to memorize the piece I’m playing—sheet music is fine—but I *do* need to memorize the basics. I simply cannot play in real-time if I have to refer back to a fingering chart for every single note! In Joel Spolsky’s words (although see Lang’s actual article too):

    Serge Lang, a math professor at Yale, used to give his Calculus students a fairly simple algebra problem on the first day of classes, one which almost everyone could solve, but some of them solved it as quickly as they could write while others took a while, and Professor Lang claimed that all of the students who solved the problem as quickly as they could write would get an A in the Calculus course, and all the others wouldn’t. The speed with which they solved a simple algebra problem was as good a predictor of the final grade in Calculus as a whole semester of homework, tests, midterms, and a final.

    You see, if you can’t whiz through the easy stuff at 100 m.p.h., you’re never gonna get the advanced stuff.

  • It’s also hitting me how stupidly selfish I’ve been here. As much as I’d like to think so, I didn’t come here to solve practical problems and make the world a better place. If I had, I’d have started right off the bat, using (and honing) skills I *do* have, working in the Census research group to make serious progress on applied problems. Instead, I wanted to bone up on my math-theory side, just thinking about the glory of proving theorems, but without putting in the prep work. It’s true that I’ve learned a lot by working on theory problems with Jing—but I would have been so much more productive if I’d taken a few hardcore math classes first, brushing up on my weak spots *before* starting such research. (I took linear algebra over a decade before starting the PhD, and it really shows. Yes, I can multiply matrices, but the advanced stuff has been a real slog.) I’ve spent a couple of years now on work that other students could have done much faster and enjoyed more, while neglecting to make real contributions using the skills I *do* have. In other words, I wish I could go back and tell myself: *either* just take some math classes (or even get a MS in Math?), until you can do the theory on your own (or with a mentoring colleague at work), and skip the Stats PhD… *or* do the PhD in a way that builds on your strengths (and makes real contributions!), not merely papers over your weaknesses. Sadly, I probably wouldn’t have believed me. My teaching experiences and the Eberly center seminars have been wonderful, but otherwise, right now I feel I have not made good use out of my time here. (Even in my neuroscience ADA project flop, only a few logistical challenges were out of my hands—and I could have overcome most of them by gritting my teeth and learning Python well, and by sitting alongside the scientists in the lab.) Hindsight is 20/20, and everyone goes through impostor syndrome, but still…

Teaching

I was a TA for Ann Lee’s section of 36-402: Undergraduate Advanced Data Analysis, using materials developed by Cosma Shalizi.

  • The course largely followed Cosma’s (draft) textbook Advanced Data Analysis from an Elementary Point of View. It was good for me to be “forced” to read up a little on causal inference and related topics. I’m still no expert, but at least not clueless. I also liked his perspective of statistical modeling as “data compression,” and his view of regression as a linear smoother with *weird* weights.
  • Some students mentioned that having to code up cross-validation from scratch 5 times or more was a *good* part of the class. They really feel they understand it now, more so than other things which they never or rarely had to code directly—such as backfitting in GAMs. I worried that repeatedly writing CV from scratch would start to feel like busywork, but luckily not (at least for these few students). And I felt the same about taking the Convex Optimization class myself: it’s great to have repeated practice *coding up the algorithms directly* and understanding what they’re trying to do, even if it’s only practice and in reality you’d actually use a pre-canned routine that deals with subtleties such as numerical convergence issues. So, in future years, we should give more opportunities to practice coding up algorithms, not just deriving theory about them and using them for data analysis. (Not to mention the omitted issues of data collection and power calculations…)
  • By the end of the semester, so many students still didn’t understand the idea of additive vs interaction models. They assumed “additive model” specifically means “GAM with spline terms” and “interaction model” means “linear regression with interaction terms.” We should hit these points harder earlier: “additive” means *any* model that is additive in the terms; and you can certainly do interactions within a GAM by having a multiple-predictor spline term; and so on.
  • If I’m going to be strict about not accepting late HWs, I should do so from the very beginning. It’ll (hopefully) save me ages of back-and-forth emails from students with excuses over the course of the semester. Also, if the promise of no-credit-for-late-HWs only kicks in at the end of semester, all of a sudden, then some students may have already used up their free dropped-HW opportunities, so they get a much lower grade than expected even if they *do* the work (but just submit it late). That’s not *technically* unfair (the syllabus did say we’d reject late HWs)… but it *feels* unfair. Best to set up consistent and clear expectations, right?
  • Likewise, if going to be serious about saying that “on time” means “at the start of class,” then have a TA pick up the HWs right at that time. We saw a trickle of late students (or all showing up at end of class) dumping in HWs after the fact. (Maybe electronic submission, with the deadline enforced by your course-management software, is not so bad.)
  • I’m pleased that we had decent turnaround time for grading most weeks—but it was sad that so many students never bothered to pick up graded HWs. We need better incentives to figure out your mistakes and learn from them, not merely be graded on them. (Alternately, it’s tempting to say that if you *don’t* pick up X of your early HWs, then you accept “stochastic grading” for the rest—we’ll give you a random grade and save time by not grading them manually!)
  • The Blackboard discussion forums were painful to set up and navigate. We should have used Piazza instead.
  • How would I grade such a class with specs-based grading? There are so many details to demonstrate understanding of, and so many ways to lose points on current assignments. How to get around point-grubbing here?

Other projects

  • I made no progress on the FPS paper with Daren, nor on the DL paper with Nick 🙁 At least the FPS paper was submitted to a conference… and rejected by reviewers who didn’t understand the purpose of the paper. I should have quickly revised the introduction to reframe our goals clearly and sent it somewhere else, but instead it’s been sitting on my desk.
  • This semester (or maybe near the end of last term?) I volunteered to join the GenEd committee. This is a mostly-faculty committee, revising the general education requirements in the CMU college (Dietrich College of Humanities and Social Sciences) to which our Statistics department belongs. It’s been eye-opening to see how faculty meetings go behind the scenes. (In particular, it’s fascinating that even top humanities scholars at a top department have trouble concisely defending the humanities as a GenEd requirement. There are also long digressions to quibble over a single word—“competence” is like a loaded gun, and even clearly-temporary placeholders like “Off-campus experiences” hold up the meeting interminably once someone points out that undergrad research also falls under that heading and it must be renamed right now…) But seriously, it’s been great to hear some truly remarkable educators discuss the direction of our programs, from broad goals to particular pedagogical methods. As a statistician, naturally I volunteered for the Assessment subgroup, and it has been so wonderful to work with experts like Marsha Lovett.
  • I did well in the “Three Minute Thesis” (3MT) competition! Grad students have 3 minutes and 1 slide to summarize their work for a wide audience. I was surprised and thrilled to win my preliminary round. Unfortunately the finals were the same date as an out-of-town trip I could not miss, so I did not get to compete further, but they still kindly gave me a prelim-round prize of research funds which I used for travel to USCOTS.
  • I presented my dataviz course poster at the US Conference on Teaching Statistics (USCOTS), in State College, PA. During the poster session I was pleased to meet several folks from stats departments looking to hire new faculty, and they seemed pleased to meet a grad student close to defending who is interested in both pedagogy and research. I believe this was my first contact with Chris Malone at Winona State University and KB Boomer at Bucknell University.
  • The CMU student-run Data Science Club also invited me to give a high-level talk on principles of good data visualization. Here are my slides, a checklist of best practices to follow, R code, and the nhanes and obesity_education CSV files.
  • Alex Reinhart and I proposed and ran a mini (half-semester course) on Teaching Statistics, with Rebecca Nugent advising. It was run mostly as a journal club—a good chance to read and discuss interesting papers on pedagogy in general as well as (introductory) statistics education in particular.
  • Finally, I had my first R package RankingProject accepted on CRAN! Together with Tommy Wright and Martin Klein, my former colleagues back at the Census Bureau, I have been working on a paper about ways to visualize data that lead to correct inferences about comparing many different estimates. This package contains the code and data to implement techniques we recommend in the paper, “A Primer on Visualizations for Comparing Populations, Including the Issue of Overlapping Confidence Intervals” [which went to print in May 2019 in The American Statistician]. The package also was listed as one of RStudio’s top 40 new packages in March 2017, and it has had a respectable number of downloads so far: total Total CRAN downloads for RankingProject package, with a monthly average of Monthly CRAN downloads for RankingProject package.

Next up

The 9th and 10th semesters of my Statistics PhD program.

Statistical Science conversations, and in memoriam

The “Conversations” sections of Statistical Science are now available for open access. These interviews are valuable perspectives on the history of our field. But as I look over the list of names here, I am sad to reflect on the influential statisticians who passed away in 2016.

Earlier this year, I know we lost Peter Hall and Charles Stein, important contributors to statistical theory and practice.

This month, my department bid farewell to Steve Fienberg, a wonderful mentor, teacher, and researcher. His work on categorical data informed several of my projects back at the Census Bureau. I fondly remember the warm welcome my family received from Steve and his wife Joyce when we arrived at CMU. I regret I never took the opportunity to collaborate directly on his many fascinating projects, which included a wide range of topics like human rights, Census work, privacy & confidentiality, and forensic science.

Steve’s “Conversations” interview from 2013 contains many nuggets of wisdom on theory vs. practice, success in grad school, life in academia and beyond, etc. There was also a good interview at Statistics Views last year.
He was a pillar of the department and the broader statistical community, and we miss him dearly.

After 7th semester of statistics PhD program

I was lucky to have research grant support and minimal TAing duties this fall, so all semester I’ve felt my research was chugging along productively. Yet I have less to show for it than last semester—I went a little too far down an unrewarding rabbit-hole. Knowing when to cut your losses is an important skill to learn!

Previous posts: the 1st, 2nd, 3rd, 4th, 5th, and 6th semesters of my Statistics PhD program.

Research

Having defended my proposal this summer, I spent a lot of time this fall attacking one main sub-problem. Though I always felt I was making reasonable progress, eventually I discovered it to be a dead-end with no practical solution. I had wondered why nobody’s solved this problem yet; it turns out that it’s just inherently difficult, even for the simplest linear-regression case! Basically I wanted to provide finite-sample advice for a method where (1) the commonly-used approach is far from optimal but (2) the asymptotically-optimal approach is useless in finite samples. I think we can salvage parts of my work and still publish something useful, but it’ll be much less satisfying than I had hoped.

Working on a different problem, it felt encouraging to find errors in another statistician’s relevant proof: I felt like a legitimate statistician who can help colleagues notice problems and suggest improvements. On the other hand, it was also disappointing, because I had hoped to apply the proof idea directly to my own problem, and now I cannot 🙂

On a third front, my advisor invited another graduate student, Daren Wang, to help us wrap up a research project I had started in 2015 and then abandoned. Daren is bright, fast, and friendly, a pleasure to collaborate with (except when I’m despairing that it only took him a week to whiz through and improve on the stuff that took me half a year). Quite quickly, we agreed there’s no more to be done to make this project a much-better paper—so let’s just package it up now and submit to a conference. It was satisfying to work on writing and submitting a paper, one of the main skills for which I came to grad school!

Finally, I was hoping to clear up some stumbling blocks in an end-of-semester meeting with several committee members. Instead, our meeting raised many fascinating new questions & possible future directions… without wrapping up any loose ends. Alas, such is research 🙂

Classes

As I’ve noted before, I audited Jordan Rodu’s Deep Learning course. I really liked the journal-club format: Read a paper or two for every class session. Write a short response before class, so the instructor can read them first. Come prepared to discuss and bring up questions of your own. I wish more of our courses were like this—compared to lecture, it seems better for the students and less laborious for the instructor.

Although it was a theory course, not hands-on, I did become intrigued enough by one of the papers to try out the ideas myself. Together with classmate Nicolas Kim, we’re playing around with Keras on a GPU to understand some counterintuitive ideas a little better. Hopefully we’ll have something to report in a couple of weeks.

I also started to audit Kevin Kelly’s undergrad and grad-level courses on Epistemology (theory of knowing). Both were so fascinating that I had to drop them, else I would have done all the course readings at the expense of my own research 🙂 but I hope to take another stab someday. One possibly-helpful perspective I got, from my brief exposure to Epistemology, was a new-to-me (caricatured) difference between Bayesian and classical statistics.

  • Apparently most philosophy-of-science epistemologists are Bayesian. They posit that a scientist’s work goes like this: You are given a hypothesis, some data, and some prior knowledge or belief about the problem. How should we use the data to update our knowledge/belief about that hypothesis? In that case, obviously, Bayesian updating is a sensible way to go.
  • But I disagree with the premise. Often, a scientist’s work is more like this: You’re not handed a hypothesis or a dataset, but must choose them yourself. You also know your colleagues will bicker over claims of prior knowledge. If you come up with an interesting question, what data should you collect so that you’ll most likely find a strong answer? That is, an answer that most colleagues will find convincing regardless of prior belief, and that will keep you from fooling yourself? This is the classical / frequentist setting, which treats design (of a powerful, convincing experiment / survey / study) as the heart of statistics. In other words, you’re not merely evaluating “found” data—your task is to choose a design in hopes of making a convincing argument.

Other projects

Some of my cohort-mates and I finally organized a Dissertation Writing Group, a formal setting to talk shop technically with other students whose advisors don’t already hold research-group meetings. I instigated this selfishly, wanting to have other people I can pester with theory questions or simply vent with. But my fellow students agreed it’s been useful to them too. We’re also grateful to our student government for funding coffee and snacks for these meetings.

I did not take on other new side projects this fall, but I’ve stayed in touch with former colleagues from the Census Bureau still working on assessing & visualizing uncertainty in estimate rankings. We have a couple of older reports about these ideas. We still hope to publish a revised version, and we’re working on a website to present some of the ideas interactively. Eventually, the hope is to incorporate some of this into the Census website, to help statistical-novice data users understand that estimates and rankings come with statistical uncertainty.

Finally, I heard about (but have not attended) CMU’s Web Dev Weekend. I really like the format: a grab-bag of 1- or 2-hour courses, suitable for novices, that get you up and running with a concrete project and a practical skill you can take away. Can we do something similar for statistics?

Topic ideas where a novice could learn something both interesting and
useful in a 1.5h talk:

  • How not to fool yourself in A/B testing (basic experimental design and power analysis)
  • Befriend your dataset (basic graphical and numerical EDA, univariate and bivariate summaries, checking for errors and outliers)
  • Plus or minus a bit (estimating margins of error—canned methods for a few simple problems, intro to bootstrap for others)
  • Black box white belt (intro to some common data mining methods you might use as baselines in Kaggle-like prediction problems)

Many of these could be done with tools that are familiar (Excel) or novice-friendly (Tableau), instead of teaching novices to code in R at the same time as they learn statistical concepts. This would be a fun project for a spring weekend, in my copious spare time (hah!)

Life

Offline, we are starting to make some parent friends through daycare and playgrounds. I’m getting a new perspective on why parents tend to hang out with other parents: it’s nice to be around another person who really understands the rhythm of conversation when your brain is at best a quarter-present (half-occupied by watching kid, quarter-dysfunctional from lack of sleep). On the other hand, it’s sad to see some of these new friends moving away already, leaving the travails of academia behind for industry (with its own new & different travails but a higher salary).

So… I made the mistake of looking up average salaries myself. In statistics departments, average starting salaries for teaching faculty are well below starting salaries for research faculty. In turn, research faculty’s final salary (after decades of tenure) is barely up to the starting salaries I found for industry Data Scientists. Careers are certainly not all about the money, but the discrepancies were eye-opening, and they are good to know about in terms of financial planning going forward. (Of course, those are just averages, with all kinds of flaws. Particularly notable is the lack of cost-of-living adjustment, if a typical Data Scientist is hired in expensive San Francisco while typical teaching faculty are not.)

But let’s end on a high note. Responding to a question about which R / data science blogs to follow, Hadley Wickham cited this blog! If a Hadley citation can’t go on a statistician’s CV, I don’t know what can 🙂

Next up

The 8th, 9th, and 10th semesters of my Statistics PhD program.