Commandeering a map from PDF or EPS, using Inkscape and R

I love Nathan Yau’s tutorial on making choropleths from a SVG file. However, if you don’t have a SVG handy already and instead you want to repurpose a map from another vector format such as PDF or EPS, there are a few extra steps that can be done in the free tool Inkscape. And while I’m at it, how could I turn down the opportunity to replicate Nathan’s Python approach in R instead?

The following was inspired by the 300-page Census Atlas of the United States, full of beautiful maps of 2000 decennial census data. I particularly liked the small multiples of state maps, which were highly generalized (i.e. the fine detail was smoothed out) but still recognizable, and DC was enlarged to be big enough to see.

I have wanted a map like this for my own purposes, when mapping a variable for all 50 states and DC. Unfortunately, I haven’t been able to track down any colleagues who know where to find the original shapefiles for this map. Fortunately, several images from the Census Atlas are available in EPS format near the bottom of this page, under “PostScript Map Files.” With access to such vector graphics, we can get started.

Continue reading “Commandeering a map from PDF or EPS, using Inkscape and R”

Making R graphics legible in presentation slides

I only visited a few JSM sessions today, as I’ve been focused on preparing for my own talk tomorrow morning. However, I went to several talks in a row which all had a common problem that made me cringe: graphics where the fonts (titles, axes, labels) are too small to read.

You used R's default settings when putting this graph in your slides? Too bad I won't be able to read it from anywhere but the front of the room.

Dear colleagues: if we’re going to the effort of analyzing our data carefully, and creating a lovely graph in R or otherwise to convey our results in a slideshow, let’s PLEASE save our graphs in a way that the text is legible on the slides! If the audience has to strain to read your graphics, it’s no easier to digest than a slide with dense equations or massive tables of numbers.

For those of us working in R, here are some very quick suggestions that would help me focus on the content of your graphics, not on how hard I’m squinting to read them.

Continue reading “Making R graphics legible in presentation slides”

JSM 2012: Sunday

Greetings from lovely San Diego, CA, site of this year’s Joint Statistical Meetings. I can’t believe it’s already been a year since I was inspired to start blogging during the JSM in Miami!

If you’re keeping tabs on this year’s conference, there’s a fair amount of #JSM2012 activity on Twitter. Sadly, I haven’t seen any recent posts on The Statistics Forum, which blogged JSM so actively last year.

Yesterday’s Dilbert cartoon was also particularly fitting for the start of JSM, with its focus on big data 🙂

Continue reading “JSM 2012: Sunday”

U.S. Census Bureau releases API

The Census API, which was in the works for a while, was finally made publicly available yesterday (news release).

I’ve heard the DC dating scene is tough for single women… But especially for centenarians!

So far, two datasets are accessible:

  • the 2010 Census Summary File 1, providing counts down to the tract and block levels
  • the 2006-2010 American Community Survey five-year estimates, providing estimates down to the tract and block-group levels (but not all the way down to blocks)

The developers page provides more information and showcases a couple of the first few apps using the API so far, including one by Cornell’s Jan Vink (whose online poverty maps I’ve mentioned before).

For a handy list of the other government agencies with APIs and developers pages, check out the FCC’s developers page.

Polymath project: social problem-solving

Earlier this week, Argentina hosted the 53rd International Math Olympiad (IMO), a mathematical problem-solving contest for high school students from all over the world. That means it’s almost time for another “mini-polymath” project!

Edit: As of Friday morning (7/13/2012), the problem still has not been completely solved, so there’s time to chime in on the discussion thread!

For the past few years, mathematician Terry Tao has hosted and coordinated a social problem-solving event, where people around the world use a blog and wiki to work together on one of that year’s IMO problems. His 2009 post is a good introduction to the event and the spirit behind it. Personally, I had a blast trying to contribute (if only a tiny bit) to the 2010 event.

Dang, I almost had comment 42!

Tao will be hosting a fourth “mini-polymath” tonight (July 12, 2012), starting at UTC 22:00, which is 6pm EST for us here on the US East Coast. If you read blogs like mine, I imagine you’d enjoy participating, or at least following along and watching the mathematical ideas going off like fireworks 🙂

Continue reading “Polymath project: social problem-solving”

useR 2012: main conference braindump

I knew R was versatile, but DANG, people do a lot with it:

> > … I don’t think anyone actually believes that R is designed to make *everyone* happy. For me, R does about 99% of the things I need to do, but sadly, when I need to order a pizza, I still have to pick up the telephone. —Roger Peng

> There are several chains of pizzerias in the U.S. that provide for Internet-based ordering (e.g. www.papajohnsonline.com) so, with the Internet modules in R, it’s only a matter of time before you will have a pizza-ordering function available. —Doug Bates

Indeed, the GraphApp toolkit … provides one (for use in Sydney, Australia, we presume as that is where the GraphApp author hails from). —Brian Ripley

So, heads up: the following post is super long, given how much R was covered at the conference. Much of this is a “notes-to-self” braindump of topics I’d like to follow up with further. I’m writing up the invited talks, the presentation and poster sessions, and a few other notes. The conference program has links to all the abstracts, and the main website should collect most of the slides eventually.

Continue reading “useR 2012: main conference braindump”

Maps of changes in area boundaries, with R

Today a coworker needed some maps showing boundary changes. I used what I learned last week in the useR 2012 geospatial data course to make a few simple maps in R, overlaid on OpenStreetMap tiles. I’m posting my maps and my R code in case others find them useful.

A change in Census block-groups from 2000 to 2010, in Mobile, AL

Continue reading “Maps of changes in area boundaries, with R”

useR 2012: impressions, tutorials

First of all, useR 2012 (the 8th International R User Conference) was, hands down, the best-organized conference I’ve had the luck to attend. The session chairs kept everything moving on time, tactfully but sternly; the catering was delicious and varied; and Vanderbilt University’s leafy green campus and comfortable facilities were an excellent setting. Many thanks to Frank Harrell and the rest of Vanderbilt’s biostatistics department for hosting!

Plus there's a giant statue of bacon. What's not to love?

Continue reading “useR 2012: impressions, tutorials”

Pithy and pragmatic textbooks

I enjoy the rare statistics textbook that can take its subject with a grain of salt:

The practitioner has heard that the [random field] should be ergodic, since “this is what makes statistical inference possible,” but is not sure how to check this fact and proceeds anyway, feeling vaguely guilty of having perhaps overlooked something very important.
Geostatistics: Modeling Spatial Uncertainty, by Chilès and Delfiner.

It’s a familiar feeling!
As Chilès and Delfiner wryly suggest, we statisticians could often do a better job of writing for beginners or practitioners. We should not just state the assumptions needed by our tools, but also explain how sensitive results are to the assumptions, how to check these assumptions in practice, and what else to try if they’re not met.