I’ve seen R users swooning over the magrittr
package for a while now, but I couldn’t make heads or tails of all these scary %>%
symbols. Finally I had time for a closer look, and it seems potentially handy indeed. Here’s the idea and a simple toy example.
So, it can be confusing and messy to write (and read) functions from the inside out. This is especially true when functions take multiple arguments. Instead, magrittr
lets you write (and read) functions from left to right.
Say you need to compute the LogSumExp function , and you’d like your code to specify the logarithm base explicitly.
In base R, you might write
log(sum(exp(MyData)), exp(1))
But this is a bit of a mess to read. It takes a lot of parentheses-matching to see that the exp(1)
is an argument to log
and not to one of the other functions.
Instead, with magrittr
, you program from left to right:
MyData %>% exp %>% sum %>% log(exp(1))
The pipe operator %>%
takes output from the left and uses it as the first argument of input on the right. Now it’s very clear that the exp(1)
is an argument to log
.
There’s a lot more you can do with magrittr
, but code with fewer nested parentheses is already a good selling point for me.
Apart from cleaning up your nested functions, this approach to programming might be helpful if you write a lot of JavaScript code, for example if you make D3.js visualizations. R’s magrittr
pipe is similar in spirit to JavaScript’s method chaining, so it might make context-switching a little easier.
> “It takes a lot of parentheses-matching to see that the exp(1) is an argument to log and not to one of the other functions.”
This is why you should explicitly name function arguments when they are ambiguous. E.g.
log(sum(exp(MyData)), base=exp(1) )
Agreed. I usually do, and that’d help a lot in this toy example.
Still, I can imagine cases with more nested functions, with similarly- or ambiguously-named 2nd arguments (y, n, etc) where magrittr style would make a bigger difference to readability.
I find that the %% syntax also helps reduce a lot of redundant typing and makes things easier to follow, too. It pipes the left side as the first argument to the right, then reassigns the result to the variable on the left. So
some_string_date_var %% as.Date(format=”%Y%m%d”)
and now some_string_date is a Date variable – no need to repeat the variable name and risk extra typoes 🙂
The parser ate my pipes! It’s %<>%
Oh, that is very handy indeed. I’ll have to try it out.
Nothing convinced me yet about this pipe but world pushes me to use it… 🙁 I suppose I have to.
Who is pushing you to use it? Are you collaborating on code with others who use it? Or do work somewhere with a strict style guide?
I’m just curious how coding standards change, since I haven’t had this happen to me much.
Hello civilstat: Indeed, no one is pushing me but it is some kind of trend, you know. More and more people are using it and we have to adapt to it. Now is not so nice to code log(sum(exp(MyData)), exp(1)) but MyData %>% exp %>% sum %>% log(exp(1)).
For those who would like to pipe but aren’t completely happy with magrittr, there’s a competing package “pipeR,” which I find a bit easier to pass multiple outputs from one stage to the next.