This is a newbie experiment with R. My experience with statistics is from way back via Fortran and SAS. I’ve had a quick play and taster through Incanter some months back (‘R like’, Clojure based i.e. Lisp’y). Someone recently said to me, we use R. And that is enough reason for me to experiment.
Actually the experiment taught me a little more in the broader sense. R is the big thing in open source stats. In Stats alone. Its facilities are immense. And various language interfaces are available. Specifically there is also JRI, a Java/R Interface. This provides scope for Groovy + R on JavaSE and EE. I shall try that another time. I was also suitably impressed by an alternative option (not R), the NCAR Command language (NCL) and PyNGL / PyNIO which provide Python interfaces. Some histograms here. A fuller 2D scientific visualisation gallery is here
For now, this experiment just happens to be R via Python. Rpy2 to be exact.
My base example is this c/o Rowntree’s Statistics without Tears, page 47 to be exact.
and this is what I created, a faithful reproduction using Rpy2 i.e Python’s new (step up from Rpy) interface to R.
Now I think doing this was easy. General steps
- Downloaded R (2.11.0)
- Had an interactive play, but not familiar with the R syntax
- Read about Rpy, and liked the higher level Python abstractions via Rpy2
- I already had Python 2.6.3 so I grabbed the bits I needed
- Rpy2 (2.0.8)
- Numpy 1.5.0b2
- Pywin32 (so I could play on Windows)
There is a GOTCHA. If like me you occasionally Python, then you probably use Idle (Python’s equivalent to Groovy’s groovyconsole). DON’T. Yes, I didn’t grasp that Pythonwin – Python IDE and GUI Framework for Windows will be needed for an R graphics device to actually run and not just start but lock. I almost gave up, but I knew it was just ignorance on my part.
Here’s the code, and like the Incanter / Clojure code I wrote, so terse!
import rpy2.robjects as robjects
# hate commas (Clojure +1)
x = [89, 68, 92, 74, 76, 65, 77, 83, 75, 87, 85, 64, 79, 77, 96, 80, 70,
85, 80, 80, 82, 81, 86, 71, 90, 87, 71, 72, 62, 78, 77, 90, 83, 81,
73, 80, 78, 81, 81, 75, 82, 88, 79, 79, 94, 82, 66, 78, 74, 72]
assert len(x) == 50
# at >>> command prompt to see the data
#x.sort()
#x
xI = robjects.IntVector(x) # pass to R
# Breaks could also be passed like this, but how to pass 2nd param into 'hist' string? - my Python failed!
# bins = robjects.r.seq(55,100,by=5)
robjects.r('hist(%s, main="Pulse rates of 50 students", xlab="Pulse rate (beats per minute)", \
ylab="Frequency (number of students in each grp)", include.lowest="TRUE", \
col="blue", breaks=seq(60,100,by=4.9))' % xI.r_repr())
Conclusions: I think Clojure is the better language! But R itself is where the meat is. Incanter will have some serious work ahead of it to make some inroads here. It seems like a worthy cause though. The Python API by comparison feels a little like an inferior ‘text’ passing exercise! But to be able to access all that R offers and from a fully fledged programming language that offers much of the best in scientific visualisation and more, but with fast and easy coding – it is also a winner for those who are Science bods first but IT geeks second. Me, I’m the other way around, my Earth Science background plays second fiddle to my Computing Science, but I love both :-)




