In my quest to replace Excel with R I’ve been spending the past week trying to do everything in R. It hasn’t been that easy with many things taking longer due to me having to reference the R docs but one thing that’s been great so far is being able to quickly run a query on Amazon’s RDS and pull data into a data frame for quick analysis. Being able to wrap this into a reusable function makes things even better. The one thing that makes it tricky was not being able to connect to RDS directly but having to tunnel through an EC2 instance. Below are the steps to replicate the setup.

In your shell, run the following command to set up the SSH tunnel:

ssh -L <local port>:<rds host>:<rds port><ec2 user>@<ec2 instance>

Now in R:

install.packages(RMySQL) # Install the R MySQL library
library(RMySQL) # Load the library
m<-dbDriver("MySQL") # Load the driver
con<-dbConnect(m,user='username',password='pass',host='127.0.0.1',dbname='db',port=3307) # Connect to the local instance
res<-dbSendQuery(con, 'select * from table') # Execute the query
data <- fetch(res, n = -1) # Load the retrieved data into a data frame
dbClearResult(dbListResults(con)[[1]]) # Use this to free the connection

Read more!