This post will demonstrate how to connect R to Microsoft SQL Server, so that data can be extracted directly from a database by using SQL-statements. The approach described in this post is supported by both SQL Server 2012 and the upcoming SQL Server “14”. You can connect to SQL Server using different techniques – one of them is by using ODBC. This post will use ODBC.
The first time you need to connect to a database, you need to perform some one-time tasks, which are:
- Create a ODBC DSN data source
- Install necessary R-packages from CRAN
The screenshot below shows a table containing the well-known data set ‘weather.nomnial’. The table is part of a database named ‘MiningDataSets’. The goal of this tutorial is to load all this data into R.
- Open “Administrative Tools” and “ODBC Data Sources (32 bit)”
- On the tab “User DSN” press “Add”
- Select “SQL Server” in the provider list
- Now give the data source a name and a description. Remember this name – you will need it later on in the process. Last provide the name of the server where the SQL Server is installed. Press “Next”.
- Select the way in which you will authenticate against the SQL Server. In this case we use Integrated Security. Press “Next”.
- You now have the possibility to select the default database for the data source. Here we choose to point at the ‘MiningDataSets’. Press “Next” and the “Finish”.
- On the last page remember to press the “Test Connection” button and ensure that the connection could be established.
- The User DSN is now created and active.
Install and load RODBC
Support for SQL Server is not possible using native R – therefore we have to install the RODBC package from CRAN.
- Open up R and in the console window type: install.packages(“RODBC”)
- The RODBC packages is now downloaded and installed on your system. Next step is to load the package into R so the functions of the package can be used. In the console window type: library(“RODBC”)
- The RODBC package is now loaded and ready
Connect with R
Now it is time to connect to the SQL Server database from R and retrieve the nominal weather dataset.
- When calling the database you first have to open a connection to the database. And after you have performed you operations, you have to close the connection again. This is done by using the commands odbcConnect() and odbcClose(). The name specified as the parameter is the name of the ODBC user data source.
- First we want to read an entire table into R – this can be done by using the sqlFetch command.
- If you want to load the data into a data.frame, this can easily be done
- If you want to execute a specific SQL-query, this can be done using the sqlQuery-command.
You can use many more SQL-statements against the database by using different RODBC-functions. You can get more help about the RODBC-package by typing RShowDoc(“RODBC”, package=”RODBC”)
I hope this little tutorial helped you in connecting R to SQL Server 2012 / “14”.
If you are interested in using the data mining workbench WEKA together with SQL Server, the see my tutorial on that here.