In our first R blog post, Everything You Need to Know About R, we explained what R is and why it is widely used in advanced analytical practices. Indeed, R was ranked the #1 software for data science in according to Rexer Analytics’ survey of analytics tools, the industry benchmark for the state of the industry in analytics. This blog post will continue exploring R by showing how to run sample code written in R.

Environment

Selecting the right tool for the job is an important decision. R is available for download from the R Project homepage and comes with a simple console designed to type commands in the prompt. This environment is much faster and more versatile than a (GUI), but for heavy-duty programming, a more sophisticated development environment is highly recommended.

Fortunately, there is a wide range of Integrated Development Environments (IDEs) that either support R or were built for it. Some free and open source IDEs, such as RStudio, StatET, and ESS (Emacs Speaks Statistics), are popular in the R community. Another option is the recently announced Microsoft R Open, which is an open source enhanced distribution of R. It offers faster performance with multi-threading (useful for highly mathematical operations) and the for reproducibility.

Packages

R functions and datasets are stored as packages. Basic functions that allow R to work, analyze datasets and apply standard statistical and graphical functions are all included in the R environment out of the box. Additionally, there are over 7,000 freely available packages created and shared on the CRAN. These community-developed external libraries provide tens of thousands of functions that make it possible to accomplish significant analysis with minimal code authoring.

Sample Code

For this blog post, I will use sample code available from the R Programing Language Wikipedia page. The code calculates the Mandelbrot set, one of the most famous fractal patterns, through the first 20 iterations of the equation z = z2 + c (starting with z = 0) plotted for different complex constants (c). The code uses the caTools package and creates an animated image (Mandelbrot.gif) displaying how the Mandelbrot set evolves from one iteration to the next.

Piraeus Consulting - R Code

 

Running the Code

I will use the RStudio IDE to run the Mandelbrot set code. Note that RStudio requires an existing installation of R in order to work. On initial startup, the RStudio IDE is divided into four distinct areas or panes, illustrated in the screenshot below:

Piraeus Consulting - Running R

1. Console – This is where you interactively run R commands and see the output.

2. Source Editor – Place for writing R scripts.

3. Environment – Workspace to view objects in the global environment – includes data viewer for inspecting datasets.

– History – Searchable command history.

4. Files – Pane for interacting with files.

– Plots – Dedicated plots pane.

– Packages – Dedicated R package manager.

– Help – Integrated R .

– Viewer – Pane for displaying local web content.

When RStudio initially opens, it connects to a default (working) directory. This is important because the output of the code execution will save the Mandelbrot.gif file in the default directory. The directory can be checked before running the code by typing getwd() into the console and pressing Enter.

With RStudio open, all you need to do is copy the Mandelbrot set code into the Console, make sure that the cursor is positioned at the end of last line of code, and press the Enter key to execute the code. Note that the sample code uses the caTools package which will be installed and activated in the first two lines of the code:

Piraeus Consulting - R code 1

The output of the code is written to the Mandelbrot.gif file, which you can examine by double clicking in in the Files pane.

Piraeus Consulting - Files Pane

 

Piraeus Consulting - GIF

 

I hope you liked the result. If you decide to adjust any parameters (e.g., change delay to 100), you can modify the code and execute it again in the same session. If you stay in the same session, you should the first two lines that install and activate the caTools package, but if you close the console window or the IDE, you will need to activate any external packages installed from prior.

R is a powerful tool that includes virtually every data manipulation, statistical model, and chart that the modern data scientist could ever need. You can easily find, download, and use cutting edge community-reviewed methods in statistics and predictive modeling from leading researchers in data science- all free of charge! In addition, R’s graphics capabilities are very sophisticated in representing complex data and have been featured in many of the infographics seen in the New York Times, The Economist, and the FlowingData blog.

I hope this blog post has inspired you to start exploring R and that you will check back to read our future posts about R packages, R visualization, and R integration with SQL 2016.

By Aga Przysucha | Senior Consultant, Development