Monday, February 20, 2017


Visual Studio versus R Studio


SQL Server 2016 introduced support for the R language and an integrated R server.  For the first time, many Microsoft developers are looking at R programming.  One of the first questions raised is “which integrated development environment (IDE) should I use?”  Many will be relieved to learn that there is a Visual Studio extension called R Tools for Visual Studio (RTVS) that supports the R language.  However, there is another popular IDE called RStudio.  Both products are free and open source.  This leaves the question, which one should I use? Which is better? Better is a relative term since it begs the question, for what?  In this article, we will discuss the strengths and weakness of each.  Where possible, the evaluation is objective but I feel I would be remiss if I did not offer my assessment as well.  Bear in mind, both platforms are under constant change so the information here will become dated. 

R Tools for Visual Studio


The screen shot below shows an R project open in R Tools for Visual Studio (RTVS).  We can see that this is a robust tool with many features.  Note: Integrated visualizations is a new concept to Visual Studio.  The left panel shows an R script in the editor.  Multiple scripts can be edited simultaneously.  The upper middle window shows a new feature, the SQL query editor where queries can be created and executed.  Below this is the output window which shows the SQL query results.  To the right, we see the plot windows.  The plot scrolling list on the right margin supports browsing plot history.  Note:  There are more optional tabs and windows that are not displayed. 

R Tools for Visual Studio Screen















RStudio

The screen shot below shows a typical RStudio session.  The upper left quadrant is where scripts are edited.  Each tab is a separate script. Below this is the R console where the scripts execute and output is displayed. The upper right quadrant show the workspace, which shows all objects in memory.  The hidden History tab stores a list of all the R statements executed in the session.  Below this is the Plots tab which shows the most recent visualization.  The Files tab (hidden) shows a list of files in the current folder.  The Packages tab (hidden) shows a list of all installed packages.  The Help window provides some quick links to documentation and is where requests for help are displayed.  Note:  All the features shown are also available in RTVS but the settings used for that screen shot did not display them all. 

RStudio Screen





 

Decision Summary


Visual Studio with RTVS and RStudio are competitive in terms of features thanks to Microsoft’s quick enhancements in recent months. However, as of this writing, RTVS is still not considered a production release.  Therefore, the decision of which IDE to use needs to be made with an eye towards the future.  However, Microsoft’s first preview of RTVS was in March 2016 and by October of the same year had five releases each with substantial improvements.  With that in mind, differences tend to focus on the strengths of the platform, i.e. RTVS has extra features specific to Windows, .Net, and multiple languages while RStudio supports features that favor RStudio products such as integrated deployment of Shiny apps to Shinyapps.IO server and RMarkdown documents to RPubs which are both owned by RStudio.  If you are a developer familiar with Visual Studio using the Microsoft stack or do work with multiple languages, RTVS is probably a good choice.  Microsoft is likely to continue adding functionality to integrate RTVS with tools like SQL Server, .Net, Office, SharePoint, and Azure.  Visual Studio’s powerful development and deployment features together with continuously improving R support, makes this a good choice overall.  However, if you are a data scientist or a developer focused solely on R, RStudio may be the better choice.  RStudio’s interface is simpler and more intuitive which is partly because everything is focused on R.  RStudio has outstanding support for document publishing to virtually any output format.  Creating a slide presentation, an article, a book, or even a web application is simple and immediately viewable within RStudio.  Many, if not most, books on the R language were completely written and published with RStudio.  With a click of a button, you can deploy your files to a free web server as long as the resource demands fall within limits.    If you are still not sure which IDE is best for you, read on to get more details that may help you make your decision.

 Features


RStudio and RTVS are both robust products with many features.  To discuss these, we’ll start out with a list of features that each product supports.  Below is a table listing key product features.  

 


RTVS feels like a professional development platform as one would expect.  The extensive tools for performance testing, environment analysis, deployment, etc. betray the fact that this is a development platform first, data science tool second.  For example, debugging is well supported and Microsoft has committed to making debugging a first-class feature.  Many languages including Python, C, C++, C#, and Java are supported making Visual Studio a powerful yet complex tool.  RStudio comes from the data science world where publishing your work is critical. It has superb and easy to use publishing features that developers often overlook.  With a few mouse clicks (and a bit of text formatting), your R script becomes a slide show.  The company RStudio has authored many popular R packages including dplyr, ggplot2, and shiny.   RStudio quickly implements support for new data science related features such as R Notebooks.  It seems likely that integrated support for RStudio packages will come first to RStudio.   Overall, RStudio does a better job supporting the iterative data exploration required by data science. 


Features Specific to RStudio

Feature
Manipulate package that supports interactive widgets like drop downs and sliders to manipulate plots interactively.
Excellent documentation.  RTVS still has minimal documentation which made even doing this comparison a challenge.
Is in production release status.  RTVS is not at the time of this writing. 
Documentation and training materials – excellent and extensive
Easy click to deploy R Markdown or Shiny apps to free web server.
Support for R Presentations which tie several publishing features together for ease of use.
Wizard to import data from completive format such as SAS and SPSS.
Wizard to locate and install new packages.
32 and 64 bit versions of R supported.  RTVS only supports 64 bit.
Support for multiple platforms: Linux, Apple OS, Windows, OSX.  RTVS is available on where Visual Studio can run which admittedly is expanding.


Pros to RTVS

In general R Studio has more features than RTVS.  Microsoft admits this on their RTVS download site at: http://microsoft.github.io/RTVS-docs/.  Specifically, they said “RStudio is a fantastic and mature IDE for R that’s been under development for years. RTVS is a long way from RStudio, because we’ve only been developing it since July, 2015. We hope to have all the critical features that you need to be successful this summer.   It’s a fair point.  Considering the short time in which they developed RTVS, it is amazing it supports so many features.  No doubt, Microsoft will continue to improve RTVS but RStudio is also improving their product. 

Beyond raw features, there are some good reasons to prefer Visual Studio.  A list of some of these are:
  • Already trained in Visual Studio.
  • One IDE for multiple platforms and languages.
  • Expectation of ongoing improvements in support for SQL Server R Integration features.
  • Helpful if you need to integrate R programs with other languages such as Python or C#.
  • Commercial product support.
Developers are not always aware of a difference between data science and programming.  Data science is an iterative trial and error process of analysis, research, data wrangling, and experimentation.  Programming is the process of taking a set of requirements and automating them.  RStudio has better support for the prior while Visual Studio is designed for the later.  In other words, the data scientist will analyze data to build a model which has business value but is not necessarily scalable nor generalized.  To enable the enterprise to get the value from this work, it needs to be operationalized, i.e. cleaned up, redesigned for performance, perhaps moved into SQL Server, and automated perhaps via a job scheduler.  In the long term, Visual Studio may be the better choice for this operationalization.  Currently, I think both IDEs can be used with SQL Server/R integration quite well.

Pros of R Studio

R Studio has all the features of RTVS in terms of the R language but is limited to the R language only.  If you want to program in multiple languages in the same IDE, RTVS is the only choice.  However, the focus on R allows R Studio to fully support the language in a very intuitive way.  In fact, the intangible aspect of intuitiveness is the best reason to choose RStudio.  This is where my opinion comes into play so others may disagree.  I find RStudio’s features are right where you expect them to be, out on top and easy to find.  Just click the Preview button to view your Markdown document as a slide show and then publish to the server with a single button click.    Within minutes of installing RStudio I could fully use the IDE whereas I found RTVS less intuitive.  Many features are not obvious and it took time to figure out how to do things and there is scant documentation.  RStudio was designed to do exactly what it does, support data science.  Visual Studio was not designed for this but tools were added in.  An example of this came when I tried to use a document type called an R Notebook.  This is a special interactive format where output is rendered in stream right in the editor. It was inspired by the Jupyter Notebook project.  To get an idea of what I mean see the screen shot below.

R Notebook


The above script is in a special format called R Markdown, hence the Rmd file extension.  The code between the ```r and ``` is called a code chunk and its output is rendered within the editor.  This creates a documentation flow very useful to data scientists and teachers, i.e. they can show their work in real time.  The code chunk can be edited and re-executed causing the plot to refresh.  RTVS does not support R Notebooks.

Beyond the notebook functionality, R Studio has amazing publication support.  An R markdown script can be presented as a slide show, paginated, or flowing text, and converted into many formats including Word, PDF, and HTML.  Many books on the R language were completely written and published from RStudio.  To give an idea of how extensive publishing support is in RStudio, consider the screen show below.  We can see new files of many types supported including R Presentation, R HTML, R Markdown, and R Sweave, which are all aimed as publishing. 




Publishing
If we select R Markdown from the above list, we are presented with several options.  We can choose HTML, PDF, Word (which RTVS also supports) but we also can choose Presentation and Shiny which gives us more options.  


If we choose R Presentation, we can further choose among several options as we can see in the screen shot below.  This is very handy when you need to create a slide show that incorporates dynamic code execution. 


RStudio even supports creating a file from a template as shown below. 


The support for document publishing is important but I think less appreciated by developers than data scientists.  For teaching, it is invaluable.  I no longer need to create a PowerPoint slide show from my R code.  Being able to maintain it all in one file is a godsend to a speaker.  This is also useful to prepare a presentation to managers and colleagues explaining how you arrived as your data analysis and conclusions.  For research, it means you can send your R project directly to professional journals, in fact most expect this format.

Shiny

Programmers like shiny things hence the name Shiny for R Studio’s interactive web application support for R.  Think of it as server side R scripting pages much as C# supports Active Server Pages (ASP) and Java support Java Server Pages (JSP).  When you run code than include the Shiny package, R Studio launches an R web service to support interactive R applications.  Shiny is an extensive framework with many functions to support interactive widgets.  You can develop and test your Shiny application on your machine and deploy it to a cloud based Shiny server to make it available for others to use.  For a modest Shiny application, the Shiny server is free but you can pay for commercial scale support if desired. 
Both RTVS and RStudio support Shiny (R web pages) but I think RStudio has better support. First, it has better integration. Second, it has wizards to easily deploy your app to a free web server.  Third, RStudio can render Shiny apps within the IDE.  The screen show below is an example of a Shiny app.

Conclusion

R Studio and RTVS are moving targets and you will need to monitor the progress of each as time goes on.  This article covers highlights of the tools but is not exhaustive.  RTVS is rapidly expanding its features but I think the main point to consider is the direction of those features.  Clearly, integration with the Microsoft stack and Azure will be a high priority.  Features to support operationalizing R programs are on the horizon and Microsoft is at the forefront with identifying and supporting this need.  However, if you are not on board with the Microsoft ecosystem, support for other tools such as Amazon Web Services, Shiny.IO, Shiny Server, Oracle, Jupyter Notebooks (versus Azure ML notebooks), etc.  may be slow in coming.  Visual Studio does support open source products such as MySQL, PostgreSQL, Hadoop, Spark, and Python. RStudio is likely to continue focusing on data science features as it has been doing since its founding.  Better support for interactive visualizations, dynamic code, and data wrangling, can be expected.

No comments:

Post a Comment