Wednesday, December 23, 2015

Cool things about the R Programming Language

As a BI Consultant, I've been hearing about the R programming language used in analytics but put off actually learning it.  Then I saw a book on R and decided the time had come to finally delve in.  Here are a couple of thoughts I would like to share.

Ah, It's a Scripting Language - Whew!

The first thing I discovered is that R is a scripting language like PowerShell, BASH, Perl, and Python to name a few.  This was a relief for me as I have done a lot of work with PowerShell so the semantics of R would be familiar.  Actually, as I learned about R, I became convinced that Microsoft got some of its PowerShell ideas from R.  For example, R has a command line interpreter called the CLI and an integrated scripting environment (ISE) just as PowerShell does. In fact, I found the R ISE called R Studio to be very similar to the PowerShell ISE.   As a scripting language, R is designed to be very interactive so working one line at a time can be very effective for some tasks whereas scripts are suited to repeatable automated work.

You can get the base R language CLI for Windows at https://cran.r-project.org/bin/windows/base/
RStudio for Windows can be downloaded from https://www.rstudio.com/products/rstudio/download/

 

All Variables are Arrays

The thing I find the most interesting about R is that all variables are arrays.  Ok, to be more correct, I should say object collections.  However, R distinguishes different types of collections.  A single dimensional array of a data type  is called a vector.  Even if only one value is stored, it is a one dimensional array.  A two dimensional array of the same data type is called a matrix, i.e. just a grid.  What R actually calls arrays are what most languages would consider arrays with three or more dimensions.  It is important to bear in mind, these are all classes, not simple data types, and we can check on the class name using the class function as shown below.

> myvect = 1:10
> 
> class(myvect)
[1] "integer"
 
The first line above creates a vector (single dimension array) named myvect and initializes it with with element values 1 through 10. The second line is asking for the variable type. Note: Like PowerShell, any variable on a line by itself will be displayed as shown below.

 
> myvect
 [1]  1  2  3  4  5  6  7  8  9 10
 

 

For analysis, we need complex types and R has them. A List is basically a vector that can hold multiple data types. A data frame is a list of lists, i.e. much like a record set or query result set. There are more powerful classes such as data.table available through R extensions called packages. More on that in another blog.



Operations Work Automatically on the Elements of the Arrays

A fascinating feature and probably one of the reasons R is such a powerful statistical analysis tool is that operations you perform on array variables automatically get applied to the array elements.  For example...

myvect * 2
 [1]  2  4  6  8 10 12 14 16 18 20
 
 
Above, by multiplying the vector variable by 2, every element is multiplied by 2. The same happens with other operations and we can use a function called lapply to have a custom function applied to the entire vector as shown below. 

> myfunct <- function (x)  { x / 2 }
> 
> lapply(myvect, myfunct)
[[1]]
[1] 0.5

[[2]]
[1] 1

[[3]]
[1] 1.5

[[4]]
[1] 2

[[5]]
[1] 2.5
 
 
Above is a partial listing.  Note:  Creating a function looks more like assigning a variable to the function code.  Above, the funcion myfunct is being called iteratively for each element in the vector myvect.  As you learn more about R, you realize that array processing, or again more correctly, collection processing, is at the heart of the language.  
 

Summary


This was just to give you a flavor of R with a couple of key take-a-ways.  One, if you are familiar with a scripting language, you have a jump on R programming.  Two, R is designed to work with object collections.  Cubes conceptually work with data in N dimensional arrays.  I think that conceptual approach serves well for R too.   

Sunday, December 13, 2015

In the Beginning:  An Introduction to Programming with PowerShell


This video is a basic introduction to computer programming using the PowerShell language.  I hope it is helpful.




Tuesday, May 19, 2015

Developing Workflows in PowerShell


Workflow Foundation is a .Net framework for defining complex processes that can span a long period of time, can be stopped and restarted where they left off, support parallel execution of tasks, and can be packaged into reusable components.  Until PowerShell 3.0, workflows could only be created in Visual Studio using C# or VB .Net.  PowerShell 3.0 added support for the creation and execution of workflows. PowerShell workflows can be executed locally or on remote machines, thereby running parallel processes across multiple machines dramatically improving performance and scalability.  In Azure, PowerShell workflows are called Runbooks and can be used for a variety of tasks.

Note:  See the bottom of this blog for information on a free presentation I am doing tomorrow night!



PowerShell makes implementing workflows intuitive but misleading.  The workflow keyword is used to define a workflow similar to defining a function but a workflow is actually quite different.  The reason is that the cmdlets in a workflow are actually translated into the Workflow Foundation language and submitted for processing.  PowerShell is not running the code, Workflow Foundation is.   

To try the code samples below, you will need to start the PowerShell ISE which can be done as follows.  Click the Start Menu, select All Programs, open the Accessories folder, then the Windows PowerShell folder and run either Windows PowerShell ISE or Windows PowerShell ISE (x86).  The x86 indicates this is the 32 bit version whereas the other is the 64 bit version.


Once in the ISE, you can paste the code samples into the script window and click on the execute button (the green arrow in the toolbar) in the toolbar as shown below to execute the samples.



Let'e try a simple example of a workflow. Consider the code below which defines a workflow called simple.


workflow simple ([string] $myparm)

{

   "Parameter is $myparm"

  

   Get-Date



   "Some activity"



   "Third activity"

}

The code above defines a workflow but to call it, we need to execute the workflow as shown below.

workflow 'test'

We should see the output to the console below.  



Parameter is test



Sunday, May 17, 2015 4:03:07 PM

Some activity

Third activity

Notice the first keyword above, workflow, which is where we would normally see the word  function.  That's all it takes to create a workflow in PowerShell.  

However, behind the scenes there is more going on.  In an effort to make it easy for PowerShell developers to migrate to workflows, Microsoft added a cmdlet translator to PowerShell which is invoked via the workflow keyword.  Anything contained in the workflow code block is submitted to a translator which converts the code into the workflow language and submits it to the workflow engine to be processed.  This means there are some differences between what is supported in functions versus workflows.  

Let’s try some code that breaks the workflow engine to see this.  Consider the code below.

workflow simplebroken ([string] $myparm)

{

   Write-Host "Parameter is $myparm"

  

   $object = New-Object PSObject



   $object | Add-Member -MemberType NoteProperty -Name MyProperty -Value 'something'



   $object.MyProperty

}



simplebroken 'test' # Runs the workflow

When we run the code above, we get a number of error messages.  Why?  The answer is that only a subset of PowerShell cmdlets are mapped to workflow equivalents and Write-Host is not one of them.  Another problem is that each statement in a workflow runs as a separate process, called an activity, so we cannot define a complex object in one line, and access it from another.  Interestingly, simple variables such as strings can be accessed by multiple lines.  In fact, the workflow can be suspended and resumed and the variable values will be restored when the workflow restarts.  Complex objects cause problems because they can not readily be serialized to disk and restored.

There is a workaround for the lack of complex object support across activities.  We can use the workflow command inlinescript which will run a scriptblock as one activity.  Therefore, the lines in the script block can see any objects created in the script.  This is shown in the code below.

workflow simpleinline ([string] $myparm)

{
  inlinescript
  {
     Write-Verbose "Parameter is $Using:myparm"
  
     $object = New-Object PSObject

     $object | Add-Member -MemberType NoteProperty -Name MyProperty -Value 'something'

     $object.MyProperty
  }
}

simpleinline 'test' –Verbose  # Calls the workflow

Above, an object is created and a property added to it which is displayed.  Since it is all in the scriptblock, there is no problem.  Also notice, Write-Verbose is supported by workflows.

Now that we have a feel for workflows, let's look at code that uses the three workflow specific commands other than inlinescript. These are sequence, parallel, and foreach -parallel. 

workflow paralleltest {

 sequence
 {
  "1"
  "2"
  "3"
 }

 parallel
 {

   for($i=1; $i -le 100; $i++){ "a" }

   for($i=1; $i -le 100; $i++){ "b" } 

   for($i=1; $i -le 100; $i++){ "c" }

 }

  $collection = "one","two","three"

  foreach -parallel ($item in $collection)
  {
   "Length is: " + $item.Length
  }

}

paralleltest  # This statement calls the workflow

The first workflow command above is sequence which instructs the workflow engine to run the statements in between the braces sequentially.  This is followed by the parallel statement that tells the workflow engine to run each statement in between the braces in parallel, i.e. concurrently.  Lastly, we see the foreach -parallel statement which iterates over a collection running the code in between the braces in parallel.  This would most likely be used to iterate over a list of machine names submitting code to run remotely on those machines.  

The possibilities with workflows are endless and in a future blog we will discuss how to suspend a workflow for any period of time and resume it having it picked up where it left off.  Yes. Workflows can retain state and we'll learn all about that next time.

Tomorrow night, Wednesday, May 20, at 6:30 PM, I will be presenting a free in-depth presentation on PowerShell Workflows entitled Workflows for the Rest of Us.  You can reserve a seat at the link below.
http://www.meetup.com/The-RI-Microsoft-BIUG/events/222490566/

Just click on the 'Join Us' link and RSVP.  The presentation is free as is pizza and soda and there will be a number of items and swag given out.  Our group also does a lot of free webinars so even if you can't make this meeting, please join so you will be kept up to date.

Thanks,

Bryan