Data Science 101: Agile Data Science with R, Shiny and a little magic from the R Forecast package

This is the first tutorial within the Data Science 101 series. In this tutorial we will focus on the delivery of agile data science solutions.

It is important to be agile, as a data scientist. You need to explore, interact with and design predictive models for the data you are given. More importantly you need to let your clients explore and interact with the models you are building. The faster you can get your models into the hands of your clients, the better – they need to provide feedback, ask questions, become comfortable with the models and have the confidence to make decisions using the models. In order to do this effectively, we need to provide a user friendly interface to our clients.

R is great. The R syntax is simple and intuitive but what really makes R great, is the numerous essential packages for data science. R has packages for most of the prominent machine learning and data mining algorithms – everything from support vector machines to gradient boosted trees. R code can be run from the command line or from with an IDE. This however is not appropriate for clients (our end users). It is not practical to provide the R source code to the user and have them manually view the output and change values. Anybody who has programmed will know how easy it is to break the code.

This is where Shiny comes to the rescue. Shiny makes it easy to build a web/mobile user interface for your predictive or exploratory models. Shiny allows R developers to build an interface without knowing any HTML. Shiny renders the ui in HTML with Twitter Bootstrap styles. The interface controls eg sliders, input text fields, etc can all be bound to variables in R code which is executed on an R server. Whenever the user changes a value in a ui control, your R code will get recalculated and the output that is bound to the ui (eg a graph) will be re-rendered. This is known as reactive programming and it eliminates the need to worry about event handling and data binding manually.

I am now going to illustrate the power of Shiny and R with the help of a little application. The folks at RStudio were kind enough to give me access to their beta Shiny server. I created an interactive timeseries forecasting application to test Shiny. The application uses the Forecast package to perform a little bit of magic (i.e. timeseries forecasting using automatic model selection for ETS and ARIMA models which I will cover in another blog post). In the Shiny Timeseries Forecasting application, the user can select a dataset, specify the forecast ahead time period in months and view a plot of ETS and ARIMA model forecasts with confidence levels. A timeseries decomposition plot is also shown.

The source code is available on github: https://github.com/aneesha/ShinyTimeseriesForecasting.
A demo is also available on the RStudio Shiny server: http://spark.rstudio.com/aneesha/tforecast/

tforecastwithshiny

I will now walk you through the code. Interactive Shiny applications only need two files: ui.r and server.r.

In the ui.r file, the interface is created using a series of nested functions, each which defines the layout or the widget to be added. In the example below, a page with a sidebar will be rendered. The header will contain a title. The sidebar will contain the controls to select a dataset and specify the forecast ahead time period. Each widget must have a name eg datasetvar is the name of the dropdown widget and we use this variable name to retrieve the selected value. We can also define placeholders for text, tables and plots that are output from R code. Within the mainpanel, tabs are added to display the ETS forecast, Arima forecast and the timeseries decomposition (e.g. plotOutput(“etsForecastPlot”)). Each of the plot is placed on a tab.

library(shiny)

# Define UI 
shinyUI(pageWithSidebar(
  
  # Application title
  headerPanel("Timeseries Forecasting"),
  
  # Sidebar with controls
  sidebarPanel(
    selectInput("datasetvar", "Variable:",
                list("Air Passengers" = "AirPassengers", 
                     "Australian total wine sales" = "wineind",
                     "Australian monthly gas production" = "gas")),
    numericInput("ahead", "Months to Forecast Ahead:", 12),
    
    submitButton("Update View")
  ),
  
  mainPanel(
    h3(textOutput("caption")),
    
    tabsetPanel(
      tabPanel("Exponential Smoothing (ETS) Forecast", plotOutput("etsForecastPlot")), 
      tabPanel("Arima Forecast", plotOutput("arimaForecastPlot")),
      tabPanel("Timeseries Decomposition", plotOutput("dcompPlot"))
    )
  )
))

In the server.r file, we are able to use the Shiny reactive programming model to retrieve variables define in ui.r (coming from widgets) and also update the output from any R code. We can render a new plot or output tabular data. In the example below, the Forecast package is used. The Forecast package is able to automatically select an ETS and Arima model for a provided dataset. We use this functionality to find the model of best fit and then produce a forecast plot for the specified future time frame (in months).

library(shiny)
library(datasets)
library(forecast)

shinyServer(function(input, output) {
  
  getDataset <- reactive({
    if (input$datasetvar=="AirPassengers")
    {
      return(AirPassengers)
    }
    else if (input$datasetvar=="gas")
    {
      return(gas)
    }
    else
    {
      return(wineind)
    }
  })
  
  output$caption <- renderText({
    paste("Dataset: ", input$datasetvar)
  })
  
  output$dcompPlot <- renderPlot({
    ds_ts <- ts(getDataset(), frequency=12)
    f <- decompose(ds_ts)
    plot(f)
  })
  
  output$arimaForecastPlot <- renderPlot({
    fit <- auto.arima(getDataset())
    plot(forecast(fit, h=input$ahead))
  })
  
  output$etsForecastPlot <- renderPlot({
    fit <- ets(getDataset())
    plot(forecast(fit, h=input$ahead))
  })
  
})

I hope this example has illustrated the value of Shiny as a tool to enhance your Agility as a Data Scientist. Shiny requires no knowledge of HTML. In a future post, the functionality will be enhanced to allow datasets to be uploaded and forecast models to be explored. I will also cover the Forecast Package in more detail.

Additional resources:

Leave a Reply