6.27 KB
Newer Older
Your Name's avatar
Your Name committed
# Why SQL Server R Services?
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

* This will cover the traditional data scientist/DBA impedence mismatch and how the product solves this.  

* History of R, SQL Server, RevoR, and integration

c.	Security 

d.	Performance

2.	Basic Data Science terms/overview

a.	Supervised/unsupervised

b.	Features, labels, feature engineering, overfitting etc

3.	Installation recipes

a.	this is a bit tricky so let’s have a chapter that reviews it. 

b.	Let’s do some pre-installation planning.  You probably don’t want to install this on your prod reporting server or OLTP instance

c.	Some quick query recipes to ensure everything is working.  

4.	Basic querying recipes

a.	We show the template recipe that the DBA needs to use

b.	We show the data scientist what the data contract recipes need to be

c.	We show recipes for using inputs/outputs, R specific code, data type conversion recipes

5.	Software Development Lifecycle Recipes

a.	Recipe for how a data scientist should begin their data science journey

b.	R code that needs to change under RevoR

c.	Recipe for how to begin integrating directly with a stored procedure

d.	Recipe for how to continue to do using iterative CRISP-DM

6.	Performance

a.	When to use R and when to use SQL

b.	DBA features around performance engineering

7.	Real-world recipes

a.	Taking a business problem and working it from start to finish using R client/RStudio, then R server/RStudio, then finally SQL Server R services.  

Your Name's avatar
Your Name committed
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128
b.	We’ll monitor performance at each step of the way

#  More Detail
Chapter 1: Why SQL Server R Services - 10 pages
This first chapter introduces the history and reasons why you should understand SQL and R integration.  We’ll discuss some of the problems this solves,  
Topics covered
1.	What is R
2.	The impedance mismatch -- data scientists and data professionals. 
3.	History of RevoR and SQL Server
4.	Why this may initially scare a DBA, and why it should NOT (security, performance)
5.	What if you don’t have a data scientist?
Skills learned
This is an overview chapter.  This should whet the appetite of both parties to want to learn more.  
Chapter 2: Basic Data Science Terminology - 10 pages
The second chapter will be mostly interesting to data professionals.  Data scientists can be advised to skip over this chapter (maybe by giving a little up-front quiz) if they want.  We want to cover those terms that are scary and misunderstood by non-data scientists…supervised/unsupervised learning, features, labels, over/underfitting, etc.  This will be extremely over-generalized.  We don’t want to scare off the DPs, we just want them to be able to understand the jargon a little better.  
Topics covered
1.	Different types of models/algorithms and when to choose each – (un)supervised
2.	Labels, features/factors
3.	Other terms
Skills learned
Basic terminology and level setting.  This can be thought of as another intro chapter with no specific skills being learned.  
Chapter 3: Installing SQL Server R Services - 20 pages
We cover all of the topics and issues around preparing for, and installing, SQL Server R Services.  Finally we’ll verify the installation by running some basic stack queries to ensure everything is working.  We’ll do this using a free Azure account to get started quickly, but this can be done using a laptop or an existing on-prem server.  

This is geared more toward the DP than the DS, but DSs will probably find this interesting as well, especially the last portion where we test the installation.  

We’ll also provide a file with sample data that can be loaded and used for the remaining chapters and their recipes.  Finally we’ll also install RStudio so we have the ability to have a working R environment familiar to data scientists.  
Topics covered
1.	Prerequisites and licensing
2.	Security and Performance Considerations
3.	Installation using an existing server (we’ll demo Azure)
4.	Some quick queries to test everything is working
5.	Install a set of files and data for future recipes.  
6.	Install RStudio
Skills learned
Installing the product and running basic queries to ensure everything is working correctly.  
Chapter 4: Traditional Data Access with R – 30 pages
The objective is to show how a data scientist traditionally accesses and works with data.  We’ll start with loading data from a csv into a DataFrame, which is the most common use case.  We’ll briefly talk about the scalability limits of doing this.  Next we’ll connect to the R Server and change the compute context to demonstrate how we can achieve scalability using the ScaleR components.  Finally we’ll load our sample data into a SQL table and show how we can achieve even better scalabilty by using R where the data lives, without marshalling it to a csv first.  

This chapter is equally relevant to both DSs and DPs.  Both will find it illuminating.  
Topics covered
1.	Use RStudio to analyze a csv
2.	Change the execution context to use the R Server libraries for remote execution
3.	Demonstrate rODBC which is the traditional method to marshall data back to R
4.	Load and query the data using SQL and R natively.  
Skills learned
R data connectivity using various methods
Chapter 4:  Iterative Solution Development – Predicting Loan Charge-offs – 40 pages
In this chapter we’ll put everything together that we’ve learned so far.  We’ll start with a real-world business problem and real data.  We’ll start by querying it locally using RStudio.  When we have the data and R code in a somewhat good state we’ll work on migrating the model to SQL R Services and finally create a stored procedure that can either do batch classification or a single prediction.  

This chapter is equally relevant to both DSs and DPs.  
Topics covered
1.	Use RStudio to analyze a csv and develop basic R code
2.	Load the csv to SQL and have a DP create a basic view to perform data wrangling
3.	The DS uses the new view to access the data and continue model development
4.	When the model is developed the DS works with the DP to operationalize the code in 2 stored procedures.  
Skills learned
The data science process, iterative development, solution lifecycle management.