EPPS Math Coding Camp

A Beginner’s Guide to Setting Up Your Data Science Environment

Author

Brennan Stout

Published

August 5, 2025

1. Introduction

Welcome to the world of data science! This guide will walk you through the process of setting up your data science environment using R and RStudio. By the end of this tutorial, you’ll have a fully functional setup ready for your data science journey.

2. Installing R

R is the programming language we’ll be using for data analysis. Let’s start by installing it on your system.

For Windows:

Go to the R Project website.
Click on “Download R for Windows”.
Click on “base”.
Click on the download link for the latest version of R.
Once downloaded, run the installer and follow the prompts.

For Mac:

Go to the R Project website.
Click on “Download R for macOS”.
Click on the .pkg file appropriate for your macOS version.
Once downloaded, open the .pkg file and follow the installation instructions.

3. Installing RStudio

RStudio is an Integrated Development Environment (IDE) that makes working with R much easier and more efficient.

For both Windows and Mac:

Go to the RStudio download page.
Under the “RStudio Desktop” section, click on “Download”.
Select the appropriate installer for your operating system.
Once downloaded, run the installer and follow the prompts in the wizard.

Important

Exercise 1: Open RStudio. In the console pane type version. What version of R did you install?

Important

Exercise 2: In the console pane (usually at the bottom-left), type 1 + 1 and press Enter. What result do you get?

4. Configuring RStudio

Let’s set up some basic configurations in RStudio to enhance your workflow.

In RStudio, go to Tools > Global Options.
Under the “General” tab:
- Uncheck “Restore .RData into workspace at startup”
- Set “Save workspace to .RData on exit” to “Never”
Under the “Code” tab:
- Check “Soft-wrap R source files”
Click “Apply” and then “OK”.

Important

Exercise 3: Create a new R script (File > New File > R Script). Type print("Hello, Data Science!") and run the code. What output do you see in the console?

5. Installing a Package Manager (pacman)

Pacman is a convenient package manager for R. Let’s install it and learn how to use it.

In the RStudio console, type:

Code

install.packages("pacman")

Once installed, you can load pacman via the library() function and use it to install and load other packages:

Code

library(pacman)
p_load(dplyr, ggplot2)

This installs (if necessary) and loads the dplyr and ggplot2 packages.

Important

Exercise 4: Use pacman to install and load the tidyr package. Then, use p_functions() to list all functions in the tidyr package.

Setting Up Your Working Directory

Setting up a proper working directory is crucial for organizing your projects.

In RStudio, go to Session > Set Working Directory > Choose Directory

This sets a location for all the files you create within the project.

Alternatively, you can set the working directory using code:

Code

setwd("/path/to/your/directory")
# Alternatively 
setwd("\\path\\to\\your\\directory")

Important

Exercise 5: Create a new folder on your computer called “DataScience”. Set this as your working directory in RStudio. Then, use getwd() to confirm it’s set correctly.

7. Essential R Commands and Packages

Let’s familiarize ourselves with some essential R commands and set up the main packages you’ll need for data science work.

7.1 Basic R Commands

Code

# Creating variables
t <- 1
x <- 5
y <- 10
z <- TRUE

# Basic arithmetic
(z <- x + y)

# Creating vectors
numbers <- c(1, 2, 3, 4, 5)
names <- c("Alice", "Bob", "Charlie")

# Creating a data frame
df <- data.frame(
  name = names,
  age = c(25, 30, 35)
)

# Viewing data
View(df)
head(df)
str(df)
summary(df)

# Indexing
numbers[2]  # Second element
df$name     # Name column

# Basic functions
mean(numbers)
sum(numbers)
length(numbers)

# Logical operators
x > y
x == y
x != y

# Control structures
if (x > y) {
  print("x is greater than y")
} else {
  print("x is not greater than y")
}

# Loops
for (i in 1:5) {
  print(i^2)
}

while (z){
  print(names[t])
  if (t == 3){
    z <- FALSE
  }
  t <- t + 1
}

# Creating a function
square <- function(x) {
  return(x^2)
}
square(4)

# Getting help
?mean

Important

Exercise 7: Create a variable containing a vector of 10 random numbers between 1 and 100 using the sample() function. Then, use the max() and min() functions to find the highest and lowest numbers in your vector.

Installing and Loading Essential Packages

Let’s install and load some of the most commonly used packages in data science:

Code

# Install and load essential packages
p_load(
  tidyverse,  # for data manipulation and visualization
  readxl,     # for reading Excel files
  lubridate,  # for working with dates
  ggplot2,    # for creating graphs
  caret,      # for machine learning
  rmarkdown,  # for creating dynamic documents
  shiny,      # for building interactive web apps
  plotly,     # for creating interactive plots
  knitr       # for dynamic report generation
)

Reading and Writing Data

Learning to read and write data is crucial for any data science project:

Code

# Writing data to CSV
write.csv(data, "employee_data.csv", row.names = FALSE)

# Reading data from CSV
read_data <- read.csv("employee_data.csv")

# Writing data to Excel (requires writexl package)
p_load(writexl)
write_xlsx(data, "employee_data.xlsx")

# Reading data from Excel
excel_data <- read_excel("employee_data.xlsx")

# Writing R objects to RDS (R's native format)
saveRDS(data, "employee_data.rds")

# Reading RDS files
rds_data <- readRDS("employee_data.rds")

Next Steps

Now that you have a solid foundation in R and have set up your environment with essential packages, you’re ready to start your data science journey! Here are some suggestions for next steps:

Practice data manipulation with larger datasets
Explore more advanced visualizations with ggplot2
Learn about statistical tests and their implementation in R
Start exploring machine learning with the caret package
Create your first R Markdown document to share your analysis

Remember, the key to mastering R and data science is consistent practice and curiosity. Don’t hesitate to explore the vast resources available online, including R documentation, tutorials, and community forums.

Principles of Programming

Programming may feel like a daunting thing to approach, but there are a few principles which may be helpful to keep in mind for breaking down problems and informing your coding decisions.

KISS (Keep It Simple, Stupid):
1. The goal of this principle is that you only want the bare minimum of complexity necessary.
DRY (Don’t Repeat Yourself):
1. This principle states, a piece of code should only be implemented in place.
YAGNI (You Aren’t Gonna Need It):
1. Don’t implement something until it is necessary.
SOLID:
1. Single Responsibility Principle (SRP)
2. Open/Closed Principle
3. Liskov’s Substitution Principle (LSP)
4. Interface Segregation Principle (ISP)
5. Dependency Inversion Principle (DIP)
Separation of Concerns (SOC):
1. Break complexity down in different sections to address parts of the problem.
Avoid Premature Optimization:
1. Optimization looks to speed up the code, but doing it early can obfuscate more issues.
Law of Demeter:
1. Generally a combination of previous principles.
  1. Each unit should have only limited knowledge about other units.
  2. Each unit should only talk to its friends.
  3. Only talk to your immediate friends.

RStudio & GitHub Copilot

AI can be an very useful tool for speeding up your programming, but while it is helpful if your are unsure about your programming skills it may be better to use AI sparingly. If you are interest in AI however, a good place to start would be with the GitHub Copilot which can be added into RStudio.

Navigate to Tools > Global Options > Copilot.
Check the box to “Enable GitHub Copilot”.
Download and install the Copilot Agent components.
Click the “Sign In” button.
In the “GitHub Copilot: Sign in” dialog, copy the Verification Code
Navigate to or click on the link to https://github.com/login/device, paste the Verification Code and click “Continue”.
GitHub will request the necessary permissions for GitHub Copilot. To approve these permissions, click “Authorize GitHub Copilot Plugin”.
After the permissions have been approved, your RStudio IDE will indicate the currently signed in user.
Close the Global Options dialogue, open a source file (.R, .py, .qmd, etc) and begin coding with Copilot!

Important

Once copilot is set up, please disable it. You can use it for the project and exercises if you get stuck, but try to avoid during class.

Conclusion

Congratulations! You’ve now set up your data science environment with R and RStudio, learned essential R commands, and gotten familiar with some of the most important packages in the R ecosystem. This foundation will serve you well as you continue your data science journey. Keep practicing, stay curious, and happy data sciencing!

Reference

GitHub Copilot

--- title: "EPPS Math Coding Camp" subtitle: "A Beginner's Guide to Setting Up Your Data Science Environment" author: "Brennan Stout" date: "August 5, 2025" format: html: toc: true toc-depth: 3 code-fold: show code-tools: true highlight-style: github editor: markdown: wrap: 72 --- ## 1. Introduction Welcome to the world of data science! This guide will walk you through the process of setting up your data science environment using R and RStudio. By the end of this tutorial, you'll have a fully functional setup ready for your data science journey. ## 2. Installing R R is the programming language we'll be using for data analysis. Let's start by installing it on your system. ### For Windows: 1. Go to the [R Project website](https://cran.r-project.org/). 2. Click on "Download R for Windows". 3. Click on "base". 4. Click on the download link for the latest version of R. 5. Once downloaded, run the installer and follow the prompts. ### For Mac: 1. Go to the [R Project website](https://cran.r-project.org/). 2. Click on "Download R for macOS". 3. Click on the `.pkg` file appropriate for your macOS version. 4. Once downloaded, open the `.pkg` file and follow the installation instructions.    ## 3. Installing RStudio RStudio is an Integrated Development Environment (IDE) that makes working with R much easier and more efficient. ### For both Windows and Mac: 1. Go to the [RStudio download page](https://www.rstudio.com/products/rstudio/download/). 2. Under the "RStudio Desktop" section, click on "Download". 3. Select the appropriate installer for your operating system. 4. Once downloaded, run the installer and follow the prompts in the wizard. ::: callout-important **Exercise 1:** Open RStudio. In the console pane type `version`. What version of R did you install? ::: ::: callout-important **Exercise 2:** In the console pane (usually at the bottom-left), type `1 + 1` and press Enter. What result do you get? ::: ## 4. Configuring RStudio Let's set up some basic configurations in RStudio to enhance your workflow. 1. In RStudio, go to Tools \> Global Options. 2. Under the "General" tab: - Uncheck "Restore .RData into workspace at startup" - Set "Save workspace to .RData on exit" to "Never" 3. Under the "Code" tab: - Check "Soft-wrap R source files" 4. Click "Apply" and then "OK". ::: callout-important **Exercise 3:** Create a new R script (File \> New File \> R Script). Type `print("Hello, Data Science!")` and run the code. What output do you see in the console? ::: ## 5. Installing a Package Manager (pacman) Pacman is a convenient package manager for R. Let's install it and learn how to use it. In the RStudio console, type: ```{r} #| eval: false install.packages("pacman") ``` Once installed, you can load pacman via the `library()` function and use it to install and load other packages: ```{r} #| eval: false library(pacman) p_load(dplyr, ggplot2) ``` This installs (if necessary) and loads the dplyr and ggplot2 packages. ::: callout-important **Exercise 4:** Use pacman to install and load the tidyr package. Then, use p_functions() to list all functions in the tidyr package. ::: ## Setting Up Your Working Directory Setting up a proper working directory is crucial for organizing your projects. - In RStudio, go to Session \> Set Working Directory \> Choose Directory This sets a location for all the files you create within the project. ### Alternatively, you can set the working directory using code: ```{r} #| eval: false setwd("/path/to/your/directory") # Alternatively setwd("\\path\\to\\your\\directory") ``` ::: callout-important **Exercise 5:** Create a new folder on your computer called "DataScience". Set this as your working directory in RStudio. Then, use getwd() to confirm it's set correctly. ::: ## 7. Essential R Commands and Packages Let's familiarize ourselves with some essential R commands and set up the main packages you'll need for data science work. ### 7.1 Basic R Commands ```{r} #| eval: false # Creating variables t <- 1 x <- 5 y <- 10 z <- TRUE # Basic arithmetic (z <- x + y) # Creating vectors numbers <- c(1, 2, 3, 4, 5) names <- c("Alice", "Bob", "Charlie") # Creating a data frame df <- data.frame( name = names, age = c(25, 30, 35) ) # Viewing data View(df) head(df) str(df) summary(df) # Indexing numbers[2] # Second element df$name # Name column # Basic functions mean(numbers) sum(numbers) length(numbers) # Logical operators x > y x == y x != y # Control structures if (x > y) { print("x is greater than y") } else { print("x is not greater than y") } # Loops for (i in 1:5) { print(i^2) } while (z){ print(names[t]) if (t == 3){ z <- FALSE } t <- t + 1 } # Creating a function square <- function(x) { return(x^2) } square(4) # Getting help ?mean ``` ::: callout-important **Exercise 7:** Create a variable containing a vector of 10 random numbers between 1 and 100 using the sample() function. Then, use the max() and min() functions to find the highest and lowest numbers in your vector. ::: ## Installing and Loading Essential Packages Let's install and load some of the most commonly used packages in data science: ```{r} #| eval: false # Install and load essential packages p_load( tidyverse, # for data manipulation and visualization readxl, # for reading Excel files lubridate, # for working with dates ggplot2, # for creating graphs caret, # for machine learning rmarkdown, # for creating dynamic documents shiny, # for building interactive web apps plotly, # for creating interactive plots knitr # for dynamic report generation ) ``` ## Reading and Writing Data Learning to read and write data is crucial for any data science project: ```{r} #| eval: false # Writing data to CSV write.csv(data, "employee_data.csv", row.names = FALSE) # Reading data from CSV read_data <- read.csv("employee_data.csv") # Writing data to Excel (requires writexl package) p_load(writexl) write_xlsx(data, "employee_data.xlsx") # Reading data from Excel excel_data <- read_excel("employee_data.xlsx") # Writing R objects to RDS (R's native format) saveRDS(data, "employee_data.rds") # Reading RDS files rds_data <- readRDS("employee_data.rds") ``` ## Next Steps Now that you have a solid foundation in R and have set up your environment with essential packages, you're ready to start your data science journey! Here are some suggestions for next steps: - Practice data manipulation with larger datasets - Explore more advanced visualizations with ggplot2 - Learn about statistical tests and their implementation in R - Start exploring machine learning with the caret package - Create your first R Markdown document to share your analysis Remember, the key to mastering R and data science is consistent practice and curiosity. Don't hesitate to explore the vast resources available online, including R documentation, tutorials, and community forums. ## [Principles of Programming](https://www.geeksforgeeks.org/blogs/7-common-programming-principles-that-every-developer-must-follow/) Programming may feel like a daunting thing to approach, but there are a few principles which may be helpful to keep in mind for breaking down problems and informing your coding decisions. 1. KISS (Keep It Simple, Stupid): a. The goal of this principle is that you only want the bare minimum of complexity necessary. 2. DRY (Don't Repeat Yourself): a. This principle states, a piece of code should only be implemented in place. 3. YAGNI (You Aren't Gonna Need It): a. Don't implement something until it is necessary. 4. [SOLID:](https://www.geeksforgeeks.org/solid-principle-in-programming-understand-with-real-life-examples/) a. Single Responsibility Principle (SRP) b. Open/Closed Principle c. Liskov's Substitution Principle (LSP) d. Interface Segregation Principle (ISP) e. Dependency Inversion Principle (DIP) 5. Separation of Concerns (SOC): a. Break complexity down in different sections to address parts of the problem. 6. Avoid Premature Optimization: a. Optimization looks to speed up the code, but doing it early can obfuscate more issues. 7. Law of Demeter: a. Generally a combination of previous principles. i. Each unit should have only limited knowledge about other units. ii. Each unit should only talk to its friends. iii. Only talk to your immediate friends. ## RStudio & GitHub Copilot AI can be an very useful tool for speeding up your programming, but while it is helpful if your are unsure about your programming skills it may be better to use AI sparingly. If you are interest in AI however, a good place to start would be with the [GitHub Copilot](https://docs.posit.co/ide/user/ide/guide/tools/copilot.html) which can be added into RStudio. 1. Navigate to Tools > Global Options > Copilot. 2. Check the box to "Enable GitHub Copilot". 3. Download and install the Copilot Agent components. 4. Click the "Sign In" button. 5. In the "GitHub Copilot: Sign in" dialog, copy the Verification Code 6. Navigate to or click on the link to https://github.com/login/device, paste the Verification Code and click "Continue". 7. GitHub will request the necessary permissions for GitHub Copilot. To approve these permissions, click "Authorize GitHub Copilot Plugin". 8. After the permissions have been approved, your RStudio IDE will indicate the currently signed in user. 9. Close the Global Options dialogue, open a source file (.R, .py, .qmd, etc) and begin coding with Copilot! ::: callout-important Once copilot is set up, please disable it. You can use it for the project and exercises if you get stuck, but try to avoid during class. ::: ## Conclusion Congratulations! You've now set up your data science environment with R and RStudio, learned essential R commands, and gotten familiar with some of the most important packages in the R ecosystem. This foundation will serve you well as you continue your data science journey. Keep practicing, stay curious, and happy data sciencing! ### Reference - [GitHub Copilot](https://docs.posit.co/ide/user/ide/guide/tools/copilot.html)