Packages are for everyone!

The beautiful thing about packages is that anyone can create one. Creating packages with functions that you use on a daily basis can help streamline your work to make it more efficient. Today we will review the basic components of a package, how to build one, and how to upload it to a Github repository. There are more advanced things you can do with packages (e.g. publishing to CRAN, testing, installing dependencies), but they are beyond the scope of this presentation. If you are interested in learning more, we’ve included links below that contain information on more advanced topics.

Creating a Package

What packages will you need?

  • devtools
  • roxygen2

The devtools::create() function builds the skeleton of a package. The rest is for you to fill in!

#setwd(week9) 
#devtools::create("lurr")

Package contents

As you can see in your Files, this created a new directory with the given name (e.g. “lurr”). The contents within this new directory are the pieces needed to create a functioning package. We’ll go through each one…

  • .Rproj
  • R directory
  • DESCRIPTION
  • NAMESPACE

Projects

When you create your package, you’ll notice the .Rproj file. Go ahead and open that file. This will open a new environment.

What are projects?

Projects are a great way to organize your work flow (not just for packages). When you create a project, it is saved within a directory (either new or existing) that helps organize related files and scripts. Opening a project sets the working directory to that designated space, making it easier to share project directories with others (no more setting your working directory!). There’s additional added benefits, like setting preferences for that environment, saving history specific to that project, auto-saves within that directory, etc.

R directory

The R directory is where you write and save your functions in R scripts (.R).

corr_cv <- function(x, y) {
  corval <- vector()
  
  for (i in 1:length(x)) {
    corval[i] <- cor(x[-i], y[-i], use = "pairwise.complete.obs")
  }
  
  mean_corr <- mean(corval)
  return(mean_corr)
}

Considerations for organizing and formatting functions:

  • Names of functions should be clear and meaningulful. Make sure you aren’t overwriting existing functions.
    • e.g. fit_models.R (good) vs. foo.r (bad)
    • Variables as nouns
    • Functions as verbs
  • Avoid capitalization so code can be shared across OS.
  • Don’t put all functions into one file and don’t put all funtions into separate files.
Hadley's rule of thumb: "If I can't remember the name of the file where a function lives, I need to either separate the functions into more files or give the file a better name."

Description

The DESCRIPTION file contains meta-data regarding the package (e.g. title, authors, etc.). This file is for you to edit and fill in relevant information. For a personal package, much of this information will not be used and is for your own reference. One of the things you want to make sure you fill out is the Imports section. This section includes the dependencies of your package, i.e. the packages your functions use.

Default Documentation:

Package: [PACKAGE NAME]
Title: What the Package Does (one line, title case)
Version: 0.0.0.9000
Authors@R: person("First", "Last", email = "first.last@example.com", role = c("aut", "cre"))
Description: What the package does (one paragraph).
Depends: R (>= 3.5.0)
License: Who can use your package?
Encoding: UTF-8
LazyData: true
Imports:
  dplyr (can include required versions here), 
  ggplot2
Suggests:
  ggthemes

Important Notes:

  • Imports: is where you list packages that are required to run certain functions in your code. These packages are automatically installed with your package. However, these dependecies are not loaded with your package. We won’t be covering how to load dependencies in this tutorial. Instead, Hadley suggests using pkg::fun() to call functions that rely on installed dependencies. Though this may make your functions run slightly slower, it helps you keep track of your code down the road. If there are functions that you call repeatedly, you can load packages (or specific functions) in the NAMESPACE.

  • Suggests is where you list packages that are suggested but not required to run functions. These are not automatically installed with your package.

Namespace

The NAMESPACE file is where we provide documentation regarding each our functions. The actual file will never be edited manually. Instead, we can use the roxygen2 package to automatically create documentation for each function. To do so, we must include comments with specific information directly above each function (see below). Each comment must begin with #' followed by @[descriptor]. The roxygen2 package, along with the devtools::document function, will automatically process those comments to create documentation for each function. The documentation gets stored in the man/ directory.

Documentation comments

  • @param is where you define the formal arguments for your function. You must first state the argument name and then a description of it.
  • @return describes what the function returns
  • @export is required to export the function
  • @examples is where you can provide example uses of your function
  • @import is where you can load dependencies that are required for your function (I have not gotten this to work yet…)

Example of comments associated with each function in the R script:

#' Cross-Validated Correlation 
#'
#' This function allows you to run a correlation using leave-one-out cross-validation.
#' @param x A single vector.
#' @param y A single vector.
#' @keywords cross-validation correlation
#' @export
#' @examples
#' corr_cv(got$`Book Intro Chapter`, got$`Death Chapter`)

corr_cv <- function(x, y) {
  corval <- vector()
  for (i in 1:length(x)) {
    corval[i] <- cor(x[-i], y[-i], use = "pairwise.complete.obs")
  }
  mean_corr <- mean(corval)
  return(mean_corr)
}

Once you save the R script, you can run devtools::document() in the command line. This will save a new .Rd file in the man/ directory. The resulting documenation looks something like this:

% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/stat_functions.R
\name{corr_cv}
\alias{corr_cv}
\title{Cross-Validated Correlation}
\usage{
corr_cv(x, y)
}
\arguments{
\item{x}{A single vector.}

\item{y}{A single vector.}
}
\description{
This function allows you to run a correlation using leave-one-out cross-validation.
}
\examples{
corr_cv(got$`Book Intro Chapter`, got$`Death Chapter`)
}
\keyword{correlation}
\keyword{cross-validation}

Once you install your package, you can see the result using ?[function]

Package Walkthrough

Step 0: Load Packages

#install.packages("devtools")
#devtools::install_github("klutometis/roxygen")
library(devtools)
library(roxygen2)

Step 1: Create your package directory

devtools::create("OddsAndEnds")

If you look in your parent directory, you will now have a folder called OddsAndEnds, and in it you will have two folders: one file called DESCRIPTION (we will edit this in a minute), and another file called NAMESPACE (do not edit–roxygen2 adds documentation for us).

Step 2: Add functions Note that these functions must be saved in R script!

#function that changes all columns that are integers to numeric

colinttonum <- function(data) {
  temp1 <- data
  temp2 <- names(which(unlist(lapply(temp1[,1:ncol(temp1)],class)) == "integer"))
  for (i in 1:length(temp2)) {
    temp1[,temp2[i]] <- as.numeric(temp1[,temp2[i]])
  }
  warning(paste(length(temp2)," of ",ncol(temp1)," total columns were converted to numeric.",sep = ""))
  return(temp1)
}

#function that changes all columns that are factors to numeric

colfacttonum <- function(data) {
  temp1 <- data
  temp2 <- names(which(unlist(lapply(temp1[,1:ncol(temp1)],class)) == "factor"))
  for (i in 1:length(temp2)) {
    temp1[,temp2[i]] <- as.character(temp1[,temp2[i]]) 
     temp1[,temp2[i]] <- as.numeric(temp1[,temp2[i]])
  }
  warning(paste(length(temp2)," of ",ncol(temp1)," total columns were converted to numeric.",sep = ""))
  return(temp1)
}

#function that changes all columns that are characters to numeric

colchartonum <- function(data) {
  temp1 <- data
  temp2 <- names(which(unlist(lapply(temp1[,1:ncol(temp1)],class)) == "character"))
  for (i in 1:length(temp2)) {
    temp1[,temp2[i]] <- as.numeric(temp1[,temp2[i]])
  }
  warning(paste(length(temp2)," of ",ncol(temp1)," total columns were converted to numeric.",sep = ""))
  return(temp1)
}

#Let's also create a function that provides the most essential descriptives for each variable. I noticed some missing data in the datafile, so let's particuarily try to obtain info on the number of valid/missing cases for each variable.

descriptives <- function(data,si.digits=3) {
  #require(psych)
  temp1 <- data
  output <- matrix(rep(NA,ncol(temp1)*7),ncol = 7)
  for (i in 1:ncol(temp1)) {
    output[i,1] <- length(na.omit(temp1[,i]))
    output[i,2] <- sum(is.na(temp1[,i]))
    output[i,3] <- round(mean(temp1[,i], na.rm = TRUE),si.digits)
    output[i,4] <- round(sd(temp1[,i], na.rm = TRUE),si.digits)
    output[i,5] <- round(sd(temp1[,i], na.rm = TRUE)/(sqrt(output[i,1])),si.digits)
    output[i,6] <- round(skew(temp1[,i], na.rm = TRUE),si.digits)
    output[i,7] <- round(kurtosi(temp1[,i], na.rm = TRUE),si.digits)
  }
  output <- data.frame(output)
  colnames(output) <- c("n", "nmiss", "mean", "std", "se", "skew", "kurtosis")
  rownames(output) <- colnames(temp1)
  return(output)
}

Step 3: Add documentation in R script he package roxygen2 makes this step amazingly simple. The way it works is that you add special comments to the beginning of each function, of which will later be compiled into the correct format for package documentation. The details can be found in the roxygen2 documentation — I will provide an example here for each of my functions.

Typical arguments:

  • @param: Describe the formal arguments. State argument name and the describe it.

  • @return: What does the function return (e.g., A tibble with descriptive data)

  • @example: or more commonly @examples: Provide examples of the use of your function.

  • @export: Export your function

If you don’t include @export, your function will be internal, meaning others can’t access it easily.

# Save this as descriptives.R to your R directory
# (descriptives-package.r is auto-generated when you create the package)

# The comments you need to add at the beginning of the descriptives function are, for example, as follows:

#' A Descriptives Function
#'
#' This function allows you to examine your favorite descriptive statistics.
#' @param data The dataframe. Note that all variables must be numeric.
#' @param si.digits The number of digits to round to. Defaults to 3 decimal places.
#' @keywords descriptive stats, valid vases, missing cases, mean, standard deviation, standard error, skew, kurtosis
#' @export
#' @examples descriptives(data)
#' descriptives()

descriptives <- function(data,si.digits=3) {
  #require(psych)
  temp1 <- data
  output <- matrix(rep(NA,ncol(temp1)*7),ncol = 7)
  for (i in 1:ncol(temp1)) {
    output[i,1] <- length(na.omit(temp1[,i]))
    output[i,2] <- sum(is.na(temp1[,i]))
    output[i,3] <- round(mean(temp1[,i], na.rm = TRUE),si.digits)
    output[i,4] <- round(sd(temp1[,i], na.rm = TRUE),si.digits)
    output[i,5] <- round(sd(temp1[,i], na.rm = TRUE)/(sqrt(output[i,1])),si.digits)
    output[i,6] <- round(skew(temp1[,i], na.rm = TRUE),si.digits)
    output[i,7] <- round(kurtosi(temp1[,i], na.rm = TRUE),si.digits)
  }
  output <- data.frame(output)
  colnames(output) <- c("n", "nmiss", "mean", "std", "se", "skew", "kurtosis")
  rownames(output) <- colnames(temp1)
  return(output)
}

# colintonum documentation

#' Colinttonum Function
#'
#' This function changes all columns that are integers to numeric.
#' @param data The dataframe.
#' @keywords data structure, data class, integer, numeric
#' @export
#' @examples colinttonum(data)
#' colinttonum()

colinttonum <- function(data) {
  temp1 <- data
  temp2 <- names(which(unlist(lapply(temp1[,1:ncol(temp1)],class)) == "integer"))
  for (i in 1:length(temp2)) {
    temp1[,temp2[i]] <- as.numeric(temp1[,temp2[i]])
  }
  warning(paste(length(temp2)," of ",ncol(temp1)," total columns were converted to numeric.",sep = ""))
  return(temp1)
}

# colfactonum documentation

#' Colfacttonum Function
#'
#' This function changes all columns that are factors to numeric.
#' @param data The dataframe.
#' @keywords data structure, data class, factors, numeric
#' @export
#' @examples colfacttonum(data)
#' colfacttonum()

colfacttonum <- function(data) {
  temp1 <- data
  temp2 <- names(which(unlist(lapply(temp1[,1:ncol(temp1)],class)) == "factor"))
  for (i in 1:length(temp2)) {
    temp1[,temp2[i]] <- as.character(temp1[,temp2[i]]) 
    temp1[,temp2[i]] <- as.numeric(temp1[,temp2[i]])
  }
  warning(paste(length(temp2)," of ",ncol(temp1)," total columns were converted to numeric.",sep = ""))
  return(temp1)
}

# colchartonum documentation

#' Colchartonum Function
#'
#' This function changes all columns that are characters to numeric.
#' @param data The dataframe.
#' @keywords data structure, data class, character, numeric
#' @export
#' @examples colchartonum(data)
#' colchartonum()

colchartonum <- function(data) {
  temp1 <- data
  temp2 <- names(which(unlist(lapply(temp1[,1:ncol(temp1)],class)) == "character"))
  for (i in 1:length(temp2)) {
    temp1[,temp2[i]] <- as.numeric(temp1[,temp2[i]])
  }
  warning(paste(length(temp2)," of ",ncol(temp1)," total columns were converted to numeric.",sep = ""))
  return(temp1)
}

# I created a new file for each function, but if you’d rather you can simply create new functions sequentially in one file — just make sure to add the documentation comments before each function.

Step 4: Process your documentation Now you need to create the documentation from your annotations earlier. You’ve already done the “hard” work in Step 3.

setwd("./OddsAndEnds") 
document()

Must highlight and run lines above together. This automatically adds in the .Rd files to the man directory, and adds a NAMESPACE file to the main directory. You can read up more about these, but in terms of steps you need to take, you really don’t have to do anything further.

Note that each time you add new documentation to your R function, you need to run the above two lines again (or devtools::document()) to re-generate the .Rd files.

Step 5: Edit DESCRIPTION file Personally, I think here is a good point to edit the DESCRIPTION file. Let’s do that now.

The ‘Package’, ‘Version’, ‘License’, ‘Description’, ‘Title’, ‘Author’, and ‘Maintainer’ fields are mandatory, all other fields are optional.Some optional fields include: Imports and Suggests, URL (where code is posted on Github?), License (MIT, GPL-2, GPL-3, or CCO), and LazyData.

With respect to the author field, you need to include a three letter code specifying the role. There are four important roles:

cre: the creator or maintainer, the person you should bother if you have problems.

authors, those who have made significant contributions to the package.

ctb: contributors, those who have made smaller contributions, like patches.

cph: copyright holder. This is used if the copyright is held by someone other than the author, typically a company (i.e. the author’s employer).

Step 6: Install! Now it is as simple as installing the package! You need to run this from the parent working directory that contains the project folder.

setwd("..")
install("OddsAndEnds")
#If you end up updating your DESCRIPTION file later on, you need to re-install your package to load the updated info

library(OddsAndEnds)

#You do not need to load data to create a package! I am merely including data so that we can test out my functions.. speaking of, always test your functions! 
data <- read.csv("ClassData.csv")

data$WM_group <- as.factor(data$WM_group)
data$Subject_DID <- as.character(data$Subject_DID)

str(data) #I want all of my variables to be numeric, but many variables are not.. let's see if my packages streamline this process

data <- colinttonum(data)
data <- colfacttonum(data)
data <- colchartonum(data)

str(data)

descriptives(data)

# Note that if you try searching your functions without restarting your r session, you may get an error (Error Retrieving Help--R code execution error). If this happens, just save your work and re-start your sessioh. 

Publish your package on Github

Step 1: Open and Login to Github Desktop

Step 2: File –> Add Local Repository –> Set the Local Path to your package directory. A warning will appear that the directory is not a Git repository. Click create a repository here. The name of your package as well as the local path will automatically fill in, however, you can add a brief description of the repository. Finally, make sure to check off Initialize this repository with a README. This will create a README file that you can manually edit to share information about your package. When ready, click Create Repository

Step 3: To upload your package to Github, click Publish Repository on the top toolbar. Fill in the name of your package, and a description if you would like. Un-check Keep this code private since you need a paid account to publish private code.

Step 3.5: Since your package is already created, you can immediately publish to your Github account. However, if you wanted to make changes to your package, you can edit the files directly from the local repository. Once edited, you would need to write a summary of your changes and then click Commit to master. Once committing your changes, you can “push” the repository to Github by clicking Push repository on the top toolbar.

Voila! Your package is now published to an online repository that you can share with others. To install the package from any computer, you can use the command devtools::install_github([username/repository name]).

For example:

install_github("lfrank14/OddsAndEnds")
## Warning in strptime(x, fmt, tz = "GMT"): unknown timezone 'zone/tz/2018c.
## 1.0/zoneinfo/America/Los_Angeles'
## Downloading GitHub repo lfrank14/OddsAndEnds@master
## from URL https://api.github.com/repos/lfrank14/OddsAndEnds/zipball/master
## Installing OddsAndEnds
## '/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file  \
##   --no-environ --no-save --no-restore --quiet CMD INSTALL  \
##   '/private/var/folders/sy/f3znpm_d7f702pwh7bw18cb00000gq/T/Rtmpf6j6JY/devtools955026b27156/lfrank14-OddsAndEnds-139551a'  \
##   --library='/Library/Frameworks/R.framework/Versions/3.3/Resources/library'  \
##   --install-tests
## 

Additional package components

  • .gitignore: this is where you define files to ignore when you commit your package to a Git repository
  • Testing and debugging your functions
  • Installing dependencies through the NAMESPACE
  • Publishing your package on CRAN (requires additional formatting)
  • Saving data into your package
  • Vignettes

Resources

http://r-pkgs.had.co.nz/ –> Hadley’s book on R packages

https://cran.r-project.org/doc/manuals/r-release/R-exts.html –> CRAN’s guidelines for published packages

https://www.rstudio.com/wp-content/uploads/2015/03/devtools-cheatsheet.pdf –> Because who doesn’t love cheat sheets?

https://github.com/brainhack-eugene/open_neuroscience_workshop/tree/master/git_tutorial –> Shout-out to Brainhack Eugene 2018 for this awesome Git tutorial!!

https://awesome-r.com/ –> Get lost in packages.