A Data Science Package: Part [2]

KARTHIKEYAN S
Statistical Breakdown
8 min readJan 1, 2021

--

Photo by olia danilevich via Pexels

Reading A Data Science Package : Part [1] would have given you some insights on data science. Now lets look at an interface for R (programming language), R Studios, that will also help you in getting a practical understanding.

Its recommended to install R and R studios so that you can explore it along with the contents below.

R

R is a programming language and free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing. The R language is widely used among statisticians and data miners for developing statistical software and data analysis.

RStudio’s main interface

You may be missing the upper left quadrant and instead have the left side of the screen with just one region, “Console” — if this is the case, go to File > New File > R Script and now it should more closely resemble the image.

The various quadrants

Rstudio can be roughly divided into four quadrants, each with specific and varied functions, plus a main menu bar. When you first open RStudio, you should see a window that looks roughly like this:

The menu bar

The menu bar runs across the top of your screen and should have two rows. The first row should be a fairly standard menu, starting with “File” and “Edit.” Below that, there is a row of icons that are shortcuts for functions that you’ll frequently use.

File menu

Here we can open new or saved files, open new or saved projects. If you mouse over “New File”, a new menu will appear that suggests the various file formats available to you. R Script and R Markdown files are the most common file types for use, but you can also generate R notebooks, web apps, websites, or slide presentations.

Session menu

The Session menu has some R specific functions, in which you can restart, interrupt or terminate R — these can be helpful if R isn’t behaving or is stuck and you want to stop what it is doing and start from scratch.

Tools menu

The Tools menu is a huge collection functions for you to explore.

The console

To execute your first command, try typing 1 + 2 then enter at the > prompt. You should see the output [1] 3 below your command.

The environment/history

The environment quadrant

RStudio also tells you some information about the object in the environment, like whether it is a list or a dataframe or if it contains numbers, integers or characters. This is very helpful information to have as some functions only work with certain classes of data. And knowing what kind of data you have is the first step to that.

The history tab

Here you will see the commands that we have run in this session of R. If you click on any one of them, you can click “To Console” or “To Source” and this will either rerun the command in the console, or will move the command to the source, respectively.

The source/The script editor panel

The Source panel is where you will be spending most of your time in RStudio. This is where you store the R commands that you want to save for later, either as a record of what you did or as a way to rerun code.

Files, Plots, Packages, Help, and Viewer.

The files tab

In Files, you can see all of the files in your current working directory. If this isn’t where you want to save or retrieve files from, you can also change the current working directory in this tab and change it to the desired directory.

The plots tab

Plots generated will appear here and you can use the arrows to navigate to previously generated plots. The magnifying icon will open the plot in a new window, that is much larger than the quadrant. Export is how you save the plot. You can either save it as an image or as a PDF. The broom icon clears all plots from memory.

The packages tab

The Packages tab will be explored more in depth in the next lesson on R packages. Here you can see all the packages you have installed, load and unload these packages, and update them.

The help tab

The Help tab is where you find the documentation for your R packages and various functions. In the upper right of this panel there is a search function for when you have a specific function or package in question.

What is a package?

A package is a collection of functions, data, and code conveniently provided in a nice, complete format for you. Over 14,000 packages available to download each with their own specialized functions and code. A package is not to be confused with a library (these two terms are often conflated in colloquial speech about R). A library is the place where the package is located on your computer.

Packages are what make R so unique. These packages greatly expand its functionality. Each package is developed and published by the R community at large and deposited in repositories.

What are repositories?

A repository is a central location where many developed packages are located and available for download.

There are three big repositories:
1. CRAN (Comprehensive R Archive Network): R’s main repository
2. BioConductor: A repository mainly for bioinformatic-focused packages
3. GitHub: A very popular, open source repository

CRAN groups all of its packages by their functionality/topic into 35 “themes.” It calls this its “Task view.”

RDocumentation, which is a search engine for packages and functions from CRAN, BioConductor, and GitHub.

In order to find a package for any specific task the best way to begin is to Google the task followed by “R package”. From there, looking at tutorials, vignettes, and forums for people already doing what you want to do is a great way to find relevant packages.

How do you install packages?

Installing from CRAN
If you are installing from the CRAN repository, use the install.packages() function, with the name of the package you want to install in quotes between the parentheses. For example, if you want to install the package “devtools”, you would use: install.packages("devtools")

If you want to install multiple packages at once, you can do so by using a character vector, like: install.packages(c("ggplot2","lme4"))

If you want to use RStudio’s graphical interface to install packages, go to the Tools menu, and the first option should be “Install packages…”

Installing from Bioconductor
The BioConductor repository uses their own method to install packages. First, to get the basic functions required to install through BioConductor, use: source("https://bioconductor.org/biocLite.R")

This makes the main install function of BioConductor, biocLite(), available to you. Following this, you call the package you want to install in quotes, between the parentheses of the biocLite command, like so: biocLite("GenomicFeatures")

Installing from GitHub
This is a more specific case that you probably won’t run into too often. In the event you want to do this, you first must find the package you want on GitHub and take note of both the package name AND the author of the package. A general workflow is as follows:

  1. install.packages("devtools") - only run this if you don’t already have devtools installed. If you’ve been following along with this lesson, you may have installed it when we were practicing installations using the R console
  2. library(devtools) - more on what this command is doing immediately below this
  3. install_github("author/package") replacing “author” and “package” with their GitHub username and the name of the package.

Loading packages

Installing a package does not make its functions immediately available to you. First you must load the package into R; to do so, use the library() function. Think of this like any other software you install on your computer. Just because you’ve installed a program, doesn’t mean it’s automatically running - you have to open the program. Same with R. [ Do not put the package name in quotes! Unlike when you are installing the packages, the library() command does not accept package names in quotes! ]

If you want to load a package using the RStudio interface, in the lower right quadrant there is a tab called “Packages” that lists out all of the packages and a brief description, as well as the version number, of all of the packages you have installed. To load a package just click on the checkbox beside the package name

Updating, removing, unloading packages

Checking what packages you have installed
If you aren’t sure if you’ve already installed a package, or want to check what packages are installed, you can use either of: installed.packages() or library() with nothing between the parentheses to check!

Updating packages

You can check what packages need an update with a call to the function old.packages() This will identify all packages that have been updated since you installed them/last updated them.

To update all packages, use update.packages(). If you only want to update a specific package, just use once again install.packages("nameofthepackage")

Within the RStudio interface, still in that Packages tab, you can click “Update,” which will list all of the packages that are not up to date. It gives you the option to update all of your packages, or allows you to select specific packages.

Sometimes an update can change the functionality of certain functions, so if you re-run some old code, the command may be changed or perhaps even outright gone and you will need to update your code too!

Unloading packages

Situations occur where the package you have loaded may not play nicely with another package you want to use.

To unload a given package you can use the detach() function. For example, detach("package:devtools", unload=TRUE) would unload the devtools package (that we loaded earlier). Within the RStudio interface, in the Packages tab, you can simply unload a package by unchecking the box beside the package name.

Uninstalling packages

If you no longer want to have a package installed, you can simply uninstall it using the function remove.packages(). For example, remove.packages("devtools")

Within RStudio, in the Packages tab, clicking on the “X” at the end of a package’s row will uninstall that package.

How do you know what version of R you have?

Sometimes, when you are looking at a package that you might want to install, you will see that it requires a certain version of R to run.

R version can be found when you first open R/RStudio — the first thing it outputs in the console tells you what version of R is currently running. You can type version into the console and it will output information on the R version you are running. Another helpful command is sessionInfo() - it will tell you what version of R you are running along with a listing of all of the packages you have loaded.

Using the commands in a function

Try help(package = "devtools") and you will see all of the many functions that devtools provides. Within the RStudio interface, you can access the help files through the Packages tab (again) - clicking on any package name should open up the associated help files in the “Help” tab, found in that same quadrant, beside the Packages tab.

If you still have doubts on what what functions come under a package and if they are useful for you then the browseVignettes() function will provide more information and help. These are extended help files, that include an overview of the package and its functions, but often they go an extra mile and include detailed examples of how to use the functions in plain words that you can follow along with to see how to use the package.

--

--

KARTHIKEYAN S
Statistical Breakdown

Ambitious and hardworking individual, with keen interest in data science.