Chapter 3 Getting Data into R
Data lie in different places and in different formats. Therefore, getting them into R
is not always the same. Some data are stored in local directory in our computer, while other data are available online. Moreover, sometimes we need extract data from databases. To know how to extract different forms of data from different sources is important because the explosion of social media and similar platforms has given birth to tremendous amounts of data in different formats and in different places. Though accountants and auditors are good at dealing with structured data, they should also know how to deal with unstructured and non-conventional data (Richins et al. 2017).
3.1 Packages That Could Be Used to Import Data into R
.
There are many packages that can be used to import data into R
from many different sources. Of those packages, readr
, readxl
, and purrr
will be discussed here. Additionally, data can be imported into R
using RStudio's Import Wizard
. To import data using RStudio's Import Wizard
, one needs to go to the Environment tab and select Import Dataset; then, select appropriate type of data one wants to import and finally browse the data one wants to import. Please see Figure 3.1 to learn about how to use RStudio's Import Wizard
.

FIGURE3.1: Import Data Using RStudio’s Import Wizard
3.2 Knowing & Setting Your Current (Working) Directory
When one wants to import the data from Personal Computer (PC), then knowing where the data reside (which is also called “Path”) is necessary as this will help import the data easily. The function getwd()
will help to know the current working directory. Unless a specific directory is set as a current (working) directory, R
will always look for a file in the current working directory. The function setwd()
helps to set any directory (folder) as current working directory. If you are working in an R
project (The discussion about R
project is in Appendix B.1), then the project folder (directory) is the current directory. Figure 3.2 is an example of the address of a directory (folder).

FIGURE3.2: Directory
If you click on the address bar of the directory (highlighted in Red), it will look like - C:\Documents\Project Docs
. If you want to set it as your current working directory, then you have to write the code setwd("C:/Documents/Project Docs")
. Note that though the address has back slash (\
), in setwd()
function, we use forward slash (/
) as R
cannot deal with back slash. Once you set the working directory, running getwd()
will show you your current working directory, which is the default directory for importing and exporting data into R
(unless you specifically mention a different path).
3.3 Importing Data into R
from Excel
To import data from excel, readxl
package can be excellent. You can install readxl
package by install.packages("readxl")
or it will be installed when tidyverse
package is installed. readxl
package can deal with both xls
and xlsx
files. There are some built-in datasets with readxl
package. The function readxl_example
generates the names of the built in datasets in readxl
package.
library(readxl)
readxl_example() # These are the example files from readxl package.
## [1] "clippy.xls" "clippy.xlsx" "datasets.xls"
## [4] "datasets.xlsx" "deaths.xls" "deaths.xlsx"
## [7] "geometry.xls" "geometry.xlsx" "type-me.xls"
## [10] "type-me.xlsx"
The read_excel
function will read the data from excel. For example - if we want to import an excel file such as climate change data from World Bank (https://datacatalog.worldbank.org/dataset/climate-change-data), we can use read_excel
function.
<- read_excel("data/climate_change_download_0.xls")
climate_change # To know the names of the variables of the dataset
names(climate_change)
## [1] "Country code" "Country name" "Series code"
## [4] "Series name" "SCALE" "Decimals"
## [7] "1990" "1991" "1992"
## [10] "1993" "1994" "1995"
## [13] "1996" "1997" "1998"
## [16] "1999" "2000" "2001"
## [19] "2002" "2003" "2004"
## [22] "2005" "2006" "2007"
## [25] "2008" "2009" "2010"
## [28] "2011"
# OR
colnames(climate_change)
## [1] "Country code" "Country name" "Series code"
## [4] "Series name" "SCALE" "Decimals"
## [7] "1990" "1991" "1992"
## [10] "1993" "1994" "1995"
## [13] "1996" "1997" "1998"
## [16] "1999" "2000" "2001"
## [19] "2002" "2003" "2004"
## [22] "2005" "2006" "2007"
## [25] "2008" "2009" "2010"
## [28] "2011"
If the excel file climate_change_download_0
is opened in excel, we can see that there are 3 worksheets in the files, but read_excel
function in climate_change
dataset only imports the first worksheet called Data
. We can use sheet
argument in read_excel
function to specify which worksheet one wants to import. In sheet
argument the position of the worksheet can be specified as well rather than the name of the worksheet. Also, excel_sheets
functions can be used to know the names of all worksheets in a given dataset. The function excel_format
can be used to know the format of the excel files (xls
or xlsx
).
excel_sheets("data/climate_change_download_0.xls")
## [1] "Data" "Country" "Series"
excel_format("data/climate_change_download_0.xls")
## [1] "xls"
<- read_excel("data/climate_change_download_0.xls",
climate_change_country sheet = "Country")
# OR
<- read_excel("data/climate_change_download_0.xls",
climate_change_country sheet = 2)
Other arguments in read_excel
function such as range
can be used to import a subset of the excel file. The argument na
are used for missing values (NA). In our dataset climate_change
there are missing values represented by ..
. This can be replaced by na
argument.
<- read_excel("data/climate_change_download_0.xls",
climate_change2 na = "..")
Also, we can use writexl
package to export (save) a dataset from R in excel format. The function write_xlsx
is usually used to save the data in excel format in desired directory.
library(writexl)
write_xlsx(climate_change_country, "data/climatechange_country.xlsx")
Alternatively, one can use file.choose()
function within read_excel
function to manually import an excel file into R
.
<- read_excel(file.choose()) data
Also, using read.table
function allows one to copy and paste an excel file in R
.
<- read.table(file = "clipboard",
df sep = "\t", header=TRUE)
3.4 Importing Data into R
from Local Directory of Computer
Importing data from local directory of computer can be done in couple of ways. For example, we can specify the path in which the data is stored. Alternatively, we can set the working directory first; then import the data by specifying the name of the data files.
3.5 Importing Data into R
from Internet
3.6 Importing Data into R
from WRDS
Database
3.7 Some Other Sources
Very good website for the chapter - https://rc2e.com/inputandoutput#recipe-id245