This brief post is a “how to” load multiple csv
/ xlsx
files into single data frame. Say, we have a couple of *.xlsx files in one folder (some of them in sub-directories) and we would like to bind them all into one. Quite common task. Let´s do it..
library(tidyverse)
library(readxl)
Step 1
list.files
function with argument full.names = TRUE
(preserving complete path to all files) and argument recursive = TRUE
is handy in case your files are in any level of sub-directories (represented with “next_level” folder in my case).
x <- list.files(path = "../../static/data/multiple", full.names = TRUE, recursive = TRUE, pattern = "*.xlsx")
x
## [1] "../../static/data/multiple/file_1.xlsx"
## [2] "../../static/data/multiple/file_2.xlsx"
## [3] "../../static/data/multiple/file_3.xlsx"
## [4] "../../static/data/multiple/next_level/file_4.xlsx"
Now I have the x
vector with all full paths to all "*.xlsx" files.
Step 2
map
the vector of paths to function read_excel
. Use argument sheets
to specify the name of the sheet.
lst <- map(x, read_excel, sheet = "Value Assessment")
head(lst[[1]])
## # A tibble: 6 x 10
## Value Count Pattern Length Form Case Content Spacing Category
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 0 126 N 1 Nume~ Not ~ Single Valid Questio~
## 2 0.01 1 NPNN 4 Nume~ Not ~ Single Valid Appears~
## 3 0.33 1 NPNN 4 Nume~ Not ~ Single Valid Appears~
## 4 0.36 3 NPNN 4 Nume~ Not ~ Single Valid Appears~
## 5 0.39 1 NPNN 4 Nume~ Not ~ Single Valid Appears~
## 6 0.44 1 NPNN 4 Nume~ Not ~ Single Valid Appears~
## # ... with 1 more variable: `Assessment Note` <chr>
These are first 5 records of first file.
Step 3
Use bind_rows
to bind the list into one tibble (data frame). Argument .id
allows you to create a column with the name of source file.
df <- lst %>%
# set names to list items first
set_names(x %>% basename) %>%
bind_rows(.id = "filename")
head(df)
## # A tibble: 6 x 11
## filename Value Count Pattern Length Form Case Content Spacing Category
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 file_1.~ 0 126 N 1 Nume~ Not ~ Single Valid Questio~
## 2 file_1.~ 0.01 1 NPNN 4 Nume~ Not ~ Single Valid Appears~
## 3 file_1.~ 0.33 1 NPNN 4 Nume~ Not ~ Single Valid Appears~
## 4 file_1.~ 0.36 3 NPNN 4 Nume~ Not ~ Single Valid Appears~
## 5 file_1.~ 0.39 1 NPNN 4 Nume~ Not ~ Single Valid Appears~
## 6 file_1.~ 0.44 1 NPNN 4 Nume~ Not ~ Single Valid Appears~
## # ... with 1 more variable: `Assessment Note` <chr>
Pipe pipe pipe
Let´s pipe all these steps into single chunk of code.
x <- list.files(path = "../../static/data/multiple", full.names = TRUE, recursive = TRUE, pattern = "*.xlsx")
x %>%
map(read_excel, sheet = "Value Assessment") %>%
set_names(x %>% basename) %>%
bind_rows(.id = "filename") %>%
head
## # A tibble: 6 x 11
## filename Value Count Pattern Length Form Case Content Spacing Category
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 file_1.~ 0 126 N 1 Nume~ Not ~ Single Valid Questio~
## 2 file_1.~ 0.01 1 NPNN 4 Nume~ Not ~ Single Valid Appears~
## 3 file_1.~ 0.33 1 NPNN 4 Nume~ Not ~ Single Valid Appears~
## 4 file_1.~ 0.36 3 NPNN 4 Nume~ Not ~ Single Valid Appears~
## 5 file_1.~ 0.39 1 NPNN 4 Nume~ Not ~ Single Valid Appears~
## 6 file_1.~ 0.44 1 NPNN 4 Nume~ Not ~ Single Valid Appears~
## # ... with 1 more variable: `Assessment Note` <chr>
Actually, you can reduce it to two lines using map_df
..Note that without setting the names.
list.files(path = "../../static/data/multiple", full.names = TRUE, recursive = TRUE, pattern = "*.xlsx") %>%
map_df(~read_excel(., sheet = "Value Assessment"), .id = "filename") %>%
head
## # A tibble: 6 x 11
## filename Value Count Pattern Length Form Case Content Spacing Category
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 1 0 126 N 1 Nume~ Not ~ Single Valid Questio~
## 2 1 0.01 1 NPNN 4 Nume~ Not ~ Single Valid Appears~
## 3 1 0.33 1 NPNN 4 Nume~ Not ~ Single Valid Appears~
## 4 1 0.36 3 NPNN 4 Nume~ Not ~ Single Valid Appears~
## 5 1 0.39 1 NPNN 4 Nume~ Not ~ Single Valid Appears~
## 6 1 0.44 1 NPNN 4 Nume~ Not ~ Single Valid Appears~
## # ... with 1 more variable: `Assessment Note` <chr>
Pipes are cool, makes your life easier. Two lines of code, but very powerfull.