Merges two or more data files by adding the content of other input files as columns to the first input file and outputs them as files for the statistical spreadsheet 'jamovi' (https://www.jamovi.org)
Source:R/merge_cols_omv.R
merge_cols_omv.Rd
Merges two or more data files by adding the content of other input files as columns to the first input file and outputs them as files for the statistical spreadsheet 'jamovi' (https://www.jamovi.org)
Arguments
- dtaInp
Either a data frame (with the attribute "fleInp" containing the files to merge) or vector with the names of the input files (including the path, if required; "FILENAME.ext"; default: NULL); files can be of any supported file type, see Details below
- fleOut
Name of the data file to be written (including the path, if required; "FILE_OUT.omv"; default: ""); if empty, the resulting data frame is returned instead
- typMrg
Type of merging operation: "outer" (default), "inner", "left" or "right"; see Details below
- varBy
Name of the variable by which the data sets are matched, can either be a string, a character or a list (see Details below; default: list())
- varSrt
Variable(s) that are used to sort the data frame (see Details; if empty, the order after merging is kept; default: c())
- psvAnl
Whether analyses that are contained in the input file shall be transferred to the output file (TRUE / FALSE; default: FALSE)
- usePkg
Name of the package: "foreign" or "haven" that shall be used to read SPSS, Stata and SAS files; "foreign" is the default (it comes with base R), but "haven" is newer and more comprehensive
- selSet
Name of the data set that is to be selected from the workspace (only applies when reading .RData-files)
- ...
Additional arguments passed on to methods; see Details below
Value
a data frame (only returned if fleOut
is empty) where the columns of all input data sets (given in the dtaInp
-argument) are concatenated
Details
Using data frames with the input parameter
dtaInp
is primarily thought to be used when callingmerge_cols_omv
from the jamovi-modulesjTransform
andRj
. For the use in R, it is strongly recommended to use a character vector with the file names instead.There are four different types of merging operations (defined via
typMrg
): "outer" keeps all cases (but columns in the resulting data set may contain empty cells / missing values if same input data sets did not have a row containing the matching variable (defined invarBy
). "inner" keeps only those cases where all datasets contain the same value in the matching variable, for "left" all cases from the first data set indtaInp
are kept (whereas cases that are only contained in the second or any later input data set are dropped), for "right" all cases from the second (or any higher) data set indtaInp
are kept. The behaviour of "left" and "right" may be somewhat difficult to predict in case of merging several data sets, therefore "outer" might be a safer choice if several data sets are merged.The variable that is used for matching (
varBy
) can either be a string (if all datasets contain a matching variable with the same name), a character vector (containing more than one matching variables that are contained in / the same for all data sets) or a list with the same length as dtaInp. In such list, each cell can again contain either a string (one matching variable for each data set in dtaInp) or a character vector (several matching variables for each data set in dtaInp; NB: all character vectors in the cells of the list must have the same length as it is necessary to always use the same number of matching variables when merging).varSrt
can be either a character or a character vector (with one or more variables respectively). The sorting order for a particular variable can be inverted with preceding the variable name with "-". Please note that this doesn't make sense and hence throws a warning for certain variable types (e.g., factors).The ellipsis-parameter (
...
) can be used to submit arguments / parameters to the functions that are used for transforming or reading the data. By clicking on the respective function under “See also”, you can get a more detailed overview over which parameters each of those functions take.Adding columns uses
merge
.typMrg
is implemented by settingTRUE
orFALSE
toall.x
andall.y
inmerge
,varBy
matchesby.x
andby.y
. The help formerge
can be accessed by clicking on the link under “See also”.The functions for reading and writing the data are:
read_omv
andwrite_omv
(for jamovi-files),read.table
(for CSV / TSV files; using similar defaults asread.csv
for CSV andread.delim
for TSV which both are based uponread.table
),load
(for .RData-files),readRDS
(for .rds-files),read_sav
(needs R-packagehaven
) orread.spss
(needs R-packageforeign
) for SPSS-files,read_dta
(haven
) /read.dta
(foreign
) for Stata-files,read_sas
(haven
) for SAS-data-files, andread_xpt
(haven
) /read.xport
(foreign
) for SAS-transport-files. If you would like to usehaven
, you may need to install it usinginstall.packages("haven", dep = TRUE)
.
See also
merge_cols_omv
internally uses the following functions: Adding columns uses merge()
. For reading and writing data files in different formats:
read_omv()
and write_omv()
for jamovi-files, utils::read.table()
for CSV / TSV files, load()
for reading .RData-files,
readRDS()
for .rds-files, haven::read_sav()
or foreign::read.spss()
for SPSS-files, haven::read_dta()
or foreign::read.dta()
for Stata-files,
haven::read_sas()
for SAS-data-files, and haven::read_xpt()
or foreign::read.xport()
for SAS-transport-files.
Examples
if (FALSE) { # \dontrun{
dtaInp <- jmvReadWrite::bfi_sample2
nmeInp <- paste0(tempfile(), "_", 1:3, ".rds")
nmeOut <- tempfile(fileext = ".omv")
for (i in seq_along(nmeInp)) {
saveRDS(stats::setNames(dtaInp, c("ID", paste0(names(dtaInp)[-1], "_", i))), nmeInp[i])
}
# save dtaInp three times (i.e., the length of nmeInp), adding "_" + 1 ... 3 as index
# to the data variables (A1 ... O5, gender, age → A1_1, ...)
jmvReadWrite::merge_cols_omv(dtaInp = nmeInp, fleOut = nmeOut, varBy = "ID")
cat(file.info(nmeOut)$size)
# -> 17731 (size may differ on different OSes)
dtaOut <- jmvReadWrite::read_omv(nmeOut, sveAtt = FALSE)
# read the data set where the three original datasets were added as columns and show
# the variable names
cat(names(dtaOut))
cat(names(dtaInp))
# compared to the input data set, we have the same names (expect for "ID" which was
# used for matching and that each variable had added an indicator from which data
# set they came)
cat(dim(dtaInp), dim(dtaOut))
# the first dimension of the data sets (rows) stayed the same (250), whereas the
# second dimension is now approx. three times as large (28 -> 82):
# 28 - 1 (for "ID") = 27 * 3 + 1 (for "ID") = 82
cat(colMeans(dtaInp[2:11]))
cat(colMeans(dtaOut[2:11]))
# it's therefore not much surprise that the values of the column means for the first
# 10 variables of dtaInp and dtaOut are the same too
unlink(nmeInp)
unlink(nmeOut)
} # }