% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/chi_squared_test.R
\name{chi_squared_test}
\alias{chi_squared_test}
\title{Chi-Squared test}
\usage{
chi_squared_test(
  data,
  select = NULL,
  by = NULL,
  probabilities = NULL,
  weights = NULL,
  paired = FALSE,
  ...
)
}
\arguments{
\item{data}{A data frame.}

\item{select}{Name(s) of the continuous variable(s) (as character vector)
to be used as samples for the test. \code{select} can be one of the following:
\itemize{
\item \code{select} can be used in combination with \code{by}, in which case \code{select} is
the name of the continous variable (and \code{by} indicates a grouping factor).
\item \code{select} can also be a character vector of length two or more (more than
two names only apply to \code{kruskal_wallis_test()}), in which case the two
continuous variables are treated as samples to be compared. \code{by} must be
\code{NULL} in this case.
\item If \code{select} select is of length \strong{two} and \code{paired = TRUE}, the two samples
are considered as \emph{dependent} and a paired test is carried out.
\item If \code{select} specifies \strong{one} variable and \code{by = NULL}, a one-sample test
is carried out (only applicable for \code{t_test()} and \code{wilcoxon_test()})
\item For \code{chi_squared_test()}, if \code{select} specifies \strong{one} variable and
both \code{by} and \code{probabilities} are \code{NULL}, a one-sample test against given
probabilities is automatically conducted, with equal probabilities for
each level of \code{select}.
}}

\item{by}{Name of the variable indicating the groups. Required if \code{select}
specifies only one variable that contains all samples to be compared in the
test. If \code{by} is not a factor, it will be coerced to a factor. For
\code{chi_squared_test()}, if \code{probabilities} is provided, \code{by} must be \code{NULL}.}

\item{probabilities}{A numeric vector of probabilities for each cell in the
contingency table. The length of the vector must match the number of cells
in the table, i.e. the number of unique levels of the variable specified
in \code{select}. If \code{probabilities} is provided, a chi-squared test for given
probabilities is conducted. Furthermore, if \code{probabilities} is given, \code{by}
must be \code{NULL}. The probabilities must sum to 1.}

\item{weights}{Name of an (optional) weighting variable to be used for the test.}

\item{paired}{Logical, if \code{TRUE}, a McNemar test is conducted for 2x2 tables.
Note that \code{paired} only works for 2x2 tables.}

\item{...}{Additional arguments passed down to \code{\link[=chisq.test]{chisq.test()}}.}
}
\value{
A data frame with test results. The returned effects sizes are
Cramer's V for tables with more than two rows or columns, Phi (\eqn{\phi})
for 2x2 tables, and Fei (\ifelse{latex}{\eqn{Fei}}{פ}) for tests against
given probabilities.
}
\description{
This function performs a \eqn{\chi^2} test for contingency
tables or tests for given probabilities. The returned effects sizes are
Cramer's V for tables with more than two rows or columns, Phi (\eqn{\phi})
for 2x2 tables, and Fei (\ifelse{latex}{\eqn{Fei}}{פ}) for tests against
given probabilities (see \emph{Ben-Shachar et al. 2023}).
}
\details{
The function is a wrapper around \code{\link[=chisq.test]{chisq.test()}} and
\code{\link[=fisher.test]{fisher.test()}} (for small expected values) for contingency tables, and
\code{chisq.test()} for given probabilities. When \code{probabilities} are provided,
these are rescaled to sum to 1 (i.e. \code{rescale.p = TRUE}). When \code{fisher.test()}
is called, simulated p-values are returned (i.e. \code{simulate.p.value = TRUE},
see \code{?fisher.test}). If \code{paired = TRUE} and a 2x2 table is provided,
a McNemar test (see \code{\link[=mcnemar.test]{mcnemar.test()}}) is conducted.

The weighted version of the chi-squared test is based on the a weighted
table, using \code{\link[=xtabs]{xtabs()}} as input for \code{chisq.test()}.

Interpretation of effect sizes are based on rules described in
\code{\link[effectsize:interpret_r]{effectsize::interpret_phi()}}, \code{\link[effectsize:interpret_r]{effectsize::interpret_cramers_v()}},
and \code{\link[effectsize:interpret_r]{effectsize::interpret_fei()}}. Use these function directly to get other
interpretations, by providing the returned effect size as argument, e.g.
\code{interpret_phi(0.35, rules = "gignac2016")}.
}
\section{Which test to use}{

The following table provides an overview of which test to use for different
types of data. The choice of test depends on the scale of the outcome
variable and the number of samples to compare.\tabular{lll}{
   \strong{Samples} \tab \strong{Scale of Outcome} \tab \strong{Significance Test} \cr
   1 \tab binary / nominal \tab \code{chi_squared_test()} \cr
   1 \tab continuous, not normal \tab \code{wilcoxon_test()} \cr
   1 \tab continuous, normal \tab \code{t_test()} \cr
   2, independent \tab binary / nominal \tab \code{chi_squared_test()} \cr
   2, independent \tab continuous, not normal \tab \code{mann_whitney_test()} \cr
   2, independent \tab continuous, normal \tab \code{t_test()} \cr
   2, dependent \tab binary (only 2x2) \tab \code{chi_squared_test(paired=TRUE)} \cr
   2, dependent \tab continuous, not normal \tab \code{wilcoxon_test()} \cr
   2, dependent \tab continuous, normal \tab \code{t_test(paired=TRUE)} \cr
   >2, independent \tab continuous, not normal \tab \code{kruskal_wallis_test()} \cr
   >2, independent \tab continuous,     normal \tab \code{datawizard::means_by_group()} \cr
   >2, dependent \tab continuous, not normal \tab \emph{not yet implemented} (1) \cr
   >2, dependent \tab continuous,     normal \tab \emph{not yet implemented} (2) \cr
}


(1) More than two dependent samples are considered as \emph{repeated measurements}.
For ordinal or not-normally distributed outcomes, these samples are
usually tested using a \code{\link[=friedman.test]{friedman.test()}}, which requires the samples
in one variable, the groups to compare in another variable, and a third
variable indicating the repeated measurements (subject IDs).

(2) More than two dependent samples are considered as \emph{repeated measurements}.
For normally distributed outcomes, these samples are usually tested using
a ANOVA for repeated measurements. A more sophisticated approach would
be using a linear mixed model.
}

\examples{
\dontshow{if (requireNamespace("effectsize") && requireNamespace("MASS")) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf}
data(efc)
efc$weight <- abs(rnorm(nrow(efc), 1, 0.3))

# Chi-squared test
chi_squared_test(efc, "c161sex", by = "e16sex")

# weighted Chi-squared test
chi_squared_test(efc, "c161sex", by = "e16sex", weights = "weight")

# Chi-squared test for given probabilities
chi_squared_test(efc, "c161sex", probabilities = c(0.3, 0.7))
\dontshow{\}) # examplesIf}
}
\references{
\itemize{
\item Ben-Shachar, M.S., Patil, I., Thériault, R., Wiernik, B.M.,
Lüdecke, D. (2023). Phi, Fei, Fo, Fum: Effect Sizes for Categorical Data
That Use the Chi‑Squared Statistic. Mathematics, 11, 1982.
\doi{10.3390/math11091982}
\item Bender, R., Lange, S., Ziegler, A. Wichtige Signifikanztests.
Dtsch Med Wochenschr 2007; 132: e24–e25
\item du Prel, J.B., Röhrig, B., Hommel, G., Blettner, M. Auswahl statistischer
Testverfahren. Dtsch Arztebl Int 2010; 107(19): 343–8
}
}
\seealso{
\itemize{
\item \code{\link[=t_test]{t_test()}} for parametric t-tests of dependent and independent samples.
\item \code{\link[=mann_whitney_test]{mann_whitney_test()}} for non-parametric tests of unpaired (independent)
samples.
\item \code{\link[=wilcoxon_test]{wilcoxon_test()}} for Wilcoxon rank sum tests for non-parametric tests
of paired (dependent) samples.
\item \code{\link[=kruskal_wallis_test]{kruskal_wallis_test()}} for non-parametric tests with more than two
independent samples.
\item \code{\link[=chi_squared_test]{chi_squared_test()}} for chi-squared tests (two categorical variables,
dependent and independent).
}
}
