Perform principal component analysis (PCA) on specified variables and create
visualizations. The default "classic" mode preserves the original
psych::principal() workflow for compatibility with existing
SciDataReportR analyses. The optional "omics" mode is designed for
high-dimensional data such as proteomics, metabolomics, and other settings
where the number of variables is much larger than the number of participants.
Usage
CreatePCAObject(
Data,
VarsToReduce,
VariableCategories = NULL,
minThresh = 0.85,
scale = TRUE,
center = TRUE,
Ordinal = FALSE,
numComponents = NULL,
Mode = c("classic", "omics"),
backend = c("psych", "prcomp", "irlba"),
rotate = c("varimax", "none"),
maxComponents = 20,
maxScreeComponents = 20,
VarianceFilter = NULL,
VarianceFilterMethod = c("top_n", "variance_quantile"),
MissingnessWarningThreshold = 0.2,
ParticipantMissingnessWarningThreshold = 0.2,
imputeMethod = c("missRanger", "median"),
SuppressWarnings = FALSE
)Arguments
- Data
The dataset containing the variables for PCA.
- VarsToReduce
A character vector specifying the variables to include in the PCA.
- VariableCategories
Optional categorical vector, used to color the lollipop plot. Must be the same length and order as
VarsToReduceif provided.- minThresh
The minimum threshold for cumulative proportion of variance. Default is
0.85. In"omics"mode, this threshold is evaluated only across the capped scree components.- scale
Logical, indicating whether to scale the data by standard deviation. Default is
TRUE.- center
Logical, indicating whether to center the data by subtracting the mean. Default is
TRUE.- Ordinal
Logical, indicating whether ordinal variables should be handled. Currently not used inside this function.
- numComponents
Number of principal components to compute. If
NULL, chosen byminThreshin"classic"mode and by capped scree results in"omics"mode.- Mode
Character. Either
"classic"or"omics"."classic"preserves the originalpsych::principal()behavior."omics"uses capped PCA logic for high-dimensional data.- backend
Character. PCA backend to use. Options are
"psych","prcomp", and"irlba". Default is"psych"to preserve backward compatibility. In"omics"mode,"irlba"is recommended for speed.- rotate
Character. Rotation method. Options are
"varimax"and"none". Default is"varimax"to preserve interpretability and existing loading-table behavior.- maxComponents
Maximum number of final components to retain in
"omics"mode whennumComponentsisNULL. Default is20.- maxScreeComponents
Maximum number of components used to estimate and plot the scree curve in
"omics"mode. Default is20.- VarianceFilter
Optional numeric value for variance filtering before PCA in
"omics"mode. IfVarianceFilterMethod = "top_n", this keeps the topVarianceFiltermost variable variables. IfVarianceFilterMethod = "variance_quantile", this keeps variables with variance at or above the specified quantile.- VarianceFilterMethod
Character. Either
"top_n"or"variance_quantile".- MissingnessWarningThreshold
Numeric threshold for warning about high variable-level missingness. Default is
0.20.- ParticipantMissingnessWarningThreshold
Numeric threshold for warning about high participant-level missingness. Default is
0.20.- imputeMethod
Character. Missing-data imputation method. Options are
"missRanger"and"median". Default is"missRanger"to preserve the original behavior.- SuppressWarnings
Logical. If
TRUE, suppresses PCA-specific warning messages from this function. Default isFALSE.
Value
A list containing PCA results and visualizations:
- p_scree
Scree and variance plot as a ggplot object.
- pcaresults
PCA result object. In
"classic"mode this is thepsych::principal()result afterpsych::fa.sort(). In"omics"mode this is a harmonized list with rotated and sorted loadings, scores, and variance information.- LoadingTable
Data frame of rotated loadings with variable names and labels.
- Scores
Matrix or data frame of component scores.
- CombinedData
Original data with component scores appended.
- Lollipop
Lollipop-style loading plot as a ggplot object.
- ScaleParams
List with means, standard deviations, center, and scale.
- VarsUsed
Character vector of variables actually used in PCA.
- VarianceTable
Table of variance explained by component.
- Mode
PCA mode used.
- Backend
PCA backend used.
- Center
Logical flag indicating if centering was used.
- Scale
Logical flag indicating if scaling was used.
Details
In "omics" mode, the function avoids computing one component per input
variable. Instead, it computes a capped number of components using a faster
PCA backend, applies optional variance filtering, and then applies rotation
and sorting so loadings remain interpretable. This is useful when PCA is being
used for exploratory structure, visualization, latent biological signal, or
clustering rather than exhaustive factor decomposition.
