Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Multivariate Analysis using SAS
Latest

Multivariate Analysis using SAS

Last Updated on February 23, 2022 by Editorial Team

Author(s): Dr. Marc Jacobs

Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.

Machine Learning

The difference between univariate, multivariate, and multivariable is often overlooked. As such, multivariate and multivariable are used interchangeably although they mean completely different things.

There is no such thing as a univariate, multivariate model, but you canΒ have:

  1. Univariate, multivariable
  2. Multivariate, multivariable
  3. Univariate, univariable
  4. Multivariate, univariable

Multivariate means multiple dependent variables (Y’s), multivariable means multiple independent variables (X’s). The difference between multivariable and univariable is probably known to most since the majority of the models that you run have more than one independent variable. This means you have a single outcome and multiple predictors.

The difference between univariate and multivariate will have a steeper learning curve since multivariate analysis often leads to a reduction or reframing of the original data to handle the multiple outcomes you are trying toΒ model.

A multivariable model can be thought of as a model in which multiple independent variables are found on the right side of the model equation. This type of statistical model can be used to attempt to assess the relationship between a number of variables; one can assess independent relationships while adjusting for potential confounders.

Multivariate Modeling refers to the modeling of data that are often derived from longitudinal studies, wherein an outcome is measured for the same individual at multiple time points (repeated measures), or the modeling of nested/clustered data, wherein there are multiple individuals in eachΒ cluster

A multivariate linear regression model is a model where the relationships between multiple dependent variables (i.e., Ys)β€Šβ€”β€Šmeasures of multiple outcomesβ€Šβ€”β€Šand a single set of predictor variables (i.e., Xs) are assessed.

Multivariate analysis refers to a broad category of statistical methods used when more than one dependable variable at a time is analyzed for aΒ subject.

Although many physical and virtual systems studied in scientific and business research are multivariate most analyses are univariate in practice. What often happens is that these relationships are merged in a new variable (e.g., Feed Conversion Rate). Often, some dimension reduction is possible, enabling you to see patterns in complex data using graphical techniques.

In univariate statistics, performing separate analyses on each variable provides only a limited view of what is happening in the data as means and standard deviations are computed only one variable at aΒ time.

If a model has more than one dependent variable, analyzing each dependent variable separately also increases the probability of type-I error in the set of analyses (which is normally set at 5% orΒ 0.05).

And, if you did not realize it yet, longitudinal data can be analyzed both in an univariate and multivariate wayβ€Šβ€”β€Šit depends on where you want to place the variance-covariance matrix.

Examples of multivariate analysisΒ are:

  1. Factor Analysis can examine complex intercorrelations among many variables to identify a small number of latentΒ factors.
  2. A Discriminant Function Analysis maximizes the separation among groups on a set of correlated predictor variables, and classifies observations based on their similarity to the overall groupΒ means.
  3. Canonical Correlation Analysis examines associations among two sets of variables and maximizes the between-set correlation in a small number of canonical variables (to be discussed laterΒ on).

So, start thinking about unseen or latent variables.

Univariate & univariable
Univariate & multivariable
Multivariate & multivariable

Multivariate analyses are amongst the most dangerous analyses you can conduct as they are capable of dealing with situations that areΒ rare:

  1. Large column Nβ€Šβ€”β€Šdatasets that have >100Β columns
  2. N<Pβ€Šβ€”β€Šdatasets that have less rows (N) than columnsΒ (P)
  3. Multicollinearityβ€Šβ€”β€Šdatasets containing highly correlated data

However, before just applying multivariate analysis on any dataset you see, you must make sure that you know your data first. The model could not care less if it does not make sense biologically.

Just look at the datasets below and see if you can spot issues that will make analyzing the data difficult. I promise you there are definitely issues to beΒ found.

Selecting before analyzing, screening for abnormalities or looking for potential convergence errors is especially important when dealing with multivariate data. Once included, and the model runs, you are often not able to find back what you put in the model because of dimension reduction.

In SAS there are many tools you can use to explore the data, summarize it, tabulate it, and create associations.

Never forget that to conduct meaningful analysis, you need to spend a considerable amount of time to wrangle yourΒ data.
Bubble plots are the equivalents of scatterplots but then for 4 variables.
Heatmaps
Scatterplot and scatterplot matrix. The scatterplot matrix is a great way of looking at the relationship between a large variety of variables, and between groups. Each cell in the matrix shows the relationship between two variables. The upper and lower side of the diagonal are mirrored. The diagonal shows the variableΒ names.
Plot shows that outliers influence scatter matrices.
Correlation matrixβ€Šβ€”β€Ša combination of a heatmap and a scatterplot matrix.

A good first step is to create several correlational matrices to identify the largest correlations. Heat maps will help as well. Once you have identified places for zooming in, use scatter matrices and bubble plots to lookΒ closer.

So, yes, you will create a lot of graphs and so it is best to think about which graphs you would like to make before actually making them so you do not get lost! Since graphing your data is the first step to gaining insights, a multifold of variables means that you need to graph smart. Start by graphing biologically connected data. Then look at the more unknownΒ data.

Paradox: many of the analytical methods that will be introduced actually require interconnected data. So, if you find a lot of multicollinearities, do not worry. Actually, embrace it! This is where multivariate models are at theirΒ best.

So, let's start with correlations first, since they are the de facto measurement of association. Remember, we WANT to find the correlation. However, in this post, I will discuss more than just good old Pearson correlations. I will alsoΒ discuss:

  1. good old Pearson correlationsβ€Šβ€”β€ŠCan variable 1 predict variableΒ 2?
  2. canonical Correlationsβ€Šβ€”β€ŠCan set 1 predict setΒ 2?
  3. discriminant Analysisβ€Šβ€”β€ŠCan a combination of variables be used to predict group membership?
Does variable 1 have a connection to variableΒ 2?
Does a set of variables have a connection to another set of variables?
Can a combination of variables be used to predict group membership?
Correlations are the easiest way of looking for potential relationships between two variables. Look for absolute values >Β 0.7.
And the scatterplot matrix. You are looking for clouds, preferably very diagonal clouds. Anything that is not a diagonal cloud is not really worth your time. The direction matters, but first find a diagonalΒ one.

When dealing with correlations, you will also have to deal with outliers since outliers can make a correlation’s life very difficult. Outliers add noise to theΒ signal.

There are two possible ways to deal with outliers:

  1. winsorizingβ€” replacing a value with another value based on the range of values in the dataset. A value at the 4th percentile is replaced by a value on the 5th percentile; a value on the 97th percentile is replaced by a value on the 95th percentile.
  2. trimmingβ€Šβ€”β€Šdeleting the value outside of the boundary.

Both winsorizing & trimming are done by variable.

You can immediately see the difference, but is not as great as you might expect. The clouds to the right only have a little bit more patterns.
Raw data vs. winsorized Data. You must always be careful when transforming the data. Sometimes you introduce bias where you would like to free aΒ signal.

In summary, correlation analysis is probably not new to you and despite its drawbacks, it offers a good start to explore large quantities of variables. Especially to seeΒ if:

  1. relationships exist
  2. known relationships are confirmed
  3. clusters exist
  4. there are unknown relationships

Next-up is is the Canonical Correlation Analysis (CCA) which is used to identify and measure the associations among two sets of variables. These sets are defined by the analyst. No strings attached.

Canonical correlation is especially appropriate when there is a high level of multicollinearity. This is because CCA determines a set of canonical variates which are orthogonal linear combinations of the variables within each set that best explains the variabilityβ€Šβ€”β€Šboth within and betweenΒ sets.

In short Canonical Correlations allow youΒ to:

  1. interpret how the predictors are related to the responses.
  2. interpret how the responses are related to the predictors.
  3. examine how many dimensions the variable sets share inΒ common.
Canonical variates colored by a groupingΒ factor.
These plots show how each of the observations in the dataset load on the two sets of variables, and how these two sets areΒ related.
PROC CANCORR is the procedure to go towards forΒ CCA.
These statistics test the null hypothesis that all canonical correlations are zero.The small p-values for these tests (< 0.0001) are evidence for rejecting the null hypothesis that a CCA is not warranted. There is enough shared variance!

Coefficient interpretation can beΒ tricky:

  1. Standardized coefficients address the scaling issue, but they do not address the problem of dependencies among variables.
  2. Coefficients are useful for scoring but not for interpretationβ€Šβ€”β€Šthe analysis method is aimed at prediction!
These are correlational tables. Look for relationships that exceed the absolute value ofΒ 0.7

The CCA created 11 canonical dimensions because 11 variables are included. This is because canonical variates are similar to latent variables that are found in factor analysis, except that the canonical variates also maximize the correlation between the two sets of variables. They are linear functions of the variables included. And thus, automatically, an equal amount of canonicals are made as there are variables included

In this graph you can see if all 11 are worth theΒ effort.

However, a more useful way to interpret the canonical correlation in terms of the input variables is to look at simple correlation statistics. For each pair of variates, look at the canonical structure tables.

Below, is the correlation between each variable and its canonical variate.

Below, is the correlation between each variable and the canonical variate for the other set of variables.

We can even go further and apply the canonical redundancy statistics which indicate the amount of shared variance explained by each canonical variate. It provides youΒ with:

  1. the proportion of variance in each variable is explained by the variable’s own variates.
  2. the proportion of variance in each variable explained by the other variables’ variates.
  3. RΒ² for predicting each variable from the first M variates in the otherΒ set.
Each of the variables are better explained by their own canonicals instead of the others, but is not a land-slide.

The output for redundancy analysis enables you to investigate the variance in each variable explained by the canonical variates. In this way, you can determine not only whether highly correlated linear combinations of the variables exist, but whether those linear combinations are actually explaining a sizable portion of the variance in the original variables. This is not the caseΒ here!

You can also perform Canonical Regression Analysis by which one set regresses on a second set. Together with the Redundancy Statistics, Regression Analysis will provide you with more insight into the predictive ability of the sets specified.

The regression results (average RΒ²) do not hint at a strong relationship between the Ileum and the Jejenum. Do not look too much at p-values, rather look at how each variables is contributing to RΒ². You do not have to be a rocket scientist to figure out that the Jejenum will also not be very predictive for theΒ Ileum.

In summary, Canonical Correlation Analysis is a descriptive method trying to relate one set of variables to another set of variables byΒ using:

  1. correlation
  2. regression
  3. redundancy Analysis

As a first method, it gives you a good idea about the level of multicollinearity involved and how much the two specified sets relate to themselves and each other. Do not forgetβ€Šβ€”β€ŠCCA is mainly used for prediction, not interpretation.

And the last of the trio is the Discriminant Function Analysis (DFA) which is used to answer the question: Can a combination of variables be used to predict group membership? Because, if a set of variables predicts group membership, it is also connected to thatΒ group.

DFA is a dimension-reduction technique related to Principal Component Analysis (PCA) and Canonical Correlation AnalysisΒ (CCA).

DFA in actionβ€Šβ€”β€Šthe big rings are are the center points, predicted center points, for each of the groups we are trying to predict for based on the two canonical variates.

In SAS, there are three procedures for conducting Discriminant Function Analysis:

  1. PROC CANDISCβ€Šβ€”β€Šcanonical discriminant analysis.
  2. PROC DISCRIMβ€Šβ€”β€Šdevelops a discriminant criterion to classify each observation into one of theΒ groups.
  3. PROC STEPDISCβ€Šβ€”β€Šperforms a stepwise discriminant analysis to select a subset of the quantitative variables for use in discriminating among theΒ classes.

PROC STEPDISC and PROC DISCRIM can be used together to enable selection methods on top of Discriminant Analysis.

An example of PROC CANDISC. The class variable is important hereβ€Šβ€”β€Šyou are trying to predict for thatΒ group.
The results clearly show that the Canonical Variables created from the dataset will not provide a powerful set to predict group attribution.
Two ways to plot the same data. Not really impressive results as the ellipsesΒ overlap.
A one standard deviation increase on the IFA_Y variable will result in a -0.089 standard deviation decrease in the predicted values on discriminant function 1. As you can see, the relationships are not impressive, which was already clear from the previous graphs. This means that the Canonical Variates do not really represent a set of variables!
A much much better split due to canonical variates.
The way to get into PROCΒ DISCRIM.
The difference between the left and right plot is because of mathematical differences AND because of some preliminary steps I took in DISCRIM. Lets lookΒ closer!
PROC DISCRIM in combination with crossvalidation.
Easy to include cross-validation inΒ SAS.

The option β€˜Validation’ seems to be a small request, but it is not. It is actually an introduction to the topic of overfitting. Overfitting occurs when the model is β€˜overfitted’ on the data that it’s using to come to a solution. It mistakes noise for signal. An over-fitted model will predict nicely on the trained dataset but horribly on new test data. Hence, as a prediction model, it is limited. REMEMBER: DFA is a prediction method. Hence, safeguarding from overfitting makes a lot ofΒ sense!

And the inclusion of selection methods.

You can augment PROC DISCRIM by first using PROC STEPDISC which includes algorithms for variable selection. These are mostly traditional methods:

  1. Backward Elimination: Begins with the full model and at each step deletes the effect that shows the smallest contribution to theΒ model.
  2. Forward Selection: Begins with just the intercept and at each step adds the effect that shows the largest contribution to theΒ model.
  3. Stepwise Selection: Modification of the forward selection technique that differs in that effects already in the model do not necessarily stayΒ there.

I am asking SAS to include variables that meet a certain threshold for adding or retaining a variable.

Save your output forΒ graphs.
And the complete code to include selection methods and crossvalidation for a discriminant analysis.
Canonical 1 is defined by IL10_1. Canonical 2 is defined byΒ IFG_Y.
Classification matrices showing model performance. NOT THATΒ GOOD!
The big circles indicate the groups and their mean loadings on both canonical variables. The smaller circles show the individual animals and the group they were assigned to IN THE DATABASE (no prediction). If you see a plot like this, then you know that the canonical variates have no real discriminant ability.

Hence, what this graph shows is how much these canonical variables are able to predict group assignment based on the models included. Good predictive power would show that animals in a certain group would cluster at the group mean canonical loading. This is not theΒ case.

A better way to show the discriminant power of the model is to create a new dataset containing all the variables included and add ranges to them so you can do a grid search. You can then ask each DFA model, using different algorithms, to show you how theyΒ perform.

Discriminant analysis on testΒ data.
Different classification and discriminant functions can be combined leading to six different algorithms. Lets see if it also leads to six different models.
And the results. As you can see, the functions provide different models with different separation lines. What sticks out is that the model is not able to separate en that the separation line actually runs right through the clusters of dataΒ points.

As you can see above, there is no universal best method. It all depends on the data since the non-parametric methods estimate normality and the parametric methods assume it. The most important distinction is the use of a linear or quadratic discriminant function. This clearly changes the prediction model and thus the classification matrices. As in many things in life, try different flavors, but never forget to check your assumptions, the model, and its performance.

In summary, Discriminant function analysis is usually used to predict membership in naturally occurring groups. It answers the question: β€œCan a combination of variables be used to predict group membership?” In SAS, there are two procedures for conducting Discriminant Function Analysis:

  1. PROC STEPDISCβ€Šβ€”β€Šselect a subset of variables for discriminating among theΒ classes.
  2. PROC CANDISCβ€Šβ€”β€Šperform canonical discriminant analysis.
  3. PROC DISCRIMβ€Šβ€”β€Šdevelop a discriminant criterion to classify each observation into one of theΒ groups.

Lets venture further into the world of dimension reduction and ask ourselves the following questions:

  1. Can I reduce what I have seen to variables that are invisible?
  2. Can I establish an underlying taxonomy?
Examples of output that you can obtain from SAS when running dimension reduction techniques.

There are carious PROCS available in SAS to conduct dimension reduction. Of course, the examples shown before in this post are also examples of dimension reduction.

  1. PROC PRINCOMP performs principal component analysis on continuous data and outputs standardized or unstandardized principal component scores.
  2. PROC FACTOR performs principal component analysis and various forms of exploratory factor analyses with rotation and outputs estimates of common factor scores (or principal component scores).
  3. PROC PRINQUAL performs principal component analysis of qualitative data and multidimensional preference analysis. This procedure performs one of three transformation methods for a nominal, ordinal, interval, or ratio scale data. It can also be used for missing data estimation with and without constraints.
  4. PROC CORRESP performs simple and multiple correspondence analyses, using a contingency table, Burt table, binary table, or raw categorical data as input. Correspondence analysis is a weighted form of principal component analysis that is appropriate for frequency data.
  5. PROC PLS fits models using any one of a number of linear predictive methods, including partial least squares(PLS). Although it is used for a much broader variety of analyses, PLS can perform principal components regression analysis, although the regression output is intended for prediction and does not include inferential hypothesis testing information.

Probably the most widely known clustering technique is Principal Components Analysis (PCA) and as you saw before, there is a heavy relationship with Canonical Correlation Analysis. A PCA tries to answer a practical question: β€œHow can I reduce a set of many correlated variables to a more manageable number of uncorrelated variables?”

PCA is a dimension reduction technique that creates new variables that are weighted linear combinations of a set of correlated variables β†’ principal components. It does not assume an underlying latent factor structure.

PCA displayed.

PCA’s work with components which are orthogonal regression lines, created to minimize theΒ errors.

The third component is constructed in the same manner, and each subsequent component accounts for less and less of the total variability in the data. Typically, a relatively small number of created variables, or components, can account for most of the total variability in theΒ data.

PCA creates as many components as there are input variables by performing an eigenvalue decomposition of a correlation or covariance matrix. It creates components that consolidate more of the explained variance into the first few PCs than in any variable in the original data. They are mutually orthogonal and therefore mutually independent. They are generated so that the first component accounts for the most variation in the variables, followed by the second component, and soΒ on.

As with many multivariate techniques, PCA is typically a preliminary step in a larger data analytics plan. For example, PCA could be usedΒ to:

  1. explore data and detect patterns among observations.
  2. find multivariate outliers.
  3. determine the overall extent of collinearity in a dataΒ set.

Partial Least Squares also uses PCA as an underlying engine.

SAS Studio code to run PROCΒ PRINCOMP
The first plot you are going to look atβ€Šβ€”β€Šhow many components will give you a good split between variance explained and remaining to be explainable.

The Scree plot shows how many principal components you need to reach a decent level of the variance explained. The trick is to look at the Scree plot and see when the drop levels off. Here, this is after eight components. Let's plot those eight components.

This Component Pattern Profiles plot shows how each component is loading on the variables included. Hence, it shows what each component represents. What you can immediately see is that it is quite a mess. Many variables load on many components to some degree. So you must, for interpretation’s sake, limit the number of components toΒ use.

Three components. You can clearly see that component 1 and 2 are represented by several distinct variables.

Hence, observations that load highly on those components are probably also distinctly different on those variables. Component 3 looks like a bit of a garbage component.

These plots show how the variables load on each component. Some clusters stick out, but it is also clear that a lot of variables do not really load on a component. This is reflected on the low percentage of variance explained.
The difference between PCA and DFA analysis. As you can see, the DFA does a much better job, but was already made to separate the data based on theΒ groups.
This plot shows how observations load on all three components using both the x and y axis, and color. As you can see, component 1 (mostly blue) and 2 (everything around 0) are quite informative. Component 3 adds some dimension, but not as clear as 1 andΒ 2.

In summary, PCA is a dimension reduction technique that creates new variables that are weighted linear combinations of a set of correlated variables Γ  principal components. PCA tries to answer a practical question: β€œHow can I reduce a set of many correlated variables to a more manageable number of uncorrelated variables?” PCA is typically a preliminary step in a larger data analytics plan, and a part of many regression techniques to ease analysis.

From PCA, it is quite straightforward to move further towards Principal Factor Analysis (PFA). The difference is that, in PCA, the unique factors are uncorrelated with each other. They are linear combinations of the data. In PFA, the unique factors are uncorrelated with the common (latentβ€Šβ€”β€ŠYx) factors. They are estimates of latent variables that are partially measured by theΒ data.

Component analysis is actually restructuring the same data. Exploratory factor analysis is modeling.
PROC FACTOR is the de factor procedure for factor analysis.

Factor Analysis is used when you suspect that the variables that you observe (manifest variables) are functions of variables that you cannot observe directly (latent variables). Hence, factor analysis is usedΒ to:

  1. Identify the latent variables to learn something interesting about the behavior of your population.
  2. Identify relationships between different latent variables.
  3. Show that a small number of latent variables underlies the process or behavior that you have measured to simplify yourΒ theory.
  4. Explain inter-correlations among observed variables.
The difference between Factor Analysis and Principal Component Analysis.
Look for absolute factor loadings >0.5. PBMC clearly loads very high on Factor 1. CO and Y seem to load high on FactorΒ 2.

The initial factor loadings are just the first step, and rotation methods need to be used to interpret the results ad they will help you a lot in understanding the results coming from factor analysis.

There are two general classifications of rotationΒ methods

  1. Assume orthogonal factors.
  2. Relax orthogonality assumption.
As you can see there are a lot of options available to do rotation of factorΒ models.

Orthogonal rotation maintains mutually uncorrelated factors that fall on perpendicular axes. For this, the Varimax-Orthogonal method is often used which maximizes the variance of columns of the factor pattern matrix. The axes are rotated, but the distance between the axes remains orthogonal.

Then we also have Oblique rotation which allows factors to be correlated with each other. Because factors might be theoretically correlated, using an oblique rotation method can make it much easier to interpret the factors. For this, the Promax-Oblique is often used, which performs:

  1. varimax rotation
  2. relaxes orthogonality constraints and rotatesΒ further
  3. rotate axes that can converge/diverge

But as you can see below, there are a lot of combinations possible. The major part is in relaxing or not relaxing the orthogonal assumption, meaning factors can have a covariance matrix orΒ not.

Original Results EFA. Rotation of the axes. Rotated resultsΒ EFA.
The difference between the initial and rotated factor pattern is clear to see. There are now three clusters instead of two. The variance loadings on the factors changed a bit, because the distance between the grid and the observations also differ due toΒ rotation

Having 18 factors will make this PFA quite the challenge. It also tells you that these variables will not be so easy to load on an underlying latent variable. To counteract this a bit, we could also limit the number ofΒ factors.

The results will be the same, I just cut it off atΒ 4.

To check for Orthogonality, the factors should not correlate highly with the others. The loading table to the right clearly shows what the factors stand for, eat least factor 1 (PBMC) and factor 2 (CO). Then, it becomesΒ blurry.

You would like to see a high number on the diagonal and not so much in the otherΒ cells.
Rotations has led to three nice clusters.

In the previous example, I just decided out of the blue to downsize the number of factors from 18 to 4. Selecting the numbers of factors can be done more elegantly using parallel analysis which is a form of simulation.

Parallel analysis requesting 10000 simulations to see how many factors I need to establish a decent factor analysis.
Parallel analysis shows that 6 factors should be retained.
Exploratory Factor Analysis can also provide you with path diagrams which are a visual representation of the model. If you find yourself getting a model like the graph on the left, something is wrong with the model being able to really identify the latent variables.

In summary, exploratory factor analysis (EFA) is a variable identification technique. Factor analytic methods are used when an underlying factor structure is presumed to exist but cannot be represented easily with a single (observed) variable. In EFA, the unique factors are uncorrelated with the latentβ€Šβ€”β€Šfactors. They are estimates of latent variables that are partially measured by the data. Hence, not 100% of the variance is explained inΒ EFA.

EFA is an exploratory step towards full-fledged causal modeling.

Clustering is all about measuring the distance between variables and observations, between themselves, and the clusters that are made. A lot of methods for clustering data are available in SAS. The various clustering methods differ in how the distance between two clusters is computed.

In general, clustering works likeΒ this:

  1. Each observation begins in a cluster byΒ itself.
  2. The two closest clusters are merged to form a new cluster that replaces the two old clusters.
  3. The merging of the two closest clusters is repeated until only one cluster isΒ left.
Examples of using clustering to find patterns in theΒ data.

SAS offers a variety of procedures to help you clusterΒ data:

  1. PROC CLUSTER performs hierarchical clustering of observations.
  2. PROC VARCLUS performs clustering of variables and divides a set of variables by hierarchical clustering.
  3. PROC TREE draws tree diagrams using output from the CLUSTER or VARCLUS procedures.
  4. PROC FASTCLUS performs k-means clustering on the basis of distances computed from one or more variables.
  5. PROC DISTANCE computes various measures of distance, dissimilarity, or similarity between the rows (observations).
  6. PROC ACECLUS is useful for processing data prior to the actual cluster analysis by estimating the pooled within-cluster covariance matrix.
  7. PROC MODECLUS performs clustering by implementing several clustering methods instead ofΒ one.

Although I have labeled a lot of possibilities for clustering methods, the most powerful procedure is by far the Partial Least Squares (PLS) methodβ€Šβ€”β€Ša multivariate multivariable algorithm. The PLS balances between principal components regression (explain total variance) and canonical correlation analysis (explain shared variance). It extracts components from both the dependent and independent variables, and searches for explained variance within sets, and shared variance betweenΒ sets.

PLS is a great regression technique when N < P as it extracts factors / components / latent vectorsΒ to:

  1. explain response variation.
  2. explain predictor variation.

Hence, partial least squares balance two objectives:

  1. seeking factors that explain response variation.
  2. seeking factors that explain predictor variation.

The PLS procedure is used to fit models and to account for any variation in the dependent variables. The techniques used by the Partial Least Squares Procedure are:

  1. Principal component regression (PCR) technique, in which factors are extracted to explain the variation of predictor sample.
  2. Reduced rank regression (RRR) technique, in which factors are extracted to explain response variation.
  3. Partial least squares (PLS) regression technique, where both response variation and predictor variation are accounted for.

PCR, RRR, and PLS regression are all examples of biased regression techniques. This means using information from k variables but reducing them to <k dimensions in the regression model, making the error DF larger than would be the case if Ordinary least Squares (OLS) regression were used on all the variables. PLS is commonly confused with PCR and RRR, although there are the following key differences:

  1. PCR and RRR only consider the proportion of variance within a set explained by PCs. The linear combinations are formed without regard to association among the predictors and responses.
  2. PLS seeks to maximize association between the sets while considering the explained variance in each set of variables.

The PLS procedure is used to fit models and to account for any variation in the dependent variables.

However, PLS does not fit the sample data better than OLSβ€” it can only fit worse or as well. As the number of extracted factors increases, PLS approaches OLS. However, OLS over fits sample data, where PLS with fewer factors often performs better than OLS in predicting future data. PLS uses cross-validation as a technique for determining how many factors should be retained to prevent overfitting.

SAS Studio task to begin using the PLS regression.
You are able to create some wildΒ models.
Many of the options I leave as default to see if a model can be run. The PLS can become very complex very fast and to safeguard from pouring variables in them without consideration it is good that you first understand the defaultΒ results.
And the code, straightforward. I have many outcomes and many predictors. The hallmark of a multvariate multivariable model.
This model went absolutely nowhere!

So let's start a bitΒ simpler:

  1. One dependent variable
  2. More independent variables
  3. No cross-validation to ensure I have all the data to train aΒ model.
Univariate multivariable model.
You can see the number of extract factors and how RΒ² behaves. Note that two RΒ² β€˜s are provided. This is because the PLS is building two models at the same time. One that can explain the predictors and one that can explain the outcome(s) included.

At least we have a result. However, it is not a result you would like to haveβ€Šβ€”β€Š15 factors explain 87% of the variance of the independent variables and 16% of the dependent variables. Preferably, you would like a small number of extracted factors to be able to predict at a solid level. RΒ² is not the best method for assessing thisΒ though.

Below are some slides on how to interpret the most interesting plot ever providedβ€Šβ€”β€Šthe correlation loading plot. It is quite a handful to look at but wonderfully simple once you getΒ it.

Through cross-validation, the model selects 2 factors. Always check if the model makes biological sense. Remember, statistical models could not care less what they include. They don’tΒ know.

PLS provides you with a long list of results and plots if you want them. Some of the most important ones you will find here. First and foremost, look at the fit diagnostics of the model. Then look at the Variable importance plots. Do they make biological sense? Additionally, you can use the results to create diagnostic plots of yourΒ own.

Here, we have another correlation loading plot. They are quite heavy to digest First, look at the factors and the percentage variance explained in the model (X RΒ²) and the prediction (Y RΒ²). The outcome variable variance is 75% explained by the two factors. As you can see, many variables load very highly (range of 75%β€Šβ€”β€Š100%). All in all, the observations show 3 clusters which indicate that a classification variable might help explain even more variance.

This plot shows the ability of each factor to distinguish the granularity of the response variable. If you look at Factor 1 you see that the complete range of the response variables is included. The more straight the diagonal line is, from left bottom to upper right, the better. This is clearly show in Factor 2β€Šβ€”β€Šmuch more diverging. Numbers indicate observations.

In summary, the PLS procedure is a statistically advanced, output-heavy procedure. Know what you are doing before you start! PLS strikes a balance between principal components regression and canonical correlation analysis. It extracts components from the predictors/responses that account for explained variance within the sets and shared variation between sets. PLS is a great regression technique when N < P, unlike multivariate multiple regression or predictive regression methods.

I hope you enjoyed this post. Let me know if something isΒ amiss!


Multivariate Analysis using SAS was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓