This can be used as a basis for feature selection as mentioned in Applied Predictive Analytics - We can retain the feature which has the highest loading in each Principal Component. The first coordinate corresponds to the first Principal Component. 5304 in calculation of the first coordinate of the transformed data. 6564, RM has a weightage of -.5365 and Age has a weightage of. This means that LSTAT has a weightage of. The above table shows us the importance/contribution of each feature in forming each coordinate of the transformed data. Since the original data has 3 columns(or coordinates), our transformed data(in the Principal component space) will also have 3 columns(or coordinates). We fit PCA to 3 features chosen from the Boston housing dataset: LSTAT, RM, AGE
PCA METHOD FOR HYPERIMAGE HOW TO
Let us briefly see how to interpret the components produced by PCA. For example, a change of 100 units causes only 1%(1% of range of Feature A) change in feature A while it causes 100%(100% of the range)change in feature B. If a feature A assumes values in the range of 0–10000 with a Standard Deviation of say 200, and another feature B assumes values in the range of 0–100 with a Standard Deviation of say 20, naturally feature A would contribute more in deciding the direction of maximum variance - simply due to its large Variance. PCA measures variance of data along orthogonal directions. It is important to standardize data before using PCA.
PCA METHOD FOR HYPERIMAGE PC
In Case A, the first PC would lie along the diagonal and the second PC would lie perpendicular to the diagonal(as mentioned in Point 1)ĭata Preparation and interpretation of PCA: The first PC(Principal Component) explains the most variance. The new features are sorted in order of decreasing variance.These new features generated by PCA are orthogonal(at right angles) to each other.PCA takes a matrix of samples and features as input and returns a new matrix whose features are a linear combination of the features in the original matrix. For PCA, we will only be interested in linear manifolds. We say dependent variables exist primarily along lower dimensional manifolds. Variable dependence restricts the regions where data can lie. Dependent variables(Including when the relation between variables is non linear)in general, cannot occupy all the space available to them in a bounding box because they vary together along certain directions(hence the dependence).In Fact, the stronger the linear relation, the large will be the unoccupied space. Linear dependence among variables causes data to lie along lower dimensional subspaces - or hyperplanes as in case A.Most of the area in the box contains data (unlike Case A where we had large empty regions with no data).Hence, it shows no promise of dimensionality reduction. Data DOES NOT lie along a line - unlike case A where the data stayed close to a line- Hence, there is no special direction along which data varies “more” - all directions are likely to contain data.X1 and x2 have no linear dependence - in this case, it means that they have no correlation. The remaining part of the box contains no data(in yellow in the below figure).Ĭase B: No Linear Dependence between x1 and x2 The Majority of the bounding box is empty - The data occupies a very small portion(close to the diagonal) of the box.Actually, this is what happens when we perform dimensionality reduction using PCA - we represent data using directions along which it varies most. Thus, by remembering the spread of the data along the diagonal, we retain most of the information. We see that the data is distributed very close to a straight line(in red) - such that the spread of the data along the line is maximum and the spread of data perpendicular to the line is minimum.When 3 or more variables have high linear dependence, correlation is not always a reliable measure for measuring the linear dependence - because correlation only calculates the linear dependence between 2 variables at a time). In this case, it means they have large correlation(2 variables have high linear dependence when they have high correlation.
Here, x1 and x2 have high linear dependence. The vectors shown are the eigenvectors of the covariance matrix scaled by the square root of the corresponding eigenvalue, and shifted so their tails are at the mean.I have drawn a box at the boundary of the plots - to indicate a bounding box within which the data exists. PCA of a multivariate Gaussian distribution centered at (1,3) with a standard deviation of 3 in roughly the (0.866, 0.5) direction and of 1 in the orthogonal direction.