Inductive Solutions, Inc. - RUNPCA Principal Component Analysis

Inductive Solutions, Inc.

380 Rector Place, Suite 4A, New York, New York 10280

Email Telephone: +1 (212)945.0630

Products and Services
Software Products
Recommended Books

Bibliography and White Papers
Free Downloads

RunPCA

RunPCA is an information discovery ("datamining") tool based on Principal Component Analysis, a statistical method that transforms a set of data inputs into a new smaller set of uncorrelated inputs ordered by information content. RunPCA requires 64-bit implentations of Windows. It is based on a very fast C/C++ code and is limited by dynamic memory.

RunPCA Features

Computes means, variances, covariances, and correlations of large data sets

Computes and ranks principal components and their variances

Automatically transforms data sets

Benefits

Easy-to-Learn and Easy-to-Use Excel Spreadsheet User Interface

Computation is very fast

The RunPCA C/C++ Library is available for further customization

License

The standard single user license is for Microsoft Windows. Other licensing plans for other platforms are also available. Contact us about versions for other operating systems (such as Linux or Solaris), about site licenses, or about academic discounts.

For example, suppose we have a table of 1000 rows and 3 columns (or "factors") and we want to discover some sort of relationship between the columns. The following table shows how the variance of the data of each column is distributed:

Variance Fraction Accumulated

0.381745 66.57 66.57

0.095436 16.64 18.32

0.096271 16.79 100

The most information (highest variance) is contained in the first column (almost two-thirds of the information as indicated in the first row of the table). The remaining information is split almost evenly into the other two factors (as indicated in the next two rows).

After processing by RunPCA, the three original columns are transformed to "principal factors." Now the variance of the transformed data (consisting of the three principal factors) is distributed as follows:

Variance	Fraction	Accumulated
0.494189	86.18	86.18
0.079263	13.82	100
0	0	100

Now most of the information (highest variance) is contained in the first principal factor (86% of the information). The remaining information is contained entirely in the second principal factor. This effectively reduces the dimension of the data by 33%. This means that if we have additional data of observed responses (or target outputs), then we can perform regression (or train a neural network) using only two columns of the transformed data, rather than the three columns of the original data. This can improve the speed and accuracy of the training or regression.