RootProf: Qualitative analysis

QUALITATIVE ANALYSIS

This page explains the commands for qualitative analysis. It is performed by using the Principal Component Analysis (PCA) technique, followed by a hierarchical clustering analysis. The PCA is based on the TPrincipal class of Root. The cluster analysis can be performed by different methods. The commands are:

Commands for PCA:

Commands for clustering analysis

Defines the criterion for choosing the number of principal components to be used in the analysis.

[0,1[ threshold value for the sum of the normalized eigenvalues (default value 0.9)
2 implements the Guttman-Kaiser criterion
<0 set the number of principal components to be used

equalpca

Switch for equalization of PCA analysis. Equalization means that input data are divided by the standard deviation, calculated column-wise, i.e. for the different samples. As a consequence, the covariance matrix is normalized. equalpca is automatically set to 0 when background subtraction is made.

0 equalization is not done (default)
1 equalization is done

biplot

Option to combine Scores and Loadings plot in a unique graph, called biplot. A biplot is generated for each couple of principal components, considering each couple up to four components.

0 Scores and Loadings plots are done separately (default)
1 biplots are generated

writeloadings

Write data files containing loading values for each selected principal component. The files are named "LoadingsPC#", where # is the number of principal component considered. They contain two columns, for the variable and the loading values. In case of single crystal data (datatype 3) an additional file is created, with extension .hkl, containing the loadings values as a function of the Miller indices (h,k,l).

0 loadings are not written (default)
1 loadings are written on output files
2 loadings are written on output files, with sign changed
3 loadings are written on output files, flipping negative values to positive ones

writescores

Write data files containing scores values for each selected principal component. The files are named "ScoresPC#", where # is the number of principal component considered. They contain two columns, for the data number and the score values.

0 scores are not written (default)
1 scores are written on output files
2 scores are written on output files, with sign changed

clusmethod

Defines the method used for hierarchical cluster analysis.

1 single link
2 complete link
3 group average (default)
4 centroid
5 median
6 minimum variance

clusterswitch

Master switch for cluster analysis on representative points in PCA space.

0 no clustering
1 clustering is performed (default)
2 Points are gathered by using user-defined groups

myclust

Defines the data groups when clusterswitch is set to 2. It specifies the group number for each

input profile. In the command file it is placed after the command file for the corresponding profile.

Example:

clusterswitch 2

file /home/rocco/prot/compmod/alpha/paper/PADv3/0.dat

myclust 0

file /home/rocco/prot/compmod/alpha/paper/PADv3/1.dat

myclust 1

file /home/rocco/prot/compmod/alpha/paper/PADv3/2.dat

myclust 0

file /home/rocco/prot/compmod/alpha/paper/PADv3/3.dat

myclust 1

file /home/rocco/prot/compmod/alpha/paper/PADv3/4.dat

myclust 0

file /home/rocco/prot/compmod/alpha/paper/PADv3/5.dat

myclust 0

file /home/rocco/prot/compmod/alpha/paper/PADv3/6.dat

myclust 0

sogdiff

Defines the threshold for hierarchic clustering.

=0 The threshold is automatically determined (default)

<0 Threshold value, which can be set by looking at the histogram of the cluster size distribution or at the dendrogram.

>0 Normalized threshold value, which can be set by looking at the histogram of the normalized cluster size distribution.