Harshman, R. A., Hong, S., & Lundy, M. E. (2003). Shifted factor analysis--Part I: Models and properties. J. Chemometrics , 17, 363-378.
View .pdf file (235 KB, 16 pp)
Back to home page
Summary
The factor model is modified to deal with the problem of factor shifts.
This problem arises with sequential data (e.g., time series, spectra,
digitized images) if the profiles of the latent factors shift position
up or down the sequence of measurements: such shifts disturb
multilinearity and so standard factor/component models no longer apply.
To deal with this, we modify the model(s) to include explicit
mathematical representation of any factor shifts present in a dataset;
in this way, the model can both adjust for the shifts and
describe/recover their patterns. Shifted-factor versions of both two-
and three (or higher)-way factor models are developed. The results of
applying them to synthetic data support the theoretical argument that
these models have stronger uniqueness properties; they can provide
unique solutions in both two-way and three-way cases where equivalent
non-shifted versions are under-identified. For uniqueness to hold,
however, the factors must shift independently; two or more factors that
show the same pattern of shifts will not be uniquely resolved if not
already uniquely determined. Another important restriction is that the
models, in their current form, do not work well when the shifts are
accompanied by substantial changes in factor profile shape. Three-way
factor models such as Parafac, and shifted-factor models such as
described here, may be just two of many ways that factor analysis can
incorporate additional information to make the parameters identifiable.
Keywords: Factor shifts; sequential data; unique solutions;
time series; spectral shifts; three-way or three-mode analysis;
Parafac; Parafac2; Tucker T3 and T2
Back to publications
Back to home page
Hong, S., & Harshman, R. A. (2003). Shifted factor analysis--Part II: Algorithms. J. Chemometrics, 17, 379-388.
View .pdf file (152 KB, 10 pp)
Back to home page
Summary
We previously proposed a family of models that deal with the problem of
factor position shift in sequential data. We conjectured that the added
information provided by fitting the shifts would make the model
parameters identifiable, even for two-way data. We now derive methods
of parameter estimation and give the results of experiments with
synthetic data. The Alternating Least Squares (ALS) approach is not
fully suitable for estimation because factor position shifts destroy
multilinearity of the latent structure. Therefore, an alternative
"quasi-ALS" approach is developed, some of its practical and
theoretical properties are dealt with and several versions of the
Quasi-ALS algorithm are described in detail. These procedures are quite
computation-intensive, but analysis of synthetic data demonstrates that
the algorithms can recover shifting latent factor structure and, in the
situations tested, are robust against high error levels. The results of
these experiments also provide strong empirical support for our
conjecture that the two-way shifted factor model has unique solutions
in at least some circumstances.
Keywords: Shifted factor analysis; latent position shift;
lag; multilinearity; principal component analysis; bilinear model;
quasi-ALS; uniqueness; identifiability
Back to publications
Back to home page
Hong, S., & Harshman, R. A. (2003). Shifted factor analysis--Part III: N-way generalization and application. J. Chemometrics , 17, 389-399.
View .pdf file (190 KB, 11 pp)
Back to home page
Summary
The "Quasi-ALS" algorithm for Shifted Factor estimation is generalized
to three-way and n-way models. We consider the case in which Mode A is
the only shifted sequential mode, Mode B determines shifts, and modes
above B simply reweight the factors. The algorithm is studied using
error-free and fallible synthetic data. In addition, a four-way
chromatographic dataset previously analyzed by Bro, Anderson and Kiers
[1] is reanalyzed, and (two or) three out of four factors are
recovered. The reason for the incomplete success may be factor shape
changes combined with the lack of distinct shift patterns for two of
the factors. The shifted factor model is compared with Parafac2 from
both theoretical and practical points of view.
Keywords: Shifted factor analysis; latent position shift;
lag; multilinearity; Parafac1; Parafac2; multiway analysis; quasi-ALS;
chromatography
Back to publications
Back to home page
Swartzman, L. C.,
Harshman, R. A., Burkell, J., & Lundy, M. E. (2002). What accounts
for the appeal of complementary/alternative medicine, and what makes
complementary/alternative medicine "alternative"?. Medical Decision Making, 22, 431-450.
View .pdf file (205 KB, 20 pp)
Summary
The goal of this study was to elucidate the basis for the appeal of
complementary/alternative medicine (CAM) and the basis upon which
people distinguish between CAM and conventional medicine.
Undergraduates (N=173) rated 19 approaches to the treatment of chronic
back pain on 16 rating scales. Data were analyzed via 3-mode factor
analysis, which extracted conceptual dimensions common to both the
scales and the treatments. A 5-factor solution was judged to give the
best description of the raters' perceptions. One of these 5 factors
clearly reflected the distinction between conventional versus CAM
approaches, and a 2nd factor clearly referred to treatment appeal. The
other 3 factors were invasiveness, health care professional versus
patient effort, and "druglikeness". To the extent that treatment was
seen as a CAM treatment (as opposed to a conventional treatment), it
was seen to be more appealing, less invasive, and less druglike. Simple
and partial correlations of the dimension weights indicated that both
the appeal of CAM and the distinction between CAM and conventional
medicine were largely driven by the view that CAM is less invasive than
conventional medicine.
Keywords:
alternative medicine; complementary (natural) versus conventional (or
traditional, biomedical) treatment; intractible pain; factor analysis;
attitude toward health; psychological models; lay beliefs; PARAFAC;
3-way; multimode
Back to publications
Thomas, C. G., Harshman, R. A., & Menon, R. S. (2002). Noise reduction in BOLD-based fMRI using component analysis. NeuroImage, 17, 1521-1537.
Summary
Principle Component Analysis (PCA) and Independent Component Analysis
(ICA) were used to decompose the fMRI time series signal and separate
the BOLD signal change from the structured and random noise. Rather
than using component analysis to identify spatial patterns of
activation and noise, the approach we took was to identify PCA or ICA
components contributing primarily to the noise. These noise components
were identified using an unsupervised algorithm that examines the
Fourier decomposition of each component time series. Noise components
were then removed before subsequent reconstruction of the time series
data. The BOLD contrast sensitivity (CSBOLD),
defined as the ability to detect a BOLD signal change in the presence
of physiological and scanner noise, was then calculated for all voxels.
There was an increase in CSBOLD values of activated voxels
after noise reduction as a result of decreased image-to-image
variability in the time series of each voxel. A comparison of PCA and
ICA revealed significant differences in their treatment of both
structured and random noise. ICA proved better for isolation and
removal of structured noise, while PCA was superior for isolation and
removal of random noise. This provides a framework for using and
evaluating component analysis techniques for noise reduction in fMRI.
Back to publications
Harshman, R. A., &
Hong, S. (2002). 'Stretch' vs. 'slice' methods for representing
three-way structure via matrix notation. J. Chemometrics, 16, 198-205.
View .pdf file (150 KB, 8 pp)
Back to home page
Summary
A three-way array must be represented in two-way form if its structure
is to be described and manipulated by means of matrix notation.
Historically, two methods here called 'array stretching' and 'array
slicing' have been used. More recently, however, array slicing has
often been overlooked, resulting in loss of mathematical flexibility.
Stretching involves unfolding (matricizing) the three-way array and
applying one's mathematical operations to the resulting two-way matrix;
this results in expressions that are often quite useful for parameter
estimation but which are relatively long and require practice to
interpret properly. 'Slicing' involves taking a representative two-way
subarray and applying operations to it; this often gives compact and
easily understood expressions, but requires the introduction of extra
matrix names and becomes awkward if the array is not
'slicewise-regular'. The advantages of each approach are demonstrated
and compared by applying them to a set of models from the Tucker and
Parafac families. In addition, we show how slice-wise representation
can be improved by using: (i) angle brackets to eliminate the need for
extra diagonal matrices, and (ii) 'encapsulated summation' notation to
allow representation of array structure that is orderly but not
slicewise regular.
Keywords: diagonalization, sliced-array, stretched-array,
matricized, unfolding, multilinear and quasi-multilinear models,
Tucker, T1, T2, T3, Parafac/Candecomp, Parafac1, Parafac2, Paratuck2.
Back to publications
Back to home page
Harshman, R. A. (2001). An index formalism that generalizes the capabilities of matrix notation and algebra to n-way arrays. J. Chemometrics, 15, 689-714.
View published .pdf file (2657 KB, 26 pp)
View .pdf file (269 KB, 42 pp, revised submitted ms)
Back to home page
SummaryThe capabilities of matrix notation
and the rules of matrix algebra are generalized to n-way arrays. The
resulting language seems easy to use; all the capabilities of matrix
notation are retained and most carry over naturally to the n-way
context. For example, one can multiply a three-way array times a
four-way array to obtain a three-way product. Many of the language's
key characteristics are based on the rules of tensor notation and
algebra. The most important example of this is probably the
incorporation of subscript/index related information into both the
names of array objects and the rules used to operate on them. Some
topics that emerge are relatively unexplored, such as inverses of n-way
arrays; these might prove interesting for future theoretical study.
Keywords: Linear and multilinear algebra; tensors; array notation; three-way models; n-way arrays; Tucker; T2; T3; Parafac / Candecomp
Back to publications
Back to home page
Do, T., McIntyre, N. S.,
Harshman, R. A., Lundy, M. E., & Splinter, S. J. (1999).
Application of Parallel Factor Analysis and X-ray photoelectron
spectroscopy to the initial stages in oxidation of aluminium. Surface and Interface Analysis, 27, 618-628.
Summary
Three-way parallel factor analysis (PARAFAC) was used to decompose a
set of Al 2p X-ray photoelectron spectroscopy (XPS) spectra that
resulted during the oxidation of clean aluminium surfaces, measured as
a function of exposure time and water vapour pressure. In addition to
the expected fine peak structure of the XPS Al 2p spectrum, the PARAFAC
solution provided new information on elemental processes that occur
during the initial stages of oxidation. As expected, the water
vapour-aluminium reaction (a) attenuated the metallic peak at binding
energy (BE) 72.87 ± 0.05 eV, and (b) increased the oxidic peak at BE
75.80 ± 0.05 eV. The new information came from another factor that
suggested the formation of an interfacial metal hydride at BE 72.4(4)
eV, as well as a concomitant oxide peak at 75.4(3) eV. Both are
hypothesized to be products of the hydrolysis of adsorbed water
molecules at the aluminium interface. At pressures above and below 1.3
x 10-5
Pa this factor was diminished; in the case of higher pressure, this is
explained as an increase in the recombination of atomic hydrogen.
Keywords: aluminium oxide; aluminium hydride; oxidation; parallel factor analysis; XPS
Back to publications
Hopke, P. K., Paatero, P.,
Jia, H., Ross, R. T., & Harshman, R. A. (1998). Three-way (PARAFAC)
factor analysis: Examination and comparison of alternative
computational methods as applied to ill-conditioned data. Chemometrics and Intelligent Laboratory Systems, 43, 25-42.
Summary
Four different approaches to three-way factor analysis are applied to
three sets of ill-conditioned data, and the results are compared. Also,
a numerical index is introduced to characterize the ill-conditioning of
n-way arrays (n>2).
The four approaches (computer programs) are HL-PARAFAC, a simple
alternating least squares (ALS) algorithm with minimal extrapolation;
TPALS, an ALS algorithm with sophisticated extrapolation; PMF3, a
non-linear curve fitting procedure; and DTDMR, a non-iterative
closed-form approximation method. They were applied to two 'difficult'
synthetic data sets and to one set of ill-conditioned real data, a set
of fluorescence spectroscopy measurements taken from an amino acid in
aqueous solution. Upon convergence, all results except those from DTDMR
agree, but there are large differences in speed of convergence. DTDMR
is the fastest, but solves only half the problems. Of the others, PMF3
is faster than TPALS by a factor of ten, and TPALS is faster than
PARAFAC, again by a factor of ten.
Keywords: Factor analysis; Trilinear; PARAFAC; Fluorescence spectroscopy; PMF3; TPALS; DTDMR
Back to publications
Mulaik, S. A., Raju, N.
S., & Harshman, R. A. (1997). There is a time and place for
significance testing. In L. L. Harlow, S. A. Mulaik & J. J. Steiger
(Eds.), What if there were no significance tests? Multivariate applications: Vol.1 (pp. 65-115). Mahwah, N.J.: Lawrence Erlbaum Associates.
Summary
While null hypothesis significance testing is all too frequently
misused and important complementary information such as confidence
intervals omitted, this calls for better education of researchers (and
of some who teach research methods) rather than abandonment of this
essential method. We argue that criticisms recently leveled against the
basic logic of null hypothesis significance testing are misguided and confused.
Issues addressed include the "corrupt
scientific method", the "nil" hypothesis, when it seems most
appropriate to test the null hypothesis of no effect, the proper
interpretation of a confidence interval, the purpose of a significance
test, whether meta-analysis should replace significance testing, and
eight misinterpretations of significance testing (e.g., "the
probability of rejecting H0 is α"; "a statistically
significant result is a scientifically significant result", etc.).
Significance tests must be used in the proper circumstances, with a
correct understanding of what conclusions may be drawn from them.
Power of a significance test is defined,
and issues related to sample size are discussed. Rather than avoiding
significance testing altogether, one must consider its power to detect
a given size effect with the sample at hand (i.e., significance testing
with small samples is meaningful if one is looking for large effects).
Contrary to those who believe that power is relevant only in the
context of significance testing, the authors take the position that
power remains an issue in meta-analysis as well, and discuss this in
some detail.
The notion that physicists do not perform
formal significance tests is also addressed. [Not mentioned, however,
is the fact that in some areas, such as particle physics, significance
testing plays an important role, though the terminology used is
somewhat different. In the chapter, attention is focused on those areas
of physics where significance testing is less common.] One reason is
that physicists are often trying to improve their estimates of physical
constants, rather than testing hypotheses. The argument is also made
that many of their studies are equivalent to meta-analyses in the
social sciences and their statistics like those used in meta-analysis.
The meaning of objectivity and its
relevance to hypothesis testing is also covered. A hypothesis must be
formed independently of the data used to evaluate it, so that objective
judgments about the world can be made as a result of the hypothesis
test. It is suggested that nonzero null hypotheses should be used, and
the null values modified to reflect new knowledge; this avoids the
nil-hypothesis criticism. Finally, the authors argue that hypothesis
testing should not be viewed as a true-false decision but rather as
something that affects one's degree of belief in the hypothesis. They
sum up their position by saying that "significance testing is a
procedure contributing to the (provisional) judgment about the
objective validity of a substantive proposition".
An appendix is provided which summarizes
the differing positions of the developers of significance testing
methods, R. A. Fisher vs. J. Neyman and E. Pearson. The appendix allows
the reader to assess the criticisms of significance testing in light of
what the developers said.
Back to publications
Kiers, H. A. L., &
Harshman, R. A. (1997). Relating two proposed methods for speedup of
algorithms for fitting two- and three-way principal component and
related multilinear models. Chemometrics and Intelligent Laboratory Systems, 36, 31-40.
Summary
Two- and three-way principal component analysis (PCA) or other
multilinear analyses of very large data sets can require so much
computational time as to be infeasible. This paper first discusses two
previously described speed-up approaches: (a) Alsberg and Kvalheim's
'postponed basis matrix multiplication' method simplifies the data and
uses some new algorithms; and (b) Carroll, Pruzansky and Kruskal's more
general CANDELINC procedure can apply any three-mode PCA algorithm
including those of Alsberg and Kvalheim, to a small three-way array
derived from the original large data set. Then this paper shows that
(a) it is easier and often more efficient to apply standard three-mode
PCA algorithms rather than those of Alsberg and Kvalheim to the small
array; and (b) other three-way models/analysis methods (e.g.,
Parafac/Candecomp and constrained 3-mode PCA) can also be successfully
applied to the small array.
Keywords: principal component analysis; multilinear methods; two- and three-way principal component model
Back to publications
Harshman, R. A., &
Lundy, M. E. (1996). Uniqueness proof for a family of models sharing
features of Tucker's three-mode factor analysis and PARAFAC/ CANDECOMP.
Psychometrika, 61, 133-154.
Summary
It has been proven that some three-way factor analysis and MDS models
based on Cattell's 'principle of parallel proportional profiles'
determine the one best-fitting axis orientation (i.e., it is unique),
given certain conditions. These models have not allowed for factor
interactions, however. This paper presents a more general model which
incorporates such interactions, along with a proof that its
best-fitting axis orientation is also unique. This proof requires no
assumptions of symmetry in either the data or the interactions. A
second proof is presented for the symmetrically weighted case.
Together, these proofs imply that psychometrically interesting special
cases such as Parafac2 and 3-way DEDICOM also have unique solutions.
Keywords: parallel proportional profiles;
intrinsic axes; DEDICOM; Parafac2; trilinear models; factor rotation
problem; multidimensional scaling; principal components
View .pdf file (1297 KB, 22 pp)
Back to publications
Back to home page
Harshman, R. A., and Lundy, M. E. (1994). PARAFAC: Parallel factor analysis. Computational Statistics and Data Analysis, 18, 39-72.
View .pdf file (3834 KB, 34 pp)
Back to home page
Summary
The theory underlying the parallel factor analysis method for three-way
data, called Parafac, is reviewed and an application to real data is
demonstrated. Parafac simultaneously fits 'slices' of the three-way
array, using a common set of factors with each slice weighted
differently. Given adequately distinct patterns of three-way variation
in the factors, their orientation in the best-fitting solution is
unique. Parafac can be applied to three-way observations ('direct'
fit), or to a set of covariance matrices ('indirect' fit). Parafac
direct fit is demonstrated here using data from a study of right vs.
left cerebral hemispheric control of the hands while performing various
tasks. The two Parafac factors appear to correspond to the causal
influences that were manipulated in the study. Several more general
versions of the parallel factor analysis model are also mentioned.
Keywords: Three-way exploratory factor analysis; unique axes;
parallel proportional profiles; factor rotation problem; three-way data
preprocessing; three mode principal components; trilinear
decomposition; trilinear model; multidimensional scaling; longitudinal
factor analysis; factor analysis of spectra; interpretation of factors;
'Real' or causal or explanatory factors; L. R. Tucker; R. B. Cattell
Back to publications
Back to home page
Harshman, R. A. (1994).
Substituting statistical for physical decomposition: Are there
applications for parallel factor analysis (PARAFAC) in non-destructive
evaluation? In X. P. V. Maldague (Ed.), Advances in signal
processing for nondestructive evaluation of materials (pp. 469-483).
NATO ASI Series E: Applied Sciences, Vol. 262. Dordrecht, The Netherlands: Kluwer Academic Publishers.
Summary
The possible application to Nondestructive Evaluation (NDE) of the
three-way parallel factor analysis method, called Parafac, is
discussed. Parafac uses the distinct patterns of variation in magnitude
across varying measurement conditions in the data to identify, for
example, functionally distinct parts in a mixture. It has been used to
decompose fluorescence spectroscopy measurements of complex samples
into the individual spectra of the constituent chemical compounds
comprising the mixture. Similarly, in NDE it is suggested that Parafac
may be used to separate the mixture of signals from a test object into
each causally/physically distinct component. This would be possible
when there are several different test conditions or time slices across
which the signal magnitude of each component varies in a way that is
distinct from all the other components. Each factor extracted by the
Parafac analysis would thus identify various normal and anomalous
signals; anomalous ones might then be traced to their physical causes.
Back to publications
Harshman, R. A., &
Lundy, M. E. (1990). Multidimensional analysis of preference
structures. In A. deFontenay, M.H. Shugard, & D.S. Sibley (Eds.), Contributions to economic analysis: Vol. 187. Telecommunications demand modeling: An integrated view (pp.185-204). Amsterdam: Elsevier.
View .pdf file (1793 KB, 11 pp)
Back to home page
Summary
A new approach to exploratory analysis of paired- comparison preference
data is introduced. Called DEDICOM (for DEcomposition into DIrectional
COMponents), it may be applied to real-valued matrices that are either
symmetric, asymmetric or skew-symmetric. Here it is demonstrated with
two skew-symmetric matrices, one an 8x8 set of preference strength
ratings among food and the other a 9x9 set of preference choice
frequencies among celebrities. Interpretation of DEDICOM solutions is
discussed, including an explanation of 'bimensions' for skew-symmetric
data (cf. dimensions for other types of data) and rotation issues.
Also, DEDICOM is compared to psychological-model-based approaches, and
other potential applications (e.g., telecommunications problems) are
described.
Back to publications
Back to home page
Deerwester, S., Dumais,
S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. A. (1990).
Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41, 391-407.
Summary
A new method for automatic indexing and retrieval is presented. It uses
singular-value decomposition (SVD) to detect the implicit higher-order
structure in the association of terms or keywords with documents
(called 'semantic structure'); this structure can then be used to
improve the retrieval of relevant documents on the basis of query
terms. Here, a large term by document matrix is decomposed into around
100 orthogonal factors; these factors are used to approximate the
original matrix. Documents are then represented by 100-item vectors of
factor weights. Queries are represented as pseudo-document vectors
formed by weighted combinations of terms; documents with cosine values
above a given threshold are returned. Initial tests are encouraging.
View .pdf file (3061 KB, 17 pp)
Back to publications
Back to home page
Kruskal, J. B., Harshman,
R. A., & Lundy, M. E. (1989). How 3-MFA can cause degenerate
PARAFAC solutions, among other relationships. In R. Coppi & S.
Bolasco (Eds.), Multiway data analysis (pp. 115-121). Amsterdam: Elsevier.
Summary
The relationships among three models (1) three-mode factor analysis,
(2) PARAFAC-CANDECOMP and (3) CANDELINC are discussed. Two Theorems and
some assocated corollaries are also presented. Theorem 1 and its
corollary show that data satisfying model (1) can cause degenerate
solutions when fit by model (2). Theorem 2 and its corollaries connect
all three models at once.
View .pdf file (645 KB, 8 pp)
Back to publications
Back to home page
Lundy, M. E., Harshman,
R. A., & Kruskal, J. B. (1989). A two-stage procedure incorporating
good features of both trilinear and quadrilinear methods. In R. Coppi
& S. Bolasco (Eds.), Multiway data analysis (pp. 123-130). Amsterdam: Elsevier.
View .pdf file (1202 KB, 8 pp)
Back to home page
Summary
The strengths of two models, the trilinear Parafac-Candecomp factor
analysis model and the quadrilinear Tucker T3 model, are combined here
in a procedure called PFCORE to find a solution for difficult data. The
Tucker T2 and T3 models can fit more complex variance than the more
constrained Parafac model can, but unlike Parafac, the solution is
subject to an axis indeterminacy. Using Parafac to fit data that
satisfies the Tucker models sometimes produces uninterpretable
"degenerate" solutions, however, in which two or more factors are
highly negatively correlated. An application of PFCORE to real data
demonstrates how it provides unique meaningful axes (from Parafac) with
a core matrix (from T3) to give useful insights into the data
complexities that caused the Parafac degeneracy. More general models
are also discussed.
Back to publications
Back to home page
Dawson, M. R., &
Harshman, R. A. (1986). The multidimensional analysis of asymmetries in
alphabetic confusion matrices: Evidence for global-to-local and
local-to-global processing. Perception and Psychophysics, 40, 370-383.
View .pdf file (2848 KB, 14 pp)
Back to home page
Summary
A multidimensional scaling program for asymmetric data, called DEDICOM
(DEcomposition into DIrectional COMponents), is applied to two stimulus
confusion data sets to test the competing local-to-global vs.
global-to-local theories about letter-perception processes. Analysis of
the first set revealed an additive hierarchy of asymmetry that is very
consistent with global-to-local processing, but additional structure
and reliable anomalies suggest the need for more refinement of the
theory. In contrast, analysis of the second set revealed five distinct
patterns, each consisting of transformations attributable to the
failure to detect specific local features; that is, the solution
supported the local-to-global theory. Possible reasons for this
apparent inconsistency in results are discussed, including stimulus
differences and exposure durations. Despite their differences, however,
both solutions demonstrate how useful DEDICOM is in revealing structure
in asymmetries.
Back to publications
Back to home page
DeSarbo, W. S., & Harshman, R. A. (1985). Celebrity-brand congruence analysis. Current Issues and Research in Advertising, 1, 17-52.
Summary
While a commercial spokesman for a product directly influences the audience through his/her message, s/he also indirectly
influences it by how s/he is perceived (called the "source" effect). A
lot of research has been directed towards finding factors that might
influence the source effect, but very little towards the problem of
finding the "best" spokesperson for a given product. This paper uses
the PARAFAC method to uncover the cognitive-perceptual overtones of
both a product and possible spokespeople, and then tries to establish a
basis for optimizing potentially desireable source effects. PARAFAC is
a three-way factor analysis procedure which is here applied to 34
individuals' associative judgments (measured on 39 semantic
differential scales) concerning 12 automobile makes and 12 celebrities
(commercial spokespeople). The three dimensions of overtones that were
extracted, labeled "Flashy", "Mature/Conservative", and
"Feminine/Soft/Smooth", are related in distinctive ways to different
automobiles and particular celebrities. They provide a basis for
deciding which celebrity might be suited to sponsor a specific
automobile.
Back to publications
Harshman, R. A., &
Lundy, M. E. (1984). The PARAFAC model for three-way factor analysis
and multidimensional scaling. In H. G. Law, C. W. Snyder, Jr., J.
Hattie, & R. P. McDonald (Eds.), Research methods for multimode data analysis (pp. 122-215). New York: Praeger.
View .pdf file (11,506 KB, 94 pp)
Back to home page
Summary
This is a comprehensive discussion of the different forms of the
PARAFAC(-CANDECOMP) three-way factor analysis model, its "uniqueness"
properties, and its relationship with other two- and three-way factor
analysis models. First, the PARAFAC1 model for raw score or profile
data ("direct fit") is derived as a generalization of the two-way
factor model, issues of scaling and interpretation of the factor
loading matrices are discussed (pp. 128-139; 192-203), and "system" vs.
"object" variation as they relate to the model are examined (pp.
125-133). Next, the PARAFAC1 model for covariances ("indirect fit") is
derived, first from the raw score model and then from more general
assumptions. The first derivation shows PARAFAC1 to be a special case
of PARAFAC2 (pp. 136-7). Indirect and direct fitting are compared and
the advantages and disadvantages of the orthogonality assumption
required by the indirect fit model are discussed (pp. 137-8), along
with why correlations are inappropriate (p. 141), and whether the
principal components model differs appreciably from the common factors
model in the three-way indirect fit context (pp. 141-3). Finally, the
relationship between factor analysis (PARAFAC) and metric MDS (INDSCAL,
IDIOSCAL) is examined (pp. 144-7).
The next major section in the chapter
deals with the uniqueness properties of the PARAFAC model (pp.
147-169). "Uniqueness" or "intrinsic axes" means that given certain
assumptions, the solution determined from the data by PARAFAC has no
alternative, equal-fitting form (i.e., any other rotation would reduce
its fit to the data). Why this is important (pp. 147-150; 163-9) and
the minimum conditions for uniqueness (pp. 161-2) are explained. The
value of empirically confirming a PARAFAC solution by split-half,
bootstrapping, and/or jackknifing procedures is also discussed.
The final section compares PARAFAC with
other models, most notably Tucker's T3 (pp. 169-182), Corballis'
three-way model (pp. 184-5), and Sands and Young's ALSCOMP (pp.
188-190). PARAFAC1 is described as a special case of the T3 model and
vice versa, and a diagram on p. 175 shows how Carroll transforms a T3
representation into the corresponding PARAFAC one (two other methods of
embedding T3 in PARAFAC are also discussed on pp. 176-178 and pp.
203-7). A family of related models for three-way profile data, arranged
from most general to most restricted, is presented on p. 184. PARAFAC3
is listed between T3 and PARAFAC1 and is discussed on pp. 185-6.
PARAFAC2 and DEDICOM, not in the table, are also discussed in relation
to PARAFAC3 and T3 (p. 187).
Back to publications
Back to home page
Harshman, R. A., &
Lundy, M. E. (1984). Data preprocessing and the extended PARAFAC model.
In H. G. Law, C. W. Snyder, Jr., J. Hattie, & R. P. McDonald
(Eds.), Research methods for multimode data analysis (pp.216-284). New York: Praeger.
View .pdf file (8402 KB, 69 pp)
Back to home page
Summary
The data preprocessing discussed here is restricted to additive
adjustments (centering) and multiplicative adjustments (rescaling or
normalization) that may be applied to three-way profile data, for
example, before direct fitting of the PARAFAC model. It does not
include conversion of profile data to covariances or cross-products,
nor does it include conversion of proximity data to scalar products.
Eight reasons for preprocessing are given, the most important of which
is "to make the data appropriate for the PARAFAC model" (p. 218), which
is accomplished by centering. It is shown algebraically that
"fiber"-centering and "slab"-reweighting are the only appropriate ways
to preprocess data for the trilinear PARAFAC model (see p. 231 for a
diagram of fibers and slabs). Also shown is the effect the proper
preprocessing has on the original factor loadings (e.g., centering Mode
A of the data centers the Mode A factors; rescaling Mode B similarly
reweights the rows of the Mode B factor matrix). Practical guidelines
for deciding how to preprocess data are given on pp. 257-259.
Preprocessing is also discussed from the perspective of extending the
PARAFAC model to more general data (instead of making the data
appropriate for a restricted model, as above). In this context,
"degenerate" solutions and their possible causes are described.
Degenerate solutions are characterized by two or more factors whose
loadings are highly correlated across all three modes, with a negative
triple-product. Appropriate centering can sometimes correct this
problem, but not always. Other times an orthogonally constrained
PARAFAC procedure can block the high correlations and yield an
interpretable solution. In these instances, it seems that while more
general Tucker-type structure is present in the data, the constraints
allow a subset of Tucker variations to be meaningfully expressed via
orthogonal PARAFAC factors.
Back to publications
Back to home page
Harshman, R. A., &
DeSarbo, W. S. (1984). An application of PARAFAC to a small sample
problem, demonstrating preprocessing, orthogonality constraints, and
split-half diagnostic techniques. In H. G. Law, C. W. Snyder, Jr., J.
Hattie, & R. P. McDonald (Eds.), Research methods for multimode data analysis (pp.602-642). New York: Praeger.
View .pdf file (3406 KB, 41 pp)
Back to home page
Summary
This paper presents a detailed account of an application of the PARAFAC
three-way factor analysis procedure to a small set of marketing data,
ratings of 25 stimuli (names of automobiles and celebrities) made by 34
raters using 39 bipolar scales. The preferred three-dimensional
solution, with factors labeled as "Flashy", "Mature/Conservative" and
"Feminine-Soft-Smooth", is given along with other information such as
diagnostic checks. The extra information is instructive because it
explains considerations that arise and how decisions are made in the
course of an analysis as the analyst tries to understand the data.
Preprocessing (removal of means, standardization of mean squares) the
data is discussed first. To check how many factors can be supported by
the preprocessed data, diagnostic assessment of the fit values for the
different dimensional solutions and the within-solution correlations
among dimensions is done. Split-half analyses are also performed as a
check of the factor reliability in this sample. Because highly
correlated within-solution factors arise when more than three factors
are extracted, orthogonality constraints are imposed to prevent them,
and the analyses repeated. Diagnostic checks are repeated for these new
solutions, and an additional comparison of orthogonal vs. unconstrained
solutions done. Finally, a detailed discussion of the unconstrained
three-factor solution is presented. Another diagnostic check is done
for this solution, this time an error analysis.
Back to publications
Back to home page
Harshman, R. A. (1984).
"How can I know if it's real?" A catalogue of diagnostics for use with
three-mode factor analysis and multidimensional scaling. In H. G. Law,
C. W. Snyder, Jr., J. Hattie, & R. P. McDonald (Eds.), Research methods for multimode data analysis (pp. 566-591). New York: Praeger.
View .pdf file (2806 KB, 26 pp)
Back to home page
Summary
This paper suggests that multivariate analysis procedures should
include diagnostic evaluation at various stages of the analysis to
assess the appropriateness of the model, the computational adequacy of
the fitting procedure, the statistical reliability of the solution, and
the generalizability and explanatory validity of any resulting
interpretations. The list of diagnostics presented is organized
according to when the check would be done and the type of information
needed (e.g., the data itself; factor loadings from a single analysis;
loadings from several different analyses) and which aspects of the
solution they focus on (e.g., the factor loadings, the residuals, the
fit values). A brief description of what to do and why is provided for
each diagnostic. Of all the checks mentioned, the most important are
probably those for evaluating the reliability of any characteristics of
the solution (e.g., split-half analyses, bootstrapping, jackknifing).
These can be used to estimate the maximum dimensionality and determine
which aspects of the solution are stable enough to warrant
interpretation.
Back to publications
Back to home page
Harshman, R. A., Green,
P. E., Wind, Y., & Lundy, M. E. (1982). A model for the analysis of
asymmetric data in marketing research. Marketing Science, 1, 205-242.
View .pdf file (3675 KB, 38 pp)
Back to home page
Summary
Marketing researchers have applied numerous methods for the
multidimensional scaling (MDS) of perceptions and preferences, but
there has been a lack of models for analyzing inherently asymmetric
data relationships. However, Harshman has proposed the DEDICOM
(DEcomposition into DIrectional COMponents) family of models for such
data. This article describes the single-domain DEDICOM model, applies
it to two matrices of asymmetric relationships--shampoo word
associations and car brand-switching--and compares the DEDICOM
solutions to those obtained from a symmetric (factor analysis) model.
Besides demonstrating that DEDICOM solutions can be more easily
interpreted and have significantly better fit than solutions obtained
by factor analysis, the analyses also show the potential marketing
value of additional information provided by DEDICOM--a description of
the asymmetry among the dimensions. For example, two DEDICOM dimensions
(labeled "Thickness" and "Vigor") are extracted from the shampoo word
associations data; words loading high on the Vigor dimension evoke
those high on the Thickness dimension much more often than vice versa.
This might be useful in writing advertising copy. For the car-switching
data, the four DEDICOM dimensions extracted (labeled "Plain Large
Midsize", "Specialty", "Fancy Large", and "Small") reflect a much
larger "flow" from Plain Large Midsize cars to Small than vice versa, a
result with direct implications for the automobile manufacturing
industry.
Keywords: Multidimensional scaling; factor analyses
Back to publications
Back to home page
Dunn, T., & Harshman, R. A. (1982). A multidimensional scaling model of the size-weight illusion. Psychometrika, 47, 25-45.
Summary
The weighted Euclidean model for multidimensional scaling (e.g.,
INDSCAL) is much more restricted in the kinds of individual differences
in perceptions it permits than either Tucker's Three-mode
Multidimensional Scaling model or Carroll's Idiosyncratic Scaling
(IDIOSCAL) model. Investigators have nonetheless been reluctant to use
these more general models because they are subject to transformational
indeterminacies which complicate interpretation. This article shows how
these indeterminacies can be removed by constructing specific models of
the phenomenon under investigation. To demonstrate this approach, the
size-weight illusion is developed and applied to data from two
experiments. The same data were also analyzed using INDSCAL. Of the two
solutions, only the size-weight one permits examination of individual
differences in the strength of the illusion. In this sample, however,
individual differences in illusion strength are minor. Thus the INDSCAL
solution is easily interpretable, even though it is less informative
than the size-weight one.
Keywords: Individual differences, multidimensional scaling, three-mode factor, INDSCAL, size-weight illusion
Back to publications
Harshman, R. A., &
Berenbaum, S. A. (1981). Basic concepts underlying the PARAFAC-
CANDECOMP three-way factor analysis model and its application to
longitudinal data. In D. H. Eichorn, J. A. Clausen, N. Haan, M. P.
Honzik, & P. H. Mussen (Eds.), Present and past in middle life (pp. 435-459). New York: Academic Press.
View .pdf file (2542 KB, 25 pp)
Back to home page
Summary
This is a comprehensive discussion of the three-way factor analysis
model, PARAFAC-CANDECOMP, with specific reference to considerations
that must be addressed if the data to be analyzed are longitudinal
(i.e., measurements are repeated several different times). The model
for fitting (raw) score data is presented, along with the model for
fitting a set of covariances. General assumptions discussed are that
(a) system variation is being fit; (b) within each factor, proportional patterns of variation occur across levels of the third way of the data (e.g., occasions); and (c) between
any two factors, distinct (nonproportional) patterns in variation occur
across levels of the third way of the data. The special property of
rotational uniqueness of the PARAFAC solution (compared to the
rotational indeterminacy of two-way factor analysis solutions) is
explained. Finally, special assumptions underlying PARAFAC analyses of
longitudinal data are addressed, including (a) factor loading
invariance, (b) the nature of factor score changes, (c) orthogonal vs.
oblique factors, (d) error terms and (e) linear independence of factor
variations across time.
Back to publications
Gandour, J., &
Harshman, R. A. (1978). Cross-language differences in tone perception:
A multidimensional scaling investigation. Language and Speech, 21, 1-33.
Summary
The three-way factor analysis procedure, PARAFAC, is used for a
multidimensional scaling (MDS) analysis of tone perception data. The
analysis is done to determine what dimensions underlie the perception
of linguistic tone, and to what extent an individual's language
background influences his/her perception. Paired-comparison judgements
of 13 different pitch patterns superimposed on a synthetic speech-like
syllable were obtained from 140 subjects (101 Thai, 15 Yoruba and 24
American English) and then transformed to scalar products to make the
data suitable for the PARAFAC analysis. Five dimensions, labeled
"Average Pitch", "Direction", "Length", "Extreme Endpoint", and "Slope"
were extracted. Discriminant analysis showed that most speakers of a
tonal language (Thai, Yoruba) have patterns of perceptual saliency on
these dimensions that are distinct from those of speakers of a nontonal
language (English). Also, regression analysis suggested that the
Direction and Slope dimensions closely correspond to distinctive
features of tone that have been postulated to be binary.
Back to publications
Ladefoged, P., Harshman, R., Goldstein, L., & Rice, L. (1978). Generating vocal tract shapes from formant frequencies. Journal of the Acoustical Society of America, 64, 1027-1035.
View .pdf file (1609 KB, 10 pp)
Back to publications
Summary
The purpose of this research is to devise an algorithm that generates
appropriate vocal tract shapes as seen on midsagittal X-ray diagrams of
most English vowels. The first three formant frequencies only are used,
and the shape of the tongue is specified in terms of the sum of two
factors--a front raising component and a back raising
component--obtained from PARAFAC, a factor analysis procedure for
three-way data. Stepwise multiple regression techniques show that the
proportions of the two tongue shape components and of a third parameter
corresponding to the distance between the lips are highly correlated
with the formant frequencies in 50 vowels. The recovery algorithm
developed from these correlations is tested on a number of published
sets of tracings from X-ray diagrams, and appears to be generalizable
to other speakers.
See also:
- Factor analysis of tongue shapes. (next summary)
Back to publications
Harshman, R. A., Ladefoged, P., & Goldstein, L. (1977). Factor analysis of tongue shapes. Journal of the Acoustical Society of America, 62, 693-707.
View .pdf file (2772 KB, 15 pp)
Back to home page
Summary
A factor analysis procedure for three-way data, called PARAFAC, is
applied to a description of the shape of the tongue during the
pronunciation of English vowels, to see if it can be explained in terms
of a few underlying factors. The data analyzed consists of 13 measures
of tongue displacement during pronunciation of 10 vowels by 5 speakers.
Two factors, which account for more than 92% of the variance in the
data, were extracted. One factor generates a forward movement of the
root of the tongue accompanied by an upward movement of the front of
the tongue. Movements from front to back vowels involve decreasing
amounts of this factor. The second factor generates an upward and
backward movement of the tongue. Movements from high to low vowels
involve decreasing amounts of this factor. The two factors are used to
different degrees by different speakers, depending on their individual
anatomy.
Back to publications
Harshman, R. A. (1972). PARAFAC2: Mathematical and technical notes. UCLA Working Papers in Phonetics, 22, 30-47. (University Microfilms, Ann Arbor, No. 10,085).
View .pdf file (124 KB, 18 pp)
Back to home page
Summary
The mathematical model of PARAFAC1 is reviewed, and its application to
cross-product matrices (e.g., covariances, scalar products) is
examined. A similar examination is made of the INDSCAL-PARAFAC1 model
for three-mode multidimensional scaling. Then a three-mode model called
PARAFAC2 is developed to deal specifically with sets of cross-product
matrices. It can determine orthogonal or oblique factors, whichever
best fit the data. Another important advantage is its greater
generality, since it is not restricted to analysis of system-variation
data. The PARAFAC2 model is derived in two ways, one invoking the
restricted system-variation model and one using a more general model.
The interpretation of PARAFAC2 is discussed, and a precise definition
is given of the type of uniqueness it provides. No formal proof of
uniqueness is given; so far, there is only empirical evidence for the
uniqueness of PARAFAC2 solutions. Current work on computer algorithms
for fitting the PARAFAC2 model is described, as is a method for
circumventing the "communalities" problem. Also, the difference between
covariance and correlation matrices is explained in the context of the
model, and its use in conjunction with more general models such as
IDIOSCAL is discussed.
Back to publications
Back to home page
Harshman, R. A. (1972). Determination and proof of minimum uniqueness conditions for PARAFAC1. UCLA Working Papers in Phonetics, 22, 111-117. (University Microfilms, Ann Arbor, No. 10,085).
View .pdf file (52 KB, 7 pp)
Back to home page
Summary
The first mathematical uniqueness proof of PARAFAC1, discovered by
Robert Jennerich of the UCLA Biomathematics Department, along with the
results of empirical studies of the model's uniqueness, were reported
in Harshman (1970). The empirical tests suggested that the conditions
required by Jennrich's theorem were stronger than necessary in order to
determine uniqueness, but the minimal
conditions could not be determined empirically. The proof given here
shows that two occasions can be sufficient to uniquely determine any
number of factors, provided that (a) the factors change size from the
first to the second occasions and (b) the percent change of each factor
is different than that of the other factors. It is shown how this proof
applies to Carroll and Chang's INDSCAL model and to the orthogonal
factor case of PARAFAC2.
Back to publications
Back to home page
Lindau, M., Harshman, R., & Ladefoged, P. (1971). Factor analysis of formant frequencies of vowels. UCLA Working Papers in Phonetics, 19, 17-25. (University Microfilms, Ann Arbor, No. 10,085).
Summary
This is a progress report on several sets of three-way factor analyses
that were done using the PARAFAC procedure. The purpose is to isolate
phonetic vowel quality from the whole spectrum, and to discover
acoustic dimensions that will provide an objective definition and
physical explanation for phonetic vowel quality. The first set of
analyses is of four formant frequencies (converted into pitch values of
mels) for eight cardinal vowels, as spoken by eleven phoneticians
(previously reported by Harshman, 1970). One interpretation of the
three-factor solution suggests a one-to-one correspondence between
factors and the traditional dimensions of "Vowel Height",
"Front-Backness" and "Lip Rounding". Because of some ambiguity between
the "front-back" and "rounding" factors, however, sets of Swedish
vowels in which "backness" and "rounding" are independent are analyzed
next. Preliminary results suggest three factors, just as for the
cardinal vowels. Several hypotheses based on these results are
introduced, including one that the "Vowel Height" factor may be the
most basic property of vowels.
Back to publications
Harshman, R. A. (1970).
Foundations of the PARAFAC procedure: Models and conditions for an
"explanatory" multi-modal factor analysis. UCLA Working Papers in Phonetics, 16, 84 pp. (University Microfilms, Ann Arbor, No. 10,085).
View .pdf file (386 KB, 84 pp)
Back to home page
Summary
Simple structure and other common principles of factor rotation do not
in general provide strong grounds for attributing explanatory
significance to the factors which they select. In contrast, it is shown
that an extension of Cattell's principle of rotation to Proportional
Profiles (PP) offers a basis for determining explanatory factors for
three-way or higher order multi-mode data. Conceptual models are
developed for two basic patterns of multi-mode data variation, system-
and object-variation, and PP analysis is found to apply in the
system-variation case.
Although PP was originally formulated as a
principle of rotation to be used with classic two-way factor analysis,
it is shown to embody a latent three-way factor model, which is here
made explicit and generalized from two to N "parallel occasions". As
originally formulated, PP rotation was restricted to orthogonal
factors. The generalized PP model is demonstrated to give unique
"correct" solutions with oblique, non-simple structure, and even
non-linear factor structures.
A series of tests, conducted with
synthetic data of known factor composition, demonstrate the
capabilities of linear and non-linear versions of the mode, provide
data on the minimal necessary conditions of uniqueness, and reveal the
properties of the analysis procedures when these minimal conditions are
not fulfilled. In addition, a mathematical proof is presented for the
uniqueness of the solution given certain conditions on the data.
Three-mode PP factor analysis is applied
to a three-way set of real data consisting of the fundamental and first
three formant frequencies of ll persons saying 8 vowels. A unique
solution is extracted, consisting of three factors which are highly
meaningful and consistent with prior knowledge and theory concerning
vowel quality.
The relationships between the three-mode
PP model and Tucker's multi-modal model, McDonald's non-linear model
and Carroll and Chang's multi-dimensional scaling model are explored.
Back to publications
Back to home page