R. Harshman's Abstracts

Harshman, R. A., Hong, S., & Lundy, M. E. (2003). Shifted factor analysis--Part I: Models and properties. J. Chemometrics , 17, 363-378.

View .pdf file (235 KB, 16 pp)
Back to home page

Summary The factor model is modified to deal with the problem of factor shifts. This problem arises with sequential data (e.g., time series, spectra, digitized images) if the profiles of the latent factors shift position up or down the sequence of measurements: such shifts disturb multilinearity and so standard factor/component models no longer apply. To deal with this, we modify the model(s) to include explicit mathematical representation of any factor shifts present in a dataset; in this way, the model can both adjust for the shifts and describe/recover their patterns. Shifted-factor versions of both two- and three (or higher)-way factor models are developed. The results of applying them to synthetic data support the theoretical argument that these models have stronger uniqueness properties; they can provide unique solutions in both two-way and three-way cases where equivalent non-shifted versions are under-identified. For uniqueness to hold, however, the factors must shift independently; two or more factors that show the same pattern of shifts will not be uniquely resolved if not already uniquely determined. Another important restriction is that the models, in their current form, do not work well when the shifts are accompanied by substantial changes in factor profile shape. Three-way factor models such as Parafac, and shifted-factor models such as described here, may be just two of many ways that factor analysis can incorporate additional information to make the parameters identifiable.

Keywords: Factor shifts; sequential data; unique solutions; time series; spectral shifts; three-way or three-mode analysis; Parafac; Parafac2; Tucker T3 and T2

Back to publications
Back to home page

Hong, S., & Harshman, R. A. (2003). Shifted factor analysis--Part II: Algorithms. J. Chemometrics, 17, 379-388.

View .pdf file (152 KB, 10 pp)
Back to home page

Summary We previously proposed a family of models that deal with the problem of factor position shift in sequential data. We conjectured that the added information provided by fitting the shifts would make the model parameters identifiable, even for two-way data. We now derive methods of parameter estimation and give the results of experiments with synthetic data. The Alternating Least Squares (ALS) approach is not fully suitable for estimation because factor position shifts destroy multilinearity of the latent structure. Therefore, an alternative "quasi-ALS" approach is developed, some of its practical and theoretical properties are dealt with and several versions of the Quasi-ALS algorithm are described in detail. These procedures are quite computation-intensive, but analysis of synthetic data demonstrates that the algorithms can recover shifting latent factor structure and, in the situations tested, are robust against high error levels. The results of these experiments also provide strong empirical support for our conjecture that the two-way shifted factor model has unique solutions in at least some circumstances.

Keywords: Shifted factor analysis; latent position shift; lag; multilinearity; principal component analysis; bilinear model; quasi-ALS; uniqueness; identifiability

Back to publications
Back to home page

Hong, S., & Harshman, R. A. (2003). Shifted factor analysis--Part III: N-way generalization and application. J. Chemometrics , 17, 389-399.

View .pdf file (190 KB, 11 pp)
Back to home page

Summary The "Quasi-ALS" algorithm for Shifted Factor estimation is generalized to three-way and n-way models. We consider the case in which Mode A is the only shifted sequential mode, Mode B determines shifts, and modes above B simply reweight the factors. The algorithm is studied using error-free and fallible synthetic data. In addition, a four-way chromatographic dataset previously analyzed by Bro, Anderson and Kiers [1] is reanalyzed, and (two or) three out of four factors are recovered. The reason for the incomplete success may be factor shape changes combined with the lack of distinct shift patterns for two of the factors. The shifted factor model is compared with Parafac2 from both theoretical and practical points of view.

Keywords: Shifted factor analysis; latent position shift; lag; multilinearity; Parafac1; Parafac2; multiway analysis; quasi-ALS; chromatography

Back to publications
Back to home page

Swartzman, L. C., Harshman, R. A., Burkell, J., & Lundy, M. E. (2002). What accounts for the appeal of complementary/alternative medicine, and what makes complementary/alternative medicine "alternative"?. Medical Decision Making, 22, 431-450.

View .pdf file (205 KB, 20 pp)

Summary The goal of this study was to elucidate the basis for the appeal of complementary/alternative medicine (CAM) and the basis upon which people distinguish between CAM and conventional medicine. Undergraduates (N=173) rated 19 approaches to the treatment of chronic back pain on 16 rating scales. Data were analyzed via 3-mode factor analysis, which extracted conceptual dimensions common to both the scales and the treatments. A 5-factor solution was judged to give the best description of the raters' perceptions. One of these 5 factors clearly reflected the distinction between conventional versus CAM approaches, and a 2nd factor clearly referred to treatment appeal. The other 3 factors were invasiveness, health care professional versus patient effort, and "druglikeness". To the extent that treatment was seen as a CAM treatment (as opposed to a conventional treatment), it was seen to be more appealing, less invasive, and less druglike. Simple and partial correlations of the dimension weights indicated that both the appeal of CAM and the distinction between CAM and conventional medicine were largely driven by the view that CAM is less invasive than conventional medicine.

Keywords: alternative medicine; complementary (natural) versus conventional (or traditional, biomedical) treatment; intractible pain; factor analysis; attitude toward health; psychological models; lay beliefs; PARAFAC; 3-way; multimode

Back to publications

Thomas, C. G., Harshman, R. A., & Menon, R. S. (2002). Noise reduction in BOLD-based fMRI using component analysis. NeuroImage, 17, 1521-1537.

Summary Principle Component Analysis (PCA) and Independent Component Analysis (ICA) were used to decompose the fMRI time series signal and separate the BOLD signal change from the structured and random noise. Rather than using component analysis to identify spatial patterns of activation and noise, the approach we took was to identify PCA or ICA components contributing primarily to the noise. These noise components were identified using an unsupervised algorithm that examines the Fourier decomposition of each component time series. Noise components were then removed before subsequent reconstruction of the time series data. The BOLD contrast sensitivity (CS_BOLD), defined as the ability to detect a BOLD signal change in the presence of physiological and scanner noise, was then calculated for all voxels. There was an increase in CS_BOLD values of activated voxels after noise reduction as a result of decreased image-to-image variability in the time series of each voxel. A comparison of PCA and ICA revealed significant differences in their treatment of both structured and random noise. ICA proved better for isolation and removal of structured noise, while PCA was superior for isolation and removal of random noise. This provides a framework for using and evaluating component analysis techniques for noise reduction in fMRI.

Back to publications

Harshman, R. A., & Hong, S. (2002). 'Stretch' vs. 'slice' methods for representing three-way structure via matrix notation. J. Chemometrics, 16, 198-205.

View .pdf file (150 KB, 8 pp)
Back to home page

Summary A three-way array must be represented in two-way form if its structure is to be described and manipulated by means of matrix notation. Historically, two methods here called 'array stretching' and 'array slicing' have been used. More recently, however, array slicing has often been overlooked, resulting in loss of mathematical flexibility. Stretching involves unfolding (matricizing) the three-way array and applying one's mathematical operations to the resulting two-way matrix; this results in expressions that are often quite useful for parameter estimation but which are relatively long and require practice to interpret properly. 'Slicing' involves taking a representative two-way subarray and applying operations to it; this often gives compact and easily understood expressions, but requires the introduction of extra matrix names and becomes awkward if the array is not 'slicewise-regular'. The advantages of each approach are demonstrated and compared by applying them to a set of models from the Tucker and Parafac families. In addition, we show how slice-wise representation can be improved by using: (i) angle brackets to eliminate the need for extra diagonal matrices, and (ii) 'encapsulated summation' notation to allow representation of array structure that is orderly but not slicewise regular.

Keywords: diagonalization, sliced-array, stretched-array, matricized, unfolding, multilinear and quasi-multilinear models, Tucker, T1, T2, T3, Parafac/Candecomp, Parafac1, Parafac2, Paratuck2.

Back to publications
Back to home page

Harshman, R. A. (2001). An index formalism that generalizes the capabilities of matrix notation and algebra to n-way arrays. J. Chemometrics, 15, 689-714.

View published .pdf file (2657 KB, 26 pp)
View .pdf file (269 KB, 42 pp, revised submitted ms)
Back to home page

SummaryThe capabilities of matrix notation and the rules of matrix algebra are generalized to n-way arrays. The resulting language seems easy to use; all the capabilities of matrix notation are retained and most carry over naturally to the n-way context. For example, one can multiply a three-way array times a four-way array to obtain a three-way product. Many of the language's key characteristics are based on the rules of tensor notation and algebra. The most important example of this is probably the incorporation of subscript/index related information into both the names of array objects and the rules used to operate on them. Some topics that emerge are relatively unexplored, such as inverses of n-way arrays; these might prove interesting for future theoretical study.

Keywords: Linear and multilinear algebra; tensors; array notation; three-way models; n-way arrays; Tucker; T2; T3; Parafac / Candecomp

Back to publications
Back to home page

Do, T., McIntyre, N. S., Harshman, R. A., Lundy, M. E., & Splinter, S. J. (1999). Application of Parallel Factor Analysis and X-ray photoelectron spectroscopy to the initial stages in oxidation of aluminium. Surface and Interface Analysis, 27, 618-628.

Summary Three-way parallel factor analysis (PARAFAC) was used to decompose a set of Al 2p X-ray photoelectron spectroscopy (XPS) spectra that resulted during the oxidation of clean aluminium surfaces, measured as a function of exposure time and water vapour pressure. In addition to the expected fine peak structure of the XPS Al 2p spectrum, the PARAFAC solution provided new information on elemental processes that occur during the initial stages of oxidation. As expected, the water vapour-aluminium reaction (a) attenuated the metallic peak at binding energy (BE) 72.87 ± 0.05 eV, and (b) increased the oxidic peak at BE 75.80 ± 0.05 eV. The new information came from another factor that suggested the formation of an interfacial metal hydride at BE 72.4(4) eV, as well as a concomitant oxide peak at 75.4(3) eV. Both are hypothesized to be products of the hydrolysis of adsorbed water molecules at the aluminium interface. At pressures above and below 1.3 x 10^-5 Pa this factor was diminished; in the case of higher pressure, this is explained as an increase in the recombination of atomic hydrogen.

Keywords: aluminium oxide; aluminium hydride; oxidation; parallel factor analysis; XPS

Back to publications

Hopke, P. K., Paatero, P., Jia, H., Ross, R. T., & Harshman, R. A. (1998). Three-way (PARAFAC) factor analysis: Examination and comparison of alternative computational methods as applied to ill-conditioned data. Chemometrics and Intelligent Laboratory Systems, 43, 25-42.

Summary Four different approaches to three-way factor analysis are applied to three sets of ill-conditioned data, and the results are compared. Also, a numerical index is introduced to characterize the ill-conditioning of n-way arrays (n>2). The four approaches (computer programs) are HL-PARAFAC, a simple alternating least squares (ALS) algorithm with minimal extrapolation; TPALS, an ALS algorithm with sophisticated extrapolation; PMF3, a non-linear curve fitting procedure; and DTDMR, a non-iterative closed-form approximation method. They were applied to two 'difficult' synthetic data sets and to one set of ill-conditioned real data, a set of fluorescence spectroscopy measurements taken from an amino acid in aqueous solution. Upon convergence, all results except those from DTDMR agree, but there are large differences in speed of convergence. DTDMR is the fastest, but solves only half the problems. Of the others, PMF3 is faster than TPALS by a factor of ten, and TPALS is faster than PARAFAC, again by a factor of ten.

Keywords: Factor analysis; Trilinear; PARAFAC; Fluorescence spectroscopy; PMF3; TPALS; DTDMR

Back to publications

Mulaik, S. A., Raju, N. S., & Harshman, R. A. (1997). There is a time and place for significance testing. In L. L. Harlow, S. A. Mulaik & J. J. Steiger (Eds.), What if there were no significance tests? Multivariate applications: Vol.1 (pp. 65-115). Mahwah, N.J.: Lawrence Erlbaum Associates.

Summary While null hypothesis significance testing is all too frequently misused and important complementary information such as confidence intervals omitted, this calls for better education of researchers (and of some who teach research methods) rather than abandonment of this essential method. We argue that criticisms recently leveled against the basic logic of null hypothesis significance testing are misguided and confused.
    Issues addressed include the "corrupt scientific method", the "nil" hypothesis, when it seems most appropriate to test the null hypothesis of no effect, the proper interpretation of a confidence interval, the purpose of a significance test, whether meta-analysis should replace significance testing, and eight misinterpretations of significance testing (e.g., "the probability of rejecting H₀ is α"; "a statistically significant result is a scientifically significant result", etc.). Significance tests must be used in the proper circumstances, with a correct understanding of what conclusions may be drawn from them.
    Power of a significance test is defined, and issues related to sample size are discussed. Rather than avoiding significance testing altogether, one must consider its power to detect a given size effect with the sample at hand (i.e., significance testing with small samples is meaningful if one is looking for large effects). Contrary to those who believe that power is relevant only in the context of significance testing, the authors take the position that power remains an issue in meta-analysis as well, and discuss this in some detail.
    The notion that physicists do not perform formal significance tests is also addressed. [Not mentioned, however, is the fact that in some areas, such as particle physics, significance testing plays an important role, though the terminology used is somewhat different. In the chapter, attention is focused on those areas of physics where significance testing is less common.] One reason is that physicists are often trying to improve their estimates of physical constants, rather than testing hypotheses. The argument is also made that many of their studies are equivalent to meta-analyses in the social sciences and their statistics like those used in meta-analysis.
    The meaning of objectivity and its relevance to hypothesis testing is also covered. A hypothesis must be formed independently of the data used to evaluate it, so that objective judgments about the world can be made as a result of the hypothesis test. It is suggested that nonzero null hypotheses should be used, and the null values modified to reflect new knowledge; this avoids the nil-hypothesis criticism. Finally, the authors argue that hypothesis testing should not be viewed as a true-false decision but rather as something that affects one's degree of belief in the hypothesis. They sum up their position by saying that "significance testing is a procedure contributing to the (provisional) judgment about the objective validity of a substantive proposition".
    An appendix is provided which summarizes the differing positions of the developers of significance testing methods, R. A. Fisher vs. J. Neyman and E. Pearson. The appendix allows the reader to assess the criticisms of significance testing in light of what the developers said.

Back to publications

Kiers, H. A. L., & Harshman, R. A. (1997). Relating two proposed methods for speedup of algorithms for fitting two- and three-way principal component and related multilinear models. Chemometrics and Intelligent Laboratory Systems, 36, 31-40.

Summary Two- and three-way principal component analysis (PCA) or other multilinear analyses of very large data sets can require so much computational time as to be infeasible. This paper first discusses two previously described speed-up approaches: (a) Alsberg and Kvalheim's 'postponed basis matrix multiplication' method simplifies the data and uses some new algorithms; and (b) Carroll, Pruzansky and Kruskal's more general CANDELINC procedure can apply any three-mode PCA algorithm including those of Alsberg and Kvalheim, to a small three-way array derived from the original large data set. Then this paper shows that (a) it is easier and often more efficient to apply standard three-mode PCA algorithms rather than those of Alsberg and Kvalheim to the small array; and (b) other three-way models/analysis methods (e.g., Parafac/Candecomp and constrained 3-mode PCA) can also be successfully applied to the small array.

Keywords: principal component analysis; multilinear methods; two- and three-way principal component model

Back to publications

Harshman, R. A., & Lundy, M. E. (1996). Uniqueness proof for a family of models sharing features of Tucker's three-mode factor analysis and PARAFAC/ CANDECOMP. Psychometrika, 61, 133-154.

Summary It has been proven that some three-way factor analysis and MDS models based on Cattell's 'principle of parallel proportional profiles' determine the one best-fitting axis orientation (i.e., it is unique), given certain conditions. These models have not allowed for factor interactions, however. This paper presents a more general model which incorporates such interactions, along with a proof that its best-fitting axis orientation is also unique. This proof requires no assumptions of symmetry in either the data or the interactions. A second proof is presented for the symmetrically weighted case. Together, these proofs imply that psychometrically interesting special cases such as Parafac2 and 3-way DEDICOM also have unique solutions.

Keywords: parallel proportional profiles; intrinsic axes; DEDICOM; Parafac2; trilinear models; factor rotation problem; multidimensional scaling; principal components

View .pdf file (1297 KB, 22 pp)
Back to publications
Back to home page

Harshman, R. A., and Lundy, M. E. (1994). PARAFAC: Parallel factor analysis. Computational Statistics and Data Analysis, 18, 39-72.

View .pdf file (3834 KB, 34 pp)
Back to home page

Summary The theory underlying the parallel factor analysis method for three-way data, called Parafac, is reviewed and an application to real data is demonstrated. Parafac simultaneously fits 'slices' of the three-way array, using a common set of factors with each slice weighted differently. Given adequately distinct patterns of three-way variation in the factors, their orientation in the best-fitting solution is unique. Parafac can be applied to three-way observations ('direct' fit), or to a set of covariance matrices ('indirect' fit). Parafac direct fit is demonstrated here using data from a study of right vs. left cerebral hemispheric control of the hands while performing various tasks. The two Parafac factors appear to correspond to the causal influences that were manipulated in the study. Several more general versions of the parallel factor analysis model are also mentioned.

Keywords: Three-way exploratory factor analysis; unique axes; parallel proportional profiles; factor rotation problem; three-way data preprocessing; three mode principal components; trilinear decomposition; trilinear model; multidimensional scaling; longitudinal factor analysis; factor analysis of spectra; interpretation of factors; 'Real' or causal or explanatory factors; L. R. Tucker; R. B. Cattell

Back to publications
Back to home page

Harshman, R. A. (1994). Substituting statistical for physical decomposition: Are there applications for parallel factor analysis (PARAFAC) in non-destructive evaluation? In X. P. V. Maldague (Ed.), Advances in signal processing for nondestructive evaluation of materials (pp. 469-483). NATO ASI Series E: Applied Sciences, Vol. 262. Dordrecht, The Netherlands: Kluwer Academic Publishers.

Summary The possible application to Nondestructive Evaluation (NDE) of the three-way parallel factor analysis method, called Parafac, is discussed. Parafac uses the distinct patterns of variation in magnitude across varying measurement conditions in the data to identify, for example, functionally distinct parts in a mixture. It has been used to decompose fluorescence spectroscopy measurements of complex samples into the individual spectra of the constituent chemical compounds comprising the mixture. Similarly, in NDE it is suggested that Parafac may be used to separate the mixture of signals from a test object into each causally/physically distinct component. This would be possible when there are several different test conditions or time slices across which the signal magnitude of each component varies in a way that is distinct from all the other components. Each factor extracted by the Parafac analysis would thus identify various normal and anomalous signals; anomalous ones might then be traced to their physical causes.

Back to publications

Harshman, R. A., & Lundy, M. E. (1990). Multidimensional analysis of preference structures. In A. deFontenay, M.H. Shugard, & D.S. Sibley (Eds.), Contributions to economic analysis: Vol. 187. Telecommunications demand modeling: An integrated view (pp.185-204). Amsterdam: Elsevier.

View .pdf file (1793 KB, 11 pp)
Back to home page

Summary A new approach to exploratory analysis of paired- comparison preference data is introduced. Called DEDICOM (for DEcomposition into DIrectional COMponents), it may be applied to real-valued matrices that are either symmetric, asymmetric or skew-symmetric. Here it is demonstrated with two skew-symmetric matrices, one an 8x8 set of preference strength ratings among food and the other a 9x9 set of preference choice frequencies among celebrities. Interpretation of DEDICOM solutions is discussed, including an explanation of 'bimensions' for skew-symmetric data (cf. dimensions for other types of data) and rotation issues. Also, DEDICOM is compared to psychological-model-based approaches, and other potential applications (e.g., telecommunications problems) are described.

Back to publications
Back to home page

Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. A. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41, 391-407.

Summary A new method for automatic indexing and retrieval is presented. It uses singular-value decomposition (SVD) to detect the implicit higher-order structure in the association of terms or keywords with documents (called 'semantic structure'); this structure can then be used to improve the retrieval of relevant documents on the basis of query terms. Here, a large term by document matrix is decomposed into around 100 orthogonal factors; these factors are used to approximate the original matrix. Documents are then represented by 100-item vectors of factor weights. Queries are represented as pseudo-document vectors formed by weighted combinations of terms; documents with cosine values above a given threshold are returned. Initial tests are encouraging.

View .pdf file (3061 KB, 17 pp)
Back to publications
Back to home page

Kruskal, J. B., Harshman, R. A., & Lundy, M. E. (1989). How 3-MFA can cause degenerate PARAFAC solutions, among other relationships. In R. Coppi & S. Bolasco (Eds.), Multiway data analysis (pp. 115-121). Amsterdam: Elsevier.

Summary The relationships among three models (1) three-mode factor analysis, (2) PARAFAC-CANDECOMP and (3) CANDELINC are discussed. Two Theorems and some assocated corollaries are also presented. Theorem 1 and its corollary show that data satisfying model (1) can cause degenerate solutions when fit by model (2). Theorem 2 and its corollaries connect all three models at once.

View .pdf file (645 KB, 8 pp)
Back to publications
Back to home page

Lundy, M. E., Harshman, R. A., & Kruskal, J. B. (1989). A two-stage procedure incorporating good features of both trilinear and quadrilinear methods. In R. Coppi & S. Bolasco (Eds.), Multiway data analysis (pp. 123-130). Amsterdam: Elsevier.

View .pdf file (1202 KB, 8 pp)
Back to home page

Summary The strengths of two models, the trilinear Parafac-Candecomp factor analysis model and the quadrilinear Tucker T3 model, are combined here in a procedure called PFCORE to find a solution for difficult data. The Tucker T2 and T3 models can fit more complex variance than the more constrained Parafac model can, but unlike Parafac, the solution is subject to an axis indeterminacy. Using Parafac to fit data that satisfies the Tucker models sometimes produces uninterpretable "degenerate" solutions, however, in which two or more factors are highly negatively correlated. An application of PFCORE to real data demonstrates how it provides unique meaningful axes (from Parafac) with a core matrix (from T3) to give useful insights into the data complexities that caused the Parafac degeneracy. More general models are also discussed.

Back to publications
Back to home page

Dawson, M. R., & Harshman, R. A. (1986). The multidimensional analysis of asymmetries in alphabetic confusion matrices: Evidence for global-to-local and local-to-global processing. Perception and Psychophysics, 40, 370-383.

View .pdf file (2848 KB, 14 pp)
Back to home page

Summary A multidimensional scaling program for asymmetric data, called DEDICOM (DEcomposition into DIrectional COMponents), is applied to two stimulus confusion data sets to test the competing local-to-global vs. global-to-local theories about letter-perception processes. Analysis of the first set revealed an additive hierarchy of asymmetry that is very consistent with global-to-local processing, but additional structure and reliable anomalies suggest the need for more refinement of the theory. In contrast, analysis of the second set revealed five distinct patterns, each consisting of transformations attributable to the failure to detect specific local features; that is, the solution supported the local-to-global theory. Possible reasons for this apparent inconsistency in results are discussed, including stimulus differences and exposure durations. Despite their differences, however, both solutions demonstrate how useful DEDICOM is in revealing structure in asymmetries.

Back to publications
Back to home page

DeSarbo, W. S., & Harshman, R. A. (1985). Celebrity-brand congruence analysis. Current Issues and Research in Advertising, 1, 17-52.

Summary While a commercial spokesman for a product directly influences the audience through his/her message, s/he also indirectly influences it by how s/he is perceived (called the "source" effect). A lot of research has been directed towards finding factors that might influence the source effect, but very little towards the problem of finding the "best" spokesperson for a given product. This paper uses the PARAFAC method to uncover the cognitive-perceptual overtones of both a product and possible spokespeople, and then tries to establish a basis for optimizing potentially desireable source effects. PARAFAC is a three-way factor analysis procedure which is here applied to 34 individuals' associative judgments (measured on 39 semantic differential scales) concerning 12 automobile makes and 12 celebrities (commercial spokespeople). The three dimensions of overtones that were extracted, labeled "Flashy", "Mature/Conservative", and "Feminine/Soft/Smooth", are related in distinctive ways to different automobiles and particular celebrities. They provide a basis for deciding which celebrity might be suited to sponsor a specific automobile.

An application of PARAFAC to a small sample problem...

Back to publications

Harshman, R. A., & Lundy, M. E. (1984). The PARAFAC model for three-way factor analysis and multidimensional scaling. In H. G. Law, C. W. Snyder, Jr., J. Hattie, & R. P. McDonald (Eds.), Research methods for multimode data analysis (pp. 122-215). New York: Praeger.

View .pdf file (11,506 KB, 94 pp)
Back to home page

Summary This is a comprehensive discussion of the different forms of the PARAFAC(-CANDECOMP) three-way factor analysis model, its "uniqueness" properties, and its relationship with other two- and three-way factor analysis models. First, the PARAFAC1 model for raw score or profile data ("direct fit") is derived as a generalization of the two-way factor model, issues of scaling and interpretation of the factor loading matrices are discussed (pp. 128-139; 192-203), and "system" vs. "object" variation as they relate to the model are examined (pp. 125-133). Next, the PARAFAC1 model for covariances ("indirect fit") is derived, first from the raw score model and then from more general assumptions. The first derivation shows PARAFAC1 to be a special case of PARAFAC2 (pp. 136-7). Indirect and direct fitting are compared and the advantages and disadvantages of the orthogonality assumption required by the indirect fit model are discussed (pp. 137-8), along with why correlations are inappropriate (p. 141), and whether the principal components model differs appreciably from the common factors model in the three-way indirect fit context (pp. 141-3). Finally, the relationship between factor analysis (PARAFAC) and metric MDS (INDSCAL, IDIOSCAL) is examined (pp. 144-7).
The next major section in the chapter deals with the uniqueness properties of the PARAFAC model (pp. 147-169). "Uniqueness" or "intrinsic axes" means that given certain assumptions, the solution determined from the data by PARAFAC has no alternative, equal-fitting form (i.e., any other rotation would reduce its fit to the data). Why this is important (pp. 147-150; 163-9) and the minimum conditions for uniqueness (pp. 161-2) are explained. The value of empirically confirming a PARAFAC solution by split-half, bootstrapping, and/or jackknifing procedures is also discussed.
The final section compares PARAFAC with other models, most notably Tucker's T3 (pp. 169-182), Corballis' three-way model (pp. 184-5), and Sands and Young's ALSCOMP (pp. 188-190). PARAFAC1 is described as a special case of the T3 model and vice versa, and a diagram on p. 175 shows how Carroll transforms a T3 representation into the corresponding PARAFAC one (two other methods of embedding T3 in PARAFAC are also discussed on pp. 176-178 and pp. 203-7). A family of related models for three-way profile data, arranged from most general to most restricted, is presented on p. 184. PARAFAC3 is listed between T3 and PARAFAC1 and is discussed on pp. 185-6. PARAFAC2 and DEDICOM, not in the table, are also discussed in relation to PARAFAC3 and T3 (p. 187).

Back to publications
Back to home page

Harshman, R. A., & Lundy, M. E. (1984). Data preprocessing and the extended PARAFAC model. In H. G. Law, C. W. Snyder, Jr., J. Hattie, & R. P. McDonald (Eds.), Research methods for multimode data analysis (pp.216-284). New York: Praeger.

View .pdf file (8402 KB, 69 pp)
Back to home page

Summary The data preprocessing discussed here is restricted to additive adjustments (centering) and multiplicative adjustments (rescaling or normalization) that may be applied to three-way profile data, for example, before direct fitting of the PARAFAC model. It does not include conversion of profile data to covariances or cross-products, nor does it include conversion of proximity data to scalar products. Eight reasons for preprocessing are given, the most important of which is "to make the data appropriate for the PARAFAC model" (p. 218), which is accomplished by centering. It is shown algebraically that "fiber"-centering and "slab"-reweighting are the only appropriate ways to preprocess data for the trilinear PARAFAC model (see p. 231 for a diagram of fibers and slabs). Also shown is the effect the proper preprocessing has on the original factor loadings (e.g., centering Mode A of the data centers the Mode A factors; rescaling Mode B similarly reweights the rows of the Mode B factor matrix). Practical guidelines for deciding how to preprocess data are given on pp. 257-259. Preprocessing is also discussed from the perspective of extending the PARAFAC model to more general data (instead of making the data appropriate for a restricted model, as above). In this context, "degenerate" solutions and their possible causes are described. Degenerate solutions are characterized by two or more factors whose loadings are highly correlated across all three modes, with a negative triple-product. Appropriate centering can sometimes correct this problem, but not always. Other times an orthogonally constrained PARAFAC procedure can block the high correlations and yield an interpretable solution. In these instances, it seems that while more general Tucker-type structure is present in the data, the constraints allow a subset of Tucker variations to be meaningfully expressed via orthogonal PARAFAC factors.

Back to publications
Back to home page

Harshman, R. A., & DeSarbo, W. S. (1984). An application of PARAFAC to a small sample problem, demonstrating preprocessing, orthogonality constraints, and split-half diagnostic techniques. In H. G. Law, C. W. Snyder, Jr., J. Hattie, & R. P. McDonald (Eds.), Research methods for multimode data analysis (pp.602-642). New York: Praeger.

View .pdf file (3406 KB, 41 pp)
Back to home page

Summary This paper presents a detailed account of an application of the PARAFAC three-way factor analysis procedure to a small set of marketing data, ratings of 25 stimuli (names of automobiles and celebrities) made by 34 raters using 39 bipolar scales. The preferred three-dimensional solution, with factors labeled as "Flashy", "Mature/Conservative" and "Feminine-Soft-Smooth", is given along with other information such as diagnostic checks. The extra information is instructive because it explains considerations that arise and how decisions are made in the course of an analysis as the analyst tries to understand the data. Preprocessing (removal of means, standardization of mean squares) the data is discussed first. To check how many factors can be supported by the preprocessed data, diagnostic assessment of the fit values for the different dimensional solutions and the within-solution correlations among dimensions is done. Split-half analyses are also performed as a check of the factor reliability in this sample. Because highly correlated within-solution factors arise when more than three factors are extracted, orthogonality constraints are imposed to prevent them, and the analyses repeated. Diagnostic checks are repeated for these new solutions, and an additional comparison of orthogonal vs. unconstrained solutions done. Finally, a detailed discussion of the unconstrained three-factor solution is presented. Another diagnostic check is done for this solution, this time an error analysis.

Celebrity-brand congruence analysis.
"How can I know if it's real?" (next summary)

Back to publications
Back to home page

Harshman, R. A. (1984). "How can I know if it's real?" A catalogue of diagnostics for use with three-mode factor analysis and multidimensional scaling. In H. G. Law, C. W. Snyder, Jr., J. Hattie, & R. P. McDonald (Eds.), Research methods for multimode data analysis (pp. 566-591). New York: Praeger.

View .pdf file (2806 KB, 26 pp)
Back to home page

Summary This paper suggests that multivariate analysis procedures should include diagnostic evaluation at various stages of the analysis to assess the appropriateness of the model, the computational adequacy of the fitting procedure, the statistical reliability of the solution, and the generalizability and explanatory validity of any resulting interpretations. The list of diagnostics presented is organized according to when the check would be done and the type of information needed (e.g., the data itself; factor loadings from a single analysis; loadings from several different analyses) and which aspects of the solution they focus on (e.g., the factor loadings, the residuals, the fit values). A brief description of what to do and why is provided for each diagnostic. Of all the checks mentioned, the most important are probably those for evaluating the reliability of any characteristics of the solution (e.g., split-half analyses, bootstrapping, jackknifing). These can be used to estimate the maximum dimensionality and determine which aspects of the solution are stable enough to warrant interpretation.

Back to publications
Back to home page

Harshman, R. A., Green, P. E., Wind, Y., & Lundy, M. E. (1982). A model for the analysis of asymmetric data in marketing research. Marketing Science, 1, 205-242.

View .pdf file (3675 KB, 38 pp)
Back to home page

Summary Marketing researchers have applied numerous methods for the multidimensional scaling (MDS) of perceptions and preferences, but there has been a lack of models for analyzing inherently asymmetric data relationships. However, Harshman has proposed the DEDICOM (DEcomposition into DIrectional COMponents) family of models for such data. This article describes the single-domain DEDICOM model, applies it to two matrices of asymmetric relationships--shampoo word associations and car brand-switching--and compares the DEDICOM solutions to those obtained from a symmetric (factor analysis) model. Besides demonstrating that DEDICOM solutions can be more easily interpreted and have significantly better fit than solutions obtained by factor analysis, the analyses also show the potential marketing value of additional information provided by DEDICOM--a description of the asymmetry among the dimensions. For example, two DEDICOM dimensions (labeled "Thickness" and "Vigor") are extracted from the shampoo word associations data; words loading high on the Vigor dimension evoke those high on the Thickness dimension much more often than vice versa. This might be useful in writing advertising copy. For the car-switching data, the four DEDICOM dimensions extracted (labeled "Plain Large Midsize", "Specialty", "Fancy Large", and "Small") reflect a much larger "flow" from Plain Large Midsize cars to Small than vice versa, a result with direct implications for the automobile manufacturing industry.

Keywords: Multidimensional scaling; factor analyses

Back to publications
Back to home page

Dunn, T., & Harshman, R. A. (1982). A multidimensional scaling model of the size-weight illusion. Psychometrika, 47, 25-45.

Summary The weighted Euclidean model for multidimensional scaling (e.g., INDSCAL) is much more restricted in the kinds of individual differences in perceptions it permits than either Tucker's Three-mode Multidimensional Scaling model or Carroll's Idiosyncratic Scaling (IDIOSCAL) model. Investigators have nonetheless been reluctant to use these more general models because they are subject to transformational indeterminacies which complicate interpretation. This article shows how these indeterminacies can be removed by constructing specific models of the phenomenon under investigation. To demonstrate this approach, the size-weight illusion is developed and applied to data from two experiments. The same data were also analyzed using INDSCAL. Of the two solutions, only the size-weight one permits examination of individual differences in the strength of the illusion. In this sample, however, individual differences in illusion strength are minor. Thus the INDSCAL solution is easily interpretable, even though it is less informative than the size-weight one.

Keywords: Individual differences, multidimensional scaling, three-mode factor, INDSCAL, size-weight illusion

Back to publications

Harshman, R. A., & Berenbaum, S. A. (1981). Basic concepts underlying the PARAFAC- CANDECOMP three-way factor analysis model and its application to longitudinal data. In D. H. Eichorn, J. A. Clausen, N. Haan, M. P. Honzik, & P. H. Mussen (Eds.), Present and past in middle life (pp. 435-459). New York: Academic Press.

View .pdf file (2542 KB, 25 pp)
Back to home page

Summary This is a comprehensive discussion of the three-way factor analysis model, PARAFAC-CANDECOMP, with specific reference to considerations that must be addressed if the data to be analyzed are longitudinal (i.e., measurements are repeated several different times). The model for fitting (raw) score data is presented, along with the model for fitting a set of covariances. General assumptions discussed are that (a) system variation is being fit; (b) within each factor, proportional patterns of variation occur across levels of the third way of the data (e.g., occasions); and (c) between any two factors, distinct (nonproportional) patterns in variation occur across levels of the third way of the data. The special property of rotational uniqueness of the PARAFAC solution (compared to the rotational indeterminacy of two-way factor analysis solutions) is explained. Finally, special assumptions underlying PARAFAC analyses of longitudinal data are addressed, including (a) factor loading invariance, (b) the nature of factor score changes, (c) orthogonal vs. oblique factors, (d) error terms and (e) linear independence of factor variations across time.

Back to publications

Gandour, J., & Harshman, R. A. (1978). Cross-language differences in tone perception: A multidimensional scaling investigation. Language and Speech, 21, 1-33.

Summary The three-way factor analysis procedure, PARAFAC, is used for a multidimensional scaling (MDS) analysis of tone perception data. The analysis is done to determine what dimensions underlie the perception of linguistic tone, and to what extent an individual's language background influences his/her perception. Paired-comparison judgements of 13 different pitch patterns superimposed on a synthetic speech-like syllable were obtained from 140 subjects (101 Thai, 15 Yoruba and 24 American English) and then transformed to scalar products to make the data suitable for the PARAFAC analysis. Five dimensions, labeled "Average Pitch", "Direction", "Length", "Extreme Endpoint", and "Slope" were extracted. Discriminant analysis showed that most speakers of a tonal language (Thai, Yoruba) have patterns of perceptual saliency on these dimensions that are distinct from those of speakers of a nontonal language (English). Also, regression analysis suggested that the Direction and Slope dimensions closely correspond to distinctive features of tone that have been postulated to be binary.

Back to publications

Ladefoged, P., Harshman, R., Goldstein, L., & Rice, L. (1978). Generating vocal tract shapes from formant frequencies. Journal of the Acoustical Society of America, 64, 1027-1035.

View .pdf file (1609 KB, 10 pp)
Back to publications

Summary The purpose of this research is to devise an algorithm that generates appropriate vocal tract shapes as seen on midsagittal X-ray diagrams of most English vowels. The first three formant frequencies only are used, and the shape of the tongue is specified in terms of the sum of two factors--a front raising component and a back raising component--obtained from PARAFAC, a factor analysis procedure for three-way data. Stepwise multiple regression techniques show that the proportions of the two tongue shape components and of a third parameter corresponding to the distance between the lips are highly correlated with the formant frequencies in 50 vowels. The recovery algorithm developed from these correlations is tested on a number of published sets of tracings from X-ray diagrams, and appears to be generalizable to other speakers.

Factor analysis of tongue shapes. (next summary)

Back to publications

Harshman, R. A., Ladefoged, P., & Goldstein, L. (1977). Factor analysis of tongue shapes. Journal of the Acoustical Society of America, 62, 693-707.

View .pdf file (2772 KB, 15 pp)
Back to home page

Summary A factor analysis procedure for three-way data, called PARAFAC, is applied to a description of the shape of the tongue during the pronunciation of English vowels, to see if it can be explained in terms of a few underlying factors. The data analyzed consists of 13 measures of tongue displacement during pronunciation of 10 vowels by 5 speakers. Two factors, which account for more than 92% of the variance in the data, were extracted. One factor generates a forward movement of the root of the tongue accompanied by an upward movement of the front of the tongue. Movements from front to back vowels involve decreasing amounts of this factor. The second factor generates an upward and backward movement of the tongue. Movements from high to low vowels involve decreasing amounts of this factor. The two factors are used to different degrees by different speakers, depending on their individual anatomy.

Back to publications

Harshman, R. A. (1972). PARAFAC2: Mathematical and technical notes. UCLA Working Papers in Phonetics, 22, 30-47. (University Microfilms, Ann Arbor, No. 10,085).

View .pdf file (124 KB, 18 pp)
Back to home page

Summary The mathematical model of PARAFAC1 is reviewed, and its application to cross-product matrices (e.g., covariances, scalar products) is examined. A similar examination is made of the INDSCAL-PARAFAC1 model for three-mode multidimensional scaling. Then a three-mode model called PARAFAC2 is developed to deal specifically with sets of cross-product matrices. It can determine orthogonal or oblique factors, whichever best fit the data. Another important advantage is its greater generality, since it is not restricted to analysis of system-variation data. The PARAFAC2 model is derived in two ways, one invoking the restricted system-variation model and one using a more general model. The interpretation of PARAFAC2 is discussed, and a precise definition is given of the type of uniqueness it provides. No formal proof of uniqueness is given; so far, there is only empirical evidence for the uniqueness of PARAFAC2 solutions. Current work on computer algorithms for fitting the PARAFAC2 model is described, as is a method for circumventing the "communalities" problem. Also, the difference between covariance and correlation matrices is explained in the context of the model, and its use in conjunction with more general models such as IDIOSCAL is discussed.

Uniqueness proof for a family of models...
Determination and proof of minimum uniqueness conditions for PARAFAC1. (next summary)

Back to publications
Back to home page

Harshman, R. A. (1972). Determination and proof of minimum uniqueness conditions for PARAFAC1. UCLA Working Papers in Phonetics, 22, 111-117. (University Microfilms, Ann Arbor, No. 10,085).

View .pdf file (52 KB, 7 pp)
Back to home page

Summary The first mathematical uniqueness proof of PARAFAC1, discovered by Robert Jennerich of the UCLA Biomathematics Department, along with the results of empirical studies of the model's uniqueness, were reported in Harshman (1970). The empirical tests suggested that the conditions required by Jennrich's theorem were stronger than necessary in order to determine uniqueness, but the minimal conditions could not be determined empirically. The proof given here shows that two occasions can be sufficient to uniquely determine any number of factors, provided that (a) the factors change size from the first to the second occasions and (b) the percent change of each factor is different than that of the other factors. It is shown how this proof applies to Carroll and Chang's INDSCAL model and to the orthogonal factor case of PARAFAC2.

Foundations of the PARAFAC procedure...
PARAFAC2: Mathematical and technical notes.

Back to publications
Back to home page

Lindau, M., Harshman, R., & Ladefoged, P. (1971). Factor analysis of formant frequencies of vowels. UCLA Working Papers in Phonetics, 19, 17-25. (University Microfilms, Ann Arbor, No. 10,085).

Summary This is a progress report on several sets of three-way factor analyses that were done using the PARAFAC procedure. The purpose is to isolate phonetic vowel quality from the whole spectrum, and to discover acoustic dimensions that will provide an objective definition and physical explanation for phonetic vowel quality. The first set of analyses is of four formant frequencies (converted into pitch values of mels) for eight cardinal vowels, as spoken by eleven phoneticians (previously reported by Harshman, 1970). One interpretation of the three-factor solution suggests a one-to-one correspondence between factors and the traditional dimensions of "Vowel Height", "Front-Backness" and "Lip Rounding". Because of some ambiguity between the "front-back" and "rounding" factors, however, sets of Swedish vowels in which "backness" and "rounding" are independent are analyzed next. Preliminary results suggest three factors, just as for the cardinal vowels. Several hypotheses based on these results are introduced, including one that the "Vowel Height" factor may be the most basic property of vowels.

Back to publications

Harshman, R. A. (1970). Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-modal factor analysis. UCLA Working Papers in Phonetics, 16, 84 pp. (University Microfilms, Ann Arbor, No. 10,085).

View .pdf file (386 KB, 84 pp)
Back to home page

Summary Simple structure and other common principles of factor rotation do not in general provide strong grounds for attributing explanatory significance to the factors which they select. In contrast, it is shown that an extension of Cattell's principle of rotation to Proportional Profiles (PP) offers a basis for determining explanatory factors for three-way or higher order multi-mode data. Conceptual models are developed for two basic patterns of multi-mode data variation, system- and object-variation, and PP analysis is found to apply in the system-variation case.
    Although PP was originally formulated as a principle of rotation to be used with classic two-way factor analysis, it is shown to embody a latent three-way factor model, which is here made explicit and generalized from two to N "parallel occasions". As originally formulated, PP rotation was restricted to orthogonal factors. The generalized PP model is demonstrated to give unique "correct" solutions with oblique, non-simple structure, and even non-linear factor structures.
    A series of tests, conducted with synthetic data of known factor composition, demonstrate the capabilities of linear and non-linear versions of the mode, provide data on the minimal necessary conditions of uniqueness, and reveal the properties of the analysis procedures when these minimal conditions are not fulfilled. In addition, a mathematical proof is presented for the uniqueness of the solution given certain conditions on the data.
    Three-mode PP factor analysis is applied to a three-way set of real data consisting of the fundamental and first three formant frequencies of ll persons saying 8 vowels. A unique solution is extracted, consisting of three factors which are highly meaningful and consistent with prior knowledge and theory concerning vowel quality.
    The relationships between the three-mode PP model and Tucker's multi-modal model, McDonald's non-linear model and Carroll and Chang's multi-dimensional scaling model are explored.

Back to publications
Back to home page