Remarkable advances in computation and data storage and the ready availability of huge data sets have been the keys to the growth of the new disciplines of data mining and machine learning, while the enormous success of the Human Genome Project has opened up the field of bioinformatics.
These exciting developments, which led to the introduction of many innovative statistical tools for high-dimensional data analysis, are described here in detail. The author takes a broad perspective; for the first time in a book on multivariate analysis, nonlinear methods are discussed in detail as well as linear methods. Techniques covered range from traditional multivariate methods, such as multiple regression, principal components, canonical variates, linear discriminant analysis, factor analysis, clustering, multidimensional scaling, and correspondence analysis, to the newer methods of density estimation, projection pursuit, neural networks, multivariate reduced-rank regression, nonlinear manifold learning, bagging, boosting, random forests, independent component analysis, support vector machines, and classification and regression trees. Another unique feature of this book is the discussion of database management systems.
This book is appropriate for advanced undergraduate students, graduate students, and researchers in statistics, computer science, artificial intelligence, psychology, cognitive sciences, business, medicine, bioinformatics, and engineering. Familiarity with multivariable calculus, linear algebra, and probability and statistics is required. The book presents a carefully-integrated mixture of theory and applications, and of classical and modern multivariate statistical techniques, including Bayesian methods. There are over 60 interesting data sets used as examples in the book, over 200 exercises, and many color illustrations and photographs.