Principles of Data Mining

Hardcover
$85.00 US
On sale Aug 17, 2001 | 578 Pages | 9780262082907

The first truly interdisciplinary text on data mining, blending the contributions of information science, computer science, and statistics.

The growing interest in data mining is motivated by a common problem across disciplines: how does one store, access, model, and ultimately describe and understand very large data sets? Historically, different aspects of data mining have been addressed independently by different disciplines. This is the first truly interdisciplinary text on data mining, blending the contributions of information science, computer science, and statistics.

The book consists of three sections. The first, foundations, provides a tutorial overview of the principles underlying data mining algorithms and their application. The presentation emphasizes intuition rather than rigor. The second section, data mining algorithms, shows how algorithms are constructed to solve specific problems in a principled manner. The algorithms covered include trees and rules for classification and regression, association rules, belief networks, classical statistical models, nonlinear models such as neural networks, and local "memory-based" models. The third section shows how all of the preceding analysis fits together when applied to real-world data mining problems. Topics include the role of metadata, how to handle missing data, and data preprocessing.

About

The first truly interdisciplinary text on data mining, blending the contributions of information science, computer science, and statistics.

The growing interest in data mining is motivated by a common problem across disciplines: how does one store, access, model, and ultimately describe and understand very large data sets? Historically, different aspects of data mining have been addressed independently by different disciplines. This is the first truly interdisciplinary text on data mining, blending the contributions of information science, computer science, and statistics.

The book consists of three sections. The first, foundations, provides a tutorial overview of the principles underlying data mining algorithms and their application. The presentation emphasizes intuition rather than rigor. The second section, data mining algorithms, shows how algorithms are constructed to solve specific problems in a principled manner. The algorithms covered include trees and rules for classification and regression, association rules, belief networks, classical statistical models, nonlinear models such as neural networks, and local "memory-based" models. The third section shows how all of the preceding analysis fits together when applied to real-world data mining problems. Topics include the role of metadata, how to handle missing data, and data preprocessing.

Author

Three Penguin Random House Authors Win Pulitzer Prizes

On Monday, May 5, three Penguin Random House authors were honored with a Pulitzer Prize. Established in 1917, the Pulitzer Prizes are the most prestigious awards in American letters. To date, PRH has 143 Pulitzer Prize winners, including William Faulkner, Eudora Welty, Josh Steinbeck, Ron Chernow, Anne Applebaum, Colson Whitehead, and many more. Take a look at our 2025 Pulitzer Prize

Read more

Books for LGBTQIA+ Pride Month

In June we celebrate Lesbian, Gay, Bisexual, Transgender, Queer, Intersex, and Asexual + (LGBTQIA+) Pride Month, which honors the 1969 Stonewall riots in Manhattan. Pride Month is a time to both celebrate the accomplishments of those in the LGBTQ+ community and recognize the ongoing struggles faced by many across the world who wish to live

Read more