Skip to Main Content

CSC4008 Techniques for Data Mining: Home

Course Description

This course builds theoretical understanding of data mining for the students, while provide the opportunity to learn skills and content to practice and engage in data mining and pattern discovery methods on massive data. The course is composed of the theoretical part and the practical part. The specific topics mainly include data preprocessing, finding similar items, data stream mining, link analysis, association rule mining, clustering approaches, recommender system, graph mining, Map Reduce, and so on.

Recommended Books

Introduction to Data Mining

Introduction to Data Mining, 2nd Edition , gives a comprehensive overview of the background and general themes of data mining and is designed to be useful to students, instructors, researchers, and professionals. Presented in a clear and accessible way, the book outlines fundamental concepts and algorithms for each topic, thus providing the reader with the necessary background for the application of data mining to real problems. The text helps readers understand the nuances of the subject, and includes important sections on classification, association analysis, and cluster analysis. This edition improves on the first iteration of the book, published over a decade ago, by addressing the significant changes in the industry as a result of advanced technology and data growth.

Data Mining: Concepts and Techniques

Since the previous edition's publication, great advances have been made in the field of data mining. Not only does the third of edition of Data Mining: Concepts and Techniques continue the tradition of equipping you with an understanding and application of the theory and practice of discovering patterns hidden in large data sets, it also focuses on new, important topics in the field: data warehouses and data cube technology, mining stream, mining social networks, and mining spatial, multimedia and other complex data. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. This is the resource you need if you want to apply today's most powerful data mining techniques to meet real business challenges.

Mining of Massive Data Sets

This book focuses on practical algorithms that have been used to solve key problems in data mining and can be applied successfully to even the largest datasets. It begins with a discussion of the MapReduce framework, an important tool for parallelizing algorithms automatically. The authors explain the tricks of locality-sensitive hashing and stream-processing algorithms for mining data that arrives too fast for exhaustive processing. Other chapters cover the PageRank idea and related tricks for organizing the Web, the problems of finding frequent itemsets, and clustering. This third edition includes new and extended coverage on decision trees, deep learning, and mining social-network graphs.

Recommended Databases

Librarian