Major Issues in Data Mining
Mining Methodology ,User Interaction, Efficiency and Scalability, Diversity
of Database Types,Data Mining and Society
1.Mining Methodology
Researchers have been developing new data mining methodologies.
This involves the investigation of
Mining various and new kinds of knowledge: Data mining covers a wide
spectrum of data analysis and knowledge discovery tasks. These tasks may
use the same database in different ways and require the development of
numerous data mining techniques
Mining knowledge in multidimensional space: When searching for
knowledge in large data sets, we can explore the data in multidimensional
space. That is we can search for interesting patterns among combinations of
dimensions (attributes ) at varying levels of abstraction. Such mining is
known as multidimensional data mining.
Data mining – an interdisciplinary effort: The power of data mining can be
enhanced by integrating new methods from multiple disciplines.
Boosting the power of discovery in a networked environment: Most data
objects reside in a linked or interconnected environment. Knowledge derived
in one set of objects can be used to boost the discovery of knowledge in a
“related” or semantically linked set of objects.
Handling uncertainty, noise, or incompleteness of data: Data often contain
noise, errors, exceptions, or uncertainty or are incomplete. Noise and errors
may confuse the data mining process, leading to the derivation of erroneous
patterns. Data cleaning, data preprocessing, outlier detection and removal,
and uncertainty reasoning are examples of techniques that need to be
integrated with data mining process.
Pattern evaluation and pattern- or constraint-guided mining: Techniques
are needed to assess the interestingness of discovered patterns based on
subjective measures. These estimate the value of patterns with respect to a
given user class, based on user beliefs or expectations.
2.User Interaction
The user plays an important role in the data mining process. Research areas
include how to interact with a data mining system, how to incorporate a
user’s back ground knowledge in mining, and how to visualize and
comprehend data mining results
Interactive mining: Data mining process should be highly interactive. Thus it
is important to build flexible user interfaces and an exploratory mining
environment facilitating the user’s interaction with the system.
Incorporation of background knowledge: background knowledge,
constraints, rules and other information regarding the domain under study
should be incorporated into the knowledge discovery process. Such
knowledge can be used for pattern evaluation as well as to guide the search
toward interesting patterns
Ad hoc data mining and data mining query language: High level data
mining query languages or other high-level flexible user interfaces will give
users the freedom to define ad hoc data mining tasks. This should facilitate
specification of the relevant sets of data for analysis, the domain knowledge,
the kind of knowledge to be mined etc. Optimization of the processing of
such flexible mining requests is another promising area of study
Presentation and visualization of data mining results: A data mining system
must adopt expressive knowledge representations, user friendly interfaces,
and visualization techniques so that the discovered knowledge can be easily
understood and directly usable by humans
3.Efficiency and scalability
Efficiency and scalability are always considered when comparing data
mining algorithms
Efficiency and scalability of data mining algorithms: Data mining
algorithms must be efficient and scalable in order to effectively extract
information from huge amounts of data in many data repositories. Efficiency,
scalability, performance, optimization, and the ability to execute in real time
are key criteria that drive the development of many data mining algorithms
Parallel, distributed, and incremental mining algorithms: Such algorithms
first partition the data into pieces. Each piece is processed, in parallel, by
searching for patterns. The parallel processes may interact with one another.
The patterns from each partition are eventually merged
Cloud computing and cluster computing: They use computers in a
distributed and collaborative way to tackle very large scale computational
tasks. It is also an active research area
4.Diversity of Database Types
The wide variety of database types brings challenges to data mining. This
include
Handling complex types of data: Diverse applications generate a wide
spectrum of new data types. Domain or application dedicated data mining
systems are being constructed for in-depth mining of specific kinds of data.
The construction of effective and efficient data mining tools for diverse
applications remains a challenging and active area of research.
Mining dynamic, networked, and global data repositories: Multiple sources
of data are connected by the Internet and various kinds of networks, forming
gigantic, distributed, and heterogeneous global information systems and
networks. Mining such gigantic, interconnected information networks may
help disclose many more patterns and knowledge in heterogeneous data sets
than those can be discovered from a small set of isolated data repositories.
5.Data Mining and Society
Social impacts of data mining: The improper disclosure or use of data and
potential violation of individual privacy and data protection rights are areas
of concern that need to be addressed.
Privacy preserving data mining: Data mining will help scientific discovery,
business management, economy recovery, and security protection. However
it poses the risk of disclosing an individual’s personal information. The
philosophy is to observe data sensitivity and preserve people’s privacy while
performing successful data mining.
Invisible Data mining: We cannot expect everyone in society to learn and
master data mining techniques. More and more systems should have data
mining functions built within so that people can perform data mining or use
data mining results without any knowledge of data mining algorithms.