Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
17 views5 pages

Lps Week 16 Iatb

Data mining, also known as Knowledge Discovery in Database (KDD), is a process used to extract valuable information from large datasets through various steps including data cleaning, integration, and pattern evaluation. It has applications across multiple sectors such as healthcare, finance, and marketing, enabling organizations to make data-driven decisions and identify trends. However, challenges such as data privacy, complexity, and the need for advanced tools can complicate its implementation.

Uploaded by

mphaolyn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views5 pages

Lps Week 16 Iatb

Data mining, also known as Knowledge Discovery in Database (KDD), is a process used to extract valuable information from large datasets through various steps including data cleaning, integration, and pattern evaluation. It has applications across multiple sectors such as healthcare, finance, and marketing, enabling organizations to make data-driven decisions and identify trends. However, challenges such as data privacy, complexity, and the need for advanced tools can complicate its implementation.

Uploaded by

mphaolyn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

LPS Week 16 – IATB

Data Mining

Data mining is one of the most useful techniques that help entrepreneurs, researchers, and individuals to extract
valuable information from huge sets of data. Data mining is also called Knowledge Discovery in Database (KDD). The
knowledge discovery process includes Data cleaning, Data integration, Data selection, Data transformation, Data
mining, Pattern evaluation, and Knowledge presentation.

The process of extracting information to identify patterns, trends, and useful data that would allow the business to
take the data-driven decision from huge sets of data is called Data Mining.

In other words, we can say that Data Mining is the process of investigating hidden patterns of information to various
perspectives for categorization into useful data, which is collected and assembled in particular areas such as data
warehouses, efficient analysis, data mining algorithm, helping decision making and other data requirement to
eventually cost-cutting and generating revenue.

Data mining is the act of automatically searching for large stores of information to find trends and patterns that go
beyond simple analysis procedures. Data mining utilizes complex mathematical algorithms for data segments and
evaluates the probability of future events. Data Mining is also called Knowledge Discovery of Data (KDD).
Data Mining is a process used by organizations to extract specific data from huge databases to solve business
problems. It primarily turns raw data into useful information.

STEPS IN DATA MINING


The data mining process involves several steps:
Defining the problem.
Building the database.
Examining the data.
Preparing a model to be used to probe the data.
Testing the model.
Using the model.
Putting the results into action.

Types of Data Mining


Data mining can be performed on the following types of data:
Relational Database:
A relational database is a collection of multiple data sets formally organized by tables, records, and columns from
which data can be accessed in various ways without having to recognize the database tables. Tables convey and
share information, which facilitates data searchability, reporting, and organization.

Data warehouses:
A Data Warehouse is the technology that collects the data from various sources within the organization to provide
meaningful business insights. The huge amount of data comes from multiple places such as Marketing and Finance.
The extracted data is utilized for analytical purposes and helps in decision making for a business organization. The data
warehouse is designed for the analysis of data rather than transaction processing.
Data Repositories:
The Data Repository generally refers to a destination for data storage. However, many IT professionals utilize the
term more clearly to refer to a specific kind of setup within an IT structure. For example, a group of databases, where
an organization has kept various kinds of information.
Object-Relational Database:
A combination of an object-oriented database model and relational database model is called an object relational
model. It supports Classes, Objects, Inheritance, etc.
Transactional Database:
A transactional database refers to a database management system (DBMS) that has the potential to undo a database
transaction if it is not performed appropriately.

Advantages of Data Mining


The Data Mining technique enables organizations to obtain knowledge-based data.
Data Mining helps the decision-making process of an organization.
It Facilitates the automated discovery of hidden patterns as well as the prediction of trends and behaviors.
It can be induced in the new system as well as the existing platforms.
It is a quick process that makes it easy for new users to analyze enormous amounts of data in a short time.

Disadvantages of Data Mining


There is a probability that the organizations may sell useful data of customers to other organizations for money.
Many data mining analytics software is difficult to operate and needs advance training to work on.
Different data mining instruments operate in distinct ways due to the different algorithms used in their design.
Therefore, the selection of the right data mining tools is a very challenging task.
The data mining techniques are not precise, so that it may lead to severe consequences in certain conditions.

Data Mining Applications


Data Mining is primarily used by organizations with intense consumer demands- Retail, Communication, Financial,
marketing company, determine price, consumer preferences, product positioning, and impact on sales, customer
satisfaction, and corporate profits. Data mining enables a retailer to use point-of-sale records of customer purchases to
develop products and promotions that help the organization to attract the customer. These are :

Data Mining in Healthcare:


Data mining in healthcare has excellent potential to improve the health system. It uses data and analytics for better
insights and to identify best practices that will enhance health care services and reduce costs.

Data Mining in Market Basket Analysis:


Market basket analysis is a modeling method based on a hypothesis. If you buy a specific group of products, then
you are more likely to buy another group of products.

Data mining in Education:


Education data mining is a newly emerging field, concerned with developing techniques that explore knowledge from
the data generated from educational Environments. EDM objectives are recognized as affirming student's future
learning behavior, studying the impact of educational support, and promoting learning science.
.
Data Mining in Manufacturing Engineering:
Knowledge is the best asset possessed by a manufacturing company. Data mining tools can be beneficial to find
patterns in a complex manufacturing process. Data mining can be used in system-level designing to obtain the
relationships between product architecture, product portfolio, and data needs of the customers. It can also be used to
forecast the product development period, cost, and expectations among the other tasks.

Data Mining in CRM (Customer Relationship Management):


Customer Relationship Management (CRM) is all about obtaining and holding Customers, also enhancing customer
loyalty and implementing customer-oriented strategies. To get a decent relationship with the customer, a business
organization needs to collect data and analyze the data. With data mining technologies, the collected data can be used
for analytics.

Data Mining in Fraud detection:


Billions of dollars are lost to the action of frauds. Traditional methods of fraud detection are a little bit time consuming
and sophisticated. Data mining provides meaningful patterns and turning data into information.
Data Mining in Lie Detection:
Apprehending a criminal is not a big deal, but bringing out the truth from him is a very challenging task. Law
enforcement may use data mining techniques to investigate offenses, monitor suspected terrorist communications,
etc. constructed.

Data Mining Financial Banking:


The Digitalization of the banking system is supposed to generate an enormous amount of data with every new
transaction. The data mining technique can help bankers by solving business-related problems in banking and finance
by identifying trends, casualties, and correlations in business information and market costs that are not instantly
evident to managers or executives because the data volume is too large or are produced too rapidly on the screen by
experts.

Challenges of Implementation in Data mining

Incomplete and noisy data:


The process of extracting useful data from large volumes of data is data mining. The data in the real world is
heterogeneous, incomplete, and noisy. Data in huge quantities will usually be inaccurate or unreliable.

Data Distribution:
Real-worlds data is usually stored on various platforms in a distributed computing environment. It might be in a
database, individual systems, or even on the internet. Practically, It is a quite tough task to make all the data to a
centralized data repository mainly due to organizational and technical concerns.

Complex Data:
Real-world data is heterogeneous, and it could be multimedia data, including audio and video, images, complex data,
spatial data, time series, and so on.

Performance:
The data mining system's performance relies primarily on the efficiency of algorithms and techniques used. If the
designed algorithm and techniques are not up to the mark, then the efficiency of the data mining process will be
affected adversely.

Data Privacy and Security:


Data mining usually leads to serious issues in terms of data security, governance, and privacy. For example, if a
retailer analyzes the details of the purchased items, then it reveals data about buying habits and preferences of
the customers without their permission.

Data Visualization:
In data mining, data visualization is a very important process because it is the primary method that shows the output to
the user in a presentable way. The extracted data should convey the exact meaning of what it intends to express.

Techniques For Data Mining

Cluster Analysis.
Cluster analysis is a data reduction technique that groups together either variables or cases based on similar data
characteristics. This technique is useful for finding customer segments based on characteristics such as demographic
and financial information or purchase behavior. For example, suppose a bank wants to find segments of customers
based on the types of accounts they open.

Linear Regression.
Linear regression is a method that fits a straight line through data. If the line is upward sloping, it means that an
independent variable such as the size of a sales force has a positive effect on a dependent variable such as revenue. If
the line is downward sloping, there is a negative effect. The steeper the slope, the more effect the independent
variable has on the dependent variable.

Correlation.
Correlation is a measure of the relationship between two variables. For example, a high correlation between purchases
of certain products such as cheese and crackers indicates that these products are likely to be purchased together.

Factor Analysis.
Factor analysis is a data reduction technique. This technique detects underlying factors, also called "latent variables,"
and provides models for these factors based on variables in the data. For example, suppose you have a market
research survey that asks the importance of nine product attributes.

Decision Trees.
Decision trees separate data into sets of rules that are likely to have different effects on a target variable. For example,
we might want to find the characteristics of a person likely to respond to a direct mail piece. These characteristics can
be translated into a set of rules.

Neural Networks.
Neural networks mimic the human brain and can "learn" from examples to find patterns in data or to classify data. The
advantage is that it is not necessary to have any specific model in mind when running the analysis. Also, neural
networks can find interaction effects (such as effects from the combination of age and gender) which must be explicitly
specified in regression.

Association Models.
Association models examine the extent to which values of one field depend on, or are predicted by, values of another
field. Association discovery finds rules about items that appear together in an event such as a purchase transaction.

Data Mining Tools/Software


1. Orange Data Mining:

Orange is a perfect machine learning and data mining software suite. It supports the visualization
and is a software-based on components written in Python computing language and developed at
the bioinformatics laboratory at the faculty of computer and information science, Ljubljana
University, Slovenia.

The instrument has machine learning components, add-ons for bioinformatics and text mining,
and it is packed with features for data analytics. This is also used as a python library.

Learners can also be diversified by altering their parameter sets. In orange, ensembles are simply
wrappers around learners. They act like any other learner. Based on the data, they return models
that can predict the results of any data instance.

2. SAS Data Mining:


SAS stands for Statistical Analysis System. It is a product of the SAS Institute created for analytics
and data management. SAS can mine data, change it, manage information from various sources,
and analyze statistics. It offers a graphical UI for non-technical users.

SAS data miner allows users to analyze big data and provide accurate insight for timely decision-
making purposes. SAS has distributed memory processing architecture that is highly scalable. It is
suitable for data mining, optimization, and text mining purposes.

3. DataMelt Data Mining:


DataMelt is a computation and visualization environment which offers an interactive structure for
data analysis and visualization. It is primarily designed for students, engineers, and scientists. It is
also known as DMelt.

DMelt is a multi-platform utility written in JAVA. It can run on any operating system which is
compatible with JVM (Java Virtual Machine). It consists of Science and mathematics libraries.

DMelt can be used for the analysis of the large volume of data, data mining, and statistical
analysis. It is extensively used in natural sciences, financial markets, and engineering.

4. Rattle:
Ratte is a data mining tool based on GUI. It uses the R stats programming language. Rattle exposes
the statical power of R by offering significant data mining features. While rattle has a
comprehensive and well-developed user interface, It has an integrated log code tab that produces
duplicate code for any GUI operation.

The data set produced by Rattle can be viewed and edited. Rattle gives the other facility to review
the code, use it for many purposes, and extend the code without any restriction.

5. Rapid Miner:
Data Mining Tools
Rapid Miner is one of the most popular predictive analysis systems created by the company with
the same name as the Rapid Miner. It is written in JAVA programming language. It offers an
integrated environment for text mining, deep learning, machine learning, and predictive analysis.

Rapid Miner provides the server on-site as well as in public or private cloud infrastructure. It has a
client/server model as its base. A rapid miner comes with template-based frameworks that enable
fast delivery with few errors(which are commonly expected in the manual coding writing process)

The Future Of Data Mining


One of the key issues raised by data mining technology is not a business or technological one, but a social one. It is
concern about individual privacy. Data mining makes it possible to analyze routine business transactions and glean a
significant amount of information about individuals' buying habits and preferences.
Another issue is that of data integrity. Clearly, data analysis can only be as good as the data that is being analyzed. A
key implementation challenge is integrating conflicting or redundant data from different sources. For example, a bank
may maintain credit card accounts on several different databases. The address (or even the name) of a single
cardholder may be different in each. Software must translate data from one system to another and select the address
most recently entered.

Finally, there is the issue of cost. While system hardware costs have dropped dramatically within the past five years,
data mining and data warehousing tend to be self-reinforcing. The more powerful the data mining queries, the greater
the usefulness of the information being gleaned from the data, and the greater the pressure to increase the amount of
data being collected and maintained. The result is increased pressure for faster, more powerful data mining queries.
These more efficient data mining systems often cost more than their predecessors.

References:
https://www.javatpoint.com/data-mining
https://www.encyclopedia.com/science-and-technology/computers-and-electrical
engineering/computers-and-computing/data-mining
https://www.javatpoint.com/data-mining-tools

You might also like