Itc 601 Dmbi-Module 6 Notes
Itc 601 Dmbi-Module 6 Notes
6.1 WHAT IS BI ?
Define Business Intelligence with examples
BI (Business Intelligence) is a set of processes, architectures, and technologies that convert raw
data into meaningful information that drives profitable business actions. It is a suite of software and
services to transform data into actionable intelligence and knowledge.
BI has a direct impact on organization's strategic, tactical and operational business decisions. BI
supports fact-based decision making using historical data rather than assumptions and gut feeling.
BI tools perform data analysis and create reports, summaries, dashboards, maps, graphs, and
charts to provide users with detailed intelligence about the nature of the business.
Business intelligence is defined set of mathematical models and analysis methodologies that
exploit the available data to generate information and knowledge useful for complex decision-
making processes. as a
A business intelligence system provides decision makers with information and knowledge extracted
from data.
Why is BI important ?
1. Measurement: creating KPI (Key Performance Indicators) based on historic data.
2. Identify and set benchmarks for varied processes.
3. With BI systems organizations can identify market trends and spot business problems that
need to be addressed.
4. BI helps on data visualization that enhances the data quality and thereby the quality of decision
making.
5. BI systems can be used not just by enterprises but SME (Small and Medium Enterprises)
Types of BI users
1. The Professional Data Analyst : The data analyst is a statistician who always needs to drill
deep down into data. BI system helps them to get fresh insights to develop unique business
strategies.
2. The IT users : The IT user also plays a dominant role in maintaining the BI infrastructure. :
3. The head of the company : CEO or CXO can increase the profit of their business by improving
operational efficiency in their business.
4. The Business Users : Business intelligence users can be found from across the organization.
1. Boost productivity : With a BI program, it is possible for businesses to create reports with a
single click thus saves lots of time and resources. It also allows employees to be more productive
on their tasks.
2. To improve visibility : BI also helps to improve the visibility of these processes and make it
possible to identify any areas which need attention.
3. Fix Accountability : BI system assigns accountability in the organization as there must be
someone who should own accountability and ownership for the organization's performance against
its set goals.
4. It gives a bird's eye view : BI system also helps organizations as decision makers get an
overall bird's eye view through typical BI features like dashboards and scorecards.
5. It streamlines business processes : BI takes out all complexity associated with business
processes. It also automates analytics by offering predictive analysis, computer modeling, bench-
marking and other methodologies.
6. It allows for easy analytics : BI software has democratized its usage, allowing even
nontechnical or non-analysts users to collect and process data quickly. This also allows putting the
power of analytics from the hand's many people.
BI System Disadvantages
1. Cost : Business intelligence can prove costly for small as well as for medium-sized enterprises.
The use of such type of system may be expensive for routine business transactions.
2. Complexity : Another drawback of BI is its complexity in implementation of data warehouse. It
can be so complex that it can make business techniques rigid to deal with.
3. Limited use : Like all improved technologies, BI was first established keeping in consideration
the buying competence of rich firms. Therefore, BI system is yet not affordable for many small and
medium size companies.
4. Time Consuming Implementation : It takes almost one and half year for data warehousing
system to be completely implemented. Therefore, it is a time consuming process.
3) Embedded BI : Embedded BI allows the integration of BI software or some of its features into
another business application for enhancing and extending its reporting functionality.
4) Cloud Analytics : BI applications will be soon offered in the cloud, and more businesses will be
shifting to this technology. As per their predictions within a couple of years, the spending on cloud-
based analytics will grow 4.5 times faster.
Analysis : The organization's needs for the construction of a business intelligence system should
be thoroughly identified during the first step. This preparatory step is usually carried out through a
series of interviews with knowledge workers who execute various positions and activities inside the
company. It's important to spell out the project's overall goals and priorities, as well as the costs
and benefits of developing a business intelligence system.
Design : The second phase, which is divided into two sub-phases, aims to derive a tentative plan
for the overall architecture, taking into account any future developments as well as the system's
evolution in the mid - term. First and foremost, a review of existing information infrastructures is
required. Furthermore, in order to adequately evaluate the information requirements, the primary
decision-making processes that will be supported by the business intelligence system should be
examined. Later on, the project plan will be drawn out using traditional project management
approaches, including development phases, priorities, projected execution time frames and costs,
as well as the essential roles and resources. Analysis Identification of business needs Design
Infrastructure recognition Project macro planning Planning Detailed project requirements Definition
of the mathematical models needed Identification of the data Definition of data warehouses and
data marts Development of a prototype Implementation and control Development of data
Warehouses and data marts Development of metadata Development of ETL tools Development of
applications Release and testing
Planning : A sub-phase of the planning stage is dedicated to defining and describing the functions
of the business intelligence system in greater depth. Following that, existing data, as well as data
that could be collected from outside sources, is evaluated. This enables the business intelligence
architecture's information structures to be created, which include a central data warehouse and
potentially some satellite data marts. Simultaneously with the recognition of available data, the
mathematical models to be used should be defined, ensuring the availability of the data required to
feed each model and ensuring that the efficiency of the algorithms to be used will be adequate for
the magnitude of the problems that will result. Finally, a system prototype should be built at a low
cost and with limited capabilities to discover any discrepancies between actual needs and project
specifications ahead of time.
Implementation and control : There are five major sub-phases in the last phase.
1. The data warehouse and each individual data mart must first be built. The information
infrastructures that will feed the business intelligence system are shown by these data marts.
2. A metadata archive should be developed to explain the meaning of the data in the data
warehouse and the transformations made to the original data in advance.
3. Furthermore, ETL procedures are designed to extract and transform data from primary sources
before loading it into the data warehouse and data marts.
4. The next step is to create the main business intelligence applications that will enable for the
execution of the planned analyses.
5. Finally, the system should be made available for testing and use.
A business intelligence architecture is the framework for the various technologies an organization
deploys to run business intelligence and analytics applications. It includes the IT systems and
software tools that are used to collect, integrate, store and analyze BI data and then present
information on business operations and trends to corporate executives and other business users.
The underlying BI architecture is a key element in the implementation of a successful business
intelligence program that uses data analysis and reporting to help an organization track business
performance, optimize business processes, identify new revenue opportunities, improve strategic
planning and make more informed decisions overall. A BI architecture can be deployed in an on-
premises data center or the cloud. In either case, it contains a set of core components that
collectively support the different stages of the BI process, from data collection, integration, storage
and analysis to data visualization, information delivery and the use of BI data in business decision-
making.
2. Data Integration and cleaning tools : To effectively analyze the data collected for a BI program,
organization integrate and consolidate different data sets to create unified views of them. The most
widely used data integration technology for BI applications is extract, transform and load (ETL)
software, which pulls data from source systems in batch processes. A variant of ETL is extract, load
and transform (ELT), in which data is extracted and loaded as is and transformed later for specific
BI uses. Other methods include real-time data integration, such as change data capture and
streaming integration to support real-time analytics applications, and data virtualization, which
combines data from different source systems virtually. A BI architecture typically also includes data
profiling and data cleansing tools that are used to identify and fix data quality issues. They help BI
and data management teams provide clean and consistent data that's suitable for BI uses,
3. Analytics data stores : This encompasses the various repositories where BI data is stored and
managed. The primary one is a data warehouse, which usually stores structured data in a relational,
columnar or Mo multidimensional database and makes it available for querying and analysis. An
enterprise data warehouse can also be tied to smaller data marts set up for individual departments
and business units with data that's specific to their BI needs. In addition, BI architectures often
include an operational data store that's an interim repository for data before it goes into a data
warehouse; an operational data store (ODS) can also be used to run analytical queries against
recent transaction data. Depending on the size of a BI environment, a data warehouse, data marts
and an ODS can be deployed on a single database server or separate systems. A data lake
running on a Hadoop cluster or other big data platform can also be incorporated into a BI
architecture as a repository for raw data of various types. The data can be analyzed in the data lake
itself or filtered and loaded into a data warehouse for analysis. A well-planned architecture should
specify which of the different data stores is best suited for particular BI uses.
4. BI and data visualization tools : The tools used to analyze data and present information to
business users include a suite of technologies that can be built into a BI architecture, for example,
ad hoc query, data mining and online analytical processing, or OLAP, software. In addition, the
growing adoption of self-service BI tools enables business analysts and managers to run queries
themselves instead of relying on the members of a BI team to do that for them. BI software also
includes data visualization tools that can be used to create graphical representations of data, in the
form of charts, graphs and other types of visualizations designed to illustrate trends, patterns and
outlier elements in data sets.
5. Dashboards, portals and reports : These information delivery tools give business users
visibility into the results of BI and analytics applications, with built-in data visualizations and, often,
self-service capabilities to do additional data analysis. For example, BI dashboards and online
portals can both be designed to provide real-time data access with configurable views and the
ability to drill down into data. Reports tend to present data in a more static format.
6. Other components : Other components that increasingly are part of a business architecture
include data preparation software used to structure and organize data for analysis and a metadata
repository, a business glossary and a data catalog, which can all help users find relevant data and
understand its lineage and meaning.
It is an interactive computer-based application that combines data and mathematical models to help
decision makers solve complex problems faced in managing the public and private enterprises and
organizations. A decision support system (DSS) is a computerized program used to support
determinations, judgments, and courses of action in an organization or a business. A DSS go
through and analyzes massive amounts of data, compiling comprehensive information that can be
used to solve problems and in decision-making. Typical information used by a DSS includes target
or projected revenue, sales figures or past ones from different time periods, and other inventory- or
operations-related data. A decision support system gathers and analyzes data, synthesizing it to
produce comprehensive information reports. In this way, as an informational application, a DSS
differs from an ordinary operations application, whose function is just to collect data. The DSS can
either be completely computerized or powered by humans. In some cases, it may combine both.
The ideal systems analyze information and actually make decisions for the user. At the very least,
they allow human users to make more informed decisions at a quicker pace. The DSS can be
employed by operations management and other planning departments in an organization to
compile information and data and to synthesize it into actionable intelligence. In fact, these systems
are primarily used by mid- to upper-level management. For example, a DSS may be used to project
a company's revenue over the upcoming six months based on new assumptions about product
sales. Due to a large number of factors that surround projected revenue figures, this not a
straightforward calculation that can be done manually. However, a DSS can integrate all the
multiple variables and generate an outcome and alternate outcomes, all based on the company's
past product sales data and current variables.
Types of DSS
1. Communication-driven DSS which enables cooperation, supporting more than one person
working on a shared task; examples include integrated tools like Google Docs or Microsoft
Groove.
2. Document-driven DSS which manages, retrieves, and manipulates unstructured information in
a variety of electronic formats.
3. Knowledge-driven DSS provides specialized problem solving expertise stored as facts, rules,
procedures, or in similar structures
4. Model-driven DSS emphasizes access to and manipulation of a statistical, financial,
optimization, or simulation model. Model-driven DSS use data and parameters provided by users
to assist decision makers in analyzing a situation; they are not necessarily data intensive.
5. Data-driven DSS (or data-oriented DSS) emphasizes access to and manipulation of a time
series of internal company data and, sometimes, external data. A data driven DSS, which we will
focus on, emphasizes access to and manipulation of a time series of internal company data and
sometimes external data. Simple file systems accessed by query and retrieval tools provide the
most elementary level of functionality. Data warehouse systems that allow the manipulation of
data by computerized tools tailored to a specific task and setting or by more general tools and
operators provide additional functionality. Data-driven DSS with online analytical processing
(OLAP) provide the highest level of functionality.
Types of Decisions
1. Structured : A structured decision is one in which the phases of the decision-making process
(intelligence, design, and choice) have standardized procedures, clear objectives, and clearly
specified input and output. There exists a procedure for arriving at the best solution.
2. Unstructured : An unstructured decision is one where not all of the decision-making phases are
structured and human intuition plays an important role.
3. Semi-structured : A semi structured decision has some, but not all, structured phases where
standardized procedures may be used in combination with individual judgment.
2. Tactical : Tactical decisions affect only parts of an enterprise and are usually restricted to a
single department. The time span is limited to a medium-term horizon, typically up to a year.
Made by middle managers.
3. Strategic : Decisions are strategic when they affect the entire organization or at least a
substantial part of it for a long period of time. They strongly influence the general objectives and
policies of an enterprise. Taken at a higher organizational level, usually by the company top
management.
The DSS database: It is a collection of data from a number of applications or groups. The DSS
database may be a small database residing on a PC or a large data warehouse.
The DSS software system : Contains the software tools that are used for analyzing the data,
including OLAP tools, data mining tools, or a collection of mathematical or analytical models.
The user interface : Controls the interaction between the users of the system and the DSS
software tools.
(i) Organizational information : You may want to use virtually any information available in the
organization for your Decision Support System. What you use, of course, depends on what you
need and whether it is available. You can design your Decision Support System to access this
information directly from your company's database and data warehouse.
(ii) External information : Some decisions require input from external sources of information.
Various branches of federal government, and the internet, to mention just a few, can provide
additional information for the use with a Decision Support System.
(iii) Personal information : You can incorporate your own insights and experience your personal
information into your Decision Support System. You can design your Decision Support System
so that you enter this personal information only as needed, or you can keep the information in
a personal database that is accessible by the Decision Support System.
The model management component consists of both the Decision Support System models and the
Decision Support System model management system. A model is a representation of some event,
fact, or situation. As it is not always practical, or wise, to experiment with reality, people build
models and use them for experimentation. Models can take various forms.
(i) Businesses use models to represent variables and their relationships. For example, you
would use a statistical model called analysis of variance to determine whether newspaper, TV, and
billboard advertising are equally effective in increasing sales.
(ii) Decision Support Systems help in various decision making situations by utilizing models
that allow you to analyze information in many different ways. The models you use in a Decision
Support System depend on the decision you are making and, consequently, the kind of analysis
you require. For example, you would use what-if analysis to see what effect the change of one or
more variables will have on other variables, or optimization to find the most profitable solution given
operating restrictions and limited resources. Spreadsheet software such as excel can be used as a
Decision Support System for what-if analysis.
(iii) The model management system stores and maintains the Decision Support System's
models. Its function of managing models is similar to that of a database management system. The
model management component cannot select the best model for you to use for a particular problem
that requires your expertise but it can help you create and manipulate models quickly and easily.
(Phase 2) Analysis : Define detailed functions of DSS to be developed. Gather responses to the
questions like What should the DSS accomplish, and who will use it, when and how?
(Phase 3) Design : In this phase, entire architecture of the system is considered. The various
factors like hardware, network structure, software tools, technology, database and interaction tool
are also taken into consideration.
(Phase 4) Implementation : This phase includes the actual implementation of a DSS and its
installation. A а DSS is also tested for any errors or bugs. Any changes can be backtracked using
feedback mechanism and project management tools. We can also use agile methodology to speed
up the implementation process.
(1) Integration : The design and development of a DSS necessitates the collaboration of a large
variety of approaches, tools, models, persons, and organizational processes.
(2) Involvement : During the design and development of DSS, it is common to make the mistake of
excluding or feeling isolated from the project team of knowledge workers who will really utilize the
system once it is deployed.
(3) Uncertainty : While the cost of implementation is lower, the cost of making more effective
decisions may be higher.
The telecommunications industry has expanded dramatically in the last few years with the
development of affordable mobile phone technology.
Fraud is an adaptive crime, so it needs special method of intelligent data analysis to detect and
prevent it.
There are many different types of telecommunications fraud and these can occur at various levels.
The two most types of fraud are subscription fraud and superimposed fraud.
In subscription fraud, fraudsters obtain an account without intention to pay the bill. This is thus at
the level of a phone number, all transactions from this number will be fraudulent. In such cases
abnormal usage occurs throughout the active period of the account. The account is usually used for
call selling or intensive self usage. .
In superimposed fraud, fraudsters take over legitimate account. In such cases the abnormal
usage is superimposed upon the normal usage of the legitimate customers. There are several ways
to carry out superimposed fraud, including mobile phone cloning and obtaining calling card
authorization details. Examples of such cases include cellular cloning, calling card theft and cellular
handset theft. Superimposed fraud will generally occur at the level of individual calls; the fraudulent
calls will be mixed in with the justified ones.
Other types of telecommunications fraud include ghosting (technology that tricks the network in
order to obtain free calls) and insider fraud where telecommunication company employees sell
information to criminals that can be explained for fraudulent gain.
These method exists in the areas of Knowledge Discovery in Databases (KDD), Data Mining,
Machine Learning and Statistics. They offer applicable and successful solutions in different areas of
fraud crimes.
At a low level, simple rule-based detection systems use rules such as the apparent use of the same
phone in two very distant geographical locations in quick succession, calls which appear to overlap
in time and very high value and very long calls.
At a higher level, statistical summaries of call distributions (often called profiles or signature at the
user level) are compared with thresholds determined either by experts or by application of
supervised learning methods to known fraud/non-fraud cases.
Some forensic accountants specialize in forensic analytics which is the procurement and analysis
of electronic data to reconstruct, detect, and otherwise support a claim of financial fraud. The main
steps in forensic analytics are data collection, data preparation, data analysis, and reporting.
For example, forensic analytics may be used to review an employees' purchasing card activity to
assess whether any of the purchases were diverted or divertible for personal use.
Techniques used for fraud detection fall into two primary classes: Statistical techniques and
Artificial intelligence.
1. Data pre-processing techniques for detection, validation, error correction, and filling up of
missing or incorrect data.
5. Clustering and classification to find patterns and association among groups of data.
1. Data mining to classify, cluster, and segment the data and automatically find associations
and rules in the data that may signify interesting patterns, including those related to fraud.
2. Expert systems to encode expertise for detecting fraud in the form of rules.
Recommendation System
Recommendation system is one of the business intelligence system that is used to obtain
knowledge to the active user for better decision making. Recommendation systems apply data
mining techniques to the problem of making personalized recommendations for information. Due to
the growth in the number of information and the users in recent years offers challenges in
recommender systems. Collaborative, content, demographic and knowledge-based four different
types of recommendations systems.
This system works in three phases namely preprocessing, modeling and obtaining intelligence.
First, the users are filtered based on the user's profile and knowledge such as needs and
preferences defined in the form of rules. This poses selection of features and data reduction from
dataset.
Second, these filtered users are then clustered using k-means clustering algorithm as a modelling
phase.
Third, it identifies nearest neighbour for active users and generates recommendations by finding
most frequent items from identified cluster of users. This algorithm can be experimentally tested
with e-commerce application for better decision making by recommending top n products to the
active users.
1. Identifying the dataset : To maintain the data systematically and efficiently, database and data
warehouse technologies are used. The data warehouse not only deals with the business activities
but also contains the information about the user that deals with the business.
2. Choose the columns consideration /features : Once the dataset D has been identified, the next
step of the system is to choose the consideration column or filtering columns/features. That is, from
the whole dataset, the columns/subset of features to be considered for our work are chosen. This
includes the elimination of the irrelevant column in the dataset. The irrelevant column/feature may
be the one which provide less information about the dataset.
3. Filtering objects by defining rules: From the consideration dataset, the objects can be grouped
under stated conditions that are defined in terms of rules. That is, for each column that is
considered, specify the rule to extract the necessary domain from the original dataset. This rule is
considered to be the threshold value T. The domain can be chosen by identifying the frequent items
from the dataset.
4. Identifying frequent items : The frequent items can be identified by analyzing the repeated value
in the consideration column satisfying the support count and the confidence threshold. This will
create a new dataset D'.
5. Cluster objects using k-means clustering : Upon forming the new dataset D', the objects in D' are
clustered based on similarity of objects using k-means clustering. k-means clustering is a method of
classifying or grouping objects into k clusters (where k is the number of clusters). The clustering is
performed by minimizing the sum of squared distances between the objects and the corresponding
centroid. The result consists of cluster of objects with their labels/classes.
6. Find nearest neighbour of active user : In order to find the nearest neighbours of the active user,
similarity of the active user between cluster centroids are calculated based on distance measure.
Then, select cluster that have the highest similarity among other clusters.
7. Generate recommendation dataset for active user : Recommendations are generated for the
active user based on the selected cluster of users purchased most frequent items generated from
specified threshold T. This gives intelligence to the users and business for better decision making.
Clickstream Mining
Clickstream mining is a record of a user's activity on the internet, including every website and every
page of every website that the users visits, how long the user was on a page or site, in what order
the pages were visited, any newsgroups that the user participates in and even the email-addresses
of mail that the users send and receive.
Both ISPs and individual websites are capable of tracking a user's clickstream. Clickstream data is
becoming increasingly valuable to internet marketers and advertisers. Be aware of the big amount
of data a clickstream generates.
These 'footprints' visitors leave at a site grown wildly - large businesses may gather a terabyte of it
every day. But the ability to analyze such data hasn't kept pace with the ability to capture it.
The next frontier of web data analysis is better integration of clickstream data with other customer
information such as purchase history and even demographic profiles, to form what's often called a
"360-degree view" of a site visitor. .
Clickstream analysis can be seen as a four-stage process of collection, storage, analysis, and
reporting. The first two concentrate on gathering and formatting information, and the latter two on
making sense of it.
(b) E-business feedback : The e-business analysis cycle is more sophisticated. This process
combines website activity with data from other sources, such as visitor profile information, sales
databases, and campaigns that include links to the website. It provides higher-level information,
more focused answers and information that can be used to enhance ecommerce activities across
the business as well as improving the website.
Market Segmentation
Market segmentation is a marketing concept which divides the complete market set up into smaller
subsets comprising of consumers with a similar taste, demand and preference. A market segment
is a small unit within a large market comprising of like-minded individuals. One market segment is
totally distinct from the other segment. A market segment comprises of individuals who think on the
same lines and have similar interests. The individuals from the same segment respond in a similar
way to the fluctuations in the market.
(b) Behaviouralistic Segmentation : The loyalties of the customers towards a particular brand
help the marketers to classify them into smaller groups, each group comprising of individuals loyal
towards a particular brand.
(c) Geographic Segmentation : Geographic segmentation refers to the classification of the market
into various geographical areas. A marketer can't have similar strategies for individuals living in
different places. Nestle promotes Nescafe all through the year in cold states of the country as
compared to places which have well-defined summer and winter season. McDonald's in India does
not sell beef products as it is strictly against the religious beliefs of the countrymen, whereas
McDonald's in USA freely sells and promotes beef products. .
Retail Industry
Retail organizations thrive by providing quality products to customers in a convenient, timely, and
cost effective manner. Understanding emerging customer shopping patterns can assist retailers in
organizing their products, inventory, store layout, and web presence in order to delight their
customers, thereby increasing revenue and profits. Retailers generate a lot of transaction and
logistics data that can be used to solve problems.
Optimize inventory levels at different locations : Retailers must carefully manage their
inventories. Carrying too much inventory incurs carrying costs, whereas carrying too little inventory
can result in stockouts and missed sales opportunities. Dynamic sales trend prediction can assist
retailers in moving inventory to where it is most in demand. Online retailers can provide their
suppliers with real-time information about their items' sales, allowing the suppliers to deliver their
product to the right locations and reduce stock-outs.
Improve store layout and sales promotions : Using a market basket analysis, you can create
predictive models of which products frequently sell together. This understanding of product affinities
can assist retailers in co-locating those products. Alternatively, those affinity products could be
placed further apart in order to force the customer to walk the length and breadth of the store,
exposing them to other products. Promotional discounted product bundles can be created to
promote a non-selling item and also a group of products that sell well together.
Optimize logistics for seasonal effects : Seasonal products provide extremely profitable short-
term sales opportunities, but they also pose the risk of unsold inventories at the end of the season.
Understanding which products are in season in which markets can assist retailers in dynamically
managing prices to ensure inventory is sold during the season. If it is raining in a specific area,
inventory of umbrellas and ponchos could be quickly moved there from non-rainy areas to help
increase sales.
Reduce losses due to limited shelf life : Perishable goods present difficulties in disposing of
inventory on time. Tracking sales trends allows perishable products that are at risk of not selling
before their sell-by date to be appropriately discounted and promoted.
Telecommunication Industry
BI in telecom can help with churn management, marketing/customer profiling, network failure, and
fraud detection.
(1) Management of churn : Telecom customers have shown a tendency to switch providers in
search of better deals. Telecom companies typically respond with so many incentives and
discounts in order to retain customers. However, they must determine which customers are truly at
risk of switching and which are simply bargaining for a better deal. The level of risk should be
considered when determining the type of deals and discounts to be offered. Every month, millions
of such customer calls are made. Telecom companies must provide a consistent and data-driven
method for predicting the risk of customer switching and then making an operational decision in real
time while the customer call is in progress. A decision-tree or a neural network based system can
be used to guide the customer-service call operator to make the right decisions for the company, in
a consistent manner.
(2) Marketing and product creation : In addition to customer data, telecom companies also store
call detail records (CDRs), which precisely describe the calling behavior of each customer. This
unique data can be Modu used to profile customers and then can be used for creating new
products/services bundles for marketing purposes. An American telecom company, MCI, created a
program called Friends & Family that allowed calls with one's friends and family on that network to
be totally free and thus, effectively locked many people into their network.
(3) Network failure management : The failure of telecommunications networks due to technical
failures or malicious attacks can have disastrous consequences for people, businesses, and
society. Some equipment in telecom infrastructure will most likely fail with a certain mean time
between failures. Modeling the failure pattern of various network components can aid in preventive
maintenance and capacity planning.
(5) Fraud control : There are numerous types of fraud in consumer transactions. When a customer
opens an account with the intent of never paying for the services, this is referred to subscription
fraud. Superimposition fraud is defined as unauthorised activity by someone other than the
legitimate account holder. Decision rules can be developed to analyze each CDR in real time to
identify chances of fraud and take effective action.
Banking
Banks make loans and offer credit cards to millions of customers. They are most concerned with
improving loan quality and reducing bad debts. They also want to keep more of their current
customers and sell them more services.
(1) Automate the loan application process : Decision models that predict the likelihood of a
loan's success can be generated from historical data. The can be integrated into business
processes to automate the loan application process.
(2) Detect fraudulent transactions : Every day, billions of financial transactions take place around
the world. Exception-seeking models detect fraudulent transaction patterns. For example, if money
is transferred for the first time to an unrelated account, it could be a fraudulent transaction. can
(3) Increase customer value (cross-selling, upselling) : Selling more products and services to
existing customers is frequently the simplest way to increase revenue. A checking account
customer in good standing may be offered better terms on home, auto, or educational loans than
other customers, thus, increasing the value generated by that customer.
(4) Optimize cash reserves through forecasting : Banks must maintain a certain level of liquidity
in order to meet the needs of depositors who may wish to withdraw funds. Banks can forecast how
much to keep and invest the rest to earn interest by using historical data and trend analysis.
Finance
Stock brokerages make extensive use of Business Intelligence (BI) systems. Access to accurate
and timely information can mean the difference between making or losing a fortune.
Predict changes in bond and stock prices : Forecasting the price of stocks and bonds is a
favorite pastime of financial experts as well as lay people. Stock transaction data from the past,
along with other variables, can be used to predict future price patterns. This can help traders
develop long-term trading strategies.
Assess the impact of events on market movements : Decision trees can be used to create
decision models that assess the impact of events on changes in market volume and prices.
Monetary policy changes (such as a change in the Fed Reserve interest rate) or geopolitical
changes (such as a war in a particular region of the world) can be factored into the predictive model
to help take action with greater confidence and less risk.
Identify and prevent fraudulent activities in trading : There have unfortunately been many
cases of insider trading, leading to many prominent financial industry stalwarts going to jail. Fraud
detection models can identify and flag fraudulent activity patterns.
(1) Maximize the return on marketing campaigns : Data-driven analysis of customer pain points
can ensure that marketing messages are fine-tuned to better resonate with customers.
(2) Improve customer retention (churn analysis) : Winning new customers is more difficult and
expensive than retaining existing customers. Scoring each customer based on their likelihood to
quit can assist businesses in developing effective interventions, such as discounts or free services,
to retain profitable customers in a cost-effective manner.
(3) Maximize customer value (cross-selling, upselling) : Every interaction with the customer
should be viewed as an opportunity to assess their current needs. Offering new products and
solutions to customers based on their presumed needs can help increase revenue per customer.
Even a customer complaint can be viewed as a chance to impress the customer. Using the
knowledge of the customer's history and value, the business can choose to sell a premium service
to the customer.
(4) Identify and delight highly valued customers : The best customers can be identified by
segmenting the customers. They can be proactively contacted and delighted with enhanced
attention and service. Loyalty programs can be more effectively managed.
(5) Manage brand image : A company can set up a listening post to monitor social media
conversations about itself. It can then perform sentiment analysis on the text in order to understand
the nature of the comments and respond appropriately to prospects and customers.
The major objective of watching or reading news was to be informed about whatever is happening
around us. There are several social media platforms in the current modern era, like Facebook,
Twitter and so forth where millions of users would rely upon for knowing day-to-day happenings.
Then came the fake news which spread across people as fast as the real news could. Fake news is
a piece of incorporated or falsified information often aimed at misleading people to a wrong path or
damage a person or an entity's reputation.
As humans, when we read an article, we could somehow understand its context by interpreting its
words. Given today's volume of news, it is possible to teach computers how to read and understand
the difference between real and fake news using NLP techniques. All you need here are the
appropriate Machine Learning algorithms and a dataset.
(1) Data Collection : The process of gathering information from various and all possible resources
regarding a particular research problem. This information is stored in a file as the dataset and is
subject to various techniques like testing, evaluation, etc.
(2) Data Cleaning : Identification and removal of errors if any in the gathered information. This
process is carried out mainly to improve the dataset's quality, make it reliable, and provide accurate
decision-making processes.
(3) Data Exploration Analysis : Various visualization techniques are carried out here to
understand the dataset in terms of its characteristics namely, size, quantity, etc. This process is
essential to better understand the nature of the dataset and get insights faster.
(4) Data Modelling : The process of training the dataset using one or more ML algorithms to tune it
according to the business need, predict or validate it accordingly.
(5) Data Validation : The method of tuning the hyperparameters before testing the model. This
provides an unbiased evaluation of a model fit done on the training dataset.
(6) Deployment : Integrating an ML model into an existing environment to make more practical
business decisions based on the dataset.
Cyberbullying
Social networking sites (SNS) is being rapidly increased in recent years, which provides platform to
connect people all over the world and share their interests. However, Social Networking Sites is
providing opportunities for cyberbullying activities, Cyberbullying is harassing or insulting a person
by sending messages of hurting or threatening nature using electronic communication,
Cyberbullying poses significant threat to physical and mental health of the victims. Detection of
cyberbullying and the provision of subsequent preventive measures are the main courses of action
to combat cyberbullying. The detection method can identify the presence of cyberbullying terms
and classify cyberbullying activities in social network such Flaming, Harassment, Racism and
Terrorism, using Fuzzy logic and Genetic algorithm.
The input data set contains text, image, audio and video which will be collected from social
networks. The input of the data is sent to data pre-processing which improves quality of the input.
Social network dataset consists of most noisy and unwanted data; to improve the accuracy of the
input data, the preprocessing is applied. This includes removing stop words and symbols. Stop
words are usually like “a”, "as”, “have", "is”, “the”, “or”, etc. Stop words mainly consume memory
space and reduce the processing time. After completion of the data pre-processing the outcome of
the data is sent to cyberbully detection module for detecting the cyberbully contents. The cyberbully
detection techniques are explained below:
(a) Image Cyberbully detection : Nowadays, the cyberbullying using images is vast and causes
large effects to the society. They seem to be spreading in the social networks very rapidly. Such
anti-social elements are able to create more stress to the world by spreading communalism through
images. The cyberbully image can be detected using the computer vision algorithm which includes
two methods like image similarity and Optical Character Recognition (OCR).
(b) Video Cyberbully detection : The video cyberbullying also causes more problems in terms of
both emotional and psychological means. The cyberbully video will be detected using the shot
boundary detection algorithm. Here, the video will be broken into scene, shot and frames. A shot is
a sequence of frames captured by a single camera in a single continuous action. Thereby, the
content of the video will be analysed using the shot boundary detection algorithms such as Pixel
based shot boundary detection, Histogram based shot boundary detection, Block based shot
boundary detection.
(c) Audio Cyberbully detection : The audio is the one of area where many cyberbullying occurs in
a larger part. Here, the audio will be converted into text using CMU Sphinx tool. In the converted
text, cyberbully will be detected using trained dataset.
Finally, the cyberbully content is classified into Physical bullying, Social bullying and Verbal bullying
using Naïve Bayesian classifier. The Naive Bayes classifier method is developed based on
Bayesian theorem with assumptions which are independent in between predictors.
(a) Social bullying : Social bullying involves spreading rumours about a person, purposely
embarrassing a person in public where it intends to hurt his or her feeling. Another form of bullying
that falls into this category involves encouraging others to avoid a certain person or group. Social
bullying affects a person and their ability to relate to their environment as well as other people in a
social setting. Not only does it have a direct impact on a person's mental and emotional state, it can
also adversely affect their reputation in both personal and professional circles.
(b) Verbal bullying : Verbal bullying is one of the most highly used techniques to perform bullying
mechanism in an efficient way. Criticizing and making fun of others are all forms of verbal bullying.
In verbal bullying the main weapon the bully uses are their voice. Verbal bullying is defined as a
negative aspects based defining declaration told to the victim or about the target, thereby defining
the target to be as non-existent one. If the abuser proximately does not make an apology and draw
back the significant declaration, the relationship is considered as verbally abusive one to the
network. This will create psychological disorders that plague them into and throughout adulthood
periods of an individual.
(c) Physical bullying : Physical bullying is one in which one's feeling is being hatred or harms their
personal possessions. The various types of physical bullying methods which are present widely are
Stealing, heaving, hitting, pushing, slapping, spattering and abolishing property. Physical bullying is
hardly the primary form of bullying that a Buller will experience. Frequently bullying will commence
in an altered method and advancement to physical violence. In physical bullying the foremost
weapon the bully uses are their body.
Sentiment Analysis
Sentiment Analysis in business, also known as opinion mining is a process of identifying and
cataloging a piece of text according to the tone conveyed by it. This text can be tweets, comments,
feedback, and even random rants with positive, negative and neutral sentiments associated with
them. Every business needs to implement automated sentiment analysis. Sentiment analysis
involves the steps like preprocessing, feature extraction and sentiment classification.
Preprocessing : We are interested in features of an object. For this, input data are preprocessed
using following steps:
(a) Tokenization : White spaces, special characters, symbols are removed; remaining words are
called as tokens.
(b) Removal of Stop Words : The articles and common words like “a, an, the, this, that am, is”, etc.
(c) Stemming : Reduces the tokens or words to its root form.
(d) Case Normalization : It changes the whole document either in lower case letters or upper case
letters.
(a) Feature Types : It deals with finding of types of features used for sentiments viz, term
frequency, term co-occurrence, sentiment word, negation, syntactic dependency.
(b) Feature Selection : It deals with finding good features for sentiment classification viz.
information gain, odd ratio, document frequency, mutual information.
(c) Feature Weighting Mechanism : It calculates weight for ranking the features using term
frequency and inverse document frequency.
(d) Feature Reduction : The dimensionality of features is reduced for better performance.
Opinion posted is classified as positive opinion, negative opinion and neutral opinion. The 3 levels
of sentiments analysis are as follows.
(a) Document level : The whole document is considered for impressing the opinion as positive,
negative or neutral. The opinion about an object may be expressed without using any opinion word.
In this case natural language processing plays a vital role to mine the correct sentiments. The main
challenge is to extract subjective text for inferring the overall sentiment of the whole document.
(b) Sentence level : The documents in collection are divided into sentences and then the
sentences are classified as per positive, negative or neutral polarity. A document is a combination
of subjective and objective sentences. First the subjective sentences are determined and then the
opinion in those subjective sentences will be calculated. The sentence level polarity identification
can be done in either of the two ways: a grammatical syntactic approach or a semantic approach.
The grammatical syntactic approach takes grammatical structure of the sentence into account by
considering parts of speech tags.
(c) Word or phrase level : When product feature is considered for sentiment analysis, it is word or
phrase level sentiment analysis. It uses adjective, adverb as features. Word level sentiment can be
attained by ‘Dictionary Based Approach' or 'Corpus Based Approach'.
(i) Dictionary based approach : Sometimes the opinion is not expressed by a popular
keyword. Some jargons may be used to express the sentiments. Here, WordNet containing the
synonyms and antonyms is considered for finding out the polarity of a word.
(ii) Corpus based approach: In this method, occurrence of any word with other word whose
polarity is known is taken into account. Adjectives joined by 'and' show the same impression
and if joined by 'but' show opposite impression.
Finally, the sentiments are classified using machine learning approaches like SVM, Naïve Bayes,
Decision Tree, Rule Based Classifier and lexicon based approaches like dictionary based and
corpus based approach.