Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
7 views82 pages

BA - Unit-1

The document outlines the roles and skills of business analysts and business analytics professionals, emphasizing their focus on process optimization and data analysis, respectively. It details various big data technologies, including Hadoop, MongoDB, and Apache Spark, and discusses the importance of data quality and privacy in business analytics. Additionally, it covers the infrastructure needed for business analytics, including database management systems, data warehouses, and the significance of data governance and outsourcing in implementing analytics programs.

Uploaded by

ramkumar632004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views82 pages

BA - Unit-1

The document outlines the roles and skills of business analysts and business analytics professionals, emphasizing their focus on process optimization and data analysis, respectively. It details various big data technologies, including Hadoop, MongoDB, and Apache Spark, and discusses the importance of data quality and privacy in business analytics. Additionally, it covers the infrastructure needed for business analytics, including database management systems, data warehouses, and the significance of data governance and outsourcing in implementing analytics programs.

Uploaded by

ramkumar632004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 82

Business Analytics

Personnel
Business Analyst

• Business analyst has less to do with data and instead focuses on


analyzing and optimizing the processes and functions that make up a
business.
• They analyze
• what a business needs to function optimally
• what it needs to improve
• then work to implement solutions.
• This may include
• improving processes,
• changing policies or
• introducing new technology.
Business Analyst

• A business analyst may work with both the client, who has a particular
requirement in their business
• Development team, tor build a product or deliver a service to fulfill that
requirement.
• Coordinate between these two parties to make sure the solutions
created by development meet the client’s requirements
• Adapt solutions as these needs change.
• Also act as a technical project manager and collaborate with
stakeholders to design and implement the service or product and
ensure it’s solving the client’s problem.
Business Analytics Professional

• Business analytics focuses on data, statistical analysis and reporting to


help investigate and analyze business performance, provide insights,
and drive recommendations to improve performance.

• They may also work with internal or external clients, but their focus is to
improve the product, marketing or customer experience by using
insights from data, rather than analyzing processes and functions.
Core Business Analytics Skills

• A good communicator

• Inquisitive

• A problem solver

• A critical thinker

• A visualizer

• Both detail-oriented and a big picture thinker


Technical Skills for Business Analytics

• SQL
• SQL is the coding language of databases and one of the most important tools in an analytics
professional’s toolkit.
• Professionals write SQL queries to extract and analyze data from the transactions database
and develop visualizations to present to stakeholders.
• Statistical languages
• The two most common programming languages in analytics
• R, for statistical analysis
• Python, for general programming
• Statistical software
• Apart from the above languages, statistical software such as SPSS, SAS, Sage, Mathematica,
and even Excel can be used when managing and analyzing data.
Big Data Technology
• A software tool to analyze, process and interpret the massive amount of structured
and unstructured data that could not be processed manually or traditionally is
called Big Data Technology.
• This helps in forming conclusions and forecasts about the future so that many
risks could be avoided.
• The types of big data technologies are operational and analytical.
• Operational technology deals with daily activities such as online transactions, social media
interactions and so on.
• Analytical technology deals with the stock market, weather forecast, scientific computations
and so on. Big data technologies are found in data storage and mining, visualization and
analytics.
Types of Big Data Technologies
Types of Big Data Technologies

They can mainly be classified into 4 domains.

1.Data storage

2.Analytics

3.Data mining

4.Visualization
1. Hadoop

• Hadoop is the first technology that comes into play.


• Based on map-reduce architecture and helps in the processing of batch-related
jobs and process batch information.
• Map: As the name suggests its main use is to map the input data in key-value
pairs.
• Reduce: group the data based on its key-value pair as per the reducer algorithm
written by the developer.
• Designed to store and process the data in a distributed data processing
environment.
• Developed by the Apache software foundation in the year 2011 and is written in
Java.
2. MongoDB

•What is MongoDB?

•MongoDB is a database used to store and manage information.

•It is a type of NoSQL database, meaning it does not store data in traditional rows and columns like a
relational database (e.g., MySQL).

•How Does MongoDB Store Data?

•Data is stored in documents using a format called BSON (similar to JSON).

•These documents are grouped into collections (like tables in a relational database).
3. Hunk

• It is useful in accessing data through remote Hadoop clusters


• It was developed by team Splunk in the year 2013 which was written in Java.
• Hunk is a tool that helps analyze large amounts of data stored in Hadoop (a
system for managing big data).
• It allows users to search, explore, and visualize big data easily.
• It is an extension of Splunk, a popular software for analyzing machine data.
4. Cassandra

•What is Cassandra?

•Cassandra is a database system that stores and manages large amounts of data across many computers.

•It is open-source, meaning anyone can use and improve it for free.

•Why is Cassandra Important for Big Data?

•Big Data refers to extremely large and complex data that traditional databases cannot handle.

•Cassandra is designed to manage huge volumes of data quickly and efficiently.


5. Presto

•What is Presto?

•Presto is a fast, open-source tool used to analyze large amounts of data (Big Data).

•It helps to ask questions (queries) and get answers quickly from huge data sets.

•Purpose of Presto:

•It allows users to search, filter, and analyze data across multiple sources (databases, data lakes, etc.).

•Presto is used when there is too much data for regular computers to handle efficiently.
6. ElasticSearch
• Very important tool today when it comes to searching.

• It is written in the language JAVA and is developed by Elastic company in the


company 2012.

• The names of a few companies which make use of elasticsearch are: LinkedIn,
StackOverflow, Netflix, Facebook, Google, Accenture, etc.
7. Apache Kafka

• Known for its publish-subscribe or pub-sub

• Perform data processing on real-time streaming data.

• It was developed by the Apache Software community in the year of 2011 and is
written in Java.
• The companies which are making use of this technology include Twitter, Spotify,
Netflix, Linkedin, Yahoo, etc.
8. Splunk

• It was developed by Splunk in Python, XML, Ajax.

• Splunk is a software tool that helps collect, analyze, and understand large amounts of data.

• It turns raw data (from websites, apps, or machines) into useful information.How Does Splunk
Work?

• Collects Data: Gathers data from different sources (e.g., websites, sensors, machines).

• Indexes Data: Organizes and stores the data for quick searching.

• Searches Data: Allows users to find and analyze specific information.

• Visualizes Data: Displays the data using charts, graphs, and reports.
9. Apache Spark:

• It was developed by the Apache Software foundation in Java language primarily.


• Apache Spark is an open-source tool used to process large amounts of data
quickly.
• It helps to analyze "Big Data," which means very large collections of information.
• It can process data much faster than traditional methods.
• It handles different types of data, like text, images, and videos.
• Useful for companies, scientists, and researchers to make sense of large
datasets.
10. R language

• R is a programming language and a free software environment which is used for


statistical computing and also for graphics

• Used by data scientists, data miners and data practitioners for developing
statistical software for data analytics.
Technologies related to
Data Visualization
Tableau

• It is developed by the tableau company in the year 2013 and is written in Python,
C++, Java and C.

• Companies which are making use of Tableau are: QlikQ, Oracle Hyperion,
Cognos, etc.

• Tableau is a software tool used to visualize and analyze data.

• It helps turn complex data into easy-to-understand charts, graphs, and


dashboards.
Plotly

• Plotly is mainly used for making Graphs

• Can be used with MATLAB, Python, R, Arduino, Julia, etc.

• This can be used interactively in Jupyter notebook and Pycharm in Python

• It was first developed in 2012 and written in javascript.

• The few companies which are using Plotly are paladins, bitbank, etc.
UNIT-2 BA

MANAGING
RESOURCES FOR
BUSINESS ANALYTICS
Business Analytics Personnel

Certification in BA

• INFORMS (www.informs.org/Certification-Continuing-
Ed/Analytics- Certification)

• It is a major academic and professional organization, announced


the startup of a Certified Analytic Professional (CAP) program in
2013
Cognizure
(www.cognizure.com/index.aspx)
• It offers a variety of service products, including business analytic
services.

• It offers a general certification Business Analytics Professional


(BAP) exam that measures existing skill sets in BA staff and
identifies areas needing improvement.

• This is a tool to validate technical proficiency, expertise, and


professional standards in BA
Select types of BA skills or Competency
Requirements
Business Analytics Data

• Structured data
• Unstructured data

Sources of Data
• Internal Sources
• External Sources
Typical Internal Sources of Data
Typical External Sources of Data
External data sources (examples)

• USCensus and the International Monetary Fund : (IMF) are useful


data sources at the macroeconomic level for model building.

• Audience and survey data sources might include Nielsen


(www.nielsen.com/us/en.html)

• Psychographic or demographic data sourced from Claritas


(www.claritas.com)

• Financial data from Equifax (www.equifax.com), Dun &Bradstreet


(www.dnb.com), and so forth
Data Issues
Data quality

• It should be used for the purpose for which it is collected.

• Data can differ for different applications, but the quality of data is

important.

• High quality data helps to ensure competitiveness, aids customer

service, and improves profitability.

• When data is of poor quality, it can provide information that is


Data privacy
• It refers to the protection of shared data such that access is
permitted only to those intended or permitted users.
• Awareness of the risks in sharing too much.
• For example,
• Competitors can steal a firm’s customers by accessing addresses.
• Data leaks on product quality failures can damage brand image,
• Customers can become distrustful of a firm that shares information given in
confidence.
• To avoid these issues, customer privacy and data privacy are
important.
Business Analytics Technology
Infrastructure
Firms need an information technology (IT) infrastructure that
supports personnel in the conduct of their daily business
operations.

1. Hardware

2. Software

3. Networking and Telecommunications Technology

4. Data Management Technology


General IT Infrastructure
Database management systems (DBMS)
• Data management technology software that permits firms to:
• centralize data
• manage it efficiently
• provide access to stored data by application programs.

• Serves as an interface between application programs and the physical data files of structured
data.

• Makes the task of understanding where and how the data is actually stored more efficient.

• Other DBMS systems such as OODBMS can handle unstructured data.

• Object-oriented DBMS systems are able to store and retrieve unstructured data, like drawings,
images, photographs, and voice data.

• Necessary to handle the load of big data that most firms currently collect
DBMS
• DBMS(Data Base Management System) is software that manages and
organizes data in a database.
• Functions:
• Storing,
• retrieving,
• updating, and
• deleting data.
• Examples: MySQL, Oracle, MS Access.
• Advantages: Data security, easy access, reduces redundancy.
• Example : School or College records, library systems…
Data warehouses
• A data warehouse stores large amounts of organized data from
different sources.

• It helps in analyzing and reporting information easily.

• Data is stored in a structured way (table format) for quick access.

• Used by companies to track trends, make decisions, and improve


performance.
Data marts
• Subsets or smaller groupings within a data warehouse.

• Decentralized data warehouses used for limited portion of the


organization’s data that is placed in a separate database for a specific
population of users.

• For example, a firm might develop a smaller database on just product


quality to focus efforts on quality customer and product issues.
• Data access can be made more quickly and at lower cost.
• Once data has been captured and placed into database
management systems, it is available for analysis with
BA tools, including:

• online analytical processing, as well as

• Data, Text, and Web mining technologies


Online Analytical Processing (OLAP)

• Software that allows users to view data in multiple dimensions.

• For example, employees can be viewed in terms of their age,


gender, geographic location, and so on.

• It allows users to obtain online answers to ad-hoc questions quickly,


even when the data is stored in very large databases
Data mining
• Data mining is the process of finding patterns and useful information from
large sets of data.
• Uses:
• Helps in making decisions,
• predicting trends, and
• understanding data.
• Examples: Online shopping recommendations, fraud detection, and analyzing student
performance.
• Techniques: Classification, clustering, and association.

• It is an ideal predictive analytics tool used in the BA process.


Types of Information got from Data
Mining
Text mining

• A software application used to extract key elements from


unstructured datasets, discover patterns and relationships in the
text materials, and summarize the information.

• Majority of the information stored in businesses is in the form of


unstructured data (emails, pictures, memos, transcripts, survey
responses, business receipts, and so on),
Web mining
• Seeks to find patterns, trends, and insights into customer behavior
from users of the Web.

Marketers, for example, use BA services like:

• Google Trends (www.google.com/trends/) and

• Google Insights for Search (http://google.about.com/od/i/g/google-


insights-for-search.htm)

to track the popularity of various words and phrases to learn what


consumers are interested in and what they are buying.
Software applications
• Microsoft Excel
• spreadsheet systems have add-in applications specifically used for BA
analysis. These add-in applications broaden the use of Excel into areas
of BA.
• Analysis ToolPak is an Excel add-in that contains a variety of statistical
tools (for example, graphics and multiple regressions) for the descriptive
and predictive BA process steps.
• Another Excel add-in, Solver, contains operations research optimization
tools (for example, linear programming) used in the prescriptive step of
the BA process.
SAS® Analytics Pro (www.sas.com/)
• Software provides a desktop statistical toolset allowing users to
access, manipulate, analyze, and present information in visual
formats.

• It is designed for use by analysts, researchers, statisticians,


engineers, and scientists who need to explore, examine, and
present data in an easily understandable way and distribute
findings in a variety of formats.

• It is a statistical package chiefly useful in the descriptive and


IBM’s SPSS software
• Offers users a wide range of statistical and decision-making tools.

• These tools include methodologies for data collection, statistical


manipulation, modelling trends in structured and unstructured data,
and optimizing analytics.

• Depending on the statistical packages acquired, the software can


cover all three steps in the BA process.
LINGO
• LINGO is a software tool used to solve mathematical
optimization problems.

• It handles linear, nonlinear, and integer programming


models.

• Widely used in business, engineering, and research for


decision-making.

• Provides easy model building with built-in functions.


Organization Structures Aligning
Business Analytics
Functional organization structure
Matrix organization
Centralized BA Department, Project or
Team Organization Structure
Establishing an Information Policy
• The information policy specifies organizational rules for sharing,
disseminating, acquiring, standardizing, classifying, and inventorying
all types of information and data.

• It defines the specific procedures and accountabilities that:


• identify which users and organizational units can share information,
• where the information can be distributed
• who is responsible for updating and maintaining the information.
Establishing an Information Policy

• In small firms, business owners might establish the information policy.

• For larger firms, data administration may be responsible for the


specific policies and procedures for data management.

• Responsibilities could include:


• developing the information policy,
• planning data collection and storage,
• overseeing database design,
• developing the data dictionary,
• as well as monitoring how information systems specialists and end user groups use
Data governance

• Data governance includes establishing policies and processes for


managing the availability, usability, integrity, and security of the data
employed in businesses.

• Focused
on promoting data privacy, data security, data quality, and compliance
with government regulations.

• Such information policy, data administration, and data governance must be


in place to guard and ensure data is managed for the betterment of the
entire organization.
Outsourcing Business Analytics

• Outsourcing business operations is a strategy that an organization


can use to implement a BA program, run BA projects, and operate
BA teams.

• Any business activity can be outsourced, including BA.

• BA is a staff function that is easier to outsource than other line


management tasks, such as running a warehouse.
Advantages of Outsourcing BA
Disadvantages of Outsourcing BA
Ensuring Data Quality
• Data quality refers to accuracy, precision, and completeness of data.

• High-quality data is considered to correctly reflect the real world in


which it is extracted.

• Poor quality data caused by data entry errors, poorly maintained


databases, out-of-date data, and incomplete data usually leads to
bad decisions and undermines BA within a firm.

• The database management systems personnel are managerially


responsible for ensuring data quality
Ensuring Data Quality
• An organization needs to identify and correct faulty data and
establish routines and procedures for editing data in the database.

• The analysis of data quality can begin with a data quality audit

• This audit may be of the entire database, just a sample of files, or a


survey of end users for perceptions of the data quality.

• If Data quality audit files are found that have errors, a process called
data cleansing or data scrubbing is undertaken to eliminate or repair
data.
Managing Change
• Changing organizations is organizational culture and the use of
change management.
• Organizational culture is how an organization supports cooperation,
coordination, and empowerment of employees.
• Change management is defined as an approach for transitioning
the organization (individuals, teams, projects, departments) to a
changed and desired future state.
• Change management is a means of implementing change in an
organization, such as adding a BA department.
• Changes in an organization can be either planned (a result
of specific and planned efforts at change with direction by a
change leader)or unplanned (spontaneous changes without
direction of a change leader).
Change Management Targets
Change Management Best Practices
Data Quality
• Data quality is an integral part of data governance that ensures that
your organization’s data is fit for purpose.

• It refers to the overall utility of a dataset and its ability to be easily


processed and analyzed for other uses.

• Managing data quality dimensions such as completeness,


conformity, consistency, accuracy, and integrity, helps your data
governance, analytics, and AI/ML initiatives deliver reliably
trustworthy results.
Benefits of data quality

• Increased revenues

• Reduced costs

• Less time spent reconciling data

• Greater confidence in analytical systems

• Increased customer satisfaction


The Foundational Components of Data Quality
• Support all use cases

• Accelerate and scale

• Deliver a flexible user experience

• Automate critical tasks


Dimensions of Data Quality

• Accuracy: The data reflects the real-world objects and/or events it is intended
to model. Accuracy is often measured by how the values agree with an
information source that is known to be correct.

• Completeness: The data makes all required records and values available.

• Consistency: Data values drawn from multiple locations do not conflict with
each other, either across a record or message, or along all values of a single
attribute. Note that consistent data is not necessarily accurate or complete.
Dimensions of Data Quality

• Timeliness: Data is updated as frequently as necessary, including in real


time, to ensure that it meets user requirements for accuracy, accessibility
and availability.

• Validity: The data conforms to defined business rules and falls within
allowable parameters when those rules are applied.

• Uniqueness: No record exists more than once within the data set, even if it
exists in multiple locations. Every record can be uniquely identified and
accessed within the data set and across applications.
Examples of Data Quality Metrics

Healthcare data quality metrics

• Healthcare organizations need complete, correct, unique patient records


to drive proper treatment, fast and accurate billing, risk management, and
more effective product pricing and sales.

Public sector data quality metrics

• Public sector agencies need complete, consistent, accurate data about


constituents, proposed initiatives, and current projects to understand how
well they’re meeting their goals.
Examples of Data Quality Metrics
Financial services data quality metrics

• Financial services firms must identify and protect sensitive data, automate
reporting processes, and monitor and remediate regulatory compliance.

Manufacturing data quality metrics

• Manufacturers need to maintain accurate customer and vendor records, be


notified in a timely way of QA issues and maintenance needs, and track
overall supplier spend for opportunities to reduce operational costs.
Common reasons for Data Quality issues
1) Poor Organization

2) Too Much Data

3) Inconsistent Data

4) Poor Data Security

5) Poor Data Definition

6) Incorrect Data

7) Poor Data Recovery


HOW TO IMPROVE THE DATA QUALITY?
DISCOVER
• Look at the status of your data right now:
• what you have,
• where it’s kept,
• its sensitivity level,
• data connections, and
• any quality concerns it has.

DEFINE RULES
• The data quality measures you choose and
• Measures should match with the discovery phase.
HOW TO IMPROVE THE DATA QUALITY?
APPLY RULES
• Businesses must integrate their data quality tools across all data sources and targets to
remediate data quality throughout the company.

MONITOR AND MANAGE


• Data quality is a long-term commitment.

• rack and report on all data quality processes both in-house and in the cloud using dashboards,
scorecards, and visualizations
What is an information management policy?

• An information management policy helps the staff


• for creating, capturing and managing information assets
• to satisfy business, legal and stakeholder requirements.

• An information management policy should be


• consistent with the principles, environment and strategic directions
• described in your organisation's information governance framework.


• Support the management of information as an organisational asset

• Explains the benefits of good information management

• Outlines roles and responsibilities

• Proves commitment to meeting business, legislative and regulatory requirements

• Contributes to an environment that values the integrity and accessibility of information assets
An effective information management policy will usually include the
following:
• details of organisationally endorsed processes, practices and
procedures
• identification of endorsed systems for managing information
assets
• Include normal administrative practice (NAP)
• an outline of the roles, responsibilities and expectations of all
staff in managing information assets
Interaction with other policies and
procedures
• It will take more than one policy document to provide guidance
across all processes.

• IM policy statements should be embedded into a broad range of


organisational policies and procedures to assist the
stakeholders.

• Dividing policy statements across several documents can also


enhance readability
Key aspects of an information
management policy
• Title, date and version number
• Follow your agency’s naming and versioning conventions and identify
how frequently the policy will be reviewed.
• Purpose
• Explain why an information management policy is needed and
the benefits of good practice.
• Scope
• The scope should identify both who and what is covered by the
policy, to support the holistic management of all an agency’s
information assets.
Policy statement
• Provide a brief statement of your agency's commitment to good
information management practices.

• Factors that influence information management within the


agency.

For example:

[The agency] recognises its information assets as valuable


corporate assets and is committed to achieving appropriate and
ongoing management of these assets to advance [the agency’s]
strategic priorities and meet client needs.
Other guidance that could be covered in the policy includes:
• Endorsed systems used to maintain information: establish clearly
which locations are endorsed for the capture and storage of
information and which should not be used. For example, corporate
information assets must not be maintained in email folders, shared
folders, personal drives or external storage media.
• Requirements for storage and preservation: for information in
digital and physical formats, including security protocols and
preservation requirements. This may include referencing preferred file
formats.
• Access to information:
• Provide a statement supporting the staff having ready access to corporate
information.
• Describe situations when it is appropriate to restrict this access.
• Document the public’s right of access to information under legislation including
the Freedom of Information Act 1982 and the Archives Act 1983.
• Retention and destruction:

• Policies for retaining and destroying the organisation's information


assets.

• Provide staff with the information they need to comply with accountable
and authorised destruction of information assets.

• This includes the correct use of normal administrative practice (NAP).

• Transfer:
• Explain situations when information may be required to be transferred.

• Note that information may be transferred to other agencies as a result


of a machinery of government change.

You might also like