0% found this document useful (0 votes)

35 views29 pages

Research Methodogy Class 4

The document discusses research methodology and data collection methods. It covers topics like formulating hypotheses, data collection techniques, sampling designs, exploratory data analysis and dealing with outliers. Common data collection methods like questionnaires, interviews and observations are described along with sampling techniques like simple random sampling and stratified sampling.

Uploaded by

Moriwam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views29 pages

Research Methodogy Class 4

Uploaded by

Moriwam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Research Methodology

Course Code: CSE-RM, Fall 1200

SECTION 1: (F) 10AM-12PM

Presented by
Dr. Rubaiyat Islam
Associate Professor, WUB
Adjunct Faculty, IUB.
Omdena Bangladesh Chapter Lead
Crypto-economist Consultant
Sifchain Finance, USA.
COURSE CONTENTS

1. Understanding the nature of Problem, Idea generation.

2. Reviewing Literature.
3. Proposing/Developing/Experimenting/Collecting Data
4. Analyzing
5. Drawing calculation
6. Making generalization of the findings/results
7. Paper/Journal/Report Writing
8. Plagiarism, Selecting Journals.
9. Major /Minor Reviews
AFTER LITERATURE REVIEW ….
FORMULATE THE HYPOTHESIS

§ Tentative assumption.
§ A statement of expectation or prediction that will be
tested by research.
§ Guiding the researcher by span the area of research
and keep him on the right track.
§ Finalize which and how the tests must be conducted
, in the analysis of data and indirectly the quality of
data which is required for analysis.
LEARNING OBJECTIVES OF THIS
PRESENTATION

§ Six major methods of data collection

§ Effective method choosing the right data collection method
§ Strengths and weaknesses of different data collection
method.
§ Use the strong method in our research
DATA COLLECTION

§ Source of Data :
Statistical data may be obtained from two sources, namely, primary
and secondary
1 Primary data:
Data measured or collected by the investigator or the user directly from
the source. Primary sources are sources that can supply first hand
information for immediate user.
2 Secondary data:
When an investigator uses data, which have already been collected by
others, such data are called secondary data. Data gathered or compiled
from published and unpublished sources
COMMON COLLECTION METHODS

§ Tests - Participants fill out an instrument to measure their

ability or degree of skills.
§ Questionnaires – Fill out self-report instruments.
§ Interviews – Researchers can talk to participants in person or
over the telephone.
§ Focus groups – In a small group settings.
§ Observations – Observe natural and structured environment.
§ Constructed, secondary and existing data - Use data from
earlier time.
SAMPLING DESIGNS

Methods by which a representative sample can be chosen from a population.

Four sampling designs in common use:

1. Simple random sampling

2. Systematic sampling
3. Stratified sampling
4. Cluster sampling
SAMPLING DESIGNS

Simple Random Sampling

The example of putting all
students’ names and thoroughly
mixing these names before
drawing each name represents a
simple random sampling.
SAMPLING DESIGNS

Systematic Sampling : In this sampling

design, every k-th unit (or item) is
selected from a population until the
sample size is reached.
(size of population)
K = -------------------------
(size of sample)
SAMPLING DESIGNS

Stratified Sampling
In this sampling, the entire population is
divided in to several groups, called
strata, and a subsample is selected
from each group. All subsamples are
then combined to form a sample. This
sampling design is used when a
population is not homogeneous.
SAMPLING DESIGNS

Stratified sampling could be either

proportionate or disproportionate,
depending on the number of units
selected from each group.
SAMPLING DESIGNS

Cluster Sampling
This sampling design involves selecting
at random a few groups, called clusters,
from a population, and then selecting
units from each cluster. Cluster sampling
is used when a population is large,
fairly homogeneous and scattered
over a large geographical area .
DATA ORGANIZATION/PRE-PROCESSING

The process of selecting a sample from

a population amounts to data
collection. Once the data has been
collected, it must be organized to make
it meaningful. Unorganized data does
not convey any meaningful information.
Raw Data
A set of unorganized data
Data Pre-processing for Exploratory
Data Analysis:
1. Missing values
2. Creating a frequency distribution
table.
WHAT IS EXPLORATORY DATA ANALYSIS

EDA is an approach for data analysis using variety

of techniques to gain insights about the data.

• Cleaning and preprocessing

Basic steps in any • Statistical Analysis
exploratory data • Visualization for trend analysis,
analysis: anomaly detection, outlier
detection (and removal).
IMPORTANCE OF EDA

Improve understanding of variables by extracting

averages, mean, minimum, and maximum values, etc.

Discover errors, outliers, and missing values in the

data.

Identify patterns by visualizing data in graphs such as

bar graphs, scatter plots, heatmaps and histograms.

17
EDA USING PANDAS

Import data into workplace(Jupyter notebook, Google colab, Python IDE)

Descriptive statistics

Removal of nulls

Visualization

18
1. PACKAGES AND DATA IMPORT

• Step 1 : Import pandas to the workplace.

• “Import pandas”

• Step 2 : Read data/dataset into Pandas dataframe. Different

input formats include:
• Excel : read_excel
• CSV: read_csv
• JSON: read_json
• HTML and many more

19
• Used to make preliminary assessments about the population distribution of
the variable.

• Commonly used statistics:

1. Central tendency :

• Mean – The average value of all the data points. : dataframe.mean()

2. • Median – The middle value when all the data points are put in an

DESCRIPTIVE •
ordered list: dataframe.median()
Mode – The data point which occurs the most in the dataset
STATS :dataframe.mode()

(PANDAS)
2. Spread : It is the measure of how far the datapoints are away from the
mean or median

• Variance - The variance is the mean of the squares of the individual

deviations: dataframe.var()
• Standard deviation - The standard deviation is the square root of the
variance:dataframe.std()
3. Skewness: It is a measure of asymmetry: dataframe.skew()
Other methods to get a quick look on the data:
• Describe() : Summarizes the central tendency,
dispersion and shape of a dataset’s distribution,
excluding NaN values.
DESCRIPTIVE • Syntax: pandas.dataframe.describe()
STATS • Info() :Prints a concise summary of the
(CONTD.) dataframe. This method prints information
about a dataframe including the index dtype
and columns, non-null values and memory
usage.
• Syntax: pandas.dataframe.info()
3. NULL VALUES

Detecting Handling

Detecting Null- Handling null values:

values: •Dropping the rows with
•Isnull(): It is used as an null values: dropna()
alias for dataframe.isna(). function is used to delete
This function returns the rows or columns with null
dataframe with boolean values.
values indicating missing •Replacing missing values:
values. fillna() function can fill the
•Syntax : dataframe.isnull() missing values with a
special value value like
mean or median.
4. VISUALIZATION

• Univariate: Looking at one variable/column at a time

• Bar-graph
• Histograms
• Boxplot
• Multivariate : Looking at relationship between two or more
variables
• Scatter plots
• Pie plots
• Heatmaps(seaborn)

23
BAR-GRAPH, HISTOGRAM AND
BOXPLOT

• Bar graph: A bar plot is a plot that presents

data with rectangular bars with lengths
proportional to the values that they
represent.
• Boxplot : Depicts numerical data graphically
through their quartiles. The box extends
from the Q1 to Q3 quartile values of the
data, with a line at the median (Q2).
• Histogram: A histogram is a representation of
the distribution of data.
SC ATTERPLOT, PIEPLOT

• Scatterplot : Shows the data as a collection of points.

• Syntax: dataframe.plot.scatter(x = 'x_column_name', y = 'y_columnn_name’)

• Pie plot : Proportional representation of the numerical data in a column.

• Syntax: dataframe.plot.pie(y=‘column_name’)
OUTLIER DETECTION

• An outlier is a point or set of data points that lie away from the rest of the data
values of the dataset..
• Outliers are easily identified by visualizing the data.
• For e.g.
• In a boxplot, the data points which lie outside the upper and lower bound can be
considered as outliers
• In a scatterplot, the data points which lie outside the groups of datapoints can be
considered as outliers
OUTLIER REMOVAL

• Calculate the IQR as follows:

Ø Calculate the first and third quartile (Q1 and Q3)
Ø Calculate the interquartile range, IQR = Q3-Q1
Ø Find the lower bound which is Q1*1.5
Ø Find the upper bound which is Q3*1.5
Ø Replace the data points which lie outside this range.
Ø They can be replaced by mean or median.
REFERENCES

• More information on EDA tools and Pandas can be found

on below links:
• https://pandas.pydata.org/docs/user_guide/index.html
• https://pandas.pydata.org/docs/user_guide/missing_data.html
• https://pandas.pydata.org/docs/user_guide/visualization.html

28
PRACTIC AL DEMONSTRATION

• Creating research findings questionaries for EDA

• Data Story telling and EDA

Imagen Turbo-Compresor Solar
No ratings yet
Imagen Turbo-Compresor Solar
2 pages
School Based Press Conference Guidelines
No ratings yet
School Based Press Conference Guidelines
13 pages
3 Data Visualization
No ratings yet
3 Data Visualization
75 pages
Singh Surender - Biostatistics & Research Methodolgy
No ratings yet
Singh Surender - Biostatistics & Research Methodolgy
18 pages
STQS2223 CH 4
No ratings yet
STQS2223 CH 4
30 pages
Engineering Data Analysis
No ratings yet
Engineering Data Analysis
7 pages
The Origin of Paper
No ratings yet
The Origin of Paper
3 pages
Scenario 11
No ratings yet
Scenario 11
2 pages
Eda Unit 1
No ratings yet
Eda Unit 1
57 pages
Data Minds - Data Science Curriculum 2023 V2
No ratings yet
Data Minds - Data Science Curriculum 2023 V2
15 pages
Data Literacy
No ratings yet
Data Literacy
9 pages
Python EDA: Stats, Visualization, Correlation
No ratings yet
Python EDA: Stats, Visualization, Correlation
7 pages
Developers' Guide to Domain Design
No ratings yet
Developers' Guide to Domain Design
34 pages
Module 5
No ratings yet
Module 5
20 pages
(Susol Busway) - Catalog - EN - 202103
No ratings yet
(Susol Busway) - Catalog - EN - 202103
40 pages
Data Visualization
No ratings yet
Data Visualization
19 pages
SingerValve 106 PR UL Fire Valve Sheet Product Pages
No ratings yet
SingerValve 106 PR UL Fire Valve Sheet Product Pages
2 pages
Unit 2
No ratings yet
Unit 2
36 pages
VVDI PROG User Manual Guide
No ratings yet
VVDI PROG User Manual Guide
80 pages
Lecture 1
No ratings yet
Lecture 1
26 pages
DAUP Exam Notes - 2in1
No ratings yet
DAUP Exam Notes - 2in1
35 pages
6.research Methodology-BBA S1M6
No ratings yet
6.research Methodology-BBA S1M6
64 pages
Audi 80/90 Wiring Diagram Guide
No ratings yet
Audi 80/90 Wiring Diagram Guide
20 pages
Unit 3
No ratings yet
Unit 3
36 pages
ML Lecture 6 7 Preprocess
No ratings yet
ML Lecture 6 7 Preprocess
43 pages
Dev Answer Key
No ratings yet
Dev Answer Key
21 pages
Unit 1,2
No ratings yet
Unit 1,2
17 pages
What Is Data Visualization and Why Is It Important
No ratings yet
What Is Data Visualization and Why Is It Important
18 pages
Term2 Datascience Notes
No ratings yet
Term2 Datascience Notes
8 pages
Probability and Stat Unit 1
No ratings yet
Probability and Stat Unit 1
12 pages
UNIT II-DSDA - Docx Notes
No ratings yet
UNIT II-DSDA - Docx Notes
26 pages
3.badm - Mba Notes
No ratings yet
3.badm - Mba Notes
13 pages
Exploratory Data
No ratings yet
Exploratory Data
47 pages
Ovi R
No ratings yet
Ovi R
2 pages
Presentation by Abhyuday Sharma
No ratings yet
Presentation by Abhyuday Sharma
27 pages
Data Literacy
No ratings yet
Data Literacy
4 pages
Chapter 2. Data Analysis and Processing - Full
No ratings yet
Chapter 2. Data Analysis and Processing - Full
49 pages
Research Methodology & EDA Guide
No ratings yet
Research Methodology & EDA Guide
29 pages
Data Analysis
No ratings yet
Data Analysis
42 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
16 pages
Unit 2
No ratings yet
Unit 2
58 pages
Preprocessing 935
No ratings yet
Preprocessing 935
68 pages
EDA - Unit 1
No ratings yet
EDA - Unit 1
82 pages
Unit 1 - Intro To EDA
No ratings yet
Unit 1 - Intro To EDA
40 pages
Dav Exps - Merged - Merged
No ratings yet
Dav Exps - Merged - Merged
99 pages
Data, Data Collection, and Sourcing
No ratings yet
Data, Data Collection, and Sourcing
54 pages
Da Laqs Saqs
No ratings yet
Da Laqs Saqs
23 pages
Unit2 Modified
No ratings yet
Unit2 Modified
42 pages
Fda End Sem
No ratings yet
Fda End Sem
14 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
Eda Reviewer
No ratings yet
Eda Reviewer
2 pages
BRM Chapter 6
No ratings yet
BRM Chapter 6
8 pages
Chapter 2
No ratings yet
Chapter 2
53 pages
Unit .......
No ratings yet
Unit .......
45 pages
21BCAD5C01 IDA Module 2 Notes
No ratings yet
21BCAD5C01 IDA Module 2 Notes
16 pages
Imc Reviewer
No ratings yet
Imc Reviewer
5 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
4marks BA
No ratings yet
4marks BA
5 pages
Crash Course Data Science
No ratings yet
Crash Course Data Science
7 pages
Data Mining
No ratings yet
Data Mining
34 pages
CS109a Lecture1
No ratings yet
CS109a Lecture1
67 pages
Lecture Notes For Tripple I
No ratings yet
Lecture Notes For Tripple I
9 pages
Practical No.-01
No ratings yet
Practical No.-01
25 pages
Data Analysis
No ratings yet
Data Analysis
13 pages
Lect2 - Data Preprocessing
No ratings yet
Lect2 - Data Preprocessing
10 pages
Revision SB Chap 2 7
No ratings yet
Revision SB Chap 2 7
55 pages
Exploratory Data Analysis: Datascience Using Python Topic: 3
No ratings yet
Exploratory Data Analysis: Datascience Using Python Topic: 3
32 pages
Exploratory Data Analysis-1
No ratings yet
Exploratory Data Analysis-1
10 pages
Rtu PDF
No ratings yet
Rtu PDF
13 pages
Quasi-Anechoic Measurement of Loudspeakers Using Beamforming Method
No ratings yet
Quasi-Anechoic Measurement of Loudspeakers Using Beamforming Method
7 pages
1 s2.0 S0141029619311046 Main
No ratings yet
1 s2.0 S0141029619311046 Main
11 pages
Electronic Record Naming Guide
No ratings yet
Electronic Record Naming Guide
11 pages
T20 and T24 SP and AP
No ratings yet
T20 and T24 SP and AP
2 pages
My Tasks Fiori App
No ratings yet
My Tasks Fiori App
4 pages
1 Collecting and Interpreting Data Edexcel PDF
No ratings yet
1 Collecting and Interpreting Data Edexcel PDF
3 pages
Heat Pump Performance Analysis
No ratings yet
Heat Pump Performance Analysis
2 pages
Rohde and Schwarz TSMA6B - Bro - en - 3609-5622-12 - v0600
No ratings yet
Rohde and Schwarz TSMA6B - Bro - en - 3609-5622-12 - v0600
26 pages
FULL PreSonus Studio One 4 Professional 411 MULTILANG x64 PDF
No ratings yet
FULL PreSonus Studio One 4 Professional 411 MULTILANG x64 PDF
4 pages
SAS1700-2015 - Creating Multi - Sheet Microsoft Excel Workbooks With SAS - Part 2
No ratings yet
SAS1700-2015 - Creating Multi - Sheet Microsoft Excel Workbooks With SAS - Part 2
21 pages
ISM - Guidelines For System Management (December 2023)
No ratings yet
ISM - Guidelines For System Management (December 2023)
8 pages
Cyber2 Namedentity
No ratings yet
Cyber2 Namedentity
11 pages
1 Info Packet 1 (April 2022)
No ratings yet
1 Info Packet 1 (April 2022)
10 pages
DAA Q Bank CAE2
No ratings yet
DAA Q Bank CAE2
9 pages
Importance & Structure of Business Letters
No ratings yet
Importance & Structure of Business Letters
9 pages
Ensemble Learning for Genomic Selection
No ratings yet
Ensemble Learning for Genomic Selection
8 pages
BasicML Survey
No ratings yet
BasicML Survey
6 pages
Ensemble Learning for Cyber-Attack Detection
No ratings yet
Ensemble Learning for Cyber-Attack Detection
4 pages
EGEC 2023 Self Placement Guide
No ratings yet
EGEC 2023 Self Placement Guide
4 pages
Article Review - Samakaab Basha - SRE
No ratings yet
Article Review - Samakaab Basha - SRE
4 pages
Ig 1685196111
No ratings yet
Ig 1685196111
3 pages
CEng 6104-Course Outline March 2023
No ratings yet
CEng 6104-Course Outline March 2023
2 pages
19-2G0017 - Perf Curves
No ratings yet
19-2G0017 - Perf Curves
1 page