Chapter: 04
Organization of Data
Organization of data refers to the systematic arrangement of collected figures (raw data) so
that the data becomes easy to understand and more convenient for further statistical
treatment.
1. What is Classification?
o Classification refers to the process of arranging data into groups or categories
based on shared characteristics.
o It helps to simplify, organize, and make sense of large volumes of raw data.
2. Objectives of Classification
1. Brief and Simple: Summarizes large data in compact form.
2. Utility: Makes data easy to analyze, interpret, and compare.
3. Distinctiveness: Clearly distinguishes between different groups or classes.
4. Comparability: Facilitates comparison among groups or over time.
5. Scientific Arrangement: Ensures logical, systematic data structuring.
6. Attractive and Effective: Well-arranged data is easy to understand and visually
appealing.
3. Characteristics of a Good Classification
1. Comprehensiveness: Covers all data without omission.
2. Clarity: Categories must be well-defined and understandable.
3. Homogeneity: Each class should contain similar items.
4. Suitability: Must match the purpose of the study.
5. Stability: Remains valid over time and for repeated use.
6. Elasticity: Should allow for modifications if required.
4. Basis of Classification
1. Geographical /Spatial Classification: Based on location (e.g., state-wise population)
➢ The data are classified with reference to geographical location/place such as
countries, states, cities, districts, blocks etc.
2. Chronological Classification: Based on time (e.g., year-wise rainfall).
➢ In such a classification data are classified either in ascending or descending order
with reference to time such as years, quarters, months weeks etc.
3. Qualitative Classification: Based on descriptive Characteristics.
➢ Data are classified with reference to descriptive characteristics like sex, caste,
religion literacy etc.
o a. Simple: One attribute (e.g., gender: male/female).
o b. Manifold: More than one attribute (e.g., gender + education level).
4. Quantitative Classification: Based on measurable quantities (e.g., income, age).
➢ Data are classified on the basis of some measurable characteristics such as height,
age, weight, income, and marks of students.
5. Concept of a Variable
o A variable is a measurable characteristic that can take on different values.
i. Discrete Variable:
Takes specific, separate values (e.g., number of children: 1, 2, 3).
ii. Continuous Variable:
Can take any value within a range (e.g., height: 150.2 cm, 151.5 cm).
6. Raw Data
o A mass of data in its original form is called raw data.
o Unorganized and unprocessed data collected in original form.
o Example: Marks of students before grouping or tabulation.
7. Conversion of Raw Data into Statistical Series
Organizing raw data into a structured format for analysis.
Types of Statistical Series:
1. Individual Series: Data for each item separately (e.g., marks of 5 students).
Sr. no. of workers Daily wages (in Rs.)
1 25
2 50
3 35
4 40
5 20
6 45
2. Discrete Frequency Distribution or frequency array:
o Data with specific values and their frequencies.
o Example: Number of books read by students.
Frequency array of the size of household
Size of the household Number of households (Frequency)
1 5
2 15
3 25
4 35
5 10
6 5
3. Continuous Frequency Distribution:
o Grouped data into class intervals with frequencies.
o Example: Age group 10–20, 20–30, etc.
Frequency distribution or continuous series
Marks Number of students (Frequency)
10-20 4
20-30 5
30-40 8
40-50 5
50-60 4
60-70 3
8. Types of Frequency Distribution
1. Exclusive Series: Upper class limit excluded (e.g., 10–20, 20–30).
2. Inclusive Series: Upper limit included (e.g., 10–19, 20–29).
3. Open-end Series: No limits on one or both ends (e.g., below 10, above 70).
4. Cumulative Frequency Series: Total of frequencies up to a point.
5. Mid-value Frequency Series: Uses midpoints of classes with frequencies.
9. Construction of Frequency Distribution Table
Key components:
• Class Intervals and Class Limits: Ranges into which data is grouped.
• Class Boundaries: Adjusted limits to remove gaps (especially in inclusive method).
• Mid-points (Class Marks): Central value of the class.
(Upper Limit + Lower Limit) / 2
• Frequency Column: Shows the number of observations per class.
• Cumulative Frequency:
o Less than type: Adding frequencies cumulatively from top to bottom.
o More than type: Adding frequencies cumulatively from bottom to top.
Class Related Terms
• Class Limits: Lowest and highest value in a class.
• Class Width or Size: Difference between upper and lower limits.
11. Cumulative Frequency Distribution
• Less Than Cumulative Frequency:
Add frequencies from top to bottom.
(Useful for less than ogive/median calculation).
• More Than Cumulative Frequency:
Add frequencies from bottom to top.
(Used for more than ogive).
Use: Important for drawing ogives and calculating median graphically.
12. Guidelines for Forming a Frequency Distribution Table
• 6 to 15 classes are ideal.
• Class width should be equal.
• Avoid overlapping classes.
• Class intervals must be clear and meaningful.
• Provide appropriate headings and labels.