To understand the basic components of R programming and to explore
various types of data visualizations used in analytics.
# 1. Basic R Components
# ---------------------
# Variable declaration
a <- 10 # Numeric
b <- "R-Lab" # Character
c <- TRUE # Logical
d <- c(1, 2, 3, 4, 5) # Vector
# Display variable types
print(class(a)) # numeric
print(class(b)) # character
print(class(c)) # logical
print(class(d)) # numeric (vector)
# 2. Data Types
# -------------
# Numeric
num_var <- 12.5
print(num_var)
# Integer
int_var <- as.integer(10)
print(int_var)
# Logical
log_var <- FALSE
print(log_var)
# Character
char_var <- "Visualization"
print(char_var)
# Factor (categorical variable)
fact_var <- factor(c("Male", "Female", "Female", "Male"))
print(fact_var)
# Matrix
mat <- matrix(1:9, nrow=3)
print(mat)
# Data Frame
df <- data.frame(Name=c("A", "B"), Marks=c(90, 85))
print(df)
3. Overview of Visualization
# ----------------------------
# Visuals help us to see data trends, outliers, patterns
# We'll demonstrate:
# - Line Plot
# - Bar Plot
# - Histogram
# - Pie Chart
# - Boxplot
# - Scatter Plot
# 4. Basic Plotting Graphs
# ------------------------
# Line Plot
x <- c(1, 2, 3, 4, 5)
y <- c(2, 4, 6, 8, 10)
plot(x, y, type="o", col="blue", xlab="X Axis", ylab="Y Axis", main="Line
Plot")
# Bar Plot
subjects <- c("Math", "Science", "English")
marks <- c(88, 75, 90)
barplot(marks, names.arg=subjects, col="skyblue", main="Bar Plot:
Subject Marks")
# Histogram
data <- c(10, 20, 20, 30, 40, 40, 40, 50, 60, 70)
hist(data, col="green", main="Histogram of Data", xlab="Values")
# Pie Chart
slices <- c(10, 20, 30, 40)
labels <- c("A", "B", "C", "D")
pie(slices, labels=labels, main="Pie Chart", col=rainbow(length(slices)))
# Boxplot
score <- c(65, 70, 75, 80, 90, 100, 50, 60)
boxplot(score, main="Boxplot of Scores", col="orange")
# Scatter Plot
x <- c(5, 10, 15, 20)
y <- c(2, 4, 8, 16)
plot(x, y, main="Scatter Plot", xlab="X", ylab="Y", col="red", pch=19)
5. Types of Graphs in Analytics
# -------------------------------
# - Univariate plots (histogram, barplot, pie)
# - Bivariate plots (scatterplot, boxplot)
# - Multivariate plots (pairs, grouped barplot)
# - Time-series plots (line charts)
# - Distribution plots (density, histogram)
# Example: Multiple lines in a plot (time series)
time <- 1:5
sales_2023 <- c(100, 120, 130, 115, 140)
sales_2024 <- c(90, 110, 125, 135, 150)
plot(time, sales_2023, type="o", col="blue", ylim=c(80, 160),
xlab="Quarter", ylab="Sales", main="Sales Comparison")
lines(time, sales_2024, type="o", col="red")
legend("topleft", legend=c("2023", "2024"), col=c("blue", "red"), lty=1,
pch=1)
Try using built-in datasets like:
data(mtcars)
head(mtcars)
Then you can apply:
plot(mtcars$wt, mtcars$mpg, col="blue", main="Weight vs MPG")
What is R?
R is a powerful programming language and software environment
used for statistical computing and graphics.
It is open-source and widely used in data science, machine
learning, and academic research.
🔹 Why Use R for Data Visualization?
Built-in support for graphics and plotting.
Libraries like ggplot2, lattice, plotly, shiny enhance visual
capabilities.
Supports interactive dashboards and visual storytelling.
Variables and Assignment
x <- 10 # assigns 10 to x
name <- "R Language"
Data Types in R
Type Example Description
Numeric x <- 5.5 Decimal values
x <-
Integer Whole numbers
as.integer(10)
name <-
Character Text data
"Data"
Logical flag <- TRUE TRUE/FALSE values
Type Example Description
factor(c("M",
Factor Categorical values
"F"))
Vector c(1,2,3) Sequence of elements
matrix(1:6, 2,
Matrix 2D array of data
3)
Data Table-like structure
data.frame()
Frame (rows/cols)
. Overview of Data Visualization
🔹 What is Data Visualization?
The graphical representation of data and information using visual
elements like charts, graphs, and maps.
🔹 Importance:
Simplifies complex data
Highlights patterns and trends
Aids in effective communication and decision making
Basic Graphs in R (Using Base Functions)
Graph Functio
Purpose
Type n
Line Plot Show trends over time plot()
barplot(
Bar Chart Compare categories
)
Pie Chart Show proportions pie()
Histogram View frequency distribution hist()
boxplot(
Box Plot View data distribution and outliers
)
Scatter Relationship between two numeric
plot()
Plot variables
Example: Line Plot
x <- 1:5
y <- c(5, 10, 15, 20, 25)
plot(x, y, type="o", col="blue", main="Line Plot", xlab="X", ylab="Y")
Graph Types in Analytics
✅ Univariate Graphs
Analyze single variable
o Histogram, Pie Chart, Box Plot
Bivariate Graphs
Compare two variables
o Scatter Plot, Bar Graph, Line Graph
✅ Multivariate Graphs
Analyze three or more variables
o Grouped Bar Plot, Bubble Chart, Faceted Plots
Key Functions in Base R
Functio
Use
n
plot() General plotting
hist() Histogram
barplot(
Bar chart
)
boxplot(
Boxplot
)
pie() Pie chart
lines() Add lines to a plot
legend() Add legends to a
Functio
Use
n
plot
# Variable declaration
num <- 10 # Numeric
name <- "R Programming" # Character
flag <- TRUE # Logical
# Display values
print(num)
print(name)
print(flag)
Data Types in R
# Numeric
a <- 23.5
print(class(a)) # "numeric"
# Integer
b <- as.integer(23)
print(class(b)) # "integer"
# Character
c <- "Hello R"
print(class(c)) # "character"
# Logical
d <- FALSE
print(class(d)) # "logical"
# Vector
v <- c(1, 2, 3, 4, 5)
print(v)
print(class(v)) # "numeric"
# Factor
gender <- factor(c("Male", "Female", "Female", "Male"))
print(gender)
print(class(gender)) # "factor"
# Matrix
mat <- matrix(1:9, nrow=3)
print(mat)
# Data Frame
df <- data.frame(Name=c("John", "Alice"), Age=c(25, 30))
print(df)
Visualization = transforming raw data into visual insights
# This example shows how to use built-in plotting to understand trends
data <- c(12, 15, 20, 18, 25)
barplot(data, main="Sample Bar Chart", col="steelblue")
Basics of Plotting Graphs in R
# Line Plot
x <- c(1, 2, 3, 4, 5)
y <- c(2, 4, 6, 8, 10)
plot(x, y, type="o", col="blue", xlab="X Axis", ylab="Y Axis", main="Line
Plot")
# Bar Plot
subjects <- c("Math", "Science", "English")
scores <- c(80, 90, 70)
barplot(scores, names.arg=subjects, col="green", main="Bar Chart of
Subjects")
# Histogram
data <- c(10, 20, 20, 30, 40, 40, 40, 50, 60, 70)
hist(data, col="purple", main="Histogram of Values", xlab="Value")
# Pie Chart
slices <- c(25, 35, 40)
labels <- c("A", "B", "C")
pie(slices, labels=labels, col=rainbow(length(slices)), main="Pie Chart")
# Boxplot
marks <- c(55, 60, 65, 70, 90, 85, 45, 77)
boxplot(marks, main="Boxplot of Marks", col="orange")
# Scatter Plot
wt <- c(2, 2.5, 3, 3.5, 4)
mpg <- c(35, 30, 28, 22, 20)
plot(wt, mpg, main="Scatter Plot: Weight vs MPG", xlab="Weight",
ylab="MPG", col="red", pch=19)
Different Types of Graphs in Analytics (Summary with Examples)
# Univariate: Histogram (distribution of a single variable)
hist(mtcars$mpg, main="Univariate - MPG Distribution", col="skyblue")
# Bivariate: Boxplot (continuous vs categorical)
boxplot(mpg ~ cyl, data=mtcars, main="Bivariate - MPG vs Cylinders",
col="yellow")
# Bivariate: Scatter plot (2 continuous variables)
plot(mtcars$hp, mtcars$mpg, main="HP vs MPG", xlab="Horsepower",
ylab="MPG", col="blue", pch=16)
# Multivariate: Colored scatter plot with shape by gear
library(ggplot2)
ggplot(mtcars, aes(x=wt, y=mpg, color=factor(cyl), shape=factor(gear)))
+
geom_point(size=3) +
ggtitle("Multivariate: Weight vs MPG by Cylinders and Gears")
Description of the Command
🔹 Function: hist()
This function is used to create a histogram — a graphical
representation that organizes a group of data points into user-
specified ranges (bins).
It helps you visualize the frequency distribution of a continuous
numeric variable.
Component-by-Component Explanation
Component Meaning
Refers to the mpg (Miles Per Gallon) column in
mtcars$mpg
the built-in mtcars dataset.
hist(mtcars$mpg) Plots a histogram for the mpg values.
main="Univariate - MPG
Adds a title to the plot.
Distribution"
Fills the bars of the histogram with the color
col="skyblue"
"skyblue".
hist(mtcars$mpg, main="Univariate - MPG Distribution", col="skyblue")
What This Graph Shows
X-axis: Represents MPG intervals (e.g., 10–15, 15–20, etc.).
Y-axis: Represents the frequency — how many cars fall within each
MPG range.
You can visually see:
o Whether the data is skewed or symmetric
o Where most values are concentrated (central tendency)
o Any possible outliers
wt <- c(2, 2.5, 3, 3.5, 4)
mpg <- c(35, 30, 28, 22, 20)
plot(wt, mpg, main="Scatter Plot: Weight vs MPG", xlab="Weight",
ylab="MPG", col="red", pch=19)
This code creates a scatter plot showing the relationship between the
weight of vehicles (wt) and their corresponding miles per gallon
(mpg).
Line-by-Line Explanation
🔹 wt <- c(2, 2.5, 3, 3.5, 4)
Creates a numeric vector wt with 5 values.
These could represent weights of cars in tons (or another unit).
🔹 mpg <- c(35, 30, 28, 22, 20)
Creates another numeric vector mpg with corresponding Miles Per
Gallon values.
Each value in mpg is related to the respective value in wt.
Weight MP
(wt) G
2.0 35
2.5 30
3.0 28
3.5 22
4.0 20
plot(wt, mpg, ...)
This command plots the values:
Paramet
Description
er
x = wt, y = mpg → plots weight on X-axis,
wt, mpg
MPG on Y-axis.
main "Scatter Plot: Weight vs MPG" → adds a title.
xlab "Weight" → label for the X-axis.
ylab "MPG" → label for the Y-axis.
col="red" Plots points in red color.
pch=19 Uses solid circle for plotting points.
What the Scatter Plot Shows
As weight increases, MPG decreases.
This represents a negative correlation between vehicle weight
and fuel efficiency.
Heavier cars tend to be less fuel-efficient.
Interpretation
This kind of visualization is important in automobile analytics, where we
assess how one feature (e.g., weight) affects another (e.g., fuel
efficiency).
plot(wt, mpg, main="Scatter Plot: Weight vs MPG", xlab="Weight",
ylab="MPG", col="red", pch=19)
abline(lm(mpg ~ wt), col="blue", lwd=2)
What This Script Does
This code creates a scatter plot of Weight vs MPG and adds a
regression (trend) line that represents the best linear fit for the data.
Paramet
Description
er
wt X-axis values: car weights
Y-axis values: miles per
mpg
gallon
main Title of the graph
xlab, ylab Axis labels
col="red" Points are colored red
Points are plotted as solid
pch=19
circles
abline(lm(mpg ~ wt), col="blue", lwd=2)
Adds a regression (trend) line to the plot.
Compone Descripti
nt on
lm(mpg ~ Fits a linear regression model where mpg is
wt) predicted using wt.
abline(.. Adds the regression line to the
.) scatter plot.
col="blu The line is colored
e" blue.
lwd= Line width is set to 2 (thicker line for better
2 visibility).
rpose: Helps visualize the trend — in this case, how MPG decreases
as weight increases.
Output Interpretation
If the trend line is downward-sloping:
Conclusion: Heavier vehicles typically have lower fuel efficiency.
plot(mtcars$mpg, mtcars$wt, col='steelblue',
main='Scatterplot', xlab='mpg', ylab='wt', pch=19)
This line of code creates a scatter plot to visualize the relationship
between:
mpg (Miles Per Gallon) — fuel efficiency
wt (Weight of the car in 1000 lbs)
using the built-in mtcars dataset in R.
Compone Descripti
nt on
plot( The R function used to create scatter plots or
) other graphs
mtcars$m Values for the X-axis (Miles Per
pg Gallon)
mtcars$ Values for the Y-axis (Weight in
wt 1000 lbs)
col='steelbl Sets the color of the plotted points to
ue' "steelblue"
main='Scatterpl Sets the main title of the
ot' graph
xlab='mp Labels the X-axis as
g' "mpg"
ylab=' Labels the Y-axis as
wt' "wt"
pch=1 Plots the points using solid filled
9 circles
The scatter plot consists of points where:
X-axis = mpg (fuel efficiency)
Y-axis = wt (vehicle weight)
It helps visualize the relationship between a car's weight and its
fuel efficiency.
Scatter plots are ideal to:
Detect correlation between variables.
Spot outliers.
Visualize linear/nonlinear patterns.
abline(lm(mtcars$wt ~ mtcars$mpg), col='red', lwd=2)
This draws a line showing the overall trend between mpg and wt.
abline(lm(mtcars$wt ~ mtcars$mpg), col='red', lwd=2)
Specifically, it shows the linear relationship between mpg (miles per
gallon) and wt (weight of the vehicle) from the mtcars dataset.
Component Description
lm(mtcars$wt ~ Fits a linear regression model where wt is
Component Description
mtcars$mpg) predicted by mpg.
Draws a straight line using the coefficients of
abline(...)
the regression model.
col='red' Colors the line red.
Sets the line width to 2 (makes it thicker for
lwd=2
visibility).
What the Regression Line Represents
The regression line is the best-fit straight line through the data
points.
It shows the trend or direction of the relationship between mpg
and wt.
Interpretation:
As mpg increases, wt decreases.
This indicates a negative linear relationship: cars that are more
fuel efficient tend to weigh less.
What is the Use of a Boxplot in R (and Data Analysis)?
A boxplot (also known as a box-and-whisker plot) is a graphical
summary of the distribution of a dataset. It visually displays a five-
number summary:
1. Minimum
2. First Quartile (Q1)
3. Median (Q2)
4. Third Quartile (Q3)
5. Maximum
🎯 Main Uses of a Boxplot
. Visualizing Distribution
Shows how the data is spread out — whether it is symmetric,
skewed left/right, or has outliers.
2. Identifying the Median
The line inside the box represents the median, which is the
central value of the dataset.
Detecting Skewness
If the median is closer to the bottom or top of the box, the data
is skewed.
o Bottom = left-skewed
o Top = right-skewed
Spotting Outliers
Data points that fall outside the whiskers are considered outliers
and are shown as individual dots.
Comparing Groups
When plotting multiple boxplots side-by-side, you can compare
distributions across categories (e.g., scores of boys vs girls,
sales in different regions, etc.).
# Create sample data
scores <- c(55, 60, 65, 70, 90, 85, 45, 77)
# Plot boxplot
boxplot(scores, main="Boxplot of Scores", col="lightblue", ylab="Score")
This will show:
The central 50% of scores (interquartile range)
The median score
Any outliers in the data
Why Boxplots are Important in Analytics
Quickly detect variability, outliers, and data symmetry.
Preferred in exploratory data analysis (EDA).
Useful when comparing large datasets or categories
✅ R Script: Side-by-Side Boxplots
# Load built-in dataset
data(mtcars)
# Convert 'cyl' to a categorical variable
mtcars$cyl <- as.factor(mtcars$cyl)
# Create side-by-side boxplots
boxplot(mpg ~ cyl, data = mtcars,
main = "MPG Distribution by Number of Cylinders",
xlab = "Number of Cylinders",
ylab = "Miles Per Gallon (MPG)",
col = c("skyblue", "orange", "lightgreen"))
Component Description
mpg ~ cyl Formula format: Compare mpg across different cyl values (4,
6, 8)
data = mtcars Use the mtcars dataset
boxplot(...) Draws the grouped boxplots
col = ... Adds different colors to each box
main, xlab, ylab Add labels and title for readability
What the Boxplot Shows
Each box shows the distribution of MPG for a specific cylinder category.
You can compare:
Median fuel efficiency
Variation in MPG
Presence of outliers
You'll likely see:
4-cylinder cars have higher and more consistent MPG.
8-cylinder cars have lower MPG with more spread.
The operator ~ used in the expression mpg ~ cyl is called the tilde
operator and it has a special meaning in R, especially in formulas for
statistical modeling and plotting.
Operator: ~ (Tilde)
✅ Used for:
Creating a formula object that defines a relationship between variables.
In Context: mpg ~ cyl
🔹 Meaning:
"mpg is modeled as a function of cyl"
mpg is the dependent variable (Y-axis)
cyl is the independent variable or grouping factor (X-axis)
📘 In the boxplot function:
Copy
Edit
boxplot(mpg ~ cyl, data = mtcars)
This tells R to:
Group the mpg values based on each unique value of cyl
Then create a separate boxplot for each cylinder group (4, 6, 8)
ata(mtcars)
boxplot(disp ~ gear, data = mtcars,
main = "Displacement by Gear",
xlab = "Gear",
ylab = "Displacement")
This code creates a boxplot that shows the distribution of engine
displacement (disp) grouped by gear categories (gear) in the mtcars
dataset.
Line-by-Line Explanation
🔹 data(mtcars)
Loads the built-in mtcars dataset.
It contains information about various car attributes such as:
disp = engine displacement (in cubic inches)
gear = number of forward gears (3, 4, or 5)
This creates side-by-side boxplots of the disp variable for each unique
value in gear.
Element Description
disp ~ gear Formula: Plot displacement (disp) grouped by gear
data = mtcars Use the mtcars dataset
main Title of the boxplot
xlab, ylab Axis labels for clarity
What the Boxplot Shows
X-axis (gear): Different gear groups — typically 3, 4, and 5 gears.
Y-axis (disp): The engine displacement (how big the engine is).
Each boxplot summarizes the distribution of displacement for each gear
group.
. Gear = 3 → Highest Displacement
The boxplot shows that cars with 3 forward gears have a higher median
displacement, and the range (box + whiskers) is spread over larger engine
sizes.
This suggests that older or heavier vehicles, such as classic muscle cars or
luxury sedans, often come with:
Fewer gears
Larger engines
Less fuel efficiency
These cars are designed more for power and torque than for speed or
agility.
Gear = 5 → Lower Displacement
Cars with 5 gears show a lower engine displacement, with a tighter spread
and smaller median value.
This typically represents:
Modern, performance-tuned vehicles
Compact sports cars
Fuel-efficient or economy-class cars
These engines may be smaller in size, but paired with more gears for
better performance, speed, and efficiency.