Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
9 views23 pages

Week 1 Basics

Uploaded by

Mr.Dracula1989
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views23 pages

Week 1 Basics

Uploaded by

Mr.Dracula1989
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 23

To understand the basic components of R programming and to explore

various types of data visualizations used in analytics.

# 1. Basic R Components

# ---------------------

# Variable declaration

a <- 10 # Numeric

b <- "R-Lab" # Character

c <- TRUE # Logical

d <- c(1, 2, 3, 4, 5) # Vector

# Display variable types

print(class(a)) # numeric

print(class(b)) # character

print(class(c)) # logical

print(class(d)) # numeric (vector)

# 2. Data Types

# -------------

# Numeric

num_var <- 12.5

print(num_var)

# Integer

int_var <- as.integer(10)

print(int_var)
# Logical

log_var <- FALSE

print(log_var)

# Character

char_var <- "Visualization"

print(char_var)

# Factor (categorical variable)

fact_var <- factor(c("Male", "Female", "Female", "Male"))

print(fact_var)

# Matrix

mat <- matrix(1:9, nrow=3)

print(mat)

# Data Frame

df <- data.frame(Name=c("A", "B"), Marks=c(90, 85))

print(df)

3. Overview of Visualization

# ----------------------------

# Visuals help us to see data trends, outliers, patterns

# We'll demonstrate:

# - Line Plot

# - Bar Plot

# - Histogram
# - Pie Chart

# - Boxplot

# - Scatter Plot

# 4. Basic Plotting Graphs

# ------------------------

# Line Plot

x <- c(1, 2, 3, 4, 5)

y <- c(2, 4, 6, 8, 10)

plot(x, y, type="o", col="blue", xlab="X Axis", ylab="Y Axis", main="Line


Plot")

# Bar Plot

subjects <- c("Math", "Science", "English")

marks <- c(88, 75, 90)

barplot(marks, names.arg=subjects, col="skyblue", main="Bar Plot:


Subject Marks")

# Histogram

data <- c(10, 20, 20, 30, 40, 40, 40, 50, 60, 70)

hist(data, col="green", main="Histogram of Data", xlab="Values")

# Pie Chart

slices <- c(10, 20, 30, 40)

labels <- c("A", "B", "C", "D")

pie(slices, labels=labels, main="Pie Chart", col=rainbow(length(slices)))

# Boxplot

score <- c(65, 70, 75, 80, 90, 100, 50, 60)
boxplot(score, main="Boxplot of Scores", col="orange")

# Scatter Plot

x <- c(5, 10, 15, 20)

y <- c(2, 4, 8, 16)

plot(x, y, main="Scatter Plot", xlab="X", ylab="Y", col="red", pch=19)

5. Types of Graphs in Analytics

# -------------------------------

# - Univariate plots (histogram, barplot, pie)

# - Bivariate plots (scatterplot, boxplot)

# - Multivariate plots (pairs, grouped barplot)

# - Time-series plots (line charts)

# - Distribution plots (density, histogram)

# Example: Multiple lines in a plot (time series)

time <- 1:5

sales_2023 <- c(100, 120, 130, 115, 140)

sales_2024 <- c(90, 110, 125, 135, 150)

plot(time, sales_2023, type="o", col="blue", ylim=c(80, 160),

xlab="Quarter", ylab="Sales", main="Sales Comparison")

lines(time, sales_2024, type="o", col="red")

legend("topleft", legend=c("2023", "2024"), col=c("blue", "red"), lty=1,


pch=1)

Try using built-in datasets like:


data(mtcars)

head(mtcars)

Then you can apply:

plot(mtcars$wt, mtcars$mpg, col="blue", main="Weight vs MPG")

What is R?

 R is a powerful programming language and software environment


used for statistical computing and graphics.

 It is open-source and widely used in data science, machine


learning, and academic research.

🔹 Why Use R for Data Visualization?

 Built-in support for graphics and plotting.

 Libraries like ggplot2, lattice, plotly, shiny enhance visual


capabilities.

 Supports interactive dashboards and visual storytelling.

Variables and Assignment

x <- 10 # assigns 10 to x

name <- "R Language"

Data Types in R

Type Example Description

Numeric x <- 5.5 Decimal values

x <-
Integer Whole numbers
as.integer(10)

name <-
Character Text data
"Data"

Logical flag <- TRUE TRUE/FALSE values


Type Example Description

factor(c("M",
Factor Categorical values
"F"))

Vector c(1,2,3) Sequence of elements

matrix(1:6, 2,
Matrix 2D array of data
3)

Data Table-like structure


data.frame()
Frame (rows/cols)

. Overview of Data Visualization

🔹 What is Data Visualization?

 The graphical representation of data and information using visual


elements like charts, graphs, and maps.

🔹 Importance:

 Simplifies complex data

 Highlights patterns and trends

 Aids in effective communication and decision making

Basic Graphs in R (Using Base Functions)

Graph Functio
Purpose
Type n

Line Plot Show trends over time plot()

barplot(
Bar Chart Compare categories
)

Pie Chart Show proportions pie()

Histogram View frequency distribution hist()

boxplot(
Box Plot View data distribution and outliers
)

Scatter Relationship between two numeric


plot()
Plot variables
Example: Line Plot

x <- 1:5

y <- c(5, 10, 15, 20, 25)

plot(x, y, type="o", col="blue", main="Line Plot", xlab="X", ylab="Y")

Graph Types in Analytics

✅ Univariate Graphs

 Analyze single variable

o Histogram, Pie Chart, Box Plot

Bivariate Graphs

 Compare two variables

o Scatter Plot, Bar Graph, Line Graph

✅ Multivariate Graphs

 Analyze three or more variables

o Grouped Bar Plot, Bubble Chart, Faceted Plots

Key Functions in Base R

Functio
Use
n

plot() General plotting

hist() Histogram

barplot(
Bar chart
)

boxplot(
Boxplot
)

pie() Pie chart

lines() Add lines to a plot

legend() Add legends to a


Functio
Use
n

plot

# Variable declaration

num <- 10 # Numeric

name <- "R Programming" # Character

flag <- TRUE # Logical

# Display values

print(num)

print(name)

print(flag)

Data Types in R

# Numeric

a <- 23.5

print(class(a)) # "numeric"

# Integer

b <- as.integer(23)

print(class(b)) # "integer"

# Character

c <- "Hello R"

print(class(c)) # "character"

# Logical

d <- FALSE
print(class(d)) # "logical"

# Vector

v <- c(1, 2, 3, 4, 5)

print(v)

print(class(v)) # "numeric"

# Factor

gender <- factor(c("Male", "Female", "Female", "Male"))

print(gender)

print(class(gender)) # "factor"

# Matrix

mat <- matrix(1:9, nrow=3)

print(mat)

# Data Frame

df <- data.frame(Name=c("John", "Alice"), Age=c(25, 30))

print(df)

Visualization = transforming raw data into visual insights

# This example shows how to use built-in plotting to understand trends

data <- c(12, 15, 20, 18, 25)

barplot(data, main="Sample Bar Chart", col="steelblue")

Basics of Plotting Graphs in R

# Line Plot

x <- c(1, 2, 3, 4, 5)

y <- c(2, 4, 6, 8, 10)


plot(x, y, type="o", col="blue", xlab="X Axis", ylab="Y Axis", main="Line
Plot")

# Bar Plot

subjects <- c("Math", "Science", "English")

scores <- c(80, 90, 70)

barplot(scores, names.arg=subjects, col="green", main="Bar Chart of


Subjects")

# Histogram

data <- c(10, 20, 20, 30, 40, 40, 40, 50, 60, 70)

hist(data, col="purple", main="Histogram of Values", xlab="Value")

# Pie Chart

slices <- c(25, 35, 40)

labels <- c("A", "B", "C")

pie(slices, labels=labels, col=rainbow(length(slices)), main="Pie Chart")

# Boxplot

marks <- c(55, 60, 65, 70, 90, 85, 45, 77)

boxplot(marks, main="Boxplot of Marks", col="orange")

# Scatter Plot

wt <- c(2, 2.5, 3, 3.5, 4)

mpg <- c(35, 30, 28, 22, 20)

plot(wt, mpg, main="Scatter Plot: Weight vs MPG", xlab="Weight",


ylab="MPG", col="red", pch=19)

Different Types of Graphs in Analytics (Summary with Examples)

# Univariate: Histogram (distribution of a single variable)

hist(mtcars$mpg, main="Univariate - MPG Distribution", col="skyblue")


# Bivariate: Boxplot (continuous vs categorical)

boxplot(mpg ~ cyl, data=mtcars, main="Bivariate - MPG vs Cylinders",


col="yellow")

# Bivariate: Scatter plot (2 continuous variables)

plot(mtcars$hp, mtcars$mpg, main="HP vs MPG", xlab="Horsepower",


ylab="MPG", col="blue", pch=16)

# Multivariate: Colored scatter plot with shape by gear

library(ggplot2)

ggplot(mtcars, aes(x=wt, y=mpg, color=factor(cyl), shape=factor(gear)))


+

geom_point(size=3) +

ggtitle("Multivariate: Weight vs MPG by Cylinders and Gears")

Description of the Command

🔹 Function: hist()

 This function is used to create a histogram — a graphical


representation that organizes a group of data points into user-
specified ranges (bins).

 It helps you visualize the frequency distribution of a continuous


numeric variable.

Component-by-Component Explanation

Component Meaning

Refers to the mpg (Miles Per Gallon) column in


mtcars$mpg
the built-in mtcars dataset.

hist(mtcars$mpg) Plots a histogram for the mpg values.

main="Univariate - MPG
Adds a title to the plot.
Distribution"

Fills the bars of the histogram with the color


col="skyblue"
"skyblue".
hist(mtcars$mpg, main="Univariate - MPG Distribution", col="skyblue")

What This Graph Shows

 X-axis: Represents MPG intervals (e.g., 10–15, 15–20, etc.).

 Y-axis: Represents the frequency — how many cars fall within each
MPG range.

 You can visually see:

o Whether the data is skewed or symmetric

o Where most values are concentrated (central tendency)

o Any possible outliers

wt <- c(2, 2.5, 3, 3.5, 4)

mpg <- c(35, 30, 28, 22, 20)

plot(wt, mpg, main="Scatter Plot: Weight vs MPG", xlab="Weight",


ylab="MPG", col="red", pch=19)

This code creates a scatter plot showing the relationship between the
weight of vehicles (wt) and their corresponding miles per gallon
(mpg).

Line-by-Line Explanation

🔹 wt <- c(2, 2.5, 3, 3.5, 4)

 Creates a numeric vector wt with 5 values.

 These could represent weights of cars in tons (or another unit).

🔹 mpg <- c(35, 30, 28, 22, 20)

 Creates another numeric vector mpg with corresponding Miles Per


Gallon values.

 Each value in mpg is related to the respective value in wt.


Weight MP
(wt) G

2.0 35

2.5 30

3.0 28

3.5 22

4.0 20

plot(wt, mpg, ...)

This command plots the values:

Paramet
Description
er

x = wt, y = mpg → plots weight on X-axis,


wt, mpg
MPG on Y-axis.

main "Scatter Plot: Weight vs MPG" → adds a title.

xlab "Weight" → label for the X-axis.

ylab "MPG" → label for the Y-axis.

col="red" Plots points in red color.

pch=19 Uses solid circle for plotting points.

What the Scatter Plot Shows

 As weight increases, MPG decreases.

 This represents a negative correlation between vehicle weight


and fuel efficiency.

 Heavier cars tend to be less fuel-efficient.


Interpretation

This kind of visualization is important in automobile analytics, where we


assess how one feature (e.g., weight) affects another (e.g., fuel
efficiency).

plot(wt, mpg, main="Scatter Plot: Weight vs MPG", xlab="Weight",


ylab="MPG", col="red", pch=19)

abline(lm(mpg ~ wt), col="blue", lwd=2)

What This Script Does

This code creates a scatter plot of Weight vs MPG and adds a


regression (trend) line that represents the best linear fit for the data.

Paramet
Description
er

wt X-axis values: car weights

Y-axis values: miles per


mpg
gallon

main Title of the graph

xlab, ylab Axis labels

col="red" Points are colored red

Points are plotted as solid


pch=19
circles

abline(lm(mpg ~ wt), col="blue", lwd=2)

Adds a regression (trend) line to the plot.

Compone Descripti
nt on
lm(mpg ~ Fits a linear regression model where mpg is
wt) predicted using wt.

abline(.. Adds the regression line to the


.) scatter plot.

col="blu The line is colored


e" blue.

lwd= Line width is set to 2 (thicker line for better


2 visibility).

rpose: Helps visualize the trend — in this case, how MPG decreases
as weight increases.

Output Interpretation

If the trend line is downward-sloping:

 Conclusion: Heavier vehicles typically have lower fuel efficiency.

plot(mtcars$mpg, mtcars$wt, col='steelblue',

main='Scatterplot', xlab='mpg', ylab='wt', pch=19)

This line of code creates a scatter plot to visualize the relationship


between:

 mpg (Miles Per Gallon) — fuel efficiency

 wt (Weight of the car in 1000 lbs)

using the built-in mtcars dataset in R.

Compone Descripti
nt on

plot( The R function used to create scatter plots or


) other graphs

mtcars$m Values for the X-axis (Miles Per


pg Gallon)
mtcars$ Values for the Y-axis (Weight in
wt 1000 lbs)

col='steelbl Sets the color of the plotted points to


ue' "steelblue"

main='Scatterpl Sets the main title of the


ot' graph

xlab='mp Labels the X-axis as


g' "mpg"

ylab=' Labels the Y-axis as


wt' "wt"

pch=1 Plots the points using solid filled


9 circles

 The scatter plot consists of points where:

 X-axis = mpg (fuel efficiency)

 Y-axis = wt (vehicle weight)

 It helps visualize the relationship between a car's weight and its


fuel efficiency.

Scatter plots are ideal to:

 Detect correlation between variables.

 Spot outliers.

 Visualize linear/nonlinear patterns.

abline(lm(mtcars$wt ~ mtcars$mpg), col='red', lwd=2)

This draws a line showing the overall trend between mpg and wt.

abline(lm(mtcars$wt ~ mtcars$mpg), col='red', lwd=2)

Specifically, it shows the linear relationship between mpg (miles per


gallon) and wt (weight of the vehicle) from the mtcars dataset.

Component Description

lm(mtcars$wt ~ Fits a linear regression model where wt is


Component Description

mtcars$mpg) predicted by mpg.

Draws a straight line using the coefficients of


abline(...)
the regression model.

col='red' Colors the line red.

Sets the line width to 2 (makes it thicker for


lwd=2
visibility).

What the Regression Line Represents

 The regression line is the best-fit straight line through the data
points.

 It shows the trend or direction of the relationship between mpg


and wt.

Interpretation:

 As mpg increases, wt decreases.

 This indicates a negative linear relationship: cars that are more


fuel efficient tend to weigh less.

What is the Use of a Boxplot in R (and Data Analysis)?

A boxplot (also known as a box-and-whisker plot) is a graphical


summary of the distribution of a dataset. It visually displays a five-
number summary:

1. Minimum

2. First Quartile (Q1)

3. Median (Q2)

4. Third Quartile (Q3)

5. Maximum

🎯 Main Uses of a Boxplot


. Visualizing Distribution

 Shows how the data is spread out — whether it is symmetric,


skewed left/right, or has outliers.

2. Identifying the Median

 The line inside the box represents the median, which is the
central value of the dataset.

Detecting Skewness

 If the median is closer to the bottom or top of the box, the data
is skewed.

o Bottom = left-skewed

o Top = right-skewed

Spotting Outliers

 Data points that fall outside the whiskers are considered outliers
and are shown as individual dots.

Comparing Groups

 When plotting multiple boxplots side-by-side, you can compare


distributions across categories (e.g., scores of boys vs girls,
sales in different regions, etc.).

# Create sample data

scores <- c(55, 60, 65, 70, 90, 85, 45, 77)

# Plot boxplot

boxplot(scores, main="Boxplot of Scores", col="lightblue", ylab="Score")

This will show:

 The central 50% of scores (interquartile range)


 The median score

 Any outliers in the data

Why Boxplots are Important in Analytics

 Quickly detect variability, outliers, and data symmetry.

 Preferred in exploratory data analysis (EDA).

 Useful when comparing large datasets or categories

✅ R Script: Side-by-Side Boxplots

# Load built-in dataset

data(mtcars)

# Convert 'cyl' to a categorical variable

mtcars$cyl <- as.factor(mtcars$cyl)

# Create side-by-side boxplots

boxplot(mpg ~ cyl, data = mtcars,

main = "MPG Distribution by Number of Cylinders",

xlab = "Number of Cylinders",

ylab = "Miles Per Gallon (MPG)",

col = c("skyblue", "orange", "lightgreen"))

Component Description

mpg ~ cyl Formula format: Compare mpg across different cyl values (4,
6, 8)

data = mtcars Use the mtcars dataset

boxplot(...) Draws the grouped boxplots

col = ... Adds different colors to each box

main, xlab, ylab Add labels and title for readability


What the Boxplot Shows

Each box shows the distribution of MPG for a specific cylinder category.

You can compare:

Median fuel efficiency

Variation in MPG

Presence of outliers

You'll likely see:

4-cylinder cars have higher and more consistent MPG.

8-cylinder cars have lower MPG with more spread.

The operator ~ used in the expression mpg ~ cyl is called the tilde
operator and it has a special meaning in R, especially in formulas for
statistical modeling and plotting.

Operator: ~ (Tilde)

✅ Used for:

Creating a formula object that defines a relationship between variables.

In Context: mpg ~ cyl

🔹 Meaning:

"mpg is modeled as a function of cyl"

mpg is the dependent variable (Y-axis)


cyl is the independent variable or grouping factor (X-axis)

📘 In the boxplot function:

Copy

Edit

boxplot(mpg ~ cyl, data = mtcars)

This tells R to:

Group the mpg values based on each unique value of cyl

Then create a separate boxplot for each cylinder group (4, 6, 8)

ata(mtcars)

boxplot(disp ~ gear, data = mtcars,

main = "Displacement by Gear",

xlab = "Gear",

ylab = "Displacement")

This code creates a boxplot that shows the distribution of engine


displacement (disp) grouped by gear categories (gear) in the mtcars
dataset.

Line-by-Line Explanation

🔹 data(mtcars)

Loads the built-in mtcars dataset.

It contains information about various car attributes such as:

disp = engine displacement (in cubic inches)


gear = number of forward gears (3, 4, or 5)

This creates side-by-side boxplots of the disp variable for each unique
value in gear.

Element Description

disp ~ gear Formula: Plot displacement (disp) grouped by gear

data = mtcars Use the mtcars dataset

main Title of the boxplot

xlab, ylab Axis labels for clarity

What the Boxplot Shows

X-axis (gear): Different gear groups — typically 3, 4, and 5 gears.

Y-axis (disp): The engine displacement (how big the engine is).

Each boxplot summarizes the distribution of displacement for each gear


group.

. Gear = 3 → Highest Displacement

The boxplot shows that cars with 3 forward gears have a higher median
displacement, and the range (box + whiskers) is spread over larger engine
sizes.

This suggests that older or heavier vehicles, such as classic muscle cars or
luxury sedans, often come with:

Fewer gears

Larger engines
Less fuel efficiency

These cars are designed more for power and torque than for speed or
agility.

Gear = 5 → Lower Displacement

Cars with 5 gears show a lower engine displacement, with a tighter spread
and smaller median value.

This typically represents:

Modern, performance-tuned vehicles

Compact sports cars

Fuel-efficient or economy-class cars

These engines may be smaller in size, but paired with more gears for
better performance, speed, and efficiency.

You might also like