Thanks to visit codestin.com
Credit goes to www.scribd.com

Open navigation menu

Scribd

0% found this document useful (0 votes)

13 views5 pages

Data Preprocessing

The document serves as an exam guide for data preprocessing in biomedical analysis, outlining objectives such as locating datasets and using R for data cleaning, normalization, and discretization. It emphasizes the importance of data preprocessing to ensure accuracy and completeness, detailing methods for handling missing values and various normalization techniques. Additionally, it introduces R as a tool for statistical computing and provides resources for setup and usage.

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views5 pages

Data Preprocessing

The document serves as an exam guide for data preprocessing in biomedical analysis, outlining objectives such as locating datasets and using R for data cleaning, normalization, and discretization. It emphasizes the importance of data preprocessing to ensure accuracy and completeness, detailing methods for handling missing values and various normalization techniques. Additionally, it introduces R as a tool for statistical computing and provides resources for setup and usage.

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Data Preprocessing - Exam Guide

MODULE: Preparing Data for Analysis

Learning Objectives

1. Locate and download biomedical/medical datasets.

2. Preprocess data using R.

3. Write R scripts to:

- Replace missing values

- Normalize data

- Discretize data

- Sample data

KEY CONCEPTS & DEFINITIONS

Information

- Anything that changes the uncertainty in a system.

- Technical definition: A stored or transmitted symbol = data.

Variables, Datasets, and Databases

- Variable: Temporary container (e.g., Excel cell)

- File: Persistent data storage (e.g., .csv, .txt)

- Database: Shared collection of logically related persistent data.

Dataset Format
- Rows = Samples (individuals/patients)

- Columns = Variables/Features/Attributes

- Files are often .csv or .txt (delimited)

Data Sources

- Proprietary: EMRs, clinical studies.

- Public: TCGA, ADNI, HRS, UK Biobank, UCI ML Repository, etc.

Importance of Data Preprocessing

> Garbage In, Garbage Out (GIGO)

Fix issues like:

- Missing values

- Noisy/inaccurate data

- Wrong data types

- Incomplete data

Goal: Make data accurate, precise, complete, interpretable, and correct

Data Preprocessing Tasks

1. Data Cleaning: Handling missing/erroneous data

2. Data Transformation: Changing types, normalization, adding vars

3. Data Reduction: Feature selection, sampling

Missing Values

Represented as: Blank, ., n/a, ?

Replacing Missing Values

- Delete row/column

- Replace with constant/statistic/neighbor/likelihood/random

Normalization (Scaling)

Why? Mixed scales distort results

Min-Max Normalization:

val' = (val min) / (max min) * (new_max new_min) + new_min

Z-Score Normalization:

val' = (val mean) / std

Decimal Scaling:

val' = val / 10^n

Comparison of Normalization

- Decimal: Preserves distribution

- Z-Score: Makes data normal

- Min-Max: Flexible range

Discretization

Numeric Nominal

Discretization Methods:

- Manual

- Automatic: Equal-width, Equal-depth, Regression, Clustering

Binning Comparison

- Equal-width: Simple, sensitive to outliers

- Equal-depth: Keeps distribution, less intuitive

Data Reduction

1. Feature Selection: Dimensionality reduction (genes)

2. Sampling: Representative subset

- Simple random (with/without replacement)

- Stratified sampling

Introduction to R Language

- R: Statistical computing, graphics, open-source

- Functions, packages, interpreter, scripts

R Setup & Tools

- Download: https://cran.r-project.org

- Tools: Rgui, RStudio, Notepad++, Jupyter

Using R

- ls(), rm(), q(), summary(), class()

- install.packages(), library()

R Distributions

- Bioconductor, Anaconda (with Jupyter support)

R Preprocessing

- Video tutorial & Jupyter link (from course)

You might also like

50Cc Scooter Ac Ignition System: B G/Y G Y/R BR BR/W B Y BL/W
100% (1)
50Cc Scooter Ac Ignition System: B G/Y G Y/R BR BR/W B Y BL/W
1 page
Wallmart Project Report
79% (24)
Wallmart Project Report
26 pages
GSCH003 - Rev04 24.11.2021
No ratings yet
GSCH003 - Rev04 24.11.2021
55 pages
LEED AP ID+C Candidate Handbook
No ratings yet
LEED AP ID+C Candidate Handbook
32 pages
Model Questions Elasticity
No ratings yet
Model Questions Elasticity
3 pages
R Programming
No ratings yet
R Programming
60 pages
BM1, Applied Statistics, Lesson 1: Data and Graph Basics: Luis Del Peso Ovalle
No ratings yet
BM1, Applied Statistics, Lesson 1: Data and Graph Basics: Luis Del Peso Ovalle
17 pages
Malayala Manorama Company Limited
100% (2)
Malayala Manorama Company Limited
31 pages
Rtips. Revival 2012!: Paul E. Johnson June 8, 2012
No ratings yet
Rtips. Revival 2012!: Paul E. Johnson June 8, 2012
72 pages
R Programming
No ratings yet
R Programming
60 pages
LTE End To End Call Flow: With Logs Using Common Troubleshooting Tools
100% (1)
LTE End To End Call Flow: With Logs Using Common Troubleshooting Tools
132 pages
Lesson 1
No ratings yet
Lesson 1
18 pages
Data Preparation and Cleaning Guide
No ratings yet
Data Preparation and Cleaning Guide
28 pages
Smith & Wesson 2013 Catalog
100% (2)
Smith & Wesson 2013 Catalog
75 pages
Normalization 1
No ratings yet
Normalization 1
23 pages
Analysis Using Statistical: Introduction & Data Exploration
No ratings yet
Analysis Using Statistical: Introduction & Data Exploration
23 pages
LP-3 (Information & Cyber Security) Lab Manual 2021-22
No ratings yet
LP-3 (Information & Cyber Security) Lab Manual 2021-22
37 pages
Teaching Notes of R
No ratings yet
Teaching Notes of R
78 pages
Heavy Vehicle Tire Safety Guide
No ratings yet
Heavy Vehicle Tire Safety Guide
12 pages
Module 2
No ratings yet
Module 2
55 pages
AVR128DA28 32 48 64 Data Sheet 40002183C
No ratings yet
AVR128DA28 32 48 64 Data Sheet 40002183C
684 pages
Agust 21
No ratings yet
Agust 21
8 pages
Introduction To R For Gene Expression Data Analysis
No ratings yet
Introduction To R For Gene Expression Data Analysis
11 pages
Data Analysis with R for Beginners
No ratings yet
Data Analysis with R for Beginners
4 pages
Regulatory Environment For Food and Beverage in Brazil
No ratings yet
Regulatory Environment For Food and Beverage in Brazil
12 pages
Unit - I: Topic - 1
No ratings yet
Unit - I: Topic - 1
13 pages
A Short List of The Most Useful R Commands
No ratings yet
A Short List of The Most Useful R Commands
8 pages
Module 2
No ratings yet
Module 2
84 pages
Basics: TH TH TH TH TH TH TH
No ratings yet
Basics: TH TH TH TH TH TH TH
3 pages
R Syntax Examples 1
No ratings yet
R Syntax Examples 1
6 pages
Stack by Linked List (By C++) : #Include
No ratings yet
Stack by Linked List (By C++) : #Include
4 pages
22bce1859 Rprogramming
No ratings yet
22bce1859 Rprogramming
29 pages
Essential R Commands Guide
No ratings yet
Essential R Commands Guide
11 pages
A Short List of Some Useful R Commands: Input and Display
No ratings yet
A Short List of Some Useful R Commands: Input and Display
2 pages
R Tutorial #1: Applied Econometrics (Econ3005)
No ratings yet
R Tutorial #1: Applied Econometrics (Econ3005)
21 pages
R Commands: Appendix B
No ratings yet
R Commands: Appendix B
5 pages
Chuks
No ratings yet
Chuks
4 pages
Business Plan: "A Lamp That Will Make Your Future Brighter
No ratings yet
Business Plan: "A Lamp That Will Make Your Future Brighter
19 pages
Medical Students' Guide to Statistics
No ratings yet
Medical Students' Guide to Statistics
67 pages
Employee Rights in Bereavement Cases
No ratings yet
Employee Rights in Bereavement Cases
1 page
R Studio: Scripts, Data Handling & Cleaning
No ratings yet
R Studio: Scripts, Data Handling & Cleaning
25 pages
RBasics Handout
No ratings yet
RBasics Handout
6 pages
STATA - Subject Table of Contents
No ratings yet
STATA - Subject Table of Contents
15 pages
Corporate Banking Analysis Guide
No ratings yet
Corporate Banking Analysis Guide
38 pages
Import Java - Util.Scanner Import Java - Text.Decimalformat Public Class Javaapplication4 (
No ratings yet
Import Java - Util.Scanner Import Java - Text.Decimalformat Public Class Javaapplication4 (
1 page
Flexitallic Flexpro Brochure 11-30-2017
No ratings yet
Flexitallic Flexpro Brochure 11-30-2017
8 pages
MAEF636850781708236636 EOI Seekho Aur Kamao 18-19
No ratings yet
MAEF636850781708236636 EOI Seekho Aur Kamao 18-19
13 pages
R-Programming Lab Mannual
No ratings yet
R-Programming Lab Mannual
33 pages
Data Preprocessing: Essential Steps For Preparing Data Before Modeling
No ratings yet
Data Preprocessing: Essential Steps For Preparing Data Before Modeling
111 pages
APPROVED Vendor Pending List
No ratings yet
APPROVED Vendor Pending List
177 pages
Lec7 8
No ratings yet
Lec7 8
28 pages
Lab0 R Tutorial EHS
No ratings yet
Lab0 R Tutorial EHS
9 pages
Introduction to R Programming
100% (8)
Introduction to R Programming
60 pages
UL2
No ratings yet
UL2
2 pages
Pre Processing
No ratings yet
Pre Processing
66 pages
2022 Q1 OKR Update Supply Chain Tech Update
No ratings yet
2022 Q1 OKR Update Supply Chain Tech Update
5 pages
BAN5
No ratings yet
BAN5
2 pages
Data Science Practicals
No ratings yet
Data Science Practicals
40 pages
An R Tutorial Starting Out
No ratings yet
An R Tutorial Starting Out
9 pages
Lab 1 Manual - Introduction To R
No ratings yet
Lab 1 Manual - Introduction To R
7 pages
Empirical Guidance
No ratings yet
Empirical Guidance
38 pages
Da Lab File 2
No ratings yet
Da Lab File 2
13 pages
Lecture 1
No ratings yet
Lecture 1
35 pages
R1 Uptovisualisation
No ratings yet
R1 Uptovisualisation
122 pages
Valve Packing Standards Guide
No ratings yet
Valve Packing Standards Guide
9 pages
Data Analytics Lab Manual Using R Programming
No ratings yet
Data Analytics Lab Manual Using R Programming
27 pages
Business Analytics - L2
No ratings yet
Business Analytics - L2
41 pages
1 Asdfadgaf
No ratings yet
1 Asdfadgaf
8 pages
Victoria Adaugo Onyekwere - 8109678605 - 20250102202313
No ratings yet
Victoria Adaugo Onyekwere - 8109678605 - 20250102202313
43 pages
Data - Analysis - With - R - 24
No ratings yet
Data - Analysis - With - R - 24
47 pages
R For Health Data Science 1st Edition Complete Volume Download
No ratings yet
R For Health Data Science 1st Edition Complete Volume Download
15 pages
UOP Alkylation Technologies Overview
No ratings yet
UOP Alkylation Technologies Overview
1 page
DCFC Exam Dumps
No ratings yet
DCFC Exam Dumps
3 pages
Data Pre Processing
No ratings yet
Data Pre Processing
66 pages
How to Stop Sending Money to Girls
No ratings yet
How to Stop Sending Money to Girls
1 page
Samarthvresume 21
No ratings yet
Samarthvresume 21
2 pages
Notation
No ratings yet
Notation
9 pages
Data Preprocessing PT 2
No ratings yet
Data Preprocessing PT 2
7 pages
Data Analytic R
No ratings yet
Data Analytic R
28 pages
SELF GUIDE B in R
No ratings yet
SELF GUIDE B in R
4 pages
Part 5
No ratings yet
Part 5
22 pages
Chapter 2
No ratings yet
Chapter 2
37 pages
Lecture 1
No ratings yet
Lecture 1
37 pages