0% found this document useful (0 votes)

137 views12 pages

Uci Dataset

The document provides an overview of various medical conditions, datasets, and their characteristics, including hepatitis, breast cancer, and lung cancer, as well as multiple datasets related to health, marketing, and environmental factors. Each dataset is described with its size, features, and target variables, covering topics from bike sharing to student performance. Additionally, it highlights the importance of these datasets for machine learning and data analysis.

Uploaded by

Himanshu Harsh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

137 views12 pages

Uci Dataset

Uploaded by

Himanshu Harsh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

UCI

Hepatitis
Hepatitis means inflammation of the liver. The liver is a vital organ
that processes nutrients, filters the blood, and fights infections.
When the liver is inflamed or damaged, its function can be affected.
Heavy alcohol use, toxins, some medications, and certain medical
conditions can cause hepatitis. However, hepatitis is often caused by
a virus. In the United States, the most common types of viral
hepatitis are hepatitis A, hepatitis B, and hepatitis C.

Breast cancer
Cancer that forms in tissues of the breast. The most common type of
breast cancer is ductal carcinoma, which begins in the lining of the
milk ducts (thin tubes that carry milk from the lobules of the breast
to the nipple). Another type of breast cancer is lobular carcinoma,
which begins in the lobules (milk glands) of the breast. Invasive
breast cancer is breast cancer that has spread from where it began in
the breast ducts or lobules to surrounding normal tissue. Breast
cancer occurs in both men and women, although male breast cancer
is rare.
https://archive.ics.uci.edu/ml/datasets/Breast+Cancer

Statlog (Heart)
The Statlog (Heart) dataset is a heart disease database containing
270 instances that consist of 13 attributes: age, sex, chest pain type
(4 values), resting blood pressure, serum cholesterol in mg/dL,
fasting blood sugar > 120 mg/dL, resting electrocardiographic results
(values 0, 1, and 2), maximum heart rate.
https://archive.ics.uci.edu/ml/datasets/Statlog+%28Heart%29
Parkinsons
Parkinson’s disease is a brain disorder that causes unintended or
uncontrollable movements, such as shaking, stiffness, and difficulty
with balance and coordination.
https://archive.ics.uci.edu/ml/datasets/Parkinsons

Lung cancer
Lung cancer is a type of cancer that begins in the lungs. Your lungs
are two spongy organs in your chest that take in oxygen when you
inhale and release carbon dioxide when you exhale. Lung cancer is
the leading cause of cancer deaths worldwide.
https://archive.ics.uci.edu/ml/datasets/Lung+Cancer

Blood-transfusion
A blood transfusion is a common procedure in which donated
blood or blood components are given to you through an
intravenous line (IV). A blood transfusion is given to replace blood
and blood components that may be too low.

https://archive.ics.uci.edu/ml/datasets/Blood+Transfusion+Service+C
enter
Amazon Commerce reviews set Data Set
Dataset are derived from the customersa reviews in Amazon
Commerce Website for authorship identification. Most previous
studies conducted the identification experiments for two to ten
authors. But in the online context, reviews to be identified usually
have more potential authors, and normally classification algorithms
are not adapted to large number of target classes. To examine the
robustness of classification algorithms, we identified 50 of the most
active users (represented by a unique ID and username) who
frequently posted reviews in these newsgroups. The number of
reviews we collected for each author is 30.

Bank Marketing Data Set

The data is related with direct marketing campaigns of a Portuguese
banking institution. The marketing campaigns were based on phone
calls. Often, more than one contact to the same client was required,
in order to access if the product (bank term deposit) would be ('yes')
or not ('no') subscribed.

There are four datasets:

1) bank-additional-full.csv with all examples (41188) and 20 inputs,
ordered by date (from May 2008 to November 2010), very close to
the data analyzed in [Moro et al., 2014]
2) bank-additional.csv with 10% of the examples (4119), randomly
selected from 1), and 20 inputs.
3) bank-full.csv with all examples and 17 inputs, ordered by date
(older version of this dataset with less inputs).
4) bank.csv with 10% of the examples and 17 inputs, randomly
selected from 3 (older version of this dataset with less inputs).
The smallest datasets are provided to test more computationally
demanding machine learning algorithms (e.g., SVM).

Fertility Data Set

Fertility is the ability to conceive a child. The fertility rate is
the average number of children born during an individuals
lifetime and is quantified demographically.
Conversely, infertility is the difficulty or inability
to reproduce naturally. In general, infertility is defined as not
being able to conceive a child after one year (or longer)
of unprotected sex [1]. Infertility is widespread, with fertility
specialists available all over the world to assist parents and
couples who experience difficulties conceiving a baby.

Wine Dataset
Contains the results of a chemical analysis of wines grown in a
particular region in Italy. The dataset contains 178 samples,
with each sample representing one wine. Each sample
contains 13 features, including measurements of alcohol
content, acidity, and color intensity. The target variable is the
type of wine, with three possible values: class 1, class 2, and
class 3.

Car Evaluation Dataset

Contains data on cars and their features, along with
evaluations from experts. The dataset contains 1,728
samples, with each sample representing one car. Each
sample contains six features, including the price,
maintenance cost, and number of doors. The target variable
is the evaluation of the car, with four possible values: unacc
(unacceptable), acc (acceptable), good, and vgood (very
good).

Diabetes Dataset
Contains data on patients with diabetes and their health metrics. The
dataset contains 768 samples, with each sample representing one
patient. Each sample contains eight features, including age, body
mass index, and blood pressure. The target variable is whether the
patient has diabetes, with two possible values: yes or no.

Titanic Dataset
Contains data on passengers aboard the Titanic, including whether
they survived. The dataset contains 891 samples, with each sample
representing one passenger. Each sample contains 12 features,
including age, sex, and ticket class. The target variable is whether the
passenger survived, with two possible values: yes or no.

Abalone Dataset
Contains data on the age, gender, and physical measurements of
abalone snails. The dataset contains 4,177 samples, with each
sample representing one abalone snail. Each sample contains eight
features, including the length, diameter, and weight of the snail. The
target variable is the age of the snail, which is a continuous value.

Forest Fires Dataset

Contains data on the spatial location and various metrics of forest
fires. The dataset contains 517 samples, with each sample
representing one forest fire. Each sample contains 12 features,
including the month, day, and area of the fire. The target variable is
the burned area of the forest (in hectares), which is a continuous
value.

Seeds Dataset
Contains data on three different varieties of wheat seeds. The
dataset contains 210 samples, with each sample representing one
wheat seed. Each sample contains seven features, including
measurements of the area, perimeter, and compactness of the seed.
The target variable is the variety of the wheat seed, with three
possible values: Kama, Rosa, and Canadian.

Abalone Dataset
Contains data on the physical characteristics of abalone, a type of
shellfish. The dataset contains 4,177 samples, with each sample
representing one abalone. Each sample contains eight features,
including measurements of the length, diameter, and weight of the
abalone. The target variable is the age of the abalone, which is a
continuous value.

Bike Sharing Dataset

Contains data on bike rentals, including various weather and
seasonal factors. The dataset contains 17,379 samples, with each
sample representing one hour of bike rentals. Each sample contains
16 features, including the temperature, humidity, and wind speed.
The target variable is the number of bike rentals, which is a
continuous value.

Letter Recognition Dataset

Contains data on the recognition of capital letters. The dataset
contains 20,000 samples, with each sample representing one letter.
Each sample contains 16 features, including measurements of the
diagonal length and the width of the letter. The target variable is the
letter that was recognized, with 26 possible values: A to Z.

Superconductivity Dataset
Contains data on the critical temperature of superconductors, based
on various material properties. The dataset contains 21,263 samples,
with each sample representing one superconductor. Each sample
contains 81 features, including measurements of the atomic mass
and electronegativity. The target variable is the critical temperature
of the superconductor, which is a continuous value.

Dermatology Dataset
Contains data on the diagnosis of various skin diseases. The dataset
contains 366 samples, with each sample representing one patient.
Each sample contains 34 features, including the age, sex, and various
skin lesion features. The target variable is the diagnosis of the skin
disease, with six possible values: psoriasis, seboreic dermatitis, lichen
planus, pityriasis rosea, cronic dermatitis, and pityriasis rubra pilaris.

Gas Sensor Array Drift Dataset

Contains data on the drift behavior of gas sensor arrays, based on
various concentration levels of gas mixtures. The dataset contains
13,910 samples, with each sample representing one gas sensor array
measurement. Each sample contains 128 features, including
measurements of the response of the sensor array to different gases.
The target variable is the concentration level of the gas mixture,
which is a continuous value.

Gesture Recognition Dataset

Contains data on the recognition of hand gestures, captured using a
Kinect sensor. The dataset contains 8,080 samples, with each sample
representing one gesture. Each sample contains 20 features,
including measurements of the position and velocity of the hand. The
target variable is the type of gesture, with five possible values: swipe
left, swipe right, wave, clap, and arm cross.

Covertype Dataset
Contains data on predicting forest cover type based on various
cartographic variables. The dataset contains 581,012 samples, with
each sample representing one 30m x 30m patch of forest land. Each
sample contains 54 features, including measurements of elevation,
slope, and distance to water. The target variable is the forest cover
type, with seven possible values: spruce/fir, lodgepole pine,
ponderosa pine, cottonwood/willow, aspen, douglas fir, or
krummholz.

Credit Approval Dataset

Contains data on credit card applications, with a focus on approving
or rejecting the applications. The dataset contains 690 samples, with
each sample representing one credit card application. Each sample
contains 15 features, including the age, income, and employment
status of the applicant. The target variable is whether or not the
application was approved, with two possible values: + (approved) or -
(rejected).

Human Activity Recognition Using

Smartphones Dataset
Contains data on the recognition of human activities using data from
smartphones. The dataset contains 10,299 samples, with each
sample representing one 2.56-second window of data. Each sample
contains 561 features, including measurements of the accelerometer
and gyroscope readings from the smartphone. The target variable is
the type of activity, with six possible values: walking, walking
upstairs, walking downstairs, sitting, standing, and laying.

Mushroom Dataset
Contains data on classifying mushrooms as edible or poisonous,
based on various physical characteristics. The dataset contains 8,124
samples, with each sample representing one mushroom. Each
sample contains 22 features, including measurements of the cap
shape, color, and odor. The target variable is the edibility of the
mushroom, with two possible values: edible or poisonous.

Student Performance Dataset

Contains data on predicting student performance in math and
Portuguese language classes, based on various personal, social, and
school-related factors. The dataset contains 649 samples, with each
sample representing one student. Each sample contains 30 features,
including measurements of the student's age, family background,
and study habits. The target variable is the final grade in the class,
with values ranging from 0 to 20.

Car Evaluation Dataset

Contains data on evaluating the acceptability of cars based on various
attributes. The dataset contains 1,728 samples, with each sample
representing one car. Each sample contains six features, including
measurements of the buying price, maintenance price, and number of
doors. The target variable is the car's acceptability, with four possible
values: unacceptable, acceptable, good, or very good.

Climate Model Simulation Crashes Datase

Contains data on predicting the likelihood of a climate model
simulation crashing, based on various performance metrics. The
dataset contains 54,000 samples, with each sample representing one
simulation. Each sample contains 18 features, including
measurements of the simulation's runtime, memory usage, and CPU
utilization. The target variable is the probability of a crash, with
values ranging from 0 to 1.

Energy Efficiency Dataset

Contains data on predicting the energy efficiency of buildings, based
on various building and environmental characteristics. The dataset
contains 768 samples, with each sample representing one building.
Each sample contains eight features, including measurements of the
building's surface area, roof area, and overall height. The target
variable is the heating load and cooling load, with values ranging
from 0 to 43.1 and 0 to 48.03, respectively.

DNA and The Genome Separate
No ratings yet
DNA and The Genome Separate
3 pages
Safe Practices For Direct Client Care
63% (16)
Safe Practices For Direct Client Care
71 pages
TP ComparacaoClassificadores
No ratings yet
TP ComparacaoClassificadores
3 pages
1st Lecture of Respiratory Histology by DR Roomi
100% (1)
1st Lecture of Respiratory Histology by DR Roomi
24 pages
Cleaning Validation Criteria Guide
No ratings yet
Cleaning Validation Criteria Guide
6 pages
Human Values Unit II Harmony in Self5
50% (4)
Human Values Unit II Harmony in Self5
18 pages
Phylum Annelida (Newer Annelid)
No ratings yet
Phylum Annelida (Newer Annelid)
10 pages
Section 051 Restorative Contours
No ratings yet
Section 051 Restorative Contours
20 pages
Creating Artificial Reefs
No ratings yet
Creating Artificial Reefs
4 pages
Iotc 2012 Wpeb08 30
No ratings yet
Iotc 2012 Wpeb08 30
14 pages
Enviromental Microbiology
No ratings yet
Enviromental Microbiology
3 pages
Test 2
No ratings yet
Test 2
2 pages
Lesson 1
No ratings yet
Lesson 1
14 pages
Balanta de Verificare Luna Decembrie 2018: Clasa 1
No ratings yet
Balanta de Verificare Luna Decembrie 2018: Clasa 1
5 pages
O Level Biology Exam Paper 5090/01
No ratings yet
O Level Biology Exam Paper 5090/01
16 pages
BIODIVERSITY
No ratings yet
BIODIVERSITY
8 pages
Data Mining Final Report
100% (1)
Data Mining Final Report
44 pages
Pros and Cons of Existing Biomarkers For Cirrhosis of Liver
No ratings yet
Pros and Cons of Existing Biomarkers For Cirrhosis of Liver
10 pages
Module 5 Transport Mechanisms
No ratings yet
Module 5 Transport Mechanisms
14 pages
Survey of The 2009 Commercial Optical Biosensor Literature
No ratings yet
Survey of The 2009 Commercial Optical Biosensor Literature
23 pages
Chapter 223689111315171920212325 and 26
100% (1)
Chapter 223689111315171920212325 and 26
8 pages
Data Mining - Fuzzy Neural Genetic Algorithm in Predicting Diabetes
No ratings yet
Data Mining - Fuzzy Neural Genetic Algorithm in Predicting Diabetes
5 pages
Cancerdiscover: An Integrative Pipeline For Cancer Biomarker and Cancer Class Prediction From High-Throughput Sequencing Data
No ratings yet
Cancerdiscover: An Integrative Pipeline For Cancer Biomarker and Cancer Class Prediction From High-Throughput Sequencing Data
9 pages
Ijsetr Vol 3 Issue 1-94-99
No ratings yet
Ijsetr Vol 3 Issue 1-94-99
6 pages
Journal of Infection and Public Health: Tanzila Saba
No ratings yet
Journal of Infection and Public Health: Tanzila Saba
16 pages
Inteligencia Artificial
No ratings yet
Inteligencia Artificial
15 pages
Support Vector Machine: Machine Learning Approach in Healthcare
No ratings yet
Support Vector Machine: Machine Learning Approach in Healthcare
5 pages
Indian Vulture
No ratings yet
Indian Vulture
8 pages
Final Big Data
No ratings yet
Final Big Data
23 pages
Parul Institute of Engineering and Technology Faculty of Engineering and Technology Department of Information Technology
No ratings yet
Parul Institute of Engineering and Technology Faculty of Engineering and Technology Department of Information Technology
15 pages
Breast Cancer Detection Using SVM Classifier With Grid Search Technique
No ratings yet
Breast Cancer Detection Using SVM Classifier With Grid Search Technique
6 pages
LITERATUR Digabungkan Fiks
No ratings yet
LITERATUR Digabungkan Fiks
41 pages
Smart Health Prediction System
No ratings yet
Smart Health Prediction System
5 pages
Report
No ratings yet
Report
5 pages
Heart Disease Prediction Using Machine Learning
No ratings yet
Heart Disease Prediction Using Machine Learning
7 pages
Final Report
No ratings yet
Final Report
26 pages
HSC Biology 1st Paper Note 1st Chapter Cell and Its Structure
No ratings yet
HSC Biology 1st Paper Note 1st Chapter Cell and Its Structure
27 pages
Breast Cancer ML Model Guide
No ratings yet
Breast Cancer ML Model Guide
12 pages
Biostatistics
100% (1)
Biostatistics
16 pages
Sat - 9.Pdf - Predicting Liver Failure Using Supervised Machine Learning Approach
No ratings yet
Sat - 9.Pdf - Predicting Liver Failure Using Supervised Machine Learning Approach
11 pages
Tutorial 5 - Calculating Mean, Standard Deviation, Frequencies
No ratings yet
Tutorial 5 - Calculating Mean, Standard Deviation, Frequencies
6 pages
Clustering On Breast Cancer Wisconsin
No ratings yet
Clustering On Breast Cancer Wisconsin
7 pages
Deep Learning in Cancer Diagnostics: A Feature-Based Transfer Learning Evaluation
No ratings yet
Deep Learning in Cancer Diagnostics: A Feature-Based Transfer Learning Evaluation
41 pages
Data Management in Healthcare Final
No ratings yet
Data Management in Healthcare Final
25 pages
Breast Cancer Diagnosis Using Machine Learning Alg
No ratings yet
Breast Cancer Diagnosis Using Machine Learning Alg
13 pages
Liver Disease Prediction Using Machine Learning
0% (1)
Liver Disease Prediction Using Machine Learning
5 pages
Pattern Recognition Project Ideas
No ratings yet
Pattern Recognition Project Ideas
8 pages
Animal Movements and Disease Risk
No ratings yet
Animal Movements and Disease Risk
233 pages
Big Data Resit Assignment
No ratings yet
Big Data Resit Assignment
22 pages
Diabetes, Life Course and Childhood Socioeconomic Conditions.
No ratings yet
Diabetes, Life Course and Childhood Socioeconomic Conditions.
12 pages
Analysis The Biomedical Datasets CSV File
No ratings yet
Analysis The Biomedical Datasets CSV File
12 pages
Machine Learning Algorithm Early Detecti
No ratings yet
Machine Learning Algorithm Early Detecti
6 pages
Breast Cancer Classification
No ratings yet
Breast Cancer Classification
5 pages
Full Palaeopathology and Evolutionary Medicine: An Integrated Approach Kimberly A Plomp PDF All Chapters
100% (5)
Full Palaeopathology and Evolutionary Medicine: An Integrated Approach Kimberly A Plomp PDF All Chapters
57 pages
Ucs551 GRP Project
No ratings yet
Ucs551 GRP Project
34 pages
Summary of The Datasets
No ratings yet
Summary of The Datasets
6 pages
FT-04 (E) RM (P-5) 08-01-2025 Sol Unlocked
No ratings yet
FT-04 (E) RM (P-5) 08-01-2025 Sol Unlocked
23 pages
Cam 15 Test 1 Listening
No ratings yet
Cam 15 Test 1 Listening
7 pages
Diabetes & Other Disease - Dataset - Description
No ratings yet
Diabetes & Other Disease - Dataset - Description
2 pages
Disease Detection with Django & ML
No ratings yet
Disease Detection with Django & ML
9 pages
Sample Document Paper
No ratings yet
Sample Document Paper
17 pages
Day 3 Module2
No ratings yet
Day 3 Module2
9 pages
Lesson6 - 1 The Battle For Biotech (p17-23)
No ratings yet
Lesson6 - 1 The Battle For Biotech (p17-23)
7 pages
End - Term - Revision - Worksheet-2 Grade 9
No ratings yet
End - Term - Revision - Worksheet-2 Grade 9
6 pages
ACMT 311 Assignment
No ratings yet
ACMT 311 Assignment
6 pages
2023 LLCM Mathematics Pii
No ratings yet
2023 LLCM Mathematics Pii
15 pages
File Course Module in Biostatistics
No ratings yet
File Course Module in Biostatistics
203 pages
Data Types for Aspiring Data Scientists
No ratings yet
Data Types for Aspiring Data Scientists
14 pages
Lu Et Al., 2022
No ratings yet
Lu Et Al., 2022
7 pages
02 Datasets in R
No ratings yet
02 Datasets in R
6 pages
Data Mining System Oriented To Populatio
No ratings yet
Data Mining System Oriented To Populatio
4 pages
Towards A Disease Prediction System: Biobert-Based Medical Profile Representation
No ratings yet
Towards A Disease Prediction System: Biobert-Based Medical Profile Representation
9 pages
ML Techniques in Breast Cancer
No ratings yet
ML Techniques in Breast Cancer
44 pages
Epidemiology With R Full Ebook Access
100% (8)
Epidemiology With R Full Ebook Access
16 pages
2409.03697v1 Check
No ratings yet
2409.03697v1 Check
10 pages
Research Paper
No ratings yet
Research Paper
4 pages
Liver Disease Prediction Using Ensemble Technique
No ratings yet
Liver Disease Prediction Using Ensemble Technique
4 pages
Mstate
No ratings yet
Mstate
48 pages
Research Statement Eugene Katsevich
No ratings yet
Research Statement Eugene Katsevich
6 pages
Datasets Problem Statements (1) Chits
No ratings yet
Datasets Problem Statements (1) Chits
9 pages
Programming For Data Analytics
No ratings yet
Programming For Data Analytics
27 pages
ML-UNIT - I - Part B
No ratings yet
ML-UNIT - I - Part B
38 pages
SDO Quiz
No ratings yet
SDO Quiz
40 pages
Medical Biostatistics For Complex Diseases 1st Edition Frank Emmert-Streib Sample
100% (2)
Medical Biostatistics For Complex Diseases 1st Edition Frank Emmert-Streib Sample
85 pages
Santo Paper
No ratings yet
Santo Paper
13 pages
Medical Biostatistics For Complex Diseases 1st Edition Frank Emmert-Streib Online Reading
No ratings yet
Medical Biostatistics For Complex Diseases 1st Edition Frank Emmert-Streib Online Reading
128 pages

Uci Dataset

Uploaded by

Uci Dataset

Uploaded by

UCI

Bank Marketing Data Set

There are four datasets:

Fertility Data Set

Car Evaluation Dataset

Forest Fires Dataset

Bike Sharing Dataset

Letter Recognition Dataset

Gas Sensor Array Drift Dataset

Gesture Recognition Dataset

Credit Approval Dataset

Human Activity Recognition Using

Student Performance Dataset

Car Evaluation Dataset

Climate Model Simulation Crashes Datase

Energy Efficiency Dataset

You might also like