-
Data Summarization dataset: breast-cancer-wisconsin code: calculate the mean,mode skew, standard deviation, variation of each column && PCC of several pairs of columns && draw histograms of two features
-
Data Normalization dataset:wine-quality.csv code: normalize data with min-max normalized values, z-score normalized values, mean subtracted normalized values && for each of the first 10 data points report the nearest and farthest out of the other first 10 points using the manhatten distance, euclidean distance and cosine distance
-
Data Decompostion dataset:communities.data code:Load the crime dataset and store it as a matrix (The data is already normalized )
&& Compute the eigen vectors and eigen values && Report a table with the top 20 eigenvalues, decide whether we can cut off the dimensions from these eigen values
s6lian/Python-Data-Analysis
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|