Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
7 views3 pages

Data Science Topics Notes

The document outlines key topics in data science, emphasizing the importance of real-world applications and the necessary skill sets such as statistics and programming. It covers statistical inference, exploratory data analysis, machine learning algorithms, and the significance of data wrangling and feature selection. Additionally, it discusses recommendation systems, social network analysis, data visualization principles, and ethical considerations in data science.

Uploaded by

malisenrichard80
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views3 pages

Data Science Topics Notes

The document outlines key topics in data science, emphasizing the importance of real-world applications and the necessary skill sets such as statistics and programming. It covers statistical inference, exploratory data analysis, machine learning algorithms, and the significance of data wrangling and feature selection. Additionally, it discusses recommendation systems, social network analysis, data visualization principles, and ethical considerations in data science.

Uploaded by

malisenrichard80
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Data Science: Topics of Study - Explained Notes

Introduction
- Big Data and Data Science hype and getting past the hype: Data science is often surrounded by

exaggerated expectations. It's important to focus on real-world applications and measurable

outcomes.

- Why now? - Datafication: This refers to the transformation of social action into online quantified

data, enabling real-time tracking and predictive analysis.

- The current landscape of perspectives: Different industries have different perspectives on data

science, ranging from customer analytics to operations and logistics.

- Skill sets needed: Includes statistics, programming (Python/R), data wrangling, machine learning,

and domain knowledge.

Statistical Inference
- Populations and samples: Populations include all members of a defined group; samples are

subsets used for analysis.

- Statistical modelling, probability distributions, fitting a model: These tools help understand

relationships between variables and make predictions.

- Python packages for data science: Common ones include NumPy, pandas, SciPy, scikit-learn, and

statsmodels.

Exploratory Data Analysis and the Data Science Process


- Basic tools of EDA: Includes histograms, boxplots, scatterplots, and summary statistics.

- Philosophy of EDA: Emphasizes understanding data patterns before applying models.

- The Data Science Process: Steps include data collection, cleaning, EDA, modeling, interpretation,

and deployment.

Three Basic Machine Learning Algorithms


- Linear Regression: A method to model the relationship between a dependent variable and one or

more independent variables.

- k-Nearest Neighbors (k-NN): A non-parametric method used for classification and regression by

comparing distances.

- k-means: An unsupervised learning algorithm used for clustering data into k number of groups.

One More Machine Learning Algorithm and Usage in Applications


- Filtering Spam as an application: A common real-world use case of machine learning.

- Why Linear Regression and k-NN are poor for spam filtering: They fail to handle text data and

sparse features efficiently.

- Naive Bayes: Works well for spam filtering by calculating the probability of an email being spam

given the words it contains.

- Data Wrangling: The process of cleaning and unifying complex data sets for easy access and

analysis. APIs and web scraping are often used.

Feature Generation and Feature Selection


- Motivating application: Used in customer retention strategies to identify important factors.

- Feature Generation: Creating new features based on domain knowledge or data transformations.

- Feature Selection: Reducing the number of input variables using techniques like Filters, Wrappers,

Decision Trees, and Random Forests.

Recommendation Systems
- Algorithmic ingredients: Involve collaborative filtering, content-based filtering, and hybrid methods.

- Dimensionality Reduction: Helps reduce data complexity, e.g., using PCA or SVD.

- Singular Value Decomposition (SVD): A mathematical technique for factorizing matrices used in

recommendation engines.

- Principal Component Analysis (PCA): A method to emphasize variation and bring out strong

patterns in a dataset.
Mining Social-Network Graphs
- Social networks as graphs: Representing individuals as nodes and relationships as edges.

- Clustering of graphs: Grouping nodes with similar properties.

- Community discovery: Detecting communities directly within networks.

- Partitioning of graphs: Dividing graphs into parts to simplify analysis.

- Neighbourhood properties: Analyzing a node's local connections.

Data Visualization
- Principles and tools: Includes clarity, accuracy, and use of visualization libraries like Matplotlib,

Seaborn, and Plotly.

- Examples of inspiring projects: Dashboards, storytelling with data, and visual analytics used in

industries.

Data Science and Ethical Issues


- Privacy, security, ethics: Involves protecting data and using it responsibly.

- A look back at Data Science: Reflecting on its evolution and impact.

- Next-generation data scientists: Professionals who are technically strong and ethically aware.

You might also like