Thanks to visit codestin.com
Credit goes to github.com

Skip to content

tunlusoy/Data-Science-Books

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Science Books

Variety of curated data science books

Books for Machine Learning and Data Science

6 Free Must-Read Books for Machine Learning and Data Science

  • Probabilistic Programming & Bayesian Methods for Hackers An intro to Bayesian methods and probabilistic programming from a computation/understanding-first, mathematics-second point of view. The Bayesian method is the natural approach to inference, yet it is hidden from readers behind chapters of slow, mathematical analysis. The typical text on Bayesian inference involves two to three chapters on probability theory, then enters what Bayesian inference is. Unfortunately, due to mathematical intractability of most Bayesian models, the reader is only shown simple, artificial examples. This can leave the user with a so-what feeling about Bayesian inference. In fact, this was the author's own prior opinion.

  • Understanding Machine Learning: From Theory to Algorithms Machine learning is one of the fastest growing areas of computer science, with far-reaching applications. The aim of this textbook is to introduce machine learning, and the algorithmic paradigms it offers, in a principled way. The book provides a theoretical account of the fundamentals underlying machine learning and the mathematical derivations that transform these principles into practical algorithms. Following a presentation of the basics, the book covers a wide array of central topics unaddressed by previous textbooks. These include a discussion of the computational complexity of learning and the concepts of convexity and stability; important algorithmic paradigms including stochastic gradient descent, neural networks, and structured output learning; and emerging theoretical concepts such as the PAC-Bayes approach and compression-based bounds.

  • An Introduction to Statistical Learning with Applications in R This book provides an introduction to statistical learning methods. It is aimed for upper level undergraduate students, masters students and Ph.D. students in the non-mathematical sciences. The book also contains a number of R labs with detailed explanations on how to implement the various methods in real life settings, and should be a valuable resource for a practicing data scientist.

  • Foundations of Data Science While traditional areas of computer science remain highly important, increasingly researchers of the future will be involved with using computers to understand and extract usable information from massive data arising in applications, not just how to make computers useful on specific well-defined problems. With this in mind we have written this book to cover the theory likely to be useful in the next 40 years, just as an understanding of automata theory, algorithms, and related topics gave students an advantage in the last 40 years.

  • A Programmer's Guide to Data Mining: The Ancient Art of the Numerati This guide follows a learn-by-doing approach. Instead of passively reading the book, I encourage you to work through the exercises and experiment with the Python code I provide. I hope you will be actively involved in trying out and programming data mining techniques. The textbook is laid out as a series of small steps that build on each other until, by the time you complete the book, you have laid the foundation for understanding data mining techniques.

  • Forecasting: Principles and Practices -2nd Edition: This textbook is intended to provide a comprehensive introduction to forecasting methods using R and to present enough information about each method for readers to be able to use them sensibly. We don’t attempt to give a thorough discussion of the theoretical details behind each method, although the references at the end of each chapter will fill in many of those details.

Top 10 Essential Books for the Data Enthusiast

  1. Data Science

    • Top Paid Recommendation: Data Science for Business

      When trying to learn about a new field, one of the most common difficulties is to find books (and other materials) that have the right "depth". All too often one ends up with either a friendly but largely useless book that oversimplifies or a heavy academic tome that, though authoritative and comprehensive, is condemned to sit gathering dust in one's shelves. "Data Science for Business" gets it just right.

    • Top Free Recommendation: The Art of Data Science

      This book describes the process of analyzing data in simple and general terms. The authors have extensive experience both managing data analysts and conducting their own data analyses, and this book is a distillation of their experience in a format that is applicable to both practitioners and managers in data science.

  2. Big Data

    • Top Paid Recommendation: Big Data: Principles and Best Practices of Scalable Realtime Data Systems

      I have rarely seen a thorough discussion of the importance of data modelling, data layers, data processing requirements analysis, and data architecture and storage implementation issues (along with other "traditional" database concepts) in the context of big data. This book delivers a refreshing comprehensive solution to that deficiency. - Kirk D. Borne, Amazon Review

    • Top Free Recommendation: Big Data Now: 2016 Edition

      In the four years that O’Reilly has produced its annual Big Data Now report, the data field has grown from infancy into young adulthood. Data is now a leader in some fields and a driver of innovation in others, and companies that use data and analytics to drive decision-making are outperforming their peers.

  3. Apache Hadoop

    • Top Paid Recommendation: Hadoop: The Definitive Guide

      I appreciate that this book covers high-level concepts as well as dives deep into the technical details that you will need to know for the design, implementation and day-to-day running of Hadoop and its various associated technologies. - Al Gordon, Amazon Review

    • Top Free Recommendation: Hadoop Explained

      Hadoop is one of the most important technologies in a world that is built on data. Find out how it has developed and progressed to address the continuing challenge of Big Data with this insightful guide.

  4. Apache Spark

    • Top Paid Recommendation: Learning Spark

      The information that is available on the Internet is great, but this book brings much of it together in one place. If you want to learn to think like a Spark programmer--not the same as thinking like a programmer--this is the place to begin. - Brian Castelli, Amazon Review

    • Top Free Recommendation: Mastering Apache Spark

      This collections of notes (what some may rashly call a "book") serves as the ultimate place of mine to collect all the nuts and bolts of using Apache Spark. The notes aim to help me designing and developing better products with Spark.

  5. Theoretical Machine Learning

    • Top Paid Recommendation: Pattern Recognition and Machine Learning

      The author is an expert, this is evidenced by the excellent insights he gives into the complex math behind the machine learning algorithms. I have worked for quite some time with neural networks and have had coursework in linear algebra, probability and regression analysis, and found some of the stuff in the book quite illuminating. - Sidhant, Amazon Review

    • Top Free Recommendation: Elements of Statistical Learning

      This book descibes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It should be a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting--the first comprehensive treatment of this topic in any book. The good news is, this is pretty much the most important book you are going to read in the space. It will tie everything together for you in a way that I haven't seen any other book attempt. - Enceladus Transit, Amazon Review

  6. Practical Machine Learning

    • Top Paid Recommendation: Python Machine Learning

      This is a fantastic book, even for a relative beginner to machine learning such as myself. The first thing that comes to mind after reading this book is that it was the perfect blend (for me at least) of theory and practice, as well as breadth and depth. - Brian M. Thomas, Amazon Review

    • Top Free Recommendation: An Introduction to Statistical Learning with Applications in R

      This book provides an introduction to statistical learning methods. It is aimed for upper level undergraduate students, masters students and Ph.D. students in the non-mathematical sciences. The book also contains a number of R labs with detailed explanations on how to implement the various methods in real life settings, and should be a valuable resource for a practicing data scientist.

  7. Deep Learning

    • Top Paid Recommendation:

    • Top Free Recommendation #1: Neural Networks and Deep Learning

      Neural Networks and Deep Learning is a free online book. The book will teach you about: -- Neural networks, a beautiful biologically-inspired programming paradigm which enables a computer to learn from observational data -- Deep learning, a powerful set of techniques for learning in neural networks

    • Top Free Recommendation #2: Deep Learning: The Deep Learning textbook is a resource intended to help students and practitioners enter the field of machine learning in general and deep learning in particular. The online version of the book is now complete and will remain available online for free. Likely to-be definitive deep learning book of the near future, written by Ian Goodfellow, Yoshua Bengio, and Aaron Courville.

    • Top Free Recommendation #3: Deep Learning Tutorial: Developed by LISA lab at University of Montreal, this free and concise tutorial presented in the form of a book explores the basics of machine learning. The book emphasizes with using the Theano library (developed originally by the university itself) for creating deep learning models in Python.

    • Top Free Recommendation #4: Deep Learning: Methods and Applications: This book provides an overview of general deep learning methodology and its applications to a variety of signal and information processing tasks.

    • Top Free Recommendation #5: First Contact with TensorFlow, get started with Deep Learning Programming: This book is oriented to engineers with only some basic understanding of Machine Learning who want to expand their wisdom in the exciting world of Deep Learning with a hands-on approach that uses TensorFlow.

    • Top Free Recommendation #6: A Brief Introduction to Neural Networks: This title covers Neural networks in depth. Neural networks are a bio-inspired mechanism of data processing, that enables computers to learn technically similar to a brain and even generalize once solutions to enough problem instances are taught. Available in English and German.

    • Top Free Recommendation #7: Neural Network Design (2nd edition): NEURAL NETWORK DESIGN (2nd Edition) provides a clear and detailed survey of fundamental neural network architectures and learning rules. In it, the authors emphasize a fundamental understanding of the principal neural networks and the methods for training them. The authors also discuss applications of networks to practical engineering problems in pattern recognition, clustering, signal processing, and control systems. Readability and natural flow of material is emphasized throughout the text.

    • Top Free Recommendation #8: Neural Networks and Learning Machines (3rd edition): This third edition of Simon Haykin’s book provides an up-to-date treatment of neural networks in a comprehensive, thorough and readable manner, split into three sections. The book begins by looking at the classical approach on supervised learning, before continuing on to kernel methods based on radial-basis function (RBF) networks. The final part of the book is devoted to regularization theory, which is at the core of machine learning.

  8. Data Mining

    • Top Paid Recommendation: Data Mining: Concepts and Techniques, Third Edition

      Data Mining is a comprehensive overview of the field, and I think it is best for a graduate class in data mining, or perhaps as a reference book. The book's focus is on technique (i.e., how to analyze data, including preparation), and it addresses all the major topics in the field including data storage and pre-processing. However, the book is really about classification methods, and the 2 chapters on cluster analysis are particularly strong and thorough. - Susan Katz, Amazon Review

    • Top Free Recommendation: Mining of Massive Datasets

      The book is based on Stanford Computer Science course CS246: Mining Massive Datasets (and CS345A: Data Mining). The book, like the course, is designed at the undergraduate computer science level with no formal prerequisites. To support deeper explorations, most of the chapters are supplemented with further reading references.

  9. SQL

    • Top Paid Recommendation: Learning SQL, Second Edition

      If you're writing any type of database driven code and you think that you don't need to understand SQL, read this book. You do need to understand it, and this book teaches it very well. - Jack D. Herrington, Amazon Review

    • Top Free Recommendation: Learn SQL The Hard Way

      This book will teach you the 80% of SQL you probably need to use it effectively, and will mix in concepts in data modeling at the same time. If you've been fumbling around building web, desktop, or mobile applications because you don't know SQL, then this book is for you. It is written for people with no prior database, programming, or SQL knowledge, but knowing at least one programming language will help.

  10. Statistics for Data Science

    • Top Paid Recommendation: Statistics in Plain English, Fourth Edition

      I work as a Data Analyst and deal with statistics on a daily basis. I am expected to know all the models and algorithms. Although statistical software does everything for me, figuring out the numbers the software chews out becomes the tricky part. I majored in Biotechnology and was alien to these statistics for the major part of my life. Long story short, I required a solid foundation guide that would help me get acclimatized to the concepts. - Shyam Goli, Amazon Review

    • Top Free Recommendation: Think Stats: Probability and Statistics for Programmers, Second Edition

      Think Stats is an introduction to Probability and Statistics for Python programmers. Think Stats emphasizes simple techniques you can use to explore real data sets and answer interesting questions. The book presents a case study using data from the National Institutes of Health. Readers are encouraged to work on a project with real datasets.

5 eBooks to Read Before Getting into a Data Science or Big Data Career

  • Big Data: The Numbers Game Deciphered For a crisp, concise overview of the world of Big Data, get this pithy 11 page eBook.The eBook begins by setting the context by touching upon the biggest developments in data science. In addition, you will learn all about: - Educational qualifications to become a data scientist; - Technical and non-technical skill sets for a data science role; - Resources to build on your learning; And much more!

  • Top Programming Languages for a Data Scientist Programming is a core technical skill that is an absolute must-have for data scientists. Learn which programming languages to prioritize for data science work with this informative guide. Find inside: - A list of the top 10 programming languages for a data science career; - Features of these programming languages; - How to apply your learning to become a data scientist.

  • 8 Essential Concepts of Big Data and Hadoop The centerpiece of the Big Data revolution, Hadoop is the most important technology in the Big Data family. Download this handy guide to learn all you need to know about Hadoop & its ecosystem.

  • Secret to Unlocking Tableau's Hidden Potential Tableau makes analytics easy and accessible to not just analysts but also top execs,IT professionals,and everyone else in between. Tableau is also currently seen to be the market leader when it comes to Self-Service Business Intelligence with a high degree of execution. If you’re looking for tips, useful hacks, & secret techniques to get the most out of Tableau, this eBook will teach you what you need to know.You will find out about its hidden functionalities and explore unused features that will make you a Tableau superstar.

  • Top 25 Interview Questions and Answers: Big Data Analysis You could be the most knowledgeable data professional in the world, but unless you make an impression in your job interview, it’s unlikely you’ll land your dream role. Get a peek into the mind of the Data Science interviewer with this compilation of the top 25 Big Data interview questions and answers.

5 eBooks to Read Before Getting into a MAchine Learning Career

  • Introduction to Machine Learning Nils J. Nilsson of Stanford put these notes together in the mid 1990s. Before you turn up your nopse at the thought of learning from something from the 90s, remember that foundation is foundation, regardless of when it was written about. Sure, many important advancements have been made in machine learning since this was put together, as Nilsson himself says, but these notes cover much of what is still considered relevant elementary material in a straightforward and focused manner. There are no diversions related to advancements of the past few decades, which authors often want to cover tangentially even in introductory texts. There is, however, a lot of information about statistical learning, learning theory, classification, and a variety of algorithms to whet your appetite. At < 200 pages, this can be read rather quickly.

  • Understanding Machine Learning: From Theory to Algorithms This book covering machine learning is written by Shai Shalev-Shwartz and Shai Ben-David. This book is newer, longer, and more advanced than the previous offering, but it is also a logical next step. This will delve deeper into more algorithms, their descriptions, and provide a bridge toward practicality as well. The focus on theory should be a clue to newcomers of its importance to really understand what is powering machine learning algorithms. The Advanced Theory section covers some concepts which may be beyond the scope or desire of a newcomer, but the option exists to have a look.

  • Bayesian Reasoning and Machine Learning This introductory text on Bayesian machine learning is one of the most well-known on the topic as far as I am aware, and happens to have a free online version available. An Amazon review from Arindam Banerjee of the University of Minnesota has this to say: Bayesian Machine LearningThe book has wide coverage of probabilistic machine learning, including discrete graphical models, Markov decision processes, latent variable models, Gaussian process, stochastic and deterministic inference, among others. The material is excellent for advanced undergraduate or introductory graduate course in graphical models, or probabilistic machine learning. The exposition throughout the book uses numerous diagrams and examples, and the book comes with an extensive software toolbox...

    It should be noted that the toolbox being referred to is implemented in MATLAB, which is no longer the default machine learning implementation language, at least not generally. The toolbox is not the book's only virtue, however. This provides a great jumping off point for those interested in probabilistic machine learning.

  • Deep Learning This is the soon-to-be-released-in-print deep learning book by Goodfellow, Bengio and Courville, which has a freely-available final draft copy on its official website. The following 2 excerpts are from the book's website, one providing an overview of its contents, the other putting almost everyone interested in reading the book at ease: The Deep Learning textbook is a resource intended to help students and practitioners enter the field of machine learning in general and deep learning in particular. The online version of the book is now complete and will remain available online for free. The print version will be available for sale soon. One of these target audiences is university students(undergraduate or graduate) learning about machine learning, including those who are beginning a career in deep learning and artificial intelligence research. The other target audience is software engineers who do not have a machine learning or statistics background, but want to rapidly acquire one and begin using deep learning in their product or platform.

    You would be hard-pressed to find a better resource from which to learn all about deep learning.

  • Reinforcement Learning: An Introduction Sutton and Barto's authoritative classic is getting a makeover. This is a link to the second draft, which is currently in progress (and freely-available while it is).

    Reinforcement learning is of incredible research interest these days, and for good reason. Given its recent high-profile success as part of AlphaGo, its potential in self-driving cars and similar systems, and its marriage with deep learning, there is little reason to believe that reinforcement learning, which is undoubtedly to play a major role in any form of "General AI" (or anything resembling it), is going anywhere. Indeed, these are all reasons that a second draft of this book is in the works.

    You can get a sense of the importance of this book in the field of reinforcement learning given that it is referred to simply as "Sutton and Barto." This Amazon review from David Tan sums the book up nicely (and allays any fears related to "is it too complex for me to understand?"):

    The book starts with examples and intuitive introduction and definition of reinforcement learning. It follows with 3 chapters on the 3 fundamental approaches to reinforcement learning: Dynamic programming, Monte Carlo and Temporal Difference methods. Subsequent chapters build on these methods to generalize to a whole spectrum of solutions and algorithms.

    The book is very readable by average computer students. Possibly the only difficult one is chapter 8, which deals with some neural network concepts.

    Do keep in mind the above is in regards to the first edition; it should generalize to the second, however.

    I wish you well on your quest to learn more about machine learning from free ebooks. Check the related links below for ever more related ebook resources.

  • Knime Press

About

Variety of curated data science books

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors