Book Scope¶
Broadly speaking, data analysis comes in two flavors: exploratory and confirmatory. This book is primarily concerned with exploratory data analysis (and visualization).
In this day and age, when artificial intelligence dominates everyone’s agenda in the data science space, exploratory data analysis seems to be a relic from the past, a lesser form of art unfit for the elite of machine learning and deep learning specialist. At best, exploratory data analysis is regarded as a necessary yet trivial task to do before data modeling, something for which no training or deep reflection is necessary beyond common sense. Nevertheless, insightful and effective exploratory data analysis is both a ubiquitous to any data science endeavor and remarkably hard to do without proper tools and experience.
Consider all the things related to data analysis before modeling:
Import data
Format data
Clean data and anomaly detection
Examine underlying assumptions for modeling
Data transformation and feature engineering
Creating engaging and insightful visual representation of data
Hypothesis formation
To determine what can and cannot be done with the dataset
To determine whether more data needs to be collected
To determine what is the best approach from a data modeling perspective
To generate insights and reveal underlying relationships in the dataset
Correlational analysis, hypothesis testing, causal inference, classical machine learning, and deep learning, all depend upon the correct execution of the exploratory data analysis. I will go as far as to say that a poorly executed exploratory data analysis will deem any data modeling process suboptimal at best and useless at worst. Exploratory data analysis lays the foundations for data modeling, and nothing good can come from weak foundations.
Explorer, hunters, artisans, and detectives¶
Nowadays, is common to compare the task of data analysis to the task of scientific discovery. I dislike such a comparison particularly when exploration and description are the goals. The extent to which science demands certainty and precision just do not match the level of uncertainty and imprecision data analysis must endure. Hence, it is a matter of degree. I think data analysts are closer to explorers, hunters, artisans, and detectives, all of whom rely way more on a mix of intuition, experience, and ad hoc techniques, and that must endure higher levels of uncertainty and imprecision.
Data analysis and visualization of what¶
As explorers, hunters, artisans, and detectives, data analysts develop areas of expertise with training and practice. A homicide detective will not be as good in investigating cybercrime, in the same manner that a financial data analyst will not be as good at analyzing geological data.
My training is in the social and behavioral sciences: sociology, economics, psychology, and neuroscience, are all disciplines I have been trained on at different points in my adult life. As a result, I know little about analyzing data in geology, genomics, physics, pharmacology, ecology, epidemiology, chemistry, factory production, and pretty much anything that’s not related to my training and experience. I considered renaming this book “Data Analysis and Visualization for the Social Sciences with Python” yet I still believe that many if not all of the principles and techniques I’ll explain are useful for people in many areas of expertise. I promise I’ll do my best to make the content as general and relevant as possible for people from a wide variety of disciplines.