Book Scope

Broadly speaking, data analysis comes in two flavors: exploratory and confirmatory. This book is primarily concerned with exploratory data analysis (and visualization).

In this day and age, when artificial intelligence dominates everyone’s agenda in the data science space, exploratory data analysis seems to be a relic from the past, a lesser form of art unfit for the elite of machine learning and deep learning specialist. At best, exploratory data analysis is regarded as a necessary yet trivial task to do before data modeling, something for which no training or deep reflection is necessary beyond common sense. Nevertheless, insightful and effective exploratory data analysis is both a ubiquitous to any data science endeavor and remarkably hard to do without proper tools and experience.

Consider all the things related to data analysis before modeling:

  • Import data

  • Format data

  • Clean data and anomaly detection

  • Examine underlying assumptions for modeling

  • Data transformation and feature engineering

  • Creating engaging and insightful visual representation of data

  • Hypothesis formation

  • To determine what can and cannot be done with the dataset

  • To determine whether more data needs to be collected

  • To determine what is the best approach from a data modeling perspective

  • To generate insights and reveal underlying relationships in the dataset

Correlational analysis, hypothesis testing, causal inference, classical machine learning, and deep learning, all depend upon the correct execution of the exploratory data analysis. I will go as far as to say that a poorly executed exploratory data analysis will deem any data modeling process suboptimal at best and useless at worst. Exploratory data analysis lays the foundations for data modeling, and nothing good can come from weak foundations.

Explorer, hunters, artisans, and detectives

Nowadays, is common to compare the task of data analysis to the task of scientific discovery. I dislike such a comparison particularly when exploration and description are the goals. The extent to which science demands certainty and precision just do not match the level of uncertainty and imprecision data analysis must endure. Hence, it is a matter of degree. I think data analysts are closer to explorers, hunters, artisans, and detectives, all of whom rely way more on a mix of intuition, experience, and ad hoc techniques, and that must endure higher levels of uncertainty and imprecision.

Data analysis and visualization of what

As explorers, hunters, artisans, and detectives, data analysts develop areas of expertise with training and practice. A homicide detective will not be as good in investigating cybercrime, in the same manner that a financial data analyst will not be as good at analyzing geological data.

My training is in the social and behavioral sciences: sociology, economics, psychology, and neuroscience, are all disciplines I have been trained on at different points in my adult life. As a result, I know little about analyzing data in geology, genomics, physics, pharmacology, ecology, epidemiology, chemistry, factory production, and pretty much anything that’s not related to my training and experience. I considered renaming this book “Data Analysis and Visualization for the Social Sciences with Python” yet I still believe that many if not all of the principles and techniques I’ll explain are useful for people in many areas of expertise. I promise I’ll do my best to make the content as general and relevant as possible for people from a wide variety of disciplines.