PREFACE 1 INTRODUCTION Data Analysis What's in This Book What's with the Workshops? What's with the Math? What You'll Need What's Missing
PART I Graphics: Looking at Data 2 A SINGLE VARIABLE: SHAPE AND DISTRIBUTION Dot andJitter Plots Histograms and Kernel Density Estimates The Cumu/atiue Distribution Function Rank-Order Plots and Lilt Charts Only When Appropriate: Summary Statistics and Box Plots Workshop: NumPy Further Reading 3 TWO VARIABLES: ESTABLISHING RELATIONSHIPS Scatter Plots Conquering Noise: 5moothing Logarithmic Plots Banking Linear ReRression and All That Shouwing What's Important Graphical Analysis and Presentation Graphics Workshop: matplotlib Further Reading TIME AS A VARIABLE: TIME-SERIES ANALYSIS Examples The Task Smoothing Don't Ouerlook the Obuious! The Correlation Function Optional: Filters and Conuolutions Workshop: scipy.signal Further ReadinR 5 MORE THAN TWO VARIABLES: GRAPHICAL MULTIVARIATE ANALYSIS False-Color Plots A Lot at a Glance: Multiplots Composition Problems Nouel Plot Types Interactiue Explorations Workshop: Tools for Multiuariate Graphics Further ReadinR 6 INTERMEZZO: A DATA ANALYSIS SESSION A Data Analysis Session Workshop: gnuplot Further ReadinR
PART II Analyticg: Modeling Data 7 GUESSTIMATION AND THE BACK OF THE ENVELOPE Principles of Guesstimation How Good Are Those Numbers? Optional: A Closer Look at Perturbation Theory and Error PropaRation Workshop: The Gnu Scientific Library (GSL) Further Reading 8 MODELS FROM SCALING ARGUMENTS Models ArRuments from Scale Mean-Field Approximations Common Time-Euolution Scenarios Case Study: How Many Seruers Are Best? Why Modeling? Workshop: Sage Further Reading 9 ARGUMENTS FROM PROBABILITY MODELS The. Binomial Distribution and Bernoulli Trials The Gaussian Distribution and the Central Limit Theorem Power-Law Distributions and Non-Normal Statistics Other Distributions Optional: Case Study--Unique Visitors ouer Time Workshop: Power-Law Distributions Further Reading 10 WHAT YOU REALLY NEED TO KNOW ABOUT CLASSICAL STATISTICS Genesis Statistics Defined Statistics Explained Controlled Experiments Versus Obseruationa} Studies Optional: Bayesian Statistics--The Other Point of View Workshop: R Further Reading 11 INTERMEZZO:MYTHBUSTING--BIGFOOT, LEAST SQUARES, AND ALL THAT How to Auerage Auerages The Standard Deuiation Least Squares Further Reading
PART III Computation: Mininhg Data 12 SIMULATIONS A Warm-Up Question Monte Carlo Simulations Resampling Methods Workshop: Discrete Euent Simulations with Simpy Further Reading 13 FINDING CLUSTERS What Constitutes a Cluster? Distance and Similarity Measures Clustering Methods Pre-and Postprocessing Other ThouRhts A Special Case: Market BasketAnalysis A Word of WarninR Workshop: P/cluster and the C Clustering Library Further Reading 14 SEEING THE FOREST FOR THE TREES: FINDING IMPORTANT ATTRIBUTES Principal Component Analysis Visual Techniques Kohonen Maps Workshop: PCA with R Further Readin2 15 INTERMEZZO:WHEN MORE IS DIFFERENT A Horror Story Some Suggestions What About Map/Reduce? Workshop: Generating Permutations Further Reading
PART IV Applications: Using Data 16 REPORTING, BUSINESS INTELLIGENCE, AND DASHBOARDS Business Intelligence Corporate Metrics and Dashboards Data Quality Issues Workshop: Berkeley DB and SQLite Further Reading 17 FINANCIAL CALCULATIONS AND MODELING The Time Value o[ Money Uncertainty in Planning and Opportunity Costs Cost Concepts and Depreciation Should You Care? Is This All That Matters? Workshop: The Newsuendor Problem Further Reading 18 PREDICTIVE ANALYTICS Introduction Some Classification Terminology Algorithms for Classification The Process The Secret Sauce The Nature o[ Statistical Learning Workshop: Two Do-lt-Yoursel Classifiers Further Reading 19 EPILOGUE: FACTS ARE NOT REALITY A PROGRAMMING ENVIRONMENTS FOR SCIENTIFIC COMPUTATION AND DATA ANALYSIS Software Tools A Catalog of Scientific Software Writing Your Own Further Reading B RESULTS FROM CALCULUS Common Functions Calculus Useful Tricks Notation and Basic Math Where to Go from Here Further Readin9 WORKING WITH DATA Sources for Data Cleanin9 and ConditioninR Sarnplin9 Data File Formats The Care and Feeding of Your Data Zoo Skills Terminology Further Fleadin9 INDEX