Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

Using Big Data to Solve Economic

and Social Problems


Professor Raj Chetty
Head Section Leader: Gregory Bruich, Ph.D.

Spring 2019
The Fading American Dream
Percent of Children Earning More than Their Parents, by Year of Birth
Pct. of Children Earning more than their Parents
100

90

80

70

60

50
1940 1950 1960 1970 1980
Child's Year of Birth
Source: Chetty, Grusky, Hell, Hendren, Manduca, Narang (Science 2017)
Why is the American Dream Fading?

 Central policy question: why are children’s chances of


climbing the income ladder falling in America?

– And what can we do to reverse this trend…?

 Difficult to answer this question based solely on


historical data on macroeconomic trends

– Numerous changes over time make it hard to test


between alternative explanations

– Problem: only a handful of data points


Theoretical Social Science

 Until recently, social scientists have had limited data to


study policy questions like this

 Social science has therefore been a theoretical field

– Develop mathematical models (economics) or qualitative


theories (sociology)

– Use these theories to explain patterns and make policy


recommendations, e.g. to improve upward mobility
Theoretical Social Science

 Problem: theories untested  five economists often


have five different answers to a given question

 Leads to a politicization of questions that in principle


have scientific answers

– Example: is Obamacare reducing job growth in America?


The Rise of Data and Empirical Evidence

 Today, social science is becoming a more empirical field


thanks to the growing availability of data

– Test and improve theories using real-world data

– Analogous to natural sciences


Empirical (Data-Based) Articles in Leading Economics Journals, 1983-2011

80%
Percentage of Empirical Articles
60%
40%

38.4% 60.3% 60.0% 72.1%


20%
0%

1983 1993 2003 2011


Year
Source: Hamermesh (JEL 2013)
Social Science in the Age of Big Data

 Recent availability of “big data” has accelerated this trend

– Large datasets are starting to transform social science, as


they have transformed business

 Examples:

– Government data: tax records, Medicare

– Corporate data: Google, Uber, retailer data

– Unstructured data: Twitter, newspapers


Why is Big Data Transforming Social Science?

1. Greater reliability than surveys

2. Ability to measure new variables (e.g., emotions)

3. Universal coverage  can “zoom in” to subgroups

4. Large samples  can approximate scientific experiments


Why This Course?

 Companies like Amazon have succeeded in solving major private


market problems using technology and big data

 Goal of this course: show how same skills can be used to address
important social problems

– We need more talent in this area given pressing challenges such as rising
inequality and global warming

 To achieve this goal, provide an introduction to a broad range of


topics, methods, and real-world applications

– Start from the questions to motivate the methods rather than the traditional
approach of doing the reverse
Overview of Topics

1. Equality of Opportunity

2. Education

3. Racial Disparities

4. Health

5. Criminal Justice

6. Tax Policy

7. Climate Change

8. Economic Development and Institutional Change


Examples of Statistical Methods You Will Learn in this Class

1. Descriptive Data Analysis: correlation, regression, survival analysis

2. Experiments: randomization, non-compliance

3. Quasi-Experiments: regression discontinuity, difference-in-differences

4. Machine Learning: prediction, overfitting, cross-validation

5. Stata (or other) statistical programming language


Statistical Methods: Two Types of “Big Data”

 Big data can be classified into two types

– “Long” data: many observations relative to variables


(e.g., tax records)
Statistical Methods: Two Types of “Big Data”

 Big data can be classified into two types

– “Long” data: many observations relative to variables


(e.g., tax records)

– “Wide” data: few observations relative to variables


(e.g. Amazon clicks, newspapers)
Statistical Methods: Two Types of “Big Data”

 Statistics/computer science has focused on “wide” data

– Main application: prediction

– Example: predicting income to target ads

 Social science has focused on “long” data

– Main application: identifying causal effects

– Example: effects of improving schools on income


Examples of Economic Concepts You Will Learn in this Class

1. Effects of price incentives

2. Supply and demand

3. Competitive equilibrium

4. Adverse selection

5. Behavioral economics vs. rational models


Two Types of Sections

 We recognize that not everyone taking this class has the same
background in statistics and economics

– Some students have taken many courses already, others are just starting

 Lectures will be structured so that everyone can follow them, with


no prior knowledge assumed

 Sections will be divided into two types, based on whether students


have prior coursework in statistics/econometrics

– Please respond to emails you will receive this week asking about your prior
coursework and preferences
Empirical Projects

 To help students learn, we will assign four empirical projects


that will get you into the data

 Will focus on real-world questions and involve coding,


reading papers, and writing

 For example, fourth project will be analogous to the “Netflix


challenge” to predict the movies people will like

 We will have a “Social Mobility challenge” to identify


predictors of mobility and neighborhood change
Discussions with Leading Experts on Real-World Applications

1. Affordable Housing: Shaun Donovan

2. College Completion: Timothy Renick

3. Food Stamps Programs: Jesse Shapiro

4. Health and Criminal Justice: Lynn Overmann

5. Poverty in Developing Countries: Esther Duflo

Important Note: Guest discussants are generously providing their time to us


 attendance is mandatory and will count toward your grade
Part 1
Local Area Variation in Upward Mobility

Topic I
Equality of Opportunity
Lecture 1 Outline

1. Geographical Variation in Upward Mobility in America

2. Causal Effects of Places vs. Sorting

 Lecture 1 is based primarily on the following paper:

Chetty, Friedman, Hendren, Jones, Porter. “The Opportunity Atlas:


Mapping the Childhood Roots of Social Mobility” NBER wp, 2018
Part 1
Local Area Variation in Upward Mobility

Part 1
Geographical Variation in Upward Mobility
Differences in Opportunity Across Local Areas

 How do children’s chances of moving up vary across areas in America?

– Are there some areas where kids do better than others? If so, what lessons
can we learn from them?

 Recent studies have used big data to measure how upward mobility
varies based on where children grow up
The Opportunity Atlas
Data Sources and Sample Definitions

 Data sources: Anonymized Census data (2000, 2010, ACS) covering U.S.
population linked to federal income tax returns from 1989-2015

 Link children to parents based on dependent claiming on tax returns

 Target sample: Children in 1978-83 birth cohorts who were born in the U.S.
or are authorized immigrants who came to the U.S. in childhood

 Analysis sample: 20.5 million children, 96% coverage rate of target sample
Measuring Parents’ and Children’s Incomes in Tax Data

 Parents’ household incomes: average income reported on Form 1040 tax


return from 1994-2000

 Children’s incomes measured from tax returns in 2014-15 (ages 31-37)

 Focus on percentile ranks in national distribution:

 Rank children relative to others born in the same year and parents relative to
other parents
Intergenerational Income Mobility for Children Raised in Chicago
Average Child Household Income Rank vs. Parent Household Income Rank

Mean Child Rank in National Income Distribution


40 50 60 70

Predicted Value Given Parents at 25th Pctile. = 40th Percentile


= $30,400
20 30

0 10 20 30 40 50 60 70 80 90 100
Source: Chetty, Hendren, Kline, Saez 2014 Parent Rank in National Income Distribution
Intergenerational Income Mobility for Children Raised in a Hypothetical Census Tract
Average Child Household Income Rank vs. Parent Household Income Rank

Mean Child Rank in National Income Distribution


40 50 60 70

Predicted Value Given = 40th Percentile


Parents at 25th Percentile
20 30

0 10 20 30 40 50 60 70 80 90 100
Parent Rank in National Income Distribution
Estimating Children’s Average Outcomes by Census Tract

 Run a separate regression using data for children who grow up in each Census
tract in America

 In practice, many children move across areas in childhood

– Weight children by fraction of childhood (up to age 23) spent in a given area
The Geography of Upward Mobility in the United States
Average Household Income for Children with Parents Earning $27,000 (25th percentile)
Seattle
Salt Lake City $37.2k Dubuque
$35.2k
$45.5k Cleveland
$29.4k

Boston $36.8k

New York City $35.4k


San Francisco
Washington DC
Bay Area
$33.9k
$37.2k
> $44.8k

Los Angeles
$34.3k
Atlanta
$26.6k
$33.7k

Note: Blue = More Upward Mobility, Red = Less Upward Mobility


Source: The Opportunity Atlas. Chetty, Friedman, Hendren, Jones, Porter 2018 < $26.8k

You might also like