<<
back to the main page
A Few Datasets
Preface
I have created this webpage to provide you with a few datasets that
you might use for your course project and to explain what I hope you
will learn by conducting an econometric analysis.
...
Organizing Information, Identifying Trends
I want you to learn econometrics and the best way to learn econometrics is to do it.
But more broadly, I hope that conducting an econometric analysis will teach you how
to organize information and identify trends.
In the specific case of an econometric analysis, each column in your spreadsheet must
represent a variable and each row must represent an observation. So your very first
task in an econometric analysis is to properly align that information your spreadsheet.
If you carefully construct your spreadsheet from reliable sources of data
(and if you choose a good set of variables to test your null hypothesis), then
you should observe some clear trends in your data. Your next task then is to
describe those trends, test a series of null hypotheses and report your findings.
Gretl, of course, will help you run regressions and calculate statistics for
your analysis. But Gretl is a tool. It is not the tool that is important.
It is the quality of your input that is important.
Your spreadsheet is what's important.
How you organize information is what's important.
In a more general case, your information might not be the numeric data that we work
with in econometrics. It might be names, addresses or whole documents and files.
Your data might not even fit into a spreadsheet at all.
But some principles, like functions and variables, will remain the same.
And, once again, what will be important is how you organize information.
If your information is wellorganized, you should observe some clear trends
in your data.
To get you started, I have assembled the datasets below. Your task is to identify
the most important trends in one of those datasets, test a series of null hypotheses
and report your findings.
(back to top)
...
Your Assignment
For the project proposal, please submit a written description of:
 the null hypotheses that you wish to test
 the dataset that you plan to test them with
For the final project, please submit a formal paper, in which you describe:
 the null hypotheses that you tested
 the dataset that you tested them with
 summary statistics
 how you manipulated the data
 the regressions that you ran
 your conclusions: should we accept or reject the null hypothesis?
 if we accept it, why might the explanatory variable not have any effect on the dependent variable?
 if we reject it, how strong is the effect of the explanatory variable on the dependent variable?
(back to top)
...

OECD Financial Data
variable 
description 
units 
CCRETT01 
relative consumer price indices 

CCUS 
currency exchange rates 
monthly average 
IR3TIB 
shortterm interest rates 
percent per annum 
IRLT 
longterm interest rates 
percent per annum 
IRSTCI 
immediate interest rates, call money, interbank rate 
percent per annum 
MABM 
broad money (M3) 
index, seasonally adjusted 
MANM 
narrow money (M1) 
index, seasonally adjusted 
SP 
share prices 
index 
OECD Economies
Australia, Austria, Belgium, Canada, Chile, Czech Republic,
Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Israel,
Italy, Japan, Korea, Latvia, Luxembourg, Mexico, Netherlands, New Zealand, Norway,
Poland, Portugal, Slovak Republic, Slovenia, Spain, Sweden, Switzerland, Turkey,
United Kingdom, United States
NonOECD Economies
Argentina, Brazil, China (People's Republic of), Colombia,
Costa Rica, India, Indonesia, Lithuania, Russia, Saudi Arabia, South Africa
Monthly Data
Jan. 1950 to Nov. 2017
download
CSV file
Gretl data
description
This dataset contains financial market data from
stats.oecd.org
covering 35 OECD countries and 11 nonOECD countries.
You may use this data to explore the effect that interest rates, exchange rates,
inflation rates and the money supply have on share prices.
Specifically, you may test the null hypotheses that:
 a change in interest rates is not associated with a change in share prices
 a change in exchange rates is not associated with a change in share prices
 a change in money supply is not associated with a change in share prices
 a change in inflation rates is not associated with a change in share prices
Then focus on the variables where we rejected the null hypothesis. In those cases,
we have accepted the alternative hypothesis that there is a relationship, so now
we want to know:
 which variables have the strongest effect on share prices?
 how large is that effect?
For example, if we think that interest rates will rise one percentage point next month,
then how much will share prices fall in response to that change?
As you conduct your analysis, you must remember that one of the GaussMarkov assumptions
is that your residuals ("error terms") must not be correlated with each other. Related
to this assumption is the concept of "stationary residuals"  the mean and variance
of your residuals must be constant over time.
Taking the difference in value between one time period and the next will usually make a series stationary,
so if you difference each variable in your regression model your residuals will usually be stationary.
Differencing, therefore, usually ensures that your residuals are stationary.
The alternative is to find a cointegrating relationship among your variables that makes
the residuals stationary. In practice however, it is difficult to find such cointegrating
relationships, so I encourage you to work with the differenced variables.
But also  from the perspective of an investor  the share price itself is not important.
What is important to the investor is the change in share price (i.e. the difference in share price).
So from an investment perspective, you want to develop a model that predicts changes in share price.
What predicts those changes?
 change in interest rate?
 change in exchange rate?
 change in inflation rate?
 change in money supply?
And how large is the effect of those changes on the change in share price?
(back to top)
...
OECD Labor Market Data
variable 
description 
units 
gwagegap 
gender wage gap 
percentage 
minwage 
minimum wage in 2014 constant prices 
2014 USD PPPs 
rgdpcap 
GDP per head (expenditure approach) at constant prices, constant PPPs, OECD base year = 2010 
US dollar 
dln_cpi 
CPI inflation rate (no food, no energy) 
percentage 
uniondens 
union density  percentage of wage and salary earners that are trade union members 
percentage 
lrem25fe 
Employment rate, Aged 2554, Females 
percentage 
lrem25ma 
Employment rate, Aged 2554, Males 
percentage 
lrem25tt 
Employment rate, Aged 2554, All Persons 
percentage 
lrem64fe 
Employment rate, Aged 1564, Females 
percentage 
lrem64ma 
Employment rate, Aged 1564, Males 
percentage 
lrem64tt 
Employment rate, Aged 1564, All Persons 
percentage 
lfwa25fe 
Working age population, Aged 2554, Females 
percentage 
lfwa25ma 
Working age population, Aged 2554, Males 
percentage 
lfwa25tt 
Working age population, Aged 2554, All Persons 
percentage 
lfwa64fe 
Working age population, Aged 1564, Females 
percentage 
lfwa64ma 
Working age population, Aged 1564, Males 
percentage 
lfwa64tt 
Working age population, Aged 1564, All Persons 
percentage 
eprc_v1 
employment protection  individual and collective dismissals (regular contracts), version 1 
0 to 6 
eprc_v2 
employment protection  individual and collective dismissals (regular contracts), version 2 
0 to 6 
eprc_v3 
employment protection  individual and collective dismissals (regular contracts), version 3 
0 to 6 
epr_v1 
employment protection  individual dismissals (regular contracts), version 1 
0 to 6 
epr_v3 
employment protection  individual dismissals (regular contracts), version 3 
0 to 6 
epc 
employment protection  collective dismissals (additional provisions) 
0 to 6 
ept_v1 
employment protection  temporary employment, version 1 
0 to 6 
ept_v3 
employment protection  temporary employment, version 3 
0 to 6 
OECD Economies
Australia, Austria, Belgium, Canada, Chile, Czech Republic,
Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Israel,
Italy, Japan, Korea, Luxembourg, Mexico, Netherlands, New Zealand, Norway,
Poland, Portugal, Slovak Republic, Slovenia, Spain, Sweden, Switzerland, Turkey,
United Kingdom, United States
Annual Data
1985 to 2014
download
Gretl data
description
This dataset contains labor market data from
stats.oecd.org
covering 34 OECD countries. It is the same dataset that I use in class
with the addition of "gender wage gap"  the percentage difference
between male and female wages.
In class, we discuss the effect of labor market regulation on male and
female employment rates. You might extend that analysis by exploring
the effect of labor market regulation on the gender wage gap.
If you do, you would account for the simultaneity issues that arise in the
supply and demand for male/female labor. Then you would find an
instrumental variable to estimate the effect of labor market regulation
on the male/female employment rates and on the gender wage gap.
(back to top)
...
NYC Vision Zero
variable 
description 
units 
NODEID 
intersection identifier 

GasPrice 
average price of gas during month 
US dollars 
Casualties 
sum of "Fatalities" and "Injuries" during month 
count 
Fatalities 
total number of fatalities during month 
count 
PedFatalit 
number of pedestrian fatalities during month 
count 
BikeFatali 
number of bicyclist fatalities during month 
count 
MVOFatalit 
number of motorist fatalities during month 
count 
Injuries 
total number of injuries during month 
count 
PedInjurie 
number of pedestrian injuries during month 
count 
BikeInjuri 
number of bicyclist injuries during month 
count 
MVOInjurie 
number of motorist injuries during month 
count 
CasualBefore 
total "Casualties" from 2009 to 2013 
sum 
CasualAfter 
total "Casualties" from 2014 to 2017 
sum 
InjurBefore 
total "Injuries" from 2009 to 2013 
sum 
InjurAfter 
total "Injuries" from 2014 to 2017 
sum 
FatalBefore 
total "Fatalities" from 2009 to 2013 
sum 
FatalAfter 
total "Fatalities" from 2014 to 2017 
sum 
Monthly Data
Jan. 2009 to Sept. 2017
download
nycdot_gas_withzeroes.csv.zip (Zipped CSV file)
lib_crosstabs.r
(R library)
nycdot_crosstabs_v2.r
(R script)
description
This NYC DOT dataset contains information on
traffic fatalities and injuries at 30,754 New York City intersections
over almost 9 years.
During the most recent 3 years and 9 months of data (from Jan. 2014 to Sept. 2017),
New York City set a goal of eliminating traffic fatalities and injuries
in an initiative called "Vision Zero."
Vision Zero reduced the speed limit throughout the city from 35 to 25 miles per hour
and changed traffic rules at many intersections.
You may use this dataset to test the null hypothesis that Vision Zero did not reduce
fatalities or injuries. And to conduct such a hypothesis test, you might use
regression analysis. But because this dataset is so large (3,229,170 observations)
we can also create crosstabulations that directly examine the empirical distribution.
The one requirement is that you must use a highmemory computer for this analysis.
Because this dataset is so large (3,229,170 observations), it took 5 minutes
to run this code on my computer with 8 GB of RAM.
(back to top)
...
<<
back to the main page