A Few Datasets

Preface

I have created this webpage to provide you with a few datasets that you might use for your course project and to explain what I hope you will learn by conducting an econometric analysis.

...

Organizing Information, Identifying Trends

I want you to learn econometrics and the best way to learn econometrics is to do it. But more broadly, I hope that conducting an econometric analysis will teach you how to organize information and identify trends.

In the specific case of an econometric analysis, each column in your spreadsheet must represent a variable and each row must represent an observation. So your very first task in an econometric analysis is to properly align that information your spreadsheet.

If you carefully construct your spreadsheet from reliable sources of data (and if you choose a good set of variables to test your null hypothesis), then you should observe some clear trends in your data. Your next task then is to describe those trends, test a series of null hypotheses and report your findings.

Gretl, of course, will help you run regressions and calculate statistics for your analysis. But Gretl is a tool. It is not the tool that is important. It is the quality of your input that is important.

Your spreadsheet is what's important. How you organize information is what's important.

In a more general case, your information might not be the numeric data that we work with in econometrics. It might be names, addresses or whole documents and files. Your data might not even fit into a spreadsheet at all.

But some principles, like functions and variables, will remain the same. And, once again, what will be important is how you organize information. If your information is well-organized, you should observe some clear trends in your data.

To get you started, I have assembled the datasets below. Your task is to identify the most important trends in one of those datasets, test a series of null hypotheses and report your findings.

(back to top)

...

Your Assignment

For the project proposal, please submit a written description of:

the null hypotheses that you wish to test
the dataset that you plan to test them with

For the final project, please submit a formal paper, in which you describe:

the null hypotheses that you tested
the dataset that you tested them with
summary statistics
how you manipulated the data
the regressions that you ran
your conclusions: should we accept or reject the null hypothesis?

if we accept it, why might the explanatory variable not have any effect on the dependent variable?
if we reject it, how strong is the effect of the explanatory variable on the dependent variable?

(back to top)

...

OECD Financial Data

variable	description	units
CCRETT01	relative consumer price indices
CCUS	currency exchange rates	monthly average
IR3TIB	short-term interest rates	percent per annum
IRLT	long-term interest rates	percent per annum
IRSTCI	immediate interest rates, call money, interbank rate	percent per annum
MABM	broad money (M3)	index, seasonally adjusted
MANM	narrow money (M1)	index, seasonally adjusted
SP	share prices	index

OECD Economies

Australia, Austria, Belgium, Canada, Chile, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Israel, Italy, Japan, Korea, Latvia, Luxembourg, Mexico, Netherlands, New Zealand, Norway, Poland, Portugal, Slovak Republic, Slovenia, Spain, Sweden, Switzerland, Turkey, United Kingdom, United States

Non-OECD Economies

Argentina, Brazil, China (People's Republic of), Colombia, Costa Rica, India, Indonesia, Lithuania, Russia, Saudi Arabia, South Africa

Monthly Data

Jan. 1950 to Nov. 2017

download

CSV file

Gretl data

description

This dataset contains financial market data from stats.oecd.org covering 35 OECD countries and 11 non-OECD countries. You may use this data to explore the effect that interest rates, exchange rates, inflation rates and the money supply have on share prices.

Specifically, you may test the null hypotheses that:

a change in interest rates is not associated with a change in share prices
a change in exchange rates is not associated with a change in share prices
a change in money supply is not associated with a change in share prices
a change in inflation rates is not associated with a change in share prices

Then focus on the variables where we rejected the null hypothesis. In those cases, we have accepted the alternative hypothesis that there is a relationship, so now we want to know:

which variables have the strongest effect on share prices?
how large is that effect?

For example, if we think that interest rates will rise one percentage point next month, then how much will share prices fall in response to that change?

As you conduct your analysis, you must remember that one of the Gauss-Markov assumptions is that your residuals ("error terms") must not be correlated with each other. Related to this assumption is the concept of "stationary residuals" -- the mean and variance of your residuals must be constant over time.

Taking the difference in value between one time period and the next will usually make a series stationary, so if you difference each variable in your regression model your residuals will usually be stationary. Differencing, therefore, usually ensures that your residuals are stationary.

The alternative is to find a co-integrating relationship among your variables that makes the residuals stationary. In practice however, it is difficult to find such co-integrating relationships, so I encourage you to work with the differenced variables.

But also -- from the perspective of an investor -- the share price itself is not important. What is important to the investor is the change in share price (i.e. the difference in share price).

So from an investment perspective, you want to develop a model that predicts changes in share price. What predicts those changes?

change in interest rate?
change in exchange rate?
change in inflation rate?
change in money supply?

And how large is the effect of those changes on the change in share price?

(back to top)

...

OECD Labor Market Data

variable	description	units
gwagegap	gender wage gap	percentage
minwage	minimum wage in 2014 constant prices	2014 USD PPPs
rgdpcap	GDP per head (expenditure approach) at constant prices, constant PPPs, OECD base year = 2010	US dollar
dln_cpi	CPI inflation rate (no food, no energy)	percentage
uniondens	union density -- percentage of wage and salary earners that are trade union members	percentage
lrem25fe	Employment rate, Aged 25-54, Females	percentage
lrem25ma	Employment rate, Aged 25-54, Males	percentage
lrem25tt	Employment rate, Aged 25-54, All Persons	percentage
lrem64fe	Employment rate, Aged 15-64, Females	percentage
lrem64ma	Employment rate, Aged 15-64, Males	percentage
lrem64tt	Employment rate, Aged 15-64, All Persons	percentage
lfwa25fe	Working age population, Aged 25-54, Females	percentage
lfwa25ma	Working age population, Aged 25-54, Males	percentage
lfwa25tt	Working age population, Aged 25-54, All Persons	percentage
lfwa64fe	Working age population, Aged 15-64, Females	percentage
lfwa64ma	Working age population, Aged 15-64, Males	percentage
lfwa64tt	Working age population, Aged 15-64, All Persons	percentage
eprc_v1	employment protection -- individual and collective dismissals (regular contracts), version 1	0 to 6
eprc_v2	employment protection -- individual and collective dismissals (regular contracts), version 2	0 to 6
eprc_v3	employment protection -- individual and collective dismissals (regular contracts), version 3	0 to 6
epr_v1	employment protection -- individual dismissals (regular contracts), version 1	0 to 6
epr_v3	employment protection -- individual dismissals (regular contracts), version 3	0 to 6
epc	employment protection -- collective dismissals (additional provisions)	0 to 6
ept_v1	employment protection -- temporary employment, version 1	0 to 6
ept_v3	employment protection -- temporary employment, version 3	0 to 6

OECD Economies

Australia, Austria, Belgium, Canada, Chile, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Israel, Italy, Japan, Korea, Luxembourg, Mexico, Netherlands, New Zealand, Norway, Poland, Portugal, Slovak Republic, Slovenia, Spain, Sweden, Switzerland, Turkey, United Kingdom, United States

Annual Data

1985 to 2014

download

Gretl data

description

This dataset contains labor market data from stats.oecd.org covering 34 OECD countries. It is the same dataset that I use in class with the addition of "gender wage gap" -- the percentage difference between male and female wages.

In class, we discuss the effect of labor market regulation on male and female employment rates. You might extend that analysis by exploring the effect of labor market regulation on the gender wage gap.

(back to top)

...

NYC Vision Zero

variable	description	units
NODEID	intersection identifier
Casualties	sum of "Fatalities" and "Injuries" during month	count
Fatalities	total number of fatalities during month	count
PedFatalit	number of pedestrian fatalities during month	count
BikeFatali	number of bicyclist fatalities during month	count
MVOFatalit	number of motorist fatalities during month	count
Injuries	total number of injuries during month	count
PedInjurie	number of pedestrian injuries during month	count
BikeInjuri	number of bicyclist injuries during month	count
MVOInjurie	number of motorist injuries during month	count
CasualBefore	total "Casualties" from 2009 to 2013	sum
CasualAfter	total "Casualties" from 2014 to 2017	sum
InjurBefore	total "Injuries" from 2009 to 2013	sum
InjurAfter	total "Injuries" from 2014 to 2017	sum
FatalBefore	total "Fatalities" from 2009 to 2013	sum
FatalAfter	total "Fatalities" from 2014 to 2017	sum

Monthly Data

Jan. 2009 to Dec. 2017

download

nyc-dot_by-mon_with-zeroes.csv.zip (Zipped CSV file)

lib_crosstabs.r (R library)

nyc-dot_crosstabs_v3.r (R script)

description

This NYC DOT dataset contains information on traffic fatalities and injuries at 30,754 New York City intersections over 9 years.

During the most recent 4 years of data (from Jan. 2014 to Dec. 2017), New York City set a goal of eliminating traffic fatalities and injuries in an initiative called "Vision Zero." Vision Zero reduced the speed limit throughout the city from 35 to 25 miles per hour and changed traffic rules at many intersections.

You may use this dataset to test the null hypothesis that Vision Zero did not reduce fatalities or injuries. And to conduct such a hypothesis test, you might use regression analysis. But because this dataset is so large (3,347,028 observations) we can also create cross-tabulations that directly examine the empirical distribution.

The one requirement is that you must use a high-memory computer for this analysis. Because this dataset is so large (3,347,028 observations), it took 5 minutes to run this code on my computer with 8 GB of RAM.

(back to top)

...

<< back to the main page