Australia, Austria, Belgium, Canada, Chile, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Israel, Italy, Japan, Korea, Latvia, Luxembourg, Mexico, Netherlands, New Zealand, Norway, Poland, Portugal, Slovak Republic, Slovenia, Spain, Sweden, Switzerland, Turkey, United Kingdom, United States
Argentina, Brazil, China (People's Republic of), Colombia, Costa Rica, India, Indonesia, Lithuania, Russia, Saudi Arabia, South Africa
Jan. 1950 to Nov. 2017
This dataset contains financial market data from stats.oecd.org covering 35 OECD countries and 11 non-OECD countries. You may use this data to explore the effect that interest rates, exchange rates, inflation rates and the money supply have on share prices.
Specifically, you may test the null hypotheses that:
Then focus on the variables where we rejected the null hypothesis. In those cases, we have accepted the alternative hypothesis that there is a relationship, so now we want to know:
For example, if we think that interest rates will rise one percentage point next month, then how much will share prices fall in response to that change?
As you conduct your analysis, you must remember that one of the Gauss-Markov assumptions is that your residuals ("error terms") must not be correlated with each other. Related to this assumption is the concept of "stationary residuals" -- the mean and variance of your residuals must be constant over time.
Taking the difference in value between one time period and the next will usually make a series stationary, so if you difference each variable in your regression model your residuals will usually be stationary. Differencing, therefore, usually ensures that your residuals are stationary.
The alternative is to find a co-integrating relationship among your variables that makes the residuals stationary. In practice however, it is difficult to find such co-integrating relationships, so I encourage you to work with the differenced variables.
But also -- from the perspective of an investor -- the share price itself is not important. What is important to the investor is the change in share price (i.e. the difference in share price).
So from an investment perspective, you want to develop a model that predicts changes in share price. What predicts those changes?
And how large is the effect of those changes on the change in share price?
Australia, Austria, Belgium, Canada, Chile, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Israel, Italy, Japan, Korea, Luxembourg, Mexico, Netherlands, New Zealand, Norway, Poland, Portugal, Slovak Republic, Slovenia, Spain, Sweden, Switzerland, Turkey, United Kingdom, United States
1985 to 2014
This dataset contains labor market data from stats.oecd.org covering 34 OECD countries. It is the same dataset that I use in class with the addition of "gender wage gap" -- the percentage difference between male and female wages.
In class, we discuss the effect of labor market regulation on male and female employment rates. You might extend that analysis by exploring the effect of labor market regulation on the gender wage gap.
If you do, you would account for the simultaneity issues that arise in the supply and demand for male/female labor. Then you would find an instrumental variable to estimate the effect of labor market regulation on the male/female employment rates and on the gender wage gap.
Jan. 2009 to Sept. 2017
nyc-dot_gas_with-zeroes.csv.zip (Zipped CSV file)
lib_crosstabs.r (R library)
nyc-dot_crosstabs_v2.r (R script)
This NYC DOT dataset contains information on traffic fatalities and injuries at 30,754 New York City intersections over almost 9 years.
During the most recent 3 years and 9 months of data (from Jan. 2014 to Sept. 2017), New York City set a goal of eliminating traffic fatalities and injuries in an initiative called "Vision Zero." Vision Zero reduced the speed limit throughout the city from 35 to 25 miles per hour and changed traffic rules at many intersections.
You may use this dataset to test the null hypothesis that Vision Zero did not reduce fatalities or injuries. And to conduct such a hypothesis test, you might use regression analysis. But because this dataset is so large (3,229,170 observations) we can also create cross-tabulations that directly examine the empirical distribution.
The one requirement is that you must use a high-memory computer for this analysis. Because this dataset is so large (3,229,170 observations), it took 5 minutes to run this code on my computer with 8 GB of RAM.