1.24.2013

Quickly plotting nonparametric response functions with binned independent variables [in Stata]

Yesterday's post described how we can bin the independent variable in a regression to get a nice non-parametric response function even when we have large data sets, complex standard errors, and many control variables.  Today's post is a function to plot these kinds of results.

After calling bin_parameter.ado to discretize an independent variable (see yesterday's post), run a regression of the outcome variable on the the sequence of generated dummy variables (this command can be as complicated as you like, so feel free to throw your worst semi-hemi-spatially-correlated-auto-regressive-multi-dimensional-cluster-block-bootstrap standard errors at it). Then run plot_response.ado (today's function) to plot the results of that regression (with your fancy standard errors included). It's that easy.

Here's an example. Generate some data where Y is a quadratic function of X and a linear function of Z:
set obs 1000
gen x = 10*runiform()-5
gen z = rnormal()
gen e = rnormal()
gen y = x^2+z+5*e
Then bin the parameter using yesterday's function and run a regression of your choosing, using the the dummy variables output by bin_parameter:
bin_parameter x, s(1) t(4) b(-4) drop(-1.6) pref(_dummy)
reg y _dummy* z
After the regression, call plot_response.ado to plot the results of that regression (only the component related to the binned variables). The arguments describing the bins are the same format as those used by bin_parameter to make this easier:
plot_response, s(1) t(4) b(-4) drop(-1.6) pref(_dummy)
The result is a plot that clearly shows us the correct functional form:


Note: plot_response.ado requires parmest.ado (download from SSC  by typing "net install st0043.pkg" at the command line). It also calls a function parm_bin_center.ado that is included in the download.

Citation note: If you use this suite of functions in publication, please cite: Hsiang, Lobell, Roberts & Schlenker (2012): "Climate and the location of crops."

Help file below the fold.


/* ====================================================

S. HSIANG
SHSIANG@PRINCETON.EDU
9/2012

-------------------------------------------------------

SYNTAX

plot_response,  Size(real) Bottom_bin_upper_bound(real) Top_bin_lower_bound(real) DROPped_bin(real) [PREF(string) SAVE_data_file(string) PLOTCOMMAND(string) NODROP MAYBEMISSING]

-------------------------------------------------------

PLOT_RESPONSE is designed to plot the response function obtained from a regression model where a continuous variable is discretized into dummy variables representing bins of the original variable. 

PLOT_RESPONSE is meant to plot the regression results obtained after BIN_PARAMETER is used, followed by a regression command. PLOT_RESPONSE calls PARMEST (after preserving the data), labels the coefficients using the center values for X in each bin using PARM_BIN_CENTER, plots the response function, then restores the original dataset. An option for saving the dataset with the estimates is included.

PLOT_RESPONSE does not automatically save the figure. Use GRAPH EXPORT or GRAPH SAVE afterwards to save the figure.

PLOT_RESPONSE is designed to show a relationship between X and Y, controlling for other variables Z1 ... Zn. The function identifies which variables describe X through the argument PREF() and discards the estimates for all coefficients that do not  have names beginning with the specified prefix. If no prefix is specified, the default "_bin" is assumed (same default as in BIN_PARAMETER).

-------------------------------------------------------

If BIN_PARAMETER was called before PLOT_RESPONSE, then the arguments

SIZE()
TOP_BIN_LOWER_BOUND()
BOTTOM_BIN_UPPER_BOUND()
DROPPED_BIN()
[NODROP]
[PREF()]

must all match the arguments used when calling BIN_PARAMETER

IF BIN_PARAMETER is being called after OLS_SPATIAL_HAC with the DROPVAR specified, then you must specify the MAYBEMISSING option when calling BIN_PARAMETER.

-------------------------------------------------------

Required arguments:

Size() - width of bins

Top_bin_lower_bound() - lower cutoff for maximum bin, all values above this number are binned

Bottom_bin_upper_bound()- top cutoff for minimum bin, all values below this number are binned

DROPped_bin() - value of X that denotes which bin is dropped. The bin that contains this value will be dropped. Example: if DROP(1) is specified and there is a bin for values of 0 < x < 3, than this bin will be assumed to be dropped.  

Options:

PREF() - the prefix used to identify the dummy vars of X in the regression model; must match PREF() used when calling BIN_PARAMETER. If PREF() is not specified, then the default is used (both BIN_PARAMETER and PLOT_RESPONSE use the same default "_bin", so they will match if both are left unspecified).

NODROP - Must be specified if NODROP was used when calling BIN_PARAMETER. If NODROP is specified, then no bin will be dropped and all coefficients will be plotted as comparisons with zero, rather than the dropped bin. In these cases, DROP() must still be specified with a real argument, however the value of the argument is irrelevant.

CILEVEL - Specify size of the confidence interval as integer between 1-99. Default is 95 (i.e. alpha = 0.05)

SAVE_data_file() - A filename/path that is used to store the parameter values that are plotted. These are the coefficients for the dummy vars which are obtained when calling PARMEST and PARM_BIN_CENTER. If nothing is specified, the data used to plot the results will not be saved.

PLOTCOMMAND() - String arguments that are appending to the final plotting command. Eg. to label the plot with the title "X vs Y" and xtitle "X" type: plot_options("title(X vs Y) xtit(X)").

MAYBEMISSING - Option to fill in missing variable names. if Stata drops observations under usual regression conditions, then o.dummy_variable_name will appear following PARMEST and MAYBEMISSING is not necessary. However, if any dummy vars were not included in the estimation command (because they were forgotten) or they were dropped in nonstandard conditions (eg. from the DROPVAR option using OLS_SPATIAL_HAC) then MAYBEMISSING will "repair" the dataset that is returned by PARMEST by filling in missing variables as missing. If no variables are missing, than MAYBEMISSING will have no effect. 

-------------------------------------------------------

Requires functions 

PARM_BIN_CENTER.ado - download at http://www.solomonhsiang.com/computing

PARMEST.ado - download from SSC (type "net install st0043.pkg")

-------------------------------------------------------

CITATION IF USED FOR PUBLICATION:

Hsiang, Lobell, Roberts and Schlenker, 2012: "Climate and the Location of Crops"

-------------------------------------------------------

EXAMPLE: 

set obs 100
gen x = 6*runiform()-3
gen z = rnormal()
gen e = rnormal()
gen y = x^2+z+e

bin_parameter x, s(1) t(2) b(-2) drop(-1.6) pref(_dummy)

reg y _dummy* z

plot_response, s(1) t(2) b(-2) drop(-1.6) pref(_dummy) plotcommand("tit(E[y|x,z])")

==================================================== */

No comments:

Post a Comment