1.21.2013

Plot polynomial of any degree in Stata (with controls)

FE has been a little sluggish to recover from break. To kick start us back in gear, I'm making good on one resolution by making this FE Week-of-Code. I'll try to post something useful that I've written from the past year each day.

It always bugged me that I could easily plot a linear or quadratic fit in Stata, but if I used a third-order polynomial I could no longer plot the results easily.  Stata has built in functions like lowess, fpfitci and lpolyci that will plot very flexible functions, but those tend to be too flexible for many purposes. Sometimes I just want a function that is flexibly non-linear, but still smooth (so not lowess) and something I can easily write down analytically (so not fpfit or lpoly) and perhaps not symmetric (so not qfit).  We use high-degree polynomial's all the time, but we just don't plot them very often (I think this is because there is no built-in command do it for us).

Here's my function plot_margins.ado that does it. It takes a polynomial that you use in a regression, and plots the response function. Added bonuses: It plots the confidence interval you specify and can handle control variables (which are not plotted)

Reed Walker actually came up with this idea in an email exchange we had last year. So he deserves credit for this. I just wrote it up into an .ado file that you can call easily.

Basically, the idea is that you run any regression using Stata's factor variable notation, where you tell Stata that a variable X is continous and should be interacted with itself, eg
reg y c.x##c.x
is the notation to regress Y on X and X^2. (Check out Stata's documentation on factor variables if this isn't familiar to you.)

Reed's idea was to then use Stata 11's new margins command to evaluate the response of Y to X and X^2 at several points along the support of X, and then to use parmest to plot the result. (To download parmest.dotype "net install st0043.pkg" at the command line in Stata. plot_margins will call parmest, so you need to have it installed to run this function.)

The idea works really well, so long as you have Stata 11 or later (margins was introduced in Stata 11). 

Here's an example. First generate some false data. Y is the outcome of interest. X is the independent variable of interest. W is a relevant covariate. Z is an irrelevant covariate.
clear 
set obs 1000 
gen year = ceil(_n/100) 
gen x=5*uniform()-2 
gen z=2*uniform() 
gen w=2*uniform() 
gen e = 40*rnormal()  
gen y= w + 3*z + 4*x - x^2 + x^3 + e 
Then run a regression of Y on a polynomial of X (here, it's third degree) along with controls. The standard errors can be computed any fancy way you like. Here, I've done a block-bootstrap by the variable year.
reg y w z c.x##c.x##c.x , vce(bootstrap , reps(500) cluster(year))
Then, right after running the regression, (or when the estimates from the regression are the active in memory) call plot_margins to plot the marginal contribution of X at different values.
plot_margins x
Easy enough? I've added a few features for programming ease. Use the plot_command() option to add labels, etc to the graph
plot_margins x, plotcommand("xtit(X) ytit(Y) tit(Third order polynomial) subtit(plotted with PLOT_MARGINS) note(SE are block-bootstrapped by year and model controls for X and Z)") 
 The result:


 or specify the option "line" to have the CI plotted as lines instead of a shaded area:


There is also an option to save the function generated by parmest. Help file is below the fold. Have fun and don't forget to email Reed and tell him "thank you!"

Update: Reed points us to marginsplot in Stata 12, which basically does the same thing. Funny that the function names are all so unique...

/*

S. HSIANG
SHSIANG@PRINCETON.EDU
10/2012
-------------------------------------------------------

CREDIT

This code is based on a script originally written by W. REED WALKER

-------------------------------------------------------

SYNTAX

plot_margins varname, [BINS(integer 50) SAVE_data_file(string) plotcommand(string)]

-------------------------------------------------------

PLOT_MARGINS is designed to plot the response function ("margins") obtained from regression estimates where a continuous variable may be interacted with itself to construct polynomial or other function.

PLOT_MARGINS plots the response specified by the most recently executed estimate (or a restored active estimate).

PLOT_MARGINS is designed to show a relationship between X and Y, adjusted for other variables Z1 ... Zn that are specified as controls in the previous estimation. In order to plot a nonlinear relationship between X and Y, the estimation command must use factor variable interactions (eg. c.X##c.X) in the previous estimation command.

PLOT_MARGINS calls PARMEST (after preserving the data), plots the response function, then restores the original dataset. An option for saving the dataset with the estimates is included.

PLOT_MARGINS does not automatically save the figure. Use GRAPH EXPORT or GRAPH SAVE afterwards to save the figure.

See "help margins" for more details on the output that is being plotted. 

PLOT_MARGINS only works on STATA Version 11 or later (because MARGINS was introduced to Version 11)

-------------------------------------------------------
NOTE

PARMEST.ADO must be installed in order for PLOT_MARGINS to work.  To download PARMEST.ado from SSC (type "net install st0043.pkg")

-------------------------------------------------------

Required arguments:

varname - The independant variable that is specified with interactions in the previous regression command
Optional arguments: 

bins() - Number of "bins" that are used to plot the final response funtion. Fewer bins will make the response function look "jagged", more bins will make the response look smoother. Default is 50. 
SAVE_data_file() - String that specifies a path/file to save the dataset of estimates used to plot the response function. These are the values of the response function (with SE) at each bin of the independent variable. If nothing is specified, the data used to plot the results will not be saved.
PLOTCOMMAND() - String arguments that are appending to the final plotting command. Eg. to label the plot with the title "X vs Y" and xtitle "X" type: plot_options("title(X vs Y) xtit(X)").

CILEVEL() - Integer specifying the confidence level to be plotted. The default is cilevel(95).
//---------------------------------------------------------

EXAMPLES: 

set obs 1000
gen year = ceil(_n/100)
gen x=3*uniform()
gen z=2*uniform()
gen w=2*uniform()
gen e = 4*rnormal()
gen y= w + 3*z + 4*x - x^2 + e
reg y w z c.x##c.x, vce(bootstrap , reps(100) cluster(year))
plot_margins x, plotcommand("xtit(Independent Variable) tit(my graph) note(bootstrapped SE)")

reg y w z c.x##c.x, robust
plot_margins x, plotcommand("xtit(X) ytit(Y) note(robust SE)")

*/

2 comments:

  1. Hello there
    Thanks for sharing this. When typing "net install st0043.pkg" at STATA 11 command box, I get the following error message: "file http://www.stata.com/users/st0043.pkg not found
    server says file temporarily redirected to http://www.stata.com/error/404.html
    could not load st0043.pkg from http://www.stata.com/users/
    r(601);"

    Is the file temporary removed, and if yes, how can I get it?

    Thanks
    Despoina Filiou

    ReplyDelete
    Replies
    1. I just tried it again and it worked for me. Give it another shot. If that doesn't work, try typing "findit parmest" at the command line, and then go down to package "st0043" and click on it to install it.

      Delete