Fight Entropy: 01.13

1.31.2013

Ethos or Logos?

Last week, Bjorn Lomborg wrote an opinion piece in the WSJ preemptively attacking the climate policies that he speculates Obama will endorse. I don't usually read this kind of thing, but journalists at Climate Science Watch asked what I thought about it since they didn't believe it. Reading the article, I was surprised the WSJ had published it -- not because the citations were not 100% correct, but because the overall logic of the essay was flawed in obvious ways (regardless of your political stance). This is the kind of thing a copy-editor should have picked up on. My reply to the CSW mainly focused on this central logical flaw.

Interestingly, when CSW posted its reply to BL (here), which drew on many scientific experts, it was focused entirely on discrediting individual statements that BL had made

Displaying his trademark doublethink, Bjorn Lomborg’s latest op-ed in the Wall Street Journal switches between recognizing the risks of climate change and rejecting the need for meaningful action in the near term. Lomborg incorporates misleading and discredited scientific information to justify dangerous delays in climate action.

rather than pointing out that the giant "if-then" statement at the core of the article's architecture would obviously return an error if it were fed into any computer capable of boolean logic.

This struck me because the dialogue (in both directions) was focused on discrediting one's opponent, by demonstrating they don't understand science (BL does this to Obama, and CSW does this to BL), rather than finding a logical solution to the problem (or simply having a logical discussion about it). This seemed unfortunate, since in this particular case, the physical science is pretty irrelevant to the actual policy discussion. The entire discussion should be focused on the economics. I blame BL for inappropriately bringing the science into the discussion, but I wish CSW had pointed out that that was the error, since I think doing so (here and elsewhere) would get us back on track to meaningful discussion rather than escalating the scientific mudslinging.

My fully reply to the CSW is below the fold (I had been in referee-mode at the time, which is probably evident).

Third Interdisciplinary Ph.D. Workshop in Sustainable Development

Third Interdisciplinary Ph.D. Workshop in Sustainable Development
April 12th-13th, 2013: Columbia University in the City of New York, USA

The graduate students in sustainable development at Columbia University are convening the Third Interdisciplinary Ph.D. Workshop in Sustainable Development (IPWSD); scheduled for April 12th-13th, 2013, at Columbia University in New York City.

The IPWSD is a conference open to graduate students working on or interested in issues related to sustainable development. It is intended to provide a forum to present and discuss research in an informal setting, as well as to meet and interact with similar graduate student researchers from other institutions. In particular, we hope to facilitate a network among students pursuing in-depth research across a range of disciplines in the social and natural sciences, to generate a larger interdisciplinary discussion concerning sustainable development. If your research pertains to the field of sustainable development and the linkages between natural and social systems, we encourage you to apply regardless of disciplinary background.

For details, please see the call for papers, or visit our conference website where a detailed list of topics, conference themes and other information is available.

website: http://blogs.cuit.columbia.edu/sdds/schedule-events/ipwsd_2013/
contact: cu.sdds.ipwsd@gmail.com

Note that the submission deadline has been extended to February 15th.

1.28.2013

Implementation capacity matters when scaling up

Paul Niehaus was presenting at USF's seminar yesterday and mentioned the following working paper by Bold, Kimenyi, Mwabu, Ng'ang'a and Sandefur:

Interventions & Institutions: Experimental Evidence on Scaling up Education Reforms in Kenya

Abstract: The recent wave of randomized trials in development economics has provoked criticisms regarding external validity and the neglect of political economy. We investigate these concerns in a randomized trial designed to assess the prospects for scaling-up a contract teacher intervention in Kenya, previously shown to raise test scores for primary students in Western Kenya and various locations in India. The intervention was implemented in parallel in all eight Kenyan provinces by a non-governmental organization (NGO) and the Kenyan government. Institutional di erences had large e ffects on contract teacher performance. We find a signifi cant, positive eff ect of 0.19 standard deviations on math and English scores in schools randomly assigned to NGO implementation, and zero eff ect in schools receiving contract teachers from the Ministry of Education. We discuss political economy factors underlying this disparity, and suggest the need for future work on scaling up proven interventions to work within public sector institutions.

The paper is openly hosted here, and the Center for Global Development has an interview with Sandefur here. The introduction to the paper gives a particular succinct summary of its motivation:

The recent wave of randomized trials in development economics has catalogued a number of cost-eff ective, small-scale interventions proven to improve learning, health, and other welfare outcomes. This methodology has also provoked a number of criticisms regarding the generalizability of experimental findings, including concerns about external validity, general equilibrium eff ects, and the neglect of political economy in much of the evaluation literature (Acemoglu 2010, Deaton 2010, Heckman 1991, Rodrik 2009). These criticisms are particularly relevant when randomized trials of pilot projects run by well-organized and monitored NGOs are used as the basis for policy prescriptions at the national or global level. As noted by Banerjee and Duflo (2008), what distinguishes possible partners for randomized evaluations is competence and a willingness to implement projects as planned. These may be lost when the project scales up. [. . . ] Not enough e ffort has taken place so far in trying `medium scale' evaluation of programs that have been successful on a small scale, where these implementation issues would become evident."

In this paper we employ the methodology of randomized trials to assess these substantive concerns about political and institutional constraints and measure precisely how treatment eff ects change when scaling up...

1.24.2013

Quickly plotting nonparametric response functions with binned independent variables [in Stata]

Yesterday's post described how we can bin the independent variable in a regression to get a nice non-parametric response function even when we have large data sets, complex standard errors, and many control variables. Today's post is a function to plot these kinds of results.

After calling bin_parameter.ado to discretize an independent variable (see yesterday's post), run a regression of the outcome variable on the the sequence of generated dummy variables (this command can be as complicated as you like, so feel free to throw your worst semi-hemi-spatially-correlated-auto-regressive-multi-dimensional-cluster-block-bootstrap standard errors at it). Then run plot_response.ado (today's function) to plot the results of that regression (with your fancy standard errors included). It's that easy.

Here's an example. Generate some data where Y is a quadratic function of X and a linear function of Z:

set obs 1000
gen x = 10*runiform()-5
gen z = rnormal()
gen e = rnormal()
gen y = x^2+z+5*e

Then bin the parameter using yesterday's function and run a regression of your choosing, using the the dummy variables output by bin_parameter:

bin_parameter x, s(1) t(4) b(-4) drop(-1.6) pref(_dummy)
reg y _dummy* z

After the regression, call plot_response.ado to plot the results of that regression (only the component related to the binned variables). The arguments describing the bins are the same format as those used by bin_parameter to make this easier:

plot_response, s(1) t(4) b(-4) drop(-1.6) pref(_dummy)

The result is a plot that clearly shows us the correct functional form:

Note: plot_response.ado requires parmest.ado (download from SSC by typing "net install st0043.pkg" at the command line). It also calls a function parm_bin_center.ado that is included in the download.

Citation note: If you use this suite of functions in publication, please cite: Hsiang, Lobell, Roberts & Schlenker (2012): "Climate and the location of crops."

Help file below the fold.

Binning a continuous independent variable for flexible nonparametric models [in Stata]

Sometimes we want to a flexible statistical model to allow for non-linearities (or to test if an observed relationship is actually linear). It's easy to run a model containing a high-degree polynomial (or something similar), but these can become complicated to interpret if the model contains many controls, such as location-specific fixed effects. Fully non-parametric models can be nice, but they require partialling out the data and standard errors can be come awkward if the sample is large or something sophisticated (like accounting for spatial correlation) is required.

An alternative that is easy to interpret, and handles large samples and complex standard errors well, is to convert the independent variable into discrete bins and to regress the outcome variable on dummy variables that represent each bin.

For example, in a paper with Jesse we take the typhoon exposure of Filipino households (a continuous variable) and make dummy variables for each 5 m/s bin of exposure. So there is a 10_to_15_ms dummy variable that is zero for all households except for those households whose exposure was between 10 and 15 m/s, and there is a different dummy for exposure between 15 and 20 m/s, etc. When we regress our outcomes on all these dummy variables (and controls) at the same time, we recover their respective coefficients -- which together describe the nonlinear response of the outcome. In this case, the response turned out to be basically linear:

The effect of typhoon exposure on Filipino households finances.
From Anttila-Hughes & Hsiang (2011)

This approach coarsens the data somewhat, so there is some efficiency loss and we should be wary of Type 2 error if we compare bins to one another. But as an approach to determine the functional form of a model, this is a great approach so long as you have enough data.

I found myself rewriting Stata code to bin variables like this in many different contexts, so I wrote bin_parameter.ado to do it for me quickly. Running these models can now be done in two lines of code (one of which is the regression command). bin_parameter allows you to specify a bin width, a top bin, a bottom bin and a dropped bin (for your comparison group). It spits out a bunch of dummy variables that represent all the bins which cover the range of the specified variable. It also has options for naming the dummy variables so you can use the wildcard notation in regression commands. Here's a short example of how it can be used:

set obs 1000
gen x = 10*runiform()-5
gen y = x^2
bin_parameter x, s(1) t(4) b(-4) drop(-1.6) pref(x_dummy)
reg y x_dummy*

Help file below the fold.

Prettier graphs with less headache: use schemes in Stata

I'm picky about graphs looking nice. So for a long time I did silly things that drove me nuts, like typing "ylabel(, angle(horizontal))" for every single Stata graph I made (since some of the default settings in Stata are not pretty). I always knew that you could set a default scheme in stata to manage colors, but I didn't realize that it could do more or be customized. See here to learn more.

After a bit of playing around, I wrote my own scheme. The text file looks pitiful, I know. But it saves me lots of headache and makes each plot look nicer with zero marginal effort. You can make your own, or if you download mine, put the file scheme-sol.scheme in ado/personal/ (where you put other ado files) and then type

set scheme sol

at the command line. Or set this as the default on your machine permanently with

set scheme sol, perm

It will make your plots that look kind of like this:

1.21.2013

Plot polynomial of any degree in Stata (with controls)

FE has been a little sluggish to recover from break. To kick start us back in gear, I'm making good on one resolution by making this FE Week-of-Code. I'll try to post something useful that I've written from the past year each day.

It always bugged me that I could easily plot a linear or quadratic fit in Stata, but if I used a third-order polynomial I could no longer plot the results easily. Stata has built in functions like lowess, fpfitci and lpolyci that will plot very flexible functions, but those tend to be too flexible for many purposes. Sometimes I just want a function that is flexibly non-linear, but still smooth (so not lowess) and something I can easily write down analytically (so not fpfit or lpoly) and perhaps not symmetric (so not qfit). We use high-degree polynomial's all the time, but we just don't plot them very often (I think this is because there is no built-in command do it for us).

Here's my function plot_margins.ado that does it. It takes a polynomial that you use in a regression, and plots the response function. Added bonuses: It plots the confidence interval you specify and can handle control variables (which are not plotted)

Reed Walker actually came up with this idea in an email exchange we had last year. So he deserves credit for this. I just wrote it up into an .ado file that you can call easily.

Basically, the idea is that you run any regression using Stata's factor variable notation, where you tell Stata that a variable X is continous and should be interacted with itself, eg

reg y c.x##c.x

is the notation to regress Y on X and X^2. (Check out Stata's documentation on factor variables if this isn't familiar to you.)

Reed's idea was to then use Stata 11's new margins command to evaluate the response of Y to X and X^2 at several points along the support of X, and then to use parmest to plot the result. (To download parmest.do, type "net install st0043.pkg" at the command line in Stata. plot_margins will call parmest, so you need to have it installed to run this function.)

The idea works really well, so long as you have Stata 11 or later (margins was introduced in Stata 11).

Here's an example. First generate some false data. Y is the outcome of interest. X is the independent variable of interest. W is a relevant covariate. Z is an irrelevant covariate.

clear

set obs 1000

gen year = ceil(_n/100)

gen x=5*uniform()-2

gen z=2*uniform()

gen w=2*uniform()

gen e = 40*rnormal()

gen y= w + 3*z + 4*x - x^2 + x^3 + e

Then run a regression of Y on a polynomial of X (here, it's third degree) along with controls. The standard errors can be computed any fancy way you like. Here, I've done a block-bootstrap by the variable year.

reg y w z c.x##c.x##c.x , vce(bootstrap , reps(500) cluster(year))

Then, right after running the regression, (or when the estimates from the regression are the active in memory) call plot_margins to plot the marginal contribution of X at different values.

plot_margins x

Easy enough? I've added a few features for programming ease. Use the plot_command() option to add labels, etc to the graph

plot_margins x, plotcommand("xtit(X) ytit(Y) tit(Third order polynomial) subtit(plotted with PLOT_MARGINS) note(SE are block-bootstrapped by year and model controls for X and Z)")

The result:

or specify the option "line" to have the CI plotted as lines instead of a shaded area:

There is also an option to save the function generated by parmest. Help file is below the fold. Have fun and don't forget to email Reed and tell him "thank you!"

Update: Reed points us to marginsplot in Stata 12, which basically does the same thing. Funny that the function names are all so unique...

Weekend Links

1) Coursera's next course on data analysis using R starts January 22nd (via Yaniv Stopnitzky)

2) Dropboxifier lets you back up / share folders via Dropbox without relocating them (via The Lazy Economist)

3) Banerjee and Duflo's "The Challenges of Global Poverty" starts Feb. 13th on edX (via Hartmut Fisher)

4) Google Ph.D. Fellowship nominations and applications are due Feb. 1st

5) The first atmospheric carbon dioxide capture plant is planned to open by end of 2014 (via Marion Dumas)

6) Easy ways to alienate yourself from a profession, Forbes lifestyle piece edition (via Ivana Ng)

7) 125 years of National Geographic photos (via Dave Pell)

8) David Enfield's ENSO FAQ is an oldie but a goodie

9) Daniel Aldrich's two pieces on the "soft approach" to countering violent extremism are now out (see under "Countering Violent Extremism", and previously on FE)

10) "Knowledge workers are bad at working." Cal Newport on the value of "deep work"

11) A mysterious patch of light shows up in the North Dakota dark (via Padraic Hughes)

1.16.2013

REDD Metrics

Marshall points us to reddmetrics.com, a neat blog/company that FE readers might appreciate:

Founded in early 2011, REDD Metrics, LLC, applies innovations in large-scale spatial data processing to questions in environmental and resource economics.

The blog is sparse but has several cool posts. They also seem to working on some interesting long term projects:

Large-scale implementation of change detection algorithms
Employing the power of cloud computing to detect changes in earth systems using satellite imagery over wide geographic areas.

Economic analysis of natural resource use and management
Identifying economic determinants of resource consumption. For example, how do interest rates or agricultural prices affect the rate of forest clearing in Indonesia?

Predictive modeling of spatial processes
Using massive spatial datasets and cutting-edge spatial modeling techniques to predict the flow of environmental phenomena over space and time.

Development of mobile apps to disseminate and collect environmental data
Enhancing on-the-ground natural resource management by syncing local field data with indicators from large-scale spatial datasets.

1.10.2013

Hottest Day on Record in Australia

From yesterday's Guardian:

"Australia had its hottest day on record on Monday with a nationwide average of 40.33C (104.59 F), narrowly breaking a 1972 record of 40.17C (104.31 F). Tuesday was the third hottest day at 40.11C (104.2F). Four of Australia's hottest 10 days on record have been in 2013."

Tammy Holmes (second from left) and her grandchildren, (from left) Charlotte, Esther, Liam, Matilda and Caleb, take refuge under a jetty. Photograph: Tim Holmes/AP

"The road closed behind me," [Bonnie Walker] told ABC News. "We just waited by the phone. We received a message at 3.30pm to say that mum and dad had evacuated, that they were surrounded by fire, and could we pray. So I braced myself to lose my children and my parents." She described the photo of her family holding on beneath the jetty as upsetting. "It's all of my, our, five children underneath the jetty huddled up to neck-deep seawater, which is cold. We swam the day before and it was cold. So I knew that that would be a challenge, to keep three non-swimmers above water."

Not unrelated, NOAA just announced that 2012 was the hottest year on record for the US.

(via Alkarim Jina)

Fight Entropy