Showing posts with label data visualization. Show all posts
Showing posts with label data visualization. Show all posts

12.17.2013

Landsat at your fingertips

The USGS has put together a slick GUI that let's you browse (as if it were GoogleEarth) and download Landsat data. The interface is described here.

One of my students found this and showed it to me. (Over the next few weeks, hopefully I'll be able to post much of the material and discoveries from my new course "Spatial Data and Analysis".)



6.05.2013

Souped-up Watercolor Regression

I introduced "watercolor regression" here on FE several months ago, after some helpful discussions with Andrew Gelman and our readers. Over the last few months, I've made a few upgrades that I think significantly increase the utility of this approach for people doing work similar to my own.

First, the original paper is now on SSRN and documents the watercolor approach, explaining its relationship to the more general idea of visual-weighting.
Visually-Weighted Regression 
Abstract: Uncertainty in regression can be efficiently and effectively communicated using the visual properties of statistical objects in a regression display. Altering the “visual weight” of lines and shapes to depict the quality of information represented clearly communicates statistical confidence even when readers are unfamiliar with the formal and abstract definitions of statistical uncertainty. Here we present examples where the color-saturation and contrast of regression lines and confidence intervals are parametrized by local measures of an estimate’s variance. The results are simple, visually intuitive and graphically compact displays of statistical uncertainty. This approach is generalizable to almost all forms of regression.
Second, the Matlab code I've posted to do watercolor regression is now parallelized. If you have Matlab running on multiple processors, the code automatically detects this and runs the bootstrap procedure in parallel.  This is helpful because a large number of resamples (>500) is important for getting the distribution of estimates (the watercolored part of the plot) to converge but serial resampling gets very slow for large data sets (eg. >1M obs), especially when block-boostrapping (see below).

Third, the code now has an option to run a block bootstrap. This is important if you have data with serial or spatial autocorrelation (eg. models of crop yields that change in response to weather).  To see this at work, suppose we have some data where there is a weak dependance of Y on X, but all observations within a block (eg. maybe obs within a single year) have a uniform level-shift induced by some unobservable process.
e = randn(1000,1);
block = repmat([1:10]',100,1);
x = 2*randn(1000,1);
y = x+10*block+e;
The scatter of this data looks like:


where each one of stripes of data is block of obs with correlated residuals. Running watercolor_reg without block-bootrapping
 watercolor_reg(x,y,100,1.25,500)
we get an exaggerated sense of precision in the relationship between Y and X:


If we try to account for the fact that residuals within a block are not independent by using the block bootstrap
watercolor_reg(x,y,100,1.25,500,block)
we get a very different result:



Finally, the last addition to the code is a simple option to clip the watercoloring at the edge of a specified confidence interval (default is 95%), an idea suggested by Ted Miguel. This allows us to have a watercolor plot which also allows us to conduct some traditional hypothesis tests visually, without violating the principles of visual weighting. Applying this option to the example above
blue = [0 0 .3]
watercolor_reg(x,y,100,1.25,500,block, blue,'CLIPCI')
we obtain a plot with a clear 95% CI, where the likelihoods within the CI are indicated by watercoloring:


Code is here. Enjoy!

5.27.2013

Hurricane-induced migration [Plot of the Week]

Impulse:

Hurricane Katrina, as pictured in the Gulf of Mexico at 21:45 UTC on August 28, 2005.

Response:

This map illustrates the national scope of the dispersion of refugees from Hurricane Katrina. It shows the location by zip code of the 800,000 displaced Louisiana residents who requested federal emergency assistance. The evacuees ended up dispersed across the entire nation, illustrating the wide-ranging impacts that can flow from extreme weather events, some of which are projected to increase in frequency and/or intensity as climate continues to change. (Source: Louisiana Geographic Information Center 2005)

4.18.2013

1-800-CLOUD-GONE

Ever been sitting by a window in the space station and feel annoyed that clouds are obstructing your view? Charlie Lyod and Chris Herwig of Mapbox (covered before on FE) have a simple but clever solution: sort your data by pixel.

Their explanation is clearer than mine (pun intended). I just wanted to post the pretty pictures.

Before:


After:


I think this idea has several applications beyond clearing the skies.

h/t Young

2.18.2013

Plotting restricted cubic splines in Stata [with controls]

Michael Roberts has been trying to convince me to us restricted cubic splines to plot highly nonlinear functions, in part because they are extremely flexible and they have nice properties near their edges.  Unlike polynomials, information at one end of the support only weakly influences fitted values at the other end of the support. Unlike the binned non-parametric methods I posted a few weeks ago, RC-splines are differentiable (smooth). Unlike other smooth non-parametric methods, RC-splines are fast to compute and easily account for control variables (like fixed effects) because they are summarized by just a few variables in an OLS regression. They can also be used with spatially robust standard errors or clustering, so they are great for nonlinear modeling of spatially correlated processes. 

In short: they have lots of advantages. The only disadvantage is that it takes a bit of effort to plot them since there's no standard Stata command to do it.

Here's my function plot_rcspline.ado, which generates the spline variables for the independent variable, fits a spline while accounting for control variables, and plots the partial effect of the specified independent variables (adjusting for the control vars) with confidence intervals (computed via delta method). It's as easy as

plot_rcspline y x

and you get something like


where the "knots" are plotted as the vertical lines (optional).

Help file below the fold.  Enjoy!

Related non-linear plotting functions previously posted on FE:
  1. Boxplot regression
  2. Visually-weighted regression
  3. Watercolor regression
  4. Non-parametric three-dimensional regression
  5. Binned non-parametric regression
  6. Polynomial regression of any degree

1.24.2013

Quickly plotting nonparametric response functions with binned independent variables [in Stata]

Yesterday's post described how we can bin the independent variable in a regression to get a nice non-parametric response function even when we have large data sets, complex standard errors, and many control variables.  Today's post is a function to plot these kinds of results.

After calling bin_parameter.ado to discretize an independent variable (see yesterday's post), run a regression of the outcome variable on the the sequence of generated dummy variables (this command can be as complicated as you like, so feel free to throw your worst semi-hemi-spatially-correlated-auto-regressive-multi-dimensional-cluster-block-bootstrap standard errors at it). Then run plot_response.ado (today's function) to plot the results of that regression (with your fancy standard errors included). It's that easy.

Here's an example. Generate some data where Y is a quadratic function of X and a linear function of Z:
set obs 1000
gen x = 10*runiform()-5
gen z = rnormal()
gen e = rnormal()
gen y = x^2+z+5*e
Then bin the parameter using yesterday's function and run a regression of your choosing, using the the dummy variables output by bin_parameter:
bin_parameter x, s(1) t(4) b(-4) drop(-1.6) pref(_dummy)
reg y _dummy* z
After the regression, call plot_response.ado to plot the results of that regression (only the component related to the binned variables). The arguments describing the bins are the same format as those used by bin_parameter to make this easier:
plot_response, s(1) t(4) b(-4) drop(-1.6) pref(_dummy)
The result is a plot that clearly shows us the correct functional form:


Note: plot_response.ado requires parmest.ado (download from SSC  by typing "net install st0043.pkg" at the command line). It also calls a function parm_bin_center.ado that is included in the download.

Citation note: If you use this suite of functions in publication, please cite: Hsiang, Lobell, Roberts & Schlenker (2012): "Climate and the location of crops."

Help file below the fold.

1.22.2013

Prettier graphs with less headache: use schemes in Stata

I'm picky about graphs looking nice. So for a long time I did silly things that drove me nuts, like typing "ylabel(, angle(horizontal))" for every single Stata graph I made (since some of the default settings in Stata are not pretty). I always knew that you could set a default scheme in stata to manage colors, but I didn't realize that it could do more or be customized. See here to learn more.

After a bit of playing around, I wrote my own scheme. The text file looks pitiful, I know. But it saves me lots of headache and makes each plot look nicer with zero marginal effort.  You can make your own, or if you download mine, put the file scheme-sol.scheme in ado/personal/ (where you put other ado files) and then type
set scheme sol
at the command line. Or set this as the default on your machine permanently with
set scheme sol, perm
It will make your plots that look kind of like this:


1.21.2013

Plot polynomial of any degree in Stata (with controls)

FE has been a little sluggish to recover from break. To kick start us back in gear, I'm making good on one resolution by making this FE Week-of-Code. I'll try to post something useful that I've written from the past year each day.

It always bugged me that I could easily plot a linear or quadratic fit in Stata, but if I used a third-order polynomial I could no longer plot the results easily.  Stata has built in functions like lowess, fpfitci and lpolyci that will plot very flexible functions, but those tend to be too flexible for many purposes. Sometimes I just want a function that is flexibly non-linear, but still smooth (so not lowess) and something I can easily write down analytically (so not fpfit or lpoly) and perhaps not symmetric (so not qfit).  We use high-degree polynomial's all the time, but we just don't plot them very often (I think this is because there is no built-in command do it for us).

Here's my function plot_margins.ado that does it. It takes a polynomial that you use in a regression, and plots the response function. Added bonuses: It plots the confidence interval you specify and can handle control variables (which are not plotted)

Reed Walker actually came up with this idea in an email exchange we had last year. So he deserves credit for this. I just wrote it up into an .ado file that you can call easily.

Basically, the idea is that you run any regression using Stata's factor variable notation, where you tell Stata that a variable X is continous and should be interacted with itself, eg
reg y c.x##c.x
is the notation to regress Y on X and X^2. (Check out Stata's documentation on factor variables if this isn't familiar to you.)

Reed's idea was to then use Stata 11's new margins command to evaluate the response of Y to X and X^2 at several points along the support of X, and then to use parmest to plot the result. (To download parmest.dotype "net install st0043.pkg" at the command line in Stata. plot_margins will call parmest, so you need to have it installed to run this function.)

The idea works really well, so long as you have Stata 11 or later (margins was introduced in Stata 11). 

Here's an example. First generate some false data. Y is the outcome of interest. X is the independent variable of interest. W is a relevant covariate. Z is an irrelevant covariate.
clear 
set obs 1000 
gen year = ceil(_n/100) 
gen x=5*uniform()-2 
gen z=2*uniform() 
gen w=2*uniform() 
gen e = 40*rnormal()  
gen y= w + 3*z + 4*x - x^2 + x^3 + e 
Then run a regression of Y on a polynomial of X (here, it's third degree) along with controls. The standard errors can be computed any fancy way you like. Here, I've done a block-bootstrap by the variable year.
reg y w z c.x##c.x##c.x , vce(bootstrap , reps(500) cluster(year))
Then, right after running the regression, (or when the estimates from the regression are the active in memory) call plot_margins to plot the marginal contribution of X at different values.
plot_margins x
Easy enough? I've added a few features for programming ease. Use the plot_command() option to add labels, etc to the graph
plot_margins x, plotcommand("xtit(X) ytit(Y) tit(Third order polynomial) subtit(plotted with PLOT_MARGINS) note(SE are block-bootstrapped by year and model controls for X and Z)") 
 The result:


 or specify the option "line" to have the CI plotted as lines instead of a shaded area:


There is also an option to save the function generated by parmest. Help file is below the fold. Have fun and don't forget to email Reed and tell him "thank you!"

Update: Reed points us to marginsplot in Stata 12, which basically does the same thing. Funny that the function names are all so unique...

12.20.2012

Unleash your inner cartographer

In my work, I make a lot of maps. But they're usually just a single image and they're mediocre in terms of color-choice and design.  Mapbox is a product from a DC startup that can help us data-jockeys build svelte and scalable maps that integrate data from lots of sources. From their about page:
MapBox is a platform for designing and publishing fast and beautiful maps. We provide MapBox Streets, a complete customizable world base map, develop the powerful open source map design studio TileMill, make it easy to integrate maps into applications and websites, and support all of these tools on top of scalable, high-performance hosting. We've made MapBox developer friendly with an open API.
The development team has worked on all sorts of projects, from tracking elections to helping document hurricane damage. Their blog is also way cool.

h/t Young

11.04.2012

Median Voter Theorem: proof by data visualization

The "median voter theorem" is a game-theoretic solution to democratic politics. It basically says that in equilibrium for a two party system, both candidates will have platforms that reflect the values of the median voter (in this election, Ohio). The second prediction is that both parties get almost exactly 50% of the vote, with very small changes in voting behavior near the median voter determining who wins.

Co.Design points us to an excellent data visualization by Felix Gonda of 156 years of voting behavior in the US (for president, the Senate and the House of Representatives). Run the little scroll-bar for years and you'll see the power of the median voter [theorem].


Check it out.

10.19.2012

Various data visualizations for Stata by Nicholas Cox

I just ran into an amazing trove of plotting commands for Stata, all written by Nicholas J. Cox. The commands can all be downloaded with findit in the Stata command line and have integrated help files. See the list with examples here.

10.08.2012

Mashup: watercolor regression of reported rapes and daily temperature in US counties

I was working with Matthew Ranson's crime and temperature data recently for a review article when Andrew Gelman tossed in his two cents on replotting the main figures, so figured I'd see how one of the plots looked if we showed the results as a watercolor regression, since that was a recent innovation that arose from discussions on FE and Gelman's blog (watercolor regression is a type of visually-weighted regression).

I found the rape-vs-temperature plot particularly striking/perplexing/upsetting/interesting (yes, county, month-by-county and year-by-county effects have been removed from the data), so I converted the number of rape cases reported each month into percentages of the mean monthly number of reported rape cases. Temperature is the monthly mean (across days) of daily maximum temperature. Dark coloration depicts the probability that the conditional mean is at a specified value, and the estimated mean is the thin white line.

Click to enlarge

This is the largest sample I've run the watercolor regression code on (N > 1.4M), but it took less than ten minutes to plot with a few hundred resamples and 300 bins in the x-variable.  As my first attempt to use the code to show real data, I think I'm pretty satisfied with how clear the depiction of uncertainty is, without distracting from the main message (an issue described here).

[I've posted a Matlab function to compute and plot watercolor regressions here and Felix Schƶnbrodt posted an implementation in R here.]

8.31.2012

Watercolor regression

I'm in a rush, so I will explain this better later. But Andrew Gelman posted my idea for a type of visually-weighted regression that I jokingly called "watercolor regression" without a picture, so its a little tough to see what I was talking about. Here is the same email but with the pictures to go along with it. The code to do you own watercolor regression is here as the 'SMOOTH' option to vwregress. (Update: I've added a separate cleaner function watercolor_reg.m for folks who want to tinker with the code but don't want to wade through all the other options built into vwregress. Update 2: I've added watercolor regression as a second example in the revised paper here.)

This was the email with figures included:

Two small follow-ups based on the discussion (the second/bigger one is to address your comment about the 95% CI edges).

1. I realized that if we plot the confidence intervals as a solid color that fades (eg. using the "fixed ink" scheme from before) we can make sure the regression line also has heightened visual weight where confidence is high by plotting the line white. This makes the contrast (and thus visual weight) between the regression line and the CI highest when the CI is narrow and dark. As the CI fade near the edges, so does the contrast with the regression line. This is a small adjustment, but I like it because it is so simple and it makes the graph much nicer. 


My posted code has been updated to do this automatically.

2. You and your readers didn't like that the edges of the filled CI were so sharp and arbitrary. But I didn't like that the contrast between the spaghetti lines and the background had so much visual weight.  So to meet in the middle, I smoothed the spaghetti plot to get a nonparametric estimate of the probability that the conditional mean is at a given value:


To do this, after generating the spaghetti through bootstrapping, I estimate a kernel density of the spaghetti in the Y dimension for each value of X.  I set the visual-weighting scheme so it still "preserves ink" along a vertical line-integral, so the distribution dims where it widens since the ink is being "stretched out". To me, it kind of looks like a watercolor painting -- maybe we should call it a "watercolor regression" or something like that.

The watercolor regression turned out to be more of a coding challenge than I expected, because the bandwidth for the kernel smoothing has to adjust to the width of the CI. And since several people seem to like R better than Matlab, I attached 2 figs to show them how I did this. Once you have the bootstrapped spaghetti plot:


I defined a new coordinate system that spanned the range of bootstrapped estimates for each value in X 


The kernel smoothing is then executed along the vertical columns of this new coordinate system.

I've updated the code posted online to include this new option. This Matlab code will generate a similar plot using my vwregress function:

x = randn(100,1);
e = randn(100,1);
y = 2*x+x.^2+4*e;

bins = 200;
color = [.5 0 0];
resamples = 500;
bw = 0.8;

vwregress(x, y, bins, bw, resamples, color, 'SMOOTH');


NOTE TO R USERS: The day after my email to andrew, Felix Schƶnbrodt posted a nice similar variant with code in R here.

Update: For overlaid regressions, I still prefer the simpler visually-weighted line (last two figs here) since this is what overlaid watercolor regressions look like:


It might look better if the scheme made the blue overlay fade from blue-to-clear rather than blue-to-white, but then it would be mixing (in the color sense) with the red so the overlaid region would then start looking like very dark purple. If someone wants to code that up, I'd like to see it. But I'm predicting it won't look so nice.

8.20.2012

Visually-weighted confidence intervals

Following up on my earlier post describing visually weighted regression (paper here) and some suggestions from Andrew Gelman and others, I adjusted my Matlab function (vwregress.m, posted here) and just thought I'd visually document some of the options I've tried.

All of this information is available if you type help vwregress once the program is installed, but I think looking at the pictures helps.

The basic visually weighted regression is just a conditional mean where the visual weight of the line reflects uncertainty.  Personally, I like the simple non-parametric plot overlaid with the OLS regression since its clean and helps us see whether a linear approximation is a reasonable fit or not:
vwregress(x, y, 300, .5,'OLS',[0 0 1])
 


Confidence intervals (CI) can be added, and visually-weighted according to the same scheme as the conditional mean:
vwregress(x, y, 300, .5, 200,[0 0 1]);

Since the CI band is bootstrapped, Gelman suggested that we overlay the spaghetti plot of resampled estimates, I added the option 'SPAG' to do this. If the spaghetti are plotted using a solid color (option 'SOLID'), this ends up looking quasi-visually-weighted:
vwregress(x, y, 300, .5,'SPAG','SOLID',200,[0 0 1]);

But since it gets kind of nasty looking near the edges, where the estimates go little haywire since the observations get thin, we can visually weight the spaghetti too to keep it from getting distracting (just omit the 'SOLID' option).
vwregress(x, y, 300, .5,'SPAG',200,[0 0 1]);

Gelman also suggested that we try approximating the spaghetti by smoothly filling in the CI band, using the original visual-weighting scheme. To do this, I added the 'FILL' option. I like the result quite a bit (even more than the spaghetti, but others may disagree). [Warning: this plotting combination may be very slow, especially with a lot of resamples.]
vwregress(x, y, 300, .5,'FILL',200,[0 0 1]);

If 'SOLID' is combined with 'FILL', only the conditional mean is plotted with solid coloring. (This differs from 'SPAG' and the simple CI bounds).
vwregress(x, y, 300, .5,'FILL','SOLID',200,[0 0 1]);

Finally, I included the 'CI' option which changes the visual-weighting scheme from using weights 1/sqrt(N) to using 1/(CI_max - CI_min), where CI_max is the upper limit (point-wise) of the CI and CI_min is the lower limit. 

I like this because if we combine this with 'FILL', then the confidence band "conserves ink" (which we equate with confidence) in the y-dimension. Imagine that we squirt out ink uniformly to draw the conditional mean and then smear the ink vertically so that it stretches from the lower confidence bound to the upper confidence bound.  In places where the CI band is narrow, this will cause very little spreading of the ink so the CI band will be dark. But in places where the CI band is wide, the ink is smeared a lot so it gets lighter. For any vertical sliver of the CI band (think dx) the amount of ink displayed (integrated along a vertical line) will be constant.
vwregress(x, y, 300, .5,'CI','FILL',200,[0 0 1]);

For Stata users, I have written vwlowess.ado (here), but unfortunately it does not yet have any of these options.

Lucas Leeman has implemented some of these ideas in R (see here), so maybe he'll make that code available.

All of the above plots were made with the random data:
x = randn(200,1); 
e = randn(200,1).*(1+abs(x)).^1.5; 
y = 2*x+x.^2+4*e;

7.30.2012

Visually-Weighted Regression

[This is the overdue earth-shattering sequel to this earlier post.]

I recently posted this working paper online. It's very short, so you should probably just read it (I was actually originally going to write it as a blog post), but I'll run through the basic idea here.  

Since I'm proposing a method, I've written functions in Matlab (vwregress.m) and Stata (vwlowess.ado) to accompany the paper. You can download them here, but I expect that other folks can do a much better job implementing this idea.

Solomon M. Hsiang
Abstract: Uncertainty in regression can be efficiently and effectively communicated using the visual properties of regression lines.  Altering the "visual weight" of lines to depict the quality of information represented clearly communicates statistical confidence even when readers are unfamiliar or reckless with the formal and abstract definitions of statical uncertainty. Here, we present an example by decreasing the color-saturation of nonparametric regression lines when the variance of estimates increases. The result is a simple, visually intuitive and graphically compact display of statistical uncertainty. This approach is generalizable to almost all forms of regression.
Here's the issue. Statistical uncertainty seems to be important for two different reasons. (1) If you have to make a decision based on data, you want to have a strong understanding of the possible outcomes that might result from your decision, which itself rests on how we interpret the data.  This is the "standard" logic, I think, and it requires a precise, quantitative estimate of uncertainty.  (2) Because there is noise in data, and because sampling is uneven across independent variables, a lot of data analysis techniques generate artifacts that we should mostly just ignore.  We are often unnecessarily focused/intrigued by the wackier results that shows up in analyses, but thinking carefully about statistical uncertainty reminds us to not focus too much on these features. Except when it doesn't.

"Visually-weighted regression" is a method for presenting regression results that tries to address issue (2), taking a normal person's psychological response to graphical displays into account. I had grown a little tired of talks and referee reports where people speculate about the cause of some strange non-linearity at the edge of a regression sample, where there was no reason to believe the non-linear structure was real.  I think this and related behaviors emerge because (i) there seems to be an intellectual predisposition to thinking that "nonlinearity" is inherently more interesting that "linearity" and (ii) the traditional method for presenting uncertainty subconsciously focuses viewers attention on features of the data that are less reliable. I can't solve issue (i) with data visualization, but we can try to fix (ii). 

The goal of visually-weighted regression is to take advantage of viewer's psychological response to images in order to focus their attention on the results that are the most informative.  "Visual weight" is a concept from art and graphical design that is used to to direct a viewer's focus within an image.  Large, dark,  high-contrast, and complex structures tend to "grab" a viewer's attention.  Our brains are constantly looking for visual information and, somewhere along the evolutionary line, detailed/high-contrast structures in our field of view were probably more informative and more useful for survival, so we are programmed to give them more of our attention.  Unfortunately, the traditional approaches to displaying statistical uncertainty give more visual weight to the uncertain portions of the analysis, which is exactly backwards of what we want. Ideally, a viewer will focus more of their attention on the portions of analysis that have some statical confidence and they will mostly ignore the portions of analysis that are so uncertain that they contain little or no information.

[continued below the fold]

6.18.2012

Only the finest in Swiss data visualiation

The data visualization specialists over at Chart Porn point us to Datavisualization.ch's Selected Tools page. From their blog post announcing it:
When I meet with people and talk about our work, I get asked a lot what technology we use to create interactive and dynamic data visualizations. [...] That’s why we have put together a selection of tools that we use the most and that we enjoy working with. We called it selection.datavisualization.ch. It includes libraries for plotting data on maps, frameworks for creating charts, graphs and diagrams and tools to simplify the handling of data. Even if you’re not into programming, you’ll find applications that can be used without writing one single line of code. We will keep this list as a living repository and add / remove things as technology develops. We hope this will help you find the best tool for your next job.
Guess it's time to learn a little javascript...

5.18.2012

Plotting a two-dimensional line using color to depict a third dimension (in Matlab)

I had seen other people do this in publications, but was annoyed that I couldn't find a quick function to do it in a general case. So I'm sharing my function to do this.

If you have three vectors describing data, eg. lat, lon and windspeed of Hurricane Ivan (yes, this data is from my env. sci. labs), you could plot it in 3 dimensions, which is awkward for readers:


or using my nifty script, you can plot wind speed as a color in two dimensions:


Not earth-shattering, but useful. Next week, I will post an earth-shattering application.

[BTW, if someone sends me code to do the same thing in Stata, I will be very grateful.]

Also, you can make art:


Help file below the fold.

5.11.2012

How much groundwater does Africa have?

Quantitative maps of groundwater resources in Africa
A M MacDonald, H C Bonsor, B Ɖ Ɠ Dochartaigh and R G Taylor

Abstract: In Africa, groundwater is the major source of drinking water and its use for irrigation is forecast to increase substantially to combat growing food insecurity. Despite this, there is little quantitative information on groundwater resources in Africa, and groundwater storage is consequently omitted from assessments of freshwater availability. Here we present the first quantitative continent-wide maps of aquifer storage and potential borehole yields in Africa based on an extensive review of available maps, publications and data. We estimate total groundwater storage in Africa to be 0.66 million km3 (0.36–1.75 million km3). Not all of this groundwater storage is available for abstraction, but the estimated volume is more than 100 times estimates of annual renewable freshwater resources on Africa. Groundwater resources are unevenly distributed: the largest groundwater volumes are found in the large sedimentary aquifers in the North African countries Libya, Algeria, Egypt and Sudan. Nevertheless, for many African countries appropriately sited and constructed boreholes can support handpump abstraction (yields of 0.1–0.3 l s−1), and contain sufficient storage to sustain abstraction through inter-annual variations in recharge. The maps show further that the potential for higher yielding boreholes ( > 5 l s−1) is much more limited. Therefore, strategies for increasing irrigation or supplying water to rapidly urbanizing cities that are predicated on the widespread drilling of high yielding boreholes are likely to be unsuccessful. As groundwater is the largest and most widely distributed store of freshwater in Africa, the quantitative maps are intended to lead to more realistic assessments of water security and water stress, and to promote a more quantitative approach to mapping of groundwater resources at national and regional level.

Click to enlarge. Copyright ERL

See related field experiment on valuing ground water protection here.

h/t Kyle

4.04.2012

US Wind Fields Visualization

Following last week's ocean currents animation one of our readers sends us Fernanda ViĆ©gas and Martin Wattenberg's real time Wind Map:

The wind map is a personal art project, not associated with any company. We've done our best to make this as accurate as possible, but can't make any guarantees about the correctness of the data or our software. Please do not use the map or its data to fly a plane, sail a boat, or fight wildfires...
Surface wind data comes from the National Digital Forecast Database. These are near-term forecasts, revised once per hour. So what you're seeing is a living portrait. (See the NDFD site for precise details; our timestamp shows time of download.) And for those of you chasing top wind speed, note that maximum speed may occur over lakes or just offshore. 
We'd be interested in displaying data for other areas; if you know of a source of detailed live wind data for other regions, or the entire globe, please let us know

3.29.2012

Ocean currents visualization



About the Perpetual Ocean project (seems like a badly chosen name, how about "the wind-driven ocean"?)

This visualization shows ocean surface currents around the world during the period from June 2005 through Decmeber 2007. The visualization does not include a narration or annotations; the goal was to use ocean flow data to create a simple, visceral experience. 
This visualization was produced using NASA/JPL's computational model called Estimating the Circulation and Climate of the Ocean, Phase II or ECCO2.. ECCO2 is high resolution model of the global ocean and sea-ice. ECCO2 attempts to model the oceans and sea ice to increasingly accurate resolutions that begin to resolve ocean eddies and other narrow-current systems which transport heat and carbon in the oceans.The ECCO2 model simulates ocean flows at all depths, but only surface flows are used in this visualization. The dark patterns under the ocean represent the undersea bathymetry. Topographic land exaggeration is 20x and bathymetric exaggeration is 40x.
h/t Sherman