The crux of the paper rests on a pretty simple idea: if you're running a huge number of one-off statistical tests (i.e., not testing the same hypothesis over and over) a fraction of your results proportional to the power of your test will be false positives (i.e., type I error). This is pretty straightforward a concept for anyone doing applied work: if you're checking to make sure you've got balance across treated and controlled populations in a randomized trial, for example, having an occasional statistically significant difference between the two populations isn't a huge deal as long as the percentage of variables that turn up that way is proportional to the significance level you're setting. Yes, you should follow through as a good little applied researcher and make sure something's not hiding there, but some portion of your results will always end up that way due to random variation.
The nice step that Ioannidis takes is to look at the entire field of medical research and apply the same logic, effectively viewing the suite of randomized trials as a game where we keep picking new potential tests for the same problems over and over again, some subset of which are guaranteed to be incorrectly not-rejected. To quote Alex Tabarrok's pithy wording of it in the Marginal Revolution post:
Want to avoid colon cancer? Let's see if an apple a day keeps the doctor away. No? What about a serving of bananas? Let's try vitamin C and don't forget red wine.Moreover, since the number of things that actually, say, help avoid colon cancer is likely small, and the number of tests being run to find things which do is large, Ioannidis concludes that a large portion ("most") results are in fast false positives and thus meaningless. It's a pretty simple premise which leads to a pretty deep statement about how we think about learning about the world.
So the solutions to this are, of course, pretty intuitive: don't trust small sample size studies; insist on retesting hypotheses; be skeptical of results in any field where a large number of researchers are pursuing solutions to the same problem. In short, demand robustness checks on everything, and make sure that what's being shown is not just an artifact of your specific data set. Good lessons that all applied researchers should have tattooed across their proverbial chests already, but nonetheless a nice thing to be reminded of.