|
||||||
Do the mathThere is at Technology Review’s arXiv blog an article “How to find bugs in giant software programs.” It’s an overview of a paper on arXiv which is a statistical study of program sizes and bug distributions in the Eclipse dataset of Java programs. TR says,
Now this is something I was very happy to believe, since it accords with my own instincts on the matter. Big programs are surely more complex internally and thus easier to make mistakes in, after all. So I sat down to write a post about how things should be kept small, big systems are dangerous and error-prone, and so forth, with dire implications for the design of the enormously complex nanosystems of the coming century. But for some reason I decided to go to the paper and see just how much more error-prone those big modules are. Here’s what the paper says:
And sure enough, that’s what the graph shows: sort the modules from largest to smallest, and the number of errors per program follows the Weibull distribution quite nicely. If each program had the same number of bugs, the plot would go straight from (0,0) to (1,1). But that would be very surprising, since there are more lines of code in large programs and so even at a flat error density, there’d be more bugs. The curve pretty much has to rise above a straight line; the question is how much? Marchesi et al confirm that program size in the systems they study follows a lognormal distribution, so I used the excellent open-source R statistics language to model such a system assuming a uniform probability of errors per line of code:
Red dots are the Weibull distribution, corresponding to the purple curve above, and black dots are the model. As far as I can see in the paper, Marchesi et al do not say anything about error density — they just point out bigger programs have more bugs, and don’t try to quantify the relationship. The closest thing they say is
That would be true if one big program were easier to check than a bunch of small ones with the same total lines of code, e.g. if there were a per-program overhead. But the numbers show errors per line of code as essentially flat. Which means that bugs go up linearly with total system complexity, regardless of inner structure. Which is good news for complex nanosystems. 5 comments to Do the math |
||||||
|
Copyright © 2009 the Foresight Institute - All Rights Reserved |
||||||
Xrtlxn hi! mi site is http://norffg.com
see you!
I’ve programmed for 40 years. This study does not surprise me. Modern software practice allows complex systems to be assembled from many independent small parts that are individually tested. This study was based on systems build within the Java framework, which has decent ways to manage complexity.
In the old days, the parts tended to have too many dependencies between each other, especially dependencies on memory state that was scattered numerous places and addressed directly. Older software systems probably did have an increase in bugs per line as complexity increased. I’d bet on it.
Don Gilmore
Complex nanosystems will involve complex software. For the nanosystems to work properly, the software must work. This will be critical for medical nanosystems, for example.
What does this have to do with nanotechnology?
Interesting! And to me, counterintuitive. But, there it is. One wonders how this is achieved, as programs grow more complex. Looking forward to more info on this over time.