It’s really amazing that Armstrong and Aldrin actually landed on the Moon. Not that they survived the trip in the huge rocket, nor the rigors of space travel, the radiation, the vacuum, the meteors.
It was the software.
Don Eyles, one of the programmers of the code that ran in the Lunar Module computer, has a remarkable story of the bugs in the code and why the mission managed to succeed in spite of them. It’s just about as heroic a tale as the much better-known physical part. The really crazy part of it is that the internal communications and cross-documentation inside NASA seem to have been designed by Dilbert:
We heard “Engine on”… several seconds passed… “Engine off”.
Soon we understood what had happened. A small piece of code in SERVICER called the “delta-V monitor” had concluded that the engine had failed and sent an engine-off command. But why? To give the engine time to come up to thrust, the delta-V monitor always waited some period of time after engine-on before it began to monitor the engine. But this time, at the end of the grace period the engine was still not producing enough thrust to satisfy the monitor’s thrust criterion.
Published accounts have attributed the slow DPS thrust buildup to the fact that the LM’s tanks were only partially pressurized. The author’s investigations show that the problem was elsewhere. … To prevent the possible, premature entry of hypergolic propellant into the engine (which could have had explosive consequences) the decision was made, shortly before flight, to delay arming the engine until the time of ignition.
The engine was slow to start not because the tanks were less pressurized, but because the propellant had further to travel to reach the engine. It would have been easy for us to adjust the parameter that controlled how long the delta-V monitor waited before testing the engine — but nobody told us.
The physical computer was quite an amazing gadget:
Informally, the programs were called “ropes” because of the durable form of read-only memory into which they were transformed for flight, which resembled a rope of woven copper wire. For the lunar missions, 36K words of “fixed” (read-only) memory, each word consisting of 15 bits plus a parity bit, were available for the program. In addition there were 2K words of artfully timeshared “erasable” or RAM memory. …
The AGC was packaged in a sturdy, sealed, aluminum-magnesium box, anodized in a gold color, that measured about six inches, by one foot, by two feet, weighed 70 pounds and consumed about 55 watts. Its logic was made up of 5600 3-input NOR gates packaged two-each in flat-pack integrated circuits. Eldon Hall, the machine’s principal designer, has related the bold decision to use integrated circuit technology for this computer despite its immaturity in the early 1960′s.
Two major bugs struck as the LM was approaching the moon:
Then we heard the words “program alarm”. In Cambridge we looked at each other. Onboard, Aldrin saw the PROG light go on and the display switch back to Verb 06 Noun 63. He quickly keyed in Verb 90 Noun 50. Alarm code 1202 appeared on the DSKY. This was an alarm issued when the computer was overloaded — when it had more work to do than it had time for. In Cambridge the word went around, “Executive alarm, no core sets”. Then Armstrong said, with an edge, “Give us a reading on the 1202 program alarm”.
… At MIT, where we realized that something mysterious was draining time from the computer, we were barely breathing.
The heroic part of the system design (among other things) was the way it was built to be able to withstand, with “graceful degradation”, a bug that caused a faulty radar attitude control routine to eat up 15% of the processor time when there was only a 10% margin in the design.
The other bug was as strange, and Dilbert-like:
It was only after Apollo 12 that we began to understand the other serious problem.
It started when Clint Tillman of Grumman Aerospace (the builder of the Lunar Module) noticed throttle oscillations during simulations of the final descent, on the order of 5% of the DPS thrust. This prompted Tillman to examine telemetry data from Apollo 11 and 12, where he noticed throttle oscillations during the final landing phases that were on the order of 25% peak to peak. (See Figure 12.) This was the period when the Commander was simultaneously using the ROD switch to control altitude-rate and the joystick to maneuver the vehicle. Because plots of this data resembled the battlements and turrets of a castle (or a castellated nut) this problem got to be known as “throttle castellation”.
The accelerometers in the IMU did not really measure acceleration; they merely counted velocity increments since the last reading. Because a throttle change commanded on the previous guidance pass occurred at some time between the accelerometer readings, the measured delta-V did not show the full effect of the most recent adjustment.
Throttle control had to compensate for this effect. The amount of compensation depended on when during the guidance period throttle commands were issued, and it also depended upon the rapidity with which the engine followed the throttle command. The applicable ICD stated that the throttle time lag was 0.3 seconds.
It fell to the author to program and test the throttle-control routine. … When I compensated for 0.1 second I saw that the oscillation was reduced. When I compensated for 0.2 seconds the oscillation appeared to be virtually eliminated. There the matter rested. Klumpp remembers me saying, “It’s just like medicine, don’t give it more compensation than it needs”.
Examining my own motives, I believe that the annoyance I felt toward the compensation terms for cluttering up my throttle logic may have translated into a desire to compensate no more than necessary. Be that as it may, both Apollo 11 and Apollo 12 flew with 0.2 seconds of compensation for a 0.3 second throttle delay.
But … the performance of the descent engine had been improved, but the ICD was not modified accordingly. The actual time lag for the descent engine was about 0.075 seconds. It turned out we had overcompensated. As a result the throttle was barely stable.
[I]f the software had compensated at 0.3 seconds on Apollo 11, the throttle would have been unstable. The throttle oscillations, instead of settling down, would have become greater. Following throttle-down in P63, or perhaps in P66 under the excitation of IMU bob, the DPS engine would have rapidly oscillated between minimum and maximum thrust. …
By the way, the code for the guidance computers is now freely available, along with open-source emulators for the computers. When I clicked on the link to the code, I got this:
The server encountered a temporary error and could not complete your request.
Please try again in 30 seconds.
What does this have to do with AI? It’s the formalist float I talked about in this post and discussed in this Wired interview: attempts to break real-world situations down into formalized procedures usually messes up. The Dilbertization of NASA, more painfully obvious in later years, was hardly limited to NASA itself. (Note, btw, a particularly successful NASA program where it was at a minimum.)
The reason AI is so hard to do is that any program designed with standard principles is essentially pre-Dilbertized. Inside a robot’s head is nothing but a vast labyrinth of pointy-headed bosses. But the field of AI is in fact slowly learning the techniques to produce robust, adaptive programs — beginning with the graceful degradation in the LGC, and really flowering with today’s robotics, and ultimately to be realized in full-fledged, unbounded-learning autogenous systems.
Once we begin to understand the techniques of making formal systems robust, space (and the rest of society) will be affected at both the lowest and highest levels. We’ll have reliable, highly capable robots to do a lot of the laborious, dangerous, dirty jobs. But much more importantly, we’ll replace our management systems, all the way up to the government, with ones that actually work.