
Computing in space presents unique challenges from both environmental hazards and resource constraints. I’m hoping that many of you were able to attend the 2024 NonStop TBC that took place in Monterey, California, and were able to hear the keynote from Dr. Mark Fernandez, who heads up HPE’s ‘computing in space’ program. HPE, specifically Mark, is working with NASA to identify space computing issues. HPE has put a few systems in place at the International Space Station and has achieved some excellent results.
Let’s consider some environmental issues that must be dealt with. Space computers are exposed to high levels of cosmic radiation, which can cause significant errors in data processing. High-energy particles can flip memory bits, leading to data corruption or system failures. This necessitates the development of radiation-hardened (rad-hard) systems that can withstand these conditions. Also, space’s gravity (or lack thereof) environment complicates standard cooling methods used on Earth, such as heat sinks and fans. Spacecraft must endure extreme temperature fluctuations, which can affect the performance and reliability of onboard computing systems. Computers must also be designed to withstand severe shock and vibration during launch. These physical stresses can damage sensitive electronic components.
The resource issues are just as daunting. Space missions require strict limitations on power consumption and cooling. The absence of conventional cooling systems means that designs must be highly efficient regarding energy use, which can limit computational capacity. Spacecraft have stringent size and weight constraints, making it challenging to incorporate advanced computing without exceeding those limits. This affects the complexity of systems that can be deployed in space. The bandwidth available for communication between spacecraft and Earth is often limited, complicating data transfer and real-time decision-making. As a result, onboard computers must handle more data processing autonomously rather than relying on ground control.
In Mark’s talk, he demonstrated the requirements and success of edge computing. By having a system in the space station, many of the tests that had to be transferred to ground control could be performed at the space station. For example, whenever a spacewalk is performed, the space suit must be examined closely for wear. Any slight defect could prove fatal to the astronaut. Prior to the Proliant DL360 being on the space station, detailed images (photos) of the space suit needed to be transmitted to the ground station and evaluated, which took 24-48 hours. With the DL360 processing locally, this was performed in minutes—quite a saving.
At NonStop TBC 2024, many asked if we were booting NonStop on the space station. At present, we cannot due to a lack of fabric and a second node. There is a single DL360 and a single Edgeline 8000. To boot NonStop, we need a second system and a connecting fabric. Keith Moore and I were working with development to see if we could find a way around this, but NonStop is very strict in that regard (as we’d expect). We hope to boot a virtual NonStop on some future missions when we might have a dual processor Edgeline with an internal fabric. We’ll see.
It seems like a perfect operating system for space. Repair time for a down system could be weeks to months or years. Standard fault tolerance for most platforms requires a backup system. NonStops’ unique design really is an N+1 arrangement where the backup components are all built into the existing system, so a NonStop would have many fewer components and take up much less of the precious space required for space travel. It would also require less power and cooling than a traditional clustered solution. Hopefully, we will see NonStop in space before too long.
Be the first to comment