In a technical paper quietly launched earlier this yr, IBM detailed what it calls the IBM Neural Pc, a reconfigurable parallel processing system designed to analysis and develop rising AI algorithms and computational neuroscience. This week, the corporate printed a preprint describing the primary software demonstrated on the Neural Pc: a deep “neuroevolution” system that mixes the implementation of an Atari 2600, picture preprocessing, and AI algorithms in an optimized pipeline. The coauthors report outcomes aggressive with state-of-the-art strategies, however maybe extra considerably, they declare that the system achieves a file coaching time of 1.2 million picture frames per second.
The Neural Pc represents one thing of a shot throughout the bow within the AI computational arms race. Based on an evaluation just lately launched by OpenAI, from 2012 to 2018, the quantity of compute used within the largest AI coaching runs grew greater than 300,000 instances with a three.5-month doubling time, far exceeding the tempo of Moore’s regulation. On tempo with this, supercomputers like Intel’s forthcoming Aurora on the Division of Vitality’s Argonne Nationwide Laboratory and AMD’s Frontier at Oak Ridge Nationwide Laboratory promise in extra of an exaflop (a quintillion floating-point computations per second) of computing efficiency.
Video video games are a well-established platform for AI and machine studying analysis. They’ve gained forex not solely due to their availability and the low value of working them at scale, however as a result of in sure domains like reinforcement studying, the place AI learns optimum behaviors by interacting with the atmosphere in pursuit of rewards, recreation scores function direct rewards. AI algorithms developed inside video games have proven to be adaptable to extra sensible makes use of, like protein folding prediction. And if the outcomes from IBM’s Neural Pc show to be repeatable, the system may very well be used to speed up these AI algorithms’ improvement.
The Neural Pc
IBM’s Neural Pc consists of 432 nodes (27 nodes throughout 16 modular playing cards) primarily based on field-programmable gate arrays (FPGAs) from Xilinx, a longtime strategic collaborator of IBM’s. (FPGAs are built-in circuits designed to be configured after manufacturing.) Every node includes a Xilinx Zynq system-on-chip — a dual-core ARM A9 processor paired with an FPGA on the identical die — together with 1GB of devoted RAM. The nodes are organized in a 3D mesh topology, interconnected vertically with electrical connections known as through-silicon vias that go utterly by way of silicon wafers or dies.
On the networking facet, the FPGAs present entry to the bodily communication hyperlinks amongst playing cards so as to set up a number of distinct channels of communication. A single card can theoretically assist switch speeds as much as 432GB per second, however the Neural Pc’s community interfaces will be adjusted and progressively optimized to greatest swimsuit a given software.
“The supply of FPGA sources on each node permits application-specific processor offload, a characteristic that isn’t out there on any parallel machine of this scale that we’re conscious of,” wrote the coauthors of a paper detailing the Neural Pc’s structure. “[M]ost of the performance-critical steps [are] offloaded and optimized on the FPGA, with the ARM [processor] … offering auxiliary assist.”
Taking part in Atari video games with AI
The researchers used 26 out of 27 nodes per card throughout the Neural Pc, finishing up experiments on a complete of 416 nodes. Two cases of their Atari game-playing software ran on every of the 416 FPGAs, scaling as much as 832 cases working in parallel. Every occasion extracted frames from a given Atari 2600 recreation, carried out picture preprocessing, ran the photographs by way of machine studying fashions, and carried out an motion throughout the recreation.
To acquire the very best efficiency, the crew shied away from emulating the Atari 2600, as a substitute opting to make use of the FPGAs to implement the console’s performance at greater frequencies. They tapped a framework from the open supply MiSTer challenge, which goals to recreate consoles and arcade machines utilizing fashionable , and bumped the Atari 2600’s processor clock to 150 MHz up from three.58 MHz. This produced roughly 2,514 frames per second in contrast with the unique 60 frames per second.
Within the picture preprocessing step, IBM’s software transformed the frames from colour to grayscale, eradicated flickering, rescaled photographs to a smaller decision, and stacked the frames into teams of 4. It then handed these onto an AI mannequin that reasoned in regards to the recreation atmosphere and a submodule that chosen the motion for the following frames by figuring out the utmost reward as predicted by the AI mannequin.
One more algorithm — a genetic algorithm — ran on an exterior pc linked to the Neural Pc by way of a PCIe connection. It evaluated the efficiency of every occasion and recognized the top-performing of the bunch, which it chosen as “dad and mom” of the following era of cases.
Over the course of 5 experiments, IBM researchers ran 59 Atari 2600 video games on the Neural Pc. The outcomes indicate that the method wasn’t data-efficient in contrast with different reinforcement studying strategies — it required 6 billion recreation frames in whole and failed at difficult exploration video games like Montezuma’s Revenge and Pitfall. However it managed to outperform a well-liked baseline — a Deep Q-network, an structure pioneered by DeepMind — in 30 out of 59 video games after 6 minutes of coaching (200 million coaching frames) versus the Deep-Q community’s 10 days of coaching. With 6 billion coaching frames, it surpassed the Deep Q-network in 36 video games whereas taking 2 orders of magnitude much less coaching time (2 hours and 30 minutes).