Mitigation of Scalability and Performance of Machine Learning with In-Network Computation

CSIRO - Information Security and Privacy

Next generation ML/AI systems will face legacy and emerging constraints such as, but not limited to, privacy, ethical concerns, performance, storage, scalability and security (vulnerabilities and adversarial capabilities) in order to build desirable properties for ML systems. In particular, the performance of ML systems can be affected by numerous factors, however, one of these factors governs the overall feasibility of running AI algorithms in continually expanding domains, namely the rapidly increasing size of the data-sets under scrutiny. Related issues to this explosion of data are numerous and include data storage constraints, network bottlenecks for real-time data streaming, requirements for terabit/s data transfer rates for distributed ML and input/output limitations in current x86 computer architecture.

Recently, the emergence of flexible networking hardware and expressive, high-level domain-specific programming languages have enabled deeply programmable networks.  With this new generation of networks, algorithms can be developed and implemented directly in the switches without the need for costly and time-consuming hardware development. As a result, network elements can forward data streams at line rate (up to terabits/s) and simultaneously perform in-line computation on these streams.

In this project, we aim at leveraging advancements in smartNIC and programmable switches, such as Netronome NICs and Barefoot P4 switches, to mitigate scalability issues of ML algorithms when faced with never seen before data production rates. In particular, we aim at leveraging the unique position of Australia in terms of space and climate observations, which will produce up to 200 TB per day, to develop and deploy innovative ML algorithms in an in-network computing framework.

Dr. Guillaume Jourjon

Dr. Craig Russell