Location: Hilton Salon D
Date: Sunday November 15, 2015
Time: 2:00pm - 5:30pm
Baidu Research Silicon Valley Artificial Intelligence Laboratory
During the past few years, deep learning has made incredible progress towards solving many previously difficult Artificial Intelligence (AI) tasks. Although the techniques behind deep learning have been studied for decades, they rely on large datasets and large computational resources, and so have only recently become practical for many problems. Training deep neural networks is very computationally intensive: training one model takes tens of exaflops of work, and so HPC techniques are key to creating these models. As in other fields, progress in AI is iterative, building on previous ideas. This means that the turnaround time in training models is a key bottleneck to progress in AI—the quicker an idea can be realized as a trainable model, train it on a large dataset, and test it, the quicker that ways can be found of improving the models. In this talk, Catanzaro will discuss the key insights that make deep learning work for many problems, describe the training problem, and detail the use of standard HPC techniques that allow him to rapidly iterate on his models. He will explain how HPC ideas are becoming increasingly central to progress in AI and will also show several examples of how deep learning is helping solve difficult AI problems.
Janis Keuper and Franz-Josef Pfreundt
The implementation of a vast majority of machine learning (ML) algorithms boils down to solving a numerical optimization problem. In this context, Stochastic Gradient Descent (SGD) methods have long proven to provide good results, both in terms of convergence and accuracy. Recently, several parallelization approaches have been proposed in order to scale SGD to solve very large ML problems. At their core, most of these approaches are following a MapReduce scheme. This paper presents a novel parallel updating algorithm for SGD, which utilizes the asynchronous single-sided communication paradigm. Compared to existing methods, Asynchronous Parallel Stochastic Gradient Descent (ASGD) provides faster convergence, at linear scalability and stable accuracy.
Markus Götz, Christian Bodenstein and Morris Riedel
Clustering algorithms in the field of data-mining are used to aggregate similar objects into common groups. One of the best-known of these algorithms is called DBSCAN. Its distinct design enables the search for an apriori unknown number of arbitrarily shaped clusters, and at the same time allows to filter out noise. Due to its sequential formulation, the parallelization of DBSCAN renders a challenge. In this paper we present a new parallel approach which we call HPDBSCAN. It employs three major techniques in order to break the sequentiality, empower workload-balancing as well as speed up neighborhood searches in distributed parallel processing environments i) a computation split heuristic for domain decomposition, ii) a data index preprocessing step and iii) a rule-based cluster merging scheme. As a proof-of-concept we implemented HPDBSCAN as an OpenMP/MPI hybrid application. Using real-world data sets, such as a point cloud from the old town of Bremen, Germany, we demonstrate that our implementation is able to achieve a significant speed-up and scale-up in common HPC setups. Moreover, we compare our approach with previous attempts to parallelize DBSCAN showing an order of magnitude improvement in terms of computation time and memory consumption.
Brian Van Essen, Hyojin Kim, Roger Pearce, Kofi Boakye and Barry Chen
Recent successes of deep learning have been largely driven by the ability to train large models on vast amounts of data. We believe that High Performance Computing (HPC) will play an increasingly important role in helping deep learning achieve the next level of innovation fueled by neural network models that are orders of magnitude larger and trained on commensurately more training data. We are targeting the unique capabilities of both current and upcoming HPC sys- tems to train massive neural networks and are developing the Livermore Big Artificial Neural Network (LBANN) toolkit to exploit both model and data parallelism optimized for large scale HPC resources. This paper presents our prelimi- nary results in scaling the size of model that can be trained with the LBANN toolkit.
Steven Young, Derek Rose, Thomas Karnowski, Seung-Hwan Lim and Robert Patton
There has been a recent surge of success in utilizing Deep Learning (DL) in imaging and speech applications for its relatively automatic feature generation and, in particular for convolutional neural networks (CNNs), high accuracy classification abilities. While these models learn their parameters through data-driven methods, model selection (as architecture construction) through hyper-parameter choices remains a tedious and highly intuition driven task. To address this, Multi-node Evolutionary Neural Networks for Deep Learning (MENNDL) is proposed as a method for automating network selection on computational clusters through hyper-parameter optimization performed via genetic algorithms.
Catherine Schuman, Adam Disney and John Reynolds
Dynamic Adaptive Neural Network Array (DANNA) is a neuromorphic hardware implementation. It differs from most other neuromorphic projects in that it allows for programmability of structure, and it is trained or designed using evo- lutionary optimization. This paper describes the DANNA structure, how DANNA is trained using evolutionary optimization, and an application of DANNA to a very simple classification task.