Deep learning (DL) technologies are proliferating in many areas. DL will transform the entire landscape of Internet-of-Things, from the datacenter architecture, software stack, development processes, to business models. Preferred Networks, Inc. (PFN) was founded in 2014 with the objective of applying this emerging technology to applications of significant industrial importance. This whitepaper discusses the implications of DL in the industrial sector and demonstrates the strong expertise and capabilities of PFN in this area.
DL is expected to revolutionize data analytics that are currently based on traditional statistical modeling or conventional machine learning techniques with two distinctive features. Firstly, DL models can easily handle extremely high-dimensional data. In traditional statistical modeling, the number of independent variables (input variables) is relatively small, which forces data scientists to disregard many potentially significant but seemingly irrelevant input variables. One important example of high-dimensional data is time-series data, which is often prevalent in sensor data from industrial devices. DL can capture the interactions between thousands or even millions of input variables, and make every piece of information from complex interactions that contribute to the output, resulting in significantly higher accuracy than conventional methods.
Furthermore, DL is also model-free, meaning that it does not assume a priori knowledge on the class of probability distribution, as any probability distribution can be approximated by a sufficiently complex neural network. This frees data scientists from making too many assumptions (which might also be incorrect or over-simplistic) in advance and from exploring the enormous space of possible statistical models. These two characteristics enable DL to be applied to a very wide range of application areas and to scale to large volumes of data.
Although there are a number of research projects actively being pursued around the world, as of today there are relatively few that apply DL to real-world applications. In particular, we are interested in the industrial sector such as automotive and industrial robotics, where an enormous amount of sensor data is generated but only traditional statistical modeling is commonly used. In the course of collaborating with industry leaders like Toyota and Fanuc, we have become convinced that DL technologies can truly revolutionize data analytics in this domain, have built up experiences and knowledge of how DL can be applied in a variety of settings, and come up with a number of innovative ideas. We briefly review them in the following three areas of recognition, prediction, and control.
DL can capture patterns arising from many input variables. This capability is best illustrated by image recognition because each pixel can be treated as a separate input variable. In fact, DL has been extensively applied to image recognition since its inception; first it was used in classifying images (i.e., determining if the image contains a certain type of object such as automobile and dog), then later the output becomes more complex such as image segmentation (i.e., determining to which object each input pixel belongs), boundary-box detection, and image synthesis (e.g., PaintsChainer, PFN’s application for auto-coloring of line drawings).
We have successfully applied our DL-based image recognition in many areas such as autonomous driving, visual inspection in manufacturing processes, and picking objects from warehouse shelves (we were awarded the 2nd prize in 2016 Amazon Picking Challenge, see Figure 1.). The recognition algorithms can also be applied to any high-dimensional data; for example, we obtained an order-of-magnitude higher accuracy in detecting breast cancer from very high-dimensional biomarker data.
The biggest technical challenge in DL is how to prepare high-quality training data sets with sufficient volume. PFN has extensive experience in image recognition and has built a suite of tools for preparing training data sets, including those for data annotation and data augmentation as well as new techniques such as semi-supervised training, one-shot learning, active-learning, unsupervised learning, and transfer-learning.
DL is also used in predicting future events. One application of significant industrial importance is to predict failures of machinery. This involves detecting anomalies in the sensor data as far in advance of the actual failure as possible. Traditional ML classifiers requires both positive (with failure) and negative (normal) training data. However, many industrial machines have now become so reliable that we cannot obtain very many samples of positive (that is, anomalous) data, resulting in biased data sets that significantly lower the prediction accuracy. Our patent-pending anomaly detection algorithm works with negative (normal) data only. To accomplish this, we use a variation of deep neural net called a deep generative model.
A deep generative model is trained to approximate the joint probability distribution of the input data sets. If a new input datum is statistically very unlikely under the trained probability distribution, we determine that the input datum is an anomaly, flagging a possible future failure of the machine. This method proved to be very effective in predicting a failure of an industrial robot. For example, Figure 2 shows a comparison between the existing model-based failure detection method and our model-free, DL-based method. The traditional model-based method can detect the failure only minutes before it, while our DL-based method can detect the same failure weeks in advance. This is because the model-free DL can incorporate variables that have been disregarded by the traditional model-based method, such as the value history with a longer time window, and capture the normal behaviour of their interactions.
In addition to the robot failure prediction, we have successfully applied the same technique to other domains such as misfire prediction for internal combustion engines and hard-landing prediction of airline flights.
DL can also be used for controlling machines. One example is our self-driving car demonstration at the 2016 Consumer Electronics Show in Las Vegas (Figure 3 left). In the Figure, each silver car is independently driven by a deep neural net, crossing the intersection without colliding with each other. The neural nets were trained by a technique called deep reinforcement learning, where each car first randomly moves and learns how to drive and how to avoid collisions from experience. This training is done in a software simulator. The red car, which is driven by a human operator, is injected afterwards without any additional knowledge on the silver car side, but the silver cars demonstrated their ability to avoid the red car as well, thus showing the robustness of our method.
As opposed to the conventional control theory, deep reinforcement learning is model-free. Figure 3 (right) shows an autonomous drone, which is also controlled by our deep reinforcement learning technology. Flying a drone autonomously involves an additional challenge because programming a software simulator with appropriate physical modeling is almost impossible because of the dynamic and nonlinear nature of the drone behaviour. Instead, we trained a neural network to mimic the drone’s behaviour by supplying training data comprising of the observation of the drone responses while it is flying, and then use this behaviour neural net to train the the control neural net in the simulator.
Currently we are applying this technique to wider domains, such as bulk bin-picking with industrial robots. We also developed and open-sourced ChainerRL, our reinforcement learning toolkit, to make this technology more accessible to the industry. Reinforcement learning can potentially replace today’s model-based control theories in many automated systems. We are actively involved in research on reinforcement learning to make this happen — for example, our team is working on “safe” reinforcement learning for mission-critical applications.
We have discussed innovative applications of DL in industrial applications. We believe that they are only the beginning and there are much more opportunities in front of us. PFN is determined to capture these opportunities by working with industry partners.
Next, we will examine the implications of DL to the computing platforms.
As we have seen, DL opens new opportunities for IoT. However, there are certain technical challenges. One is that developing DL-based applications requires a new thinking in programming that combines statistical thinking and computer science. For DL to be widely used, software tools have to be in place to make DL programming easy enough. Second, DL achieves higher accuracy in exchange for more computation, especially in the learning phase. DL workloads are not suitable for the conventional CPU’s that are optimized for transactional workloads, so special-purpose hardware is required for DL to be practical. PFN is one of a very few companies who have a full-stack DL technologies.
Chainer(R) is our software framework for DL. It was originally developed in May 2015 to accelerate internal R&D activities within PFN with the emphasis on flexibility. At that time, new types of neural networks were proposed almost every week, most notably dynamic networks such as Recurrent Neural Network that grow dynamically as a new inputs arrive. Chainer’s uniqueness is in its dynamic “define-by-run” nature of network creation — you can build a neural network on-the-fly as it is processing input data. With this flexibility, Chainer allowed us to test the new ideas very quickly. In fact, Chainer was so successful that we decided to make it open source and since then, it has been gaining popularity among not only researchers but many of the practitioners in various segments of the industry.
Chainer is designed as a natural extension of the programming language Python, a preferred platform of many data scientists, with a seamless integration with CUDA, the API for NVIDIA’s GPU, for efficient computation. Thus, anyone who is familiar with Python can quickly and easily start building DL applications with their existing skills and datasets, and at the same time enjoy the high performance of the latest GPU. Today, Chainer is being supported by every major platform, including NVIDIA, Intel, and OpenPower.
As deep neural networks become larger and more complex, DL computation becomes harder to be carried out in a single GPU. Thus, DL infrastructure must be scalable. PFN announced its multi-node distributed-training version of Chainer in January 2017. Figure 4 shows benchmark results of different DL frameworks on a cluster with 128 GPU’s. As can be seen in the graph, Chainer outperforms the other DL frameworks.
Currently, PFN operates a data center with over 1,000 GPUs just for internal R&D activities. This is one of the largest operational installations of GPU clusters in Japan. As our company grows, we expect the appetite for computing power to grow and so we are extending this infrastructure to even more GPUs in the coming months.
DL introduces many changes in the full spectrum of computing, from the programming model to computer architecture, optimizing DL infrastructure can only be done by an organization who has the capability and experiences over this full spectrum. PFN is one of the very few companies who are up to this challenge.
As more and more devices are connected to an IoT network, sending all of the corresponding sensor data to the data center for analysis will eventually result in a congested network. We envision that most of the sensor data will be processed at the edge of network and only the essence of the data will be sent to the datacenter. We call this concept edge-heavy computing. DL plays a key role to realize edge-heavy computing because the “essence” of data can be extracted by applying DL on the data. As previously discussed, our DL methods can effectively estimate the joint probability distribution of the multi-dimensional sensor data. The resulting model (called a pretrained model), and which is far more concise than the original raw data, represents the essential statistical properties (the “essence”) of the data.
PFN is a young company but already has a successful track record. We have strategic investors such as NTT, Toyota, and Fanuc, and we have received global recognitions such as Financial Times 2017 ArcelorMittal Boldness in Business Awards – Technology Award and Forbes Japan’s CEO of the Year 2016.
Our value proposition is to “Bring the latest technologies to our customers faster than anybody else.” Our engineers are always watching the latest development in computer science in general and DL in particular through academic publications, research communities, GitHub repositories, and so on, testing the new ideas by actually implementing them quickly (thanks to the flexibility of Chainer), and incorporating them into our offerings when it is appropriate. This is only made possible because our engineers have both research and engineering mindsets, in addition to extensive domain expertise.
The most critical asset in our company is our employees. Many of our engineers have been and/or are active contenders of worldwide competitions such as International Collegiate Programming Contest, TopCoder, Kaggle, and International Mathematical Olympiad. We are also a diverse organization with employees from all over the world and with variety of backgrounds. With this unmatched talent pool in the DL arena, PFN is well-positioned to lead the transformation of the IoT world through innovative DL technologies.