Deep learning method and deep neural networks. Deep learning for automatic text processing

Today the count is one of the most acceptable ways describe models created in a machine learning system. These computational graphs are composed of neuron vertices connected by synapse edges that describe the connections between the vertices.

Unlike a scalar central or vector GPU, an IPU is new type processors designed for machine learning allows you to build such graphs. A computer designed to manipulate graphs is an ideal machine for computing graph models created through machine learning.

One of the most simple ways The way to describe the process of machine intelligence is to visualize it. The Graphcore development team has created a collection of such images that are displayed on the IPU. It is based on Poplar software, which visualizes the work artificial intelligence. Researchers from this company also found out why deep networks require so much memory, and what solutions exist to solve the problem.

Poplar includes a graphics compiler that was built from the ground up to translate standard machine learning operations into highly optimized IPU application code. It allows you to collect these graphs together using the same principle as POPNNs are collected. The library contains a set various types vertices for generalized primitives.

Graphs are the paradigm on which all software is based. In Poplar, graphs allow you to define a computation process, where vertices perform operations and edges describe the relationship between them. For example, if you want to add two numbers together, you can define a vertex with two inputs (the numbers you would like to add), some calculations (a function to add two numbers), and an output (the result).

Typically, operations with vertices are much more complex than in the example described above. They are often determined small programs, called codelets (code names). Graphical abstraction is attractive because it makes no assumptions about the structure of the computation and breaks the computation down into components that the IPU can use to operate.

Poplar uses this simple abstraction to build very large graphs that are represented as images. Software generation of the graph means we can tailor it to the specific calculations needed to ensure the most effective use IPU resources.

The compiler translates standard operations, used in machine learning systems, into highly optimized IPU application code. The graph compiler creates an intermediate image of the computational graph, which is deployed on one or more IPU devices. The compiler can display this computational graph, so an application written at the neural network framework level displays an image of the computational graph that is running on the IPU.


Graph full cycle AlexNet training in forward and backward directions

The Poplar graphics compiler turned the AlexNet description into a computational graph of 18.7 million vertices and 115.8 million edges. Clearly visible clustering is the result of strong communication between processes in each layer of the network, with easier communication between layers.

Another example is a simple fully connected network trained on MNIST - simple dialing data for computer vision, a kind of “Hello, world” in machine learning. Simple network to explore this dataset helps to understand the graphs that are driven by Poplar applications. By integrating graph libraries with frameworks such as TensorFlow, the company provides one of simple ways for using IPUs in machine learning applications.

After the graph has been constructed using the compiler, it needs to be executed. This is possible using the Graph Engine. The example of ResNet-50 demonstrates its operation.


ResNet-50 graph

The ResNet-50 architecture allows the creation of deep networks from repeating partitions. The processor only needs to define these sections once and call them again. For example, the conv4 level cluster is executed six times, but only mapped once to the graph. The image also shows the variety of shapes of convolutional layers, as each one has a graph built according to a natural form of computation.

The engine creates and manages the execution of a machine learning model using a graph generated by the compiler. Once deployed, the Graph Engine monitors and responds to the IPUs, or devices, used by applications.

The ResNet-50 image shows the entire model. At this level it is difficult to identify connections between individual vertices, so it is worth looking at enlarged images. Below are some examples of sections within neural network layers.

Why do deep networks need so much memory?

Large amounts of occupied memory are one of the most big problems deep neural networks. Researchers are trying to combat the limited throughput DRAM devices that should be used modern systems to store a huge number of weights and activations in a deep neural network.

The architectures were designed using processor chips designed for sequential processing and DRAM optimization for high-density memory. The interface between these two devices is a bottleneck that introduces bandwidth limitations and adds significant overhead in power consumption.

Although we do not yet have a complete understanding of the human brain and how it works, it is generally understood that there is not a large separate memory store. It is believed that the function is long-term and short term memory in the human brain it is built into the structure of neurons + synapses. Even simple organisms like worms with neural structure The brain, consisting of just over 300 neurons, is to some extent a memory function.

Building memory in conventional processors is one way to circumvent the memory bottleneck problem, unlocking enormous bandwidth at much lower power consumption. However, on-chip memory is expensive and is not designed for the truly large amounts of memory that are attached to the CPUs and GPUs currently used to train and deploy deep neural networks.

So it's useful to look at how memory is used today in CPUs and GPU-based deep learning systems and ask yourself: why do they require such large memory storage devices when the human brain works just fine without them?

Neural networks need memory in order to store input data, weights, and activation functions as the input propagates through the network. In learning, the activation on the input must be maintained until it can be used to compute the errors in the output gradients.

For example, a 50-layer ResNet network has about 26 million weight parameters and computes 16 million forward activations. If you use a 32-bit float to store each weight and activation, it will require about 168MB of space. By using a lower precision value to store these weights and activations, we could halve or even quadruple this storage requirement.

Serious problem memory problem occurs because GPUs rely on data represented as dense vectors. Therefore they can use single instruction thread (SIMD) to achieve high density calculations. The CPU uses similar vector units for high-performance computing.

GPUs have a synapse width of 1024 bits, so they use 32-bit floating point data, so they often split it into parallel mini-batch of 32 samples to create vectors of 1024-bit data. This approach to vector parallelism increases the number of activations by 32 times and the need for local storage with a capacity of more than 2 GB.

GPUs and other machines designed for matrix algebra are also subject to memory load from weights or neural network activations. GPUs cannot efficiently perform the small convolutions used in deep neural networks. Therefore, a transformation called "reduction" is used to convert these convolutions into matrix-matrix multiplications (GEMMs), with which graphics accelerators can cope effectively.

Additional memory is also required to store input data, temporary values, and program instructions. Measuring memory usage when training ResNet-50 on HPC GPU showed that it requires more than 7.5 GB of local DRAM.

Some might think that lower computational precision might reduce the amount of memory required, but this is not the case. By switching data values ​​to half precision for weights and activations, you will only fill half the SIMD vector width, wasting half the available compute resources. To compensate for this, when you switch from full precision to half precision on the GPU, you will then have to double the size of the mini-batch to force enough data parallelism to use all the available computation. Thus, the transition to more low accuracy weights and activations on GPU still requires more than 7.5GB dynamic memory with free access.

With this a large number data that needs to be stored, it is simply impossible to fit it all into the GPU. Each convolutional neural network layer needs to store the state of the external DRAM, load the next network layer, and then load the data into the system. As a result, the interface is already limited by bandwidth and memory latency. external memory suffers from additional burden constant reboot weights, and storing and retrieving activation functions. This significantly slows down training time and significantly increases energy consumption.

There are several ways to solve this problem. First, operations such as activation functions can be performed “in-place,” allowing input data to be rewritten directly to the output. This way existing memory can be reused. Secondly, the opportunity for reuse memory can be obtained by analyzing the data dependency between operations on the network and the allocation of the same memory to operations that are not using it at that moment.

The second approach is especially effective when the entire neural network can be analyzed at compile time to create a fixed allocated memory, since memory management overhead is reduced to almost zero. It turned out that the combination of these methods can reduce the memory use of a neural network by two to three times.
A third significant approach was recently discovered by the Baidu Deep Speech team. They applied various methods saving memory to get a 16x reduction in memory consumption of activation functions, which allowed them to train networks with 100 layers. Previously, with the same amount of memory, they could train networks with nine layers.

Combining memory and processing resources into a single device has significant potential to improve the performance and efficiency of convolutional neural networks, as well as other forms of machine learning. A trade-off can be made between memory and computing resources to achieve a balance of capabilities and performance in the system.

Neural networks and knowledge models in other machine learning methods can be thought of as mathematical graphs. These graphs contain huge amount parallelism. A parallel processor designed to exploit parallelism in graphs does not rely on mini-batch and can significantly reduce the amount of local storage required.

Current research results have shown that all these methods can significantly improve the performance of neural networks. Modern GPUs and CPUs have very limited onboard memory, only a few megabytes in total. New processor architectures specifically designed for machine learning balance memory and on-chip compute, delivering significant performance and efficiency improvements over current technologies. central processors and graphics accelerators.

— The laboratory is young: there are only five people in our team so far, the work is an unplowed field, but we are serious. The main focus was the development and research of dialogue systems - online consultants and assistants who competently answer all user questions. Many companies still have such services, but either they work poorly, constantly producing errors, or there is a living person on the other side of the monitor who cannot be online 24/7, and besides, he has to be paid. We want to develop an algorithm that will allow us to create robots capable of full-fledged conversation. Such a robot will be able to buy you a plane ticket in a matter of minutes or advise you on any pressing issue. Currently, this level of systems does not exist.

Neural networks and artificial intelligence

The idea of ​​neural networks was born in the middle of the 20th century in the USA along with the advent of the first computers. Neurophysiologists who studied the theoretical aspects of brain function believed that the organization of computer work in the image and likeness of work human brain will make it possible to create the first artificial intelligence in the near future.

The difference between artificial intelligence and all algorithms of the previous generation is that the trained neural network does not act according to given path, but independently looks for ways to most effective achievement goals. The operation of a single computer “neuron” looks like this: for training, the program input is supplied with objects belonging to two types - A and B - and carrying some kind of numerical value. The program, based on the data in the training set, understands which ranges of this value correspond to objects A and which to B, and can subsequently distinguish them independently. IN real problems the system must distinguish between many types, each of which, in turn, can have dozens of properties. To solve them, more complex structure from layers of neurons, serious computing power And large number training tests. The 21st century has marked the beginning of an era in which these technologies can already be used to solve everyday problems.

Mikhail Burtsev, head of the laboratory:

— The concept of how neural networks work is quite simple: we give the machine a large amount of text, and it remembers how the words fit together. Based on this information, it can reproduce similar texts - the machine does not need to know the rules of syntax, declination and conjugation for this. Already there neural networks who, having learned from the works of Pushkin, try to write in his style. This is another feature of neural networks: they learn the “style” that they are given for learning. If you give Wikipedia as material, the program will sprinkle terms and use predominantly journalistic style. As our laboratory works to create question-answer systems, we use ready-made dialogs to train the network. In one of the experiments, they used subtitles from films and let our network study a whole saga about vampires. Having analyzed this array of data, the neural network can already support the conversation.

Dialogues between laboratory staff and a neural network

Team: today and tomorrow

The laboratory cooperates with large research centers on the basis of National Research Nuclear University MEPhI and Kurchatov Institute. Foreign experts in the field of machine learning and neuroinformatics also take part in its activities, for example Sergey Plis from The Mind Research Network. In addition, events are regularly held to popularize the laboratory’s activities and search for young talents. Winning a hackathon or successfully completing a course gives you a good chance of getting into the laboratory.

Valentin Malykh, laboratory employee:

“My path to the laboratory was very difficult. Just four years ago, I practically did not touch the topic of machine learning. Then I got involved in computational linguistics, and away we go... I changed jobs several times: I tried my hand at robotics, and developed software related to computer vision, it was there that I met machine learning, and I wanted to do serious research.
During all my work, I managed to go to several hackathons organized by the laboratory - perhaps the most interesting thing that happened to me during that period. Then I came to the guys and said that I wanted to work for them. They took me.

Philosophy of DeepHack

Hackathons, despite their name, have nothing to do with hacking software ( English hack - to hack). These are team programming competitions in which participants spend several days, and sometimes weeks, struggling to solve one problem. specific task. The theme of the hackathon is announced in advance, and usually several hundred people participate. Such events are organized not only by institutes, but also large companies who are looking for talented specialists. At the Phystech Institute, the Laboratory of Neural Networks and Deep Learning has already organized two hackathons - participants listened to lectures on question-answering and dialogue systems and wrote code for a week.

Vladislav Belyaev, laboratory employee:

— This year and last year we organized hackathons on machine learning. There were a lot of applications, not only from Russia and the CIS, but also from Europe and the States. During the hackathon, lectures were given by scientists from Oxford and Stanford, Google DeepMind and OpenAI, and Russian colleagues, Certainly. Now we are preparing a course on neural networks, we will tell you everything from the very beginning to the end: from the biological concept and basic models in programming to the actual application and specific implementation.

Free time

There are still few employees in the laboratory, so each person has a large amount of work of a different nature: they need to study algorithms, write code, and prepare scientific publications.

Mikhail Burtsev, head of the laboratory:

“You have to work a lot—I don’t think I remember what it’s like anymore.” free time. No joke, there is practically no time to relax: over the past six months we have been able to go out to barbecue once in a group. Although in a sense, work can be relaxation. Hackathons and seminars provide an opportunity to communicate in a less formal setting with colleagues and make new acquaintances. We haven’t yet managed to start traditions of spending time together after work - we’re too young. In the summer we plan to go out into nature with the whole laboratory, rent a cottage and solve the most difficult and interesting problems together for two weeks - we will organize our own mini-hackathon. Let's see how effective this approach can be. Perhaps this will become our first good tradition.

Employment

The laboratory will expand and is already looking for new employees. The easiest way to get a place is to complete a two-month internship, for which you are selected based on an interview. A necessary condition Passing the interview is to complete part of the tasks of the Deep Learning course. During the internship, you have the opportunity to participate in paid custom projects. Funding for the laboratory has not yet been arranged, however, according to laboratory staff, this problem will be solved in the near future. “To come to us now means to get a chance to become the “founding father” of the laboratory in the most promising direction information technology“says Mikhail Burtsev.

Images and photographs were provided by the MIPT Laboratory of Neural Networks and Deep Learning. Photographer: Evgeny Pelevin.

And in parts, this guide is intended for anyone who is interested in machine learning but doesn't know where to start. The content of the articles is intended for a wide audience and will be quite superficial. But does anyone really care? The more people who become interested in machine learning, the better.

Object recognition using deep learning

You may have already seen this famous xkcd comic. The joke is that any 3-year-old can recognize a photo of a bird, but getting a computer to do it took the best computer scientists over 50 years. In the last few years, we've finally found a good approach to object recognition using deep convolutional neural networks. This sounds like a bunch of made-up words from a William Gibson science fiction novel, but it all makes sense when we take them one by one. So let's do it - write a program that recognizes birds!

Let's start simple

Before we learn how to recognize pictures of birds, let's learn how to recognize something much simpler - the handwritten number "8".

I learned about business trends at a large-scale conference in Kiev. Saturday was filled with insights, in which we gained new knowledge and acquaintances, gained over the course of the hours spent. At the conference there were 4 streams of information for business leaders, top managers, marketers, sales, HR and other specialists. One of the speakers was the Minister of Infrastructure Volodymyr Omelyan, who spoke about the development of Galuzia, the renovation of roads and airports.

Good day to all, dear fellow iOS users, probably each of you has worked with the network and parsed data from JSON. For this process there are a lot of libraries, all kinds of tools that you can use. Some of them are complex and some are simple. I myself, for a very long time, to be honest, parsed JSON by hand, not trusting this process with any third party libraries and this had its advantages.

On September 9, 2014, during the next presentation, Apple company presented own system mobile payments— Apple Pay.

Using payment Apple systems Pay iPhone users 6 and iPhone 6+, as well as owners of the latest Apple versions Watch can make purchases online, use additional benefits apple pay For mobile applications and make payments using NFC technology(Near Field Communication). Used to authorize payments Touch technology ID or Face ID.

Technologies do not stand still, and development processes move with them. If previously companies worked according to the “Waterfall” model, now, for example, everyone is striving to implement “Scrum”. Evolution is also taking place in the provision of software development services. Before the company provided clients with high-quality development within the budget, stopping there, now they strive to provide maximum benefit for the client and his business, providing his expertise.

Over the past few years, so many good fonts have appeared, including free ones, that we decided to write a continuation of ours for designers.

Every designer has a set of favorite fonts to work with that they are used to working with and that reflect who they are. graphic style. Designers say " Good fonts there can never be too many,” but now we can safely imagine a situation where this set consists only of free fonts.

How often do project managers find themselves between a rock and a hard place when trying to find a balance between all the requirements and deadlines of the customer and the mental health of the entire team? How many nuances need to be taken into account so that there is peace and order on both sides of responsibility? How do you know if you are a good manager or if you urgently need to improve on all fronts? How to determine in which aspects exactly you, as a PM, are lagging behind, and where you are good and smart? This is exactly what the next Code’n’Coffee conference was about.

Pattern recognition technology is increasingly becoming part of our everyday life. Companies and institutions use it to solve the most different tasks: from security to customer satisfaction research. Investments in products based on this function, promise to grow to $39 billion by 2021. Here are just a few examples of how pattern recognition is used in different fields.