Classification of neural networks. What are artificial neural networks

Step 4. We apply rule 2, the condition of which is this statement. We get the output of the original statement.

Note that to simplify the situation, we assumed that in both cases the facts "The sky is covered with clouds" and "The barometer is falling" are already known to the system. In fact, the system finds out the truth or falsity of a fact included in the condition of some rule by asking the user about it at the moment when it tries to apply the rule. The given example was deliberately chosen to be very simple and not reflecting many problems associated with the organization of inference in an expert system. In particular, the example may give the impression that the direct chain of reasoning is more efficient than the reverse one, which in fact, generally speaking, is not the case. The effectiveness of a particular inference strategy depends on the nature of the task and the contents of the knowledge base. In diagnostic systems, direct inference is more often used, while in planning systems, reverse inference is more effective. In some systems, the inference is based on a combination of inverse and bounded direct. This combined method is called cyclic.

2.7 Explanation of decisions

The explanatory component of the ES explains how the system got the solution to the problem (or why it did not get the solution) and what knowledge it used in doing so, which makes it easier for the expert to test and increases the user's confidence in the result. Since knowledge-based systems are implemented on computers, the input information is either perceived in a form understandable to a computer, i.e. in bits and bytes. However, in order for an untrained user to interact, it requires the inclusion of natural language communication tools. The vast majority of knowledge-based systems have a fairly primitive natural language interface - valid user input messages are limited by the set of concepts contained in the knowledge base. So, using the example of a simple ES and a knowledge base, the dialogue between the user and the system can be imagined as follows:

System: Do you want to know if you need to bring an umbrella with you?

User: Yes.

System: Is it true that the sky is covered with clouds?

User: Yes.

System: Is it true that the barometer is falling?

User: Yes.

System: (after some "thinking") You should take an umbrella with you.

As can be seen from this example, during the consultation, the dialogue initiative belongs to the system, and the consultation with the EC itself looks the same as the consultation with a human expert: a series of questions are asked and, based on their analysis, an expert opinion is issued.

One of the most important problems specific to knowledge-based systems is the problem of knowledge representation. This is explained by the fact that the form of knowledge representation has a significant impact on the characteristics and properties of the system. In order to manipulate all kinds of knowledge from the real world with the help of a computer, it is necessary to carry out their modeling. In such cases, it is necessary to distinguish between the knowledge intended for processing by a computer and the knowledge used by a person.

When designing a knowledge representation model, uniformity of representation and ease of understanding should be taken into account. A homogeneous representation leads to a simplification of the inference management mechanism and simplification of knowledge management. Knowledge representation should be understandable to experts and users of the system. Otherwise, the acquisition of knowledge and its evaluation are difficult. However, it is rather difficult to fulfill this requirement equally for both simple and complex tasks. Usually, for simple problems, they stop at some average (compromise) representation, but for solving complex and large problems, structuring and modular representation are necessary.

Typical knowledge representation models are: models: production, based on the use of frames, semantic network, logical model.

23. Neural networks. Types of neural networks. Algorithms for training neural networks. Application of neural networks for pattern recognition problems.

Artificial neural network(INS) - mathematical models, as well as their software or hardware implementations, built on the principle of organization and functioning of biological neural networks - networks of nerve cells of a living organism. This concept arose in the study of the processes occurring in the brain during thinking, and when trying to model these processes. The first such brain model was the perceptron. Subsequently, these models began to be used for practical purposes, as a rule, in forecasting problems.

ANNs are a system of connected and interacting simple processors (artificial neurons). Such processors are usually quite simple, especially when compared to the processors used in personal computers. Each processor in such a network deals only with the signals it periodically receives and the signals it periodically sends to other processors. Nevertheless, when connected in a large enough network with controlled interaction, such locally simple processors together are able to perform quite complex tasks.

From the point of view of machine learning, a neural network is a special case of pattern recognition methods, discriminant analysis, clustering methods, etc. From a mathematical point of view, neural network training is a multi-parameter non-linear optimization problem. From the point of view of cybernetics, the neural network is used in adaptive control tasks and as algorithms for robotics. From the point of view of the development of computer technology and programming, a neural network is a way to solve the problem of efficient parallelism. And from the point of view of artificial intelligence, ANN is the basis of the philosophical flow of connectivism and the main direction in the structural approach to studying the possibility of building (simulating) natural intelligence using computer algorithms.

Neural networks are not programmed in the usual sense of the word, they are trained. The ability to learn is one of the main advantages of neural networks over traditional algorithms. Technically, learning is about finding the coefficients of connections between neurons. In the learning process, the neural network is able to identify complex relationships between inputs and outputs, as well as perform generalization. This means that, in case of successful training, the network will be able to return the correct result based on the data that was missing in the training sample.

Notable uses

Pattern recognitionand classification. Objects of different nature can act as images: text symbols, images, sound patterns, etc. When training the network, various patterns of images are offered with an indication of which class they belong to. A sample is usually represented as a vector of feature values. In this case, the totality of all features should uniquely define a class The to which the sample belongs. In case there are not enough features, the network can associate the same sample with several classes, which is not true. At the end of the network training, it can be presented with previously unknown images and receive an answer about belonging to a certain class.

The topology of such a network is characterized by the fact that the number of neurons in the output layer is usually equal to the number of defined classes. This establishes a correspondence between the output of the neural network and the class it represents. When a network is presented with an image, one of its outputs should show a sign that the image belongs to this class. At the same time, other outputs should have a sign that the image does not belong to this class. If at two or more outputs there is a sign of belonging to a class, it is considered that the network is “not sure” of its answer.

Decision making and management. Situations are subject to classification, the characteristics of which are fed to the input of the neural network. At the output of the network, a sign of a solution should appear. In this case, various criteria for describing the state of the controlled system are used as input signals.

Clustering. Clustering is understood as the division of a set of input signals into classes, despite the fact that neither the number nor the characteristics of the classes are known in advance. After training, such a network is able to determine which class the input signal belongs to. The network can also signal that the input signal does not belong to any of the selected classes - this is a sign of new data missing from the training sample. So such a network can detect new, previously unknown signal classes. The correspondence between the classes identified by the network and the classes that exist in the subject area is established by a person. Clustering is carried out, for example, by Kohonen neural networks.

Forecastingandapproximation. The ability of a neural network to predict directly follows from its ability to generalize and highlight hidden dependencies between input and output data. After training, the network is able to predict the future value of a certain sequence based on several previous values and / or some currently existing factors. It should be noted that forecasting is possible only when previous changes do to some extent predetermine future ones. For example, predicting stock prices based on last week's stock prices may or may not be successful, while predicting tomorrow's lottery results based on data from the past 50 years will almost certainly fail.

Data compressionandAssociative memory. The ability of neural networks to identify relationships between various parameters makes it possible to express high-dimensional data more compactly if the data is closely interconnected with each other. The reverse process - restoring the original data set from a piece of information - is called (auto)associative memory. Associative memory also allows you to restore the original signal/image from noisy/damaged input data. Solving the problem of heteroassociative memory makes it possible to implement content-addressable memory.

Stages of problem solving

Data collection for training;

Data preparation and normalization;

Choice of network topology;

Experimental selection of network characteristics;

Experimental selection of training parameters;

actual training;

Checking the adequacy of training;

Parameter adjustment, final training;

Verbalization of the network for further use.

Some of these steps should be considered in more detail.

Data collection for training

The choice of data for network training and their processing is the most difficult step in solving the problem. The training dataset must meet several criteria:

Representativeness - data should illustrate the true state of affairs in the subject area;

Consistency - inconsistent data in the training sample will lead to poor network training quality;

The initial data is converted to the form in which they can be submitted to the inputs of the network. Each entry in the data file is called learning couple or learning vector. The training vector contains one value for each network input and, depending on the type of training (supervised or unsupervised), one value for each network output. Training a network on a "raw" set, as a rule, does not give high-quality results. There are a number of ways to improve the "perception" of the network.

Normalization is executed when data of different dimensions are fed to different inputs. For example, the first input of the network is supplied with values from zero to one, and the second - from one hundred to one thousand. In the absence of normalization, the values at the second input will always have a significantly greater impact on the network output than the values at the first input. When normalizing the dimensions of all input and output data are brought together;

Quantization is performed on continuous quantities for which a finite set of discrete values is allocated. For example, quantization is used to set the frequencies of audio signals in speech recognition;

Filtration performed for "noisy" data.

In addition, the presentation of both input and output data plays an important role. Suppose the network is trained to recognize letters in images and has one numerical output - the number of the letter in the alphabet. In this case, the network will get the false impression that the letters numbered 1 and 2 are more similar than the letters numbered 1 and 3, which is generally not true. In order to avoid such a situation, a network topology with a large number of outputs is used, when each output has its own meaning. The more outputs in the network, the greater the distance between the classes and the more difficult it is to confuse them.

Choice of network topology. The type of network should be chosen based on the problem statement and the available training data. Supervised learning requires an “expert” assessment for each element of the sample. Sometimes obtaining such an estimate for a large amount of data is simply impossible. In these cases, a natural choice is an unsupervised learning network, such as a self-organizing Kohonen map or a Hopfield neural network. When solving other problems, such as time series forecasting, expert judgment is already contained in the original data and can be extracted during processing. In this case, you can use a multilayer perceptron or a Word network.

Experimental selection of network characteristics. After choosing the general structure, it is necessary to experimentally select the network parameters. For networks like a perceptron, this will be the number of layers, the number of blocks in hidden layers (for Word networks), the presence or absence of bypass connections, and the transfer functions of neurons. When choosing the number of layers and neurons in them, one should proceed from the fact that the network's ability to generalize is higher, the greater the total number of connections between neurons. On the other hand, the number of connections is limited from above by the number of records in the training data.

Experimental selection of learning parameters. After choosing a specific topology, you need to select the training parameters for the neural network. This step is especially important for supervised networks. The correct choice of parameters determines not only how quickly the responses of the network will converge to the correct responses. For example, choosing a low learning rate will increase the convergence time, but sometimes avoid network paralysis. An increase in the learning moment can lead to both an increase and a decrease in the convergence time, depending on the shape of the error surface. Based on such a contradictory influence of the parameters, it can be concluded that their values should be chosen experimentally, guided by the learning completion criterion (for example, minimizing the error or limiting the training time).

The actual training of the network. During the learning process, the network scans the training sample in a certain order. The browsing order can be sequential, random, etc. Some unsupervised networks, such as Hopfield networks, scan the sample only once. Others, such as Kohonen networks and supervised networks, scan the sample many times, with one complete pass over the sample called era of learning. When learning with a teacher, the set of initial data is divided into two parts - the actual training sample and test data; the principle of separation can be arbitrary. The training data is fed to the network for training, and the test data is used to calculate the error of the network (the test data is never used to train the network). Thus, if the error decreases on the test data, then the network does generalize. If the error on the training data continues to decrease, and the error on the test data increases, then the network has stopped generalizing and is simply "remembering" the training data. This phenomenon is called network retraining or overfitting. In such cases, training is usually stopped. During the training process, other problems may appear, such as paralysis or the network getting into a local minimum of the error surface. It is impossible to predict in advance the manifestation of a particular problem, as well as to give unambiguous recommendations for their resolution.

Checking the adequacy of training. Even in the case of successful, at first glance, training, the network does not always learn exactly what the creator wanted from it. There is a case when the network was trained to recognize images of tanks from photographs, but later it turned out that all the tanks were photographed on

the same background. As a result, the network "learned" to recognize this type of terrain, instead of "learning" to recognize tanks. Thus, the network “understands” not what was required of it, but what is easiest to generalize.

Classification by type of input information

Analog neural networks (use information in the form of real numbers);

Binary neural networks (operate with information presented in binary form).

Classification by the nature of training

Supervised learning - the output decision space of the neural network is known;

Unsupervised learning - a neural network generates an output decision space only based on input actions. Such networks are called self-organizing;

Reinforcement learning is a system for assigning penalties and rewards from the environment.

Classification according to the nature of synapse tuning

Networks with fixed connections (the weight coefficients of the neural network are selected immediately, based on the conditions of the problem, while: , where W are the weight coefficients of the network);

networks with dynamic connections (for them, during the learning process, synaptic connections are adjusted, that is, , where W are the weight coefficients of the network).

Classification by signal transmission time

In a number of neural networks, the activating function may depend not only on the weight coefficients of connections w ij, but also on the time of transmission of an impulse (signal) over communication channels τ ij. Therefore, in general, the activating (transmitting) communication function c ij from element u i to element u j has the form: . Then synchronous network is called such a network for which the transmission time τ ij each bond is either zero or a fixed constant τ. asynchronous is called such a network for which the transmission time τ ij for each link between elements u i and u j own, but also permanent.

Classification by the nature of relationships

Feedforward networks

All connections are directed strictly from input neurons to output neurons. Examples of such networks are the Rosenblatt perceptron, multilayer perceptron, Word networks.

Recurrent Neural Networks

The signal from the output neurons or hidden layer neurons is partially transmitted back to the inputs of the input layer neurons (feedback). The recurrent network Hopfield network "filters" the input data, returning to a stable state and, thus, allows solving the problems of data compression and building associative memory. A special case of recurrent networks is bidirectional networks. In such networks, there are connections between layers both in the direction from the input layer to the output one, and in the opposite direction. A classic example is the Cosco Neural Network.

Radial basis functions

Artificial neural networks that use radial-basic neural networks as activation functions (such networks are abbreviated as RBF networks). General view of the radial basis function:

, for example,

where x- neuron input signals vector, σ - function window width, φ( y) is a decreasing function (most often equal to zero outside a certain segment).

The radial-basic network is characterized by three features:

The only hidden layer

Only neurons in the hidden layer have a non-linear activation function

The synaptic weights of the connections of the input and hidden layers are equal to one

About the training procedure - see the literature

Self-organizing cards. Such networks are a competitive neural network with learning

unsupervised performing the task of visualization and

clustering. It is a method of projecting a multidimensional space into a space with a lower dimension (most often, two-dimensional), it is also used to solve problems of modeling, forecasting, etc. It is one of the versions of Kohonen's neural networks. Kohonen's self-organizing maps serve primarily for visualization and initial ("reconnaissance") data analysis.

The signal to the Kohonen network goes to all neurons at once, the weights of the corresponding synapses are interpreted as coordinates of the node position, and the output signal is formed according to the “winner takes all” principle - that is, the neuron closest (in the sense of synapse weights) to the input signal has a non-zero output signal object. During the learning process, the synapse weights are adjusted in such a way that the lattice nodes are “located” in the places of local data condensing, that is, they describe the cluster structure of the data cloud, on the other hand, the connections between neurons correspond to the neighborhood relations between the corresponding clusters in the feature space.

It is convenient to consider such maps as two-dimensional grids of nodes located in a multidimensional space. Initially, a self-organizing map is a grid of nodes connected by links. Kohonen considered two options for connecting nodes - in a rectangular and hexagonal grid - the difference is that in a rectangular grid each node is connected to 4 neighboring ones, and in a hexagonal one - to 6 nearest nodes. For two such grids, the process of constructing a Kohonen network differs only in the place where the nearest neighbors to a given node are traversed.

The initial nesting of the grid in the data space is chosen arbitrarily. The author's SOM_PAK package offers options for a random initial location of nodes in space and a variant for the location of nodes in a plane. After that, the nodes begin to move in space according to the following algorithm:

A data point is randomly selected x.

The nearest one is determined x map node (BMU - Best Matching Unit).

This node moves the given step towards x. However, it does not move alone, but carries along a certain number of nearest nodes from some neighborhood on the map. Of all the moving nodes, the central node closest to the data point moves the most, and the rest experience the smaller displacements the farther they are from the BMU. There are two stages in map tuning - the stage of rough (ordering) and the stage of fine (fine-tuning) tuning. At the first stage, large values of the neighborhoods are selected and the movement of nodes is of a collective nature - as a result, the map “spreads out” and roughly reflects the data structure; at the fine-tuning stage, the radius of the neighborhood is 1-2, and the individual positions of the nodes are already adjusted. In addition, the bias value decays uniformly with time, that is, it is large at the beginning of each of the training stages and close to zero at the end.

The algorithm repeats for a certain number of epochs (it is clear that the number of steps can vary greatly depending on the task).

Known network types: Rosenblatt Perceptron; Multilayer Perceptron; Jordan Network; Elman Network; Hamming Network; Word Network; Hopfield Network; Kohonen Network; Cognitron; Neocognitron; Chaotic Neural Network; Oscillatory Neural Network; Counterpropagation Network; Generalized regression network; Probabilistic network; Siamese neural network; Adaptive resonance networks.

Algorithms for training neural networks.

backpropagation

rapid spread

Conjugate gradient method

Levenberg-Marquard algorithm

Quasi-Newtonian algorithm

Delta delta with dash

Kohonen's algorithm

OVK (learning vector quantizer)

Pseudo-inverse method (singular value decomposition)

K-means method

Algorithms for setting deviations

To train a neural network means to tell it what we

we get from her. This process is very similar to teaching a child the alphabet. Showing the child a picture of the letter "A", we ask him: "What letter is this?" If the answer is wrong, we tell the child the answer that we would like to receive from him: "This is the letter A." The child remembers this example along with the correct answer, that is, some changes occur in his memory in the right direction. We will repeat the letter presentation process again and again until all 33 letters are firmly remembered. This process is called "supervised learning".

When training a neural network, we act in exactly the same way. We have some database containing examples (a set of handwritten images of letters). Presenting the image of the letter "A" to the input of the neural network, we get some answer from it, which is not necessarily correct. We also know the correct (desired) answer - in this case, we would like the signal level to be maximum at the output of the neural network with the label "A". Usually, the set (1, 0, 0, ...) is taken as the desired output in the classification problem, where 1 is at the output labeled "A", and 0 is at all other outputs. Calculating the difference between the desired response and the real response of the network, we get 33 numbers - error vector. The error backpropagation algorithm is a set of formulas that allows you to calculate the required corrections for the neural network weights from the error vector. We can present the same letter (as well as different images of the same letter) to the neural network many times. In this sense, training is more like repeating exercises in sports - training.

It turns out that after repeated presentation of examples, the weights of the neural network stabilize, and the neural network gives the correct answers to all (or almost all) examples from the database. In this case, we say that "the neural network has learned all the examples", "the neural network has been trained", or "the neural network has been trained". In software implementations, it can be seen that during the learning process, the error value (the sum of squared errors over all outputs) gradually decreases. When the error value reaches zero or an acceptable low level, training is stopped, and the resulting neural network is considered trained and ready for use on new data.

It is important to note that all the information that the neural network has about the task is contained in the set of examples. Therefore, the quality of neural network training directly depends on the number of examples in the training set, as well as on how fully these examples describe the given task. So, for example, it makes no sense to use a neural network to predict a financial crisis if there are no crises in the training sample. It is believed that for a full-fledged training of a neural network, at least a few tens (or better than hundreds) of examples are required.

We repeat once again that the training of neural networks is a complex and science-intensive process. Neural network training algorithms have various parameters and settings that require an understanding of their influence to control.

Application of a neural network

Once the neural network is trained, we can apply it to solve useful problems. The most important feature of the human brain is that, once having learned a certain process, it can act correctly in those situations in which it has not been in the learning process. For example, we can read almost any handwriting, even if we see it for the first time in our lives. Similarly, a neural network, properly trained, can most likely respond correctly to new data that has not been presented to it before. For example, we can draw the letter "A" in a different handwriting and then ask our neural network to classify the new image. The weights of the trained neural network store a lot of information about the similarities and differences of letters, so you can count on the correct answer for the new version of the image.

Application of neural networks for pattern recognition problems.

Handwriting recognition task

Given: 30x30 pixel black and white raster image of a letter

Necessary: determine what letter it is (there are 33 letters in the alphabet)

Formulation for the neural network:

Given: input vector of 900 binary characters (900=30x30)

Necessary: build a neural network with 900 inputs and 33

exits marked with letters. If the input of the neural network is the image of the letter "A", then the maximum value of the output signal is reached at the output "A". Similarly, the neural network works for all 33 letters.

Let us explain why it is required to choose the output of the neural network with the maximum signal level. The fact is that the level of the output signal, as a rule, can take any value from some segment. However, in this problem, we are not interested in the analog answer, but only in the category number (number of the letter in the alphabet). Therefore, the following approach is used - each category is assigned its own output, and the response of the neural network is the category at whose output the signal level is maximum. In a certain sense, the signal level at the output "A" is the certainty that the handwritten letter "A" was fed to the input of the neural network. Tasks in which it is necessary to assign input data to one of the known categories are called classification tasks. The outlined approach is a standard way to classify using neural networks.

How to build a neural network. Now that it has become clear what exactly we want to build, we can move on to the question "how to build such a neural network". This issue is resolved in two stages:

Choosing the type (architecture) of the neural network.

Selection of weights (training) of the neural network.

The first step is to choose the following:

which neurons we want to use (number of inputs, transfer functions);

how should they be connected to each other;

what to take as inputs and outputs of the neural network.

At first glance, this task seems overwhelming, but, fortunately, we do not have to invent a neural network from scratch - there are several dozen different neural network architectures, and the effectiveness of many of them has been mathematically proven. The most popular and studied architectures are multilayer perceptron, general regression neural network, Kohonen neural networks and others.

At the second stage, we should "train" the selected neural network, that is, select such values of its weights so that it works as needed. An untrained neural network is like a child - it can be taught anything. In neural networks used in practice, the number of weights can be several tens of thousands, so training is a really complex process. For many architectures, special learning algorithms have been developed that allow you to adjust the weights of the neural network in a certain way. The most popular of these algorithms is the Error Back Propagation method, used, for example, to train a perceptron.

The tasks successfully solved by the National Assembly at this stage of their development include:

recognition of visual, auditory images; a huge scope: from text recognition and targets on the radar screen to voice control systems;

associative information search and creation of associative models; speech synthesis; natural language formation;

formation of models and various non-linear and difficult to describe mathematical systems, forecasting the development of these systems in time:

application in production; forecasting the development of cyclones and other natural processes, forecasting changes in exchange rates and other financial processes;

predictive control and regulation systems; control of robots, other complex devices

various finite automata: queuing and switching systems, telecommunication systems;

decision making and diagnostics, excluding logical conclusion; especially in areas where

there are no clear mathematical models: in medicine, forensics, the financial sector;

Although there are effective mathematical methods for solving almost all of these problems, and despite the fact that neural networks lose to specialized methods for specific problems, due to their universality and promise for solving global problems, for example, building AI and modeling the thinking process, they are an important area of research that requires careful study.

From points on a plane and connections between them, you can build a lot of graphic shapes called graphs. If we think of each point as a single neuron, and the connections between the points as dendrites and synapses, then we get a neural network.

But not every connection of neurons will be efficient or generally expedient. Therefore, today there are only a few working and programmatically implemented neural network architectures. I will only briefly describe their device and the classes of tasks they solve.

According to the architecture of connections, neural networks can be grouped into two classes: direct distribution networks, in which the bonds have no loops Figure 1, and recurrent networks, in which feedbacks are possible Figure 3

Figure 2 Feedforward Neural Networks

Figure 3 Recurrent neural networks

Feedforward networks are subdivided into single-layer perceptrons (networks) and multilayer perceptrons (networks). The name of the perceptron for neural networks was invented by the American neurophysiologist F. Rosenblatt, who in 1957 invented the first neuroprocessor element (NPE), that is, a neural network. He also proved the convergence of the decision domain for the perceptron during its training. Immediately after this, vigorous research began in this area and the very first Mark I neurocomputer was created.

Multilayer networks differ in that there are several so-called hidden layers of neurons between the input and output data, which add more nonlinear connections to the model.

Consider the device of the simplest multilayer neural network. Any neural network consists of input layer and output layer. Accordingly, independent and dependent variables are presented. The input data is transformed by the neurons of the network and compared with the output. If the deviation is greater than the specified one, then the weights of connections between neurons and the threshold values of neurons change in a special way. Again, the process of calculating the output value and comparing it with the standard takes place. If the deviations are less than the specified error, then the learning process is terminated.

In addition to the input and output layers in a multilayer network, there are so-called hidden layers. They are neurons that do not have direct inputs of initial data, but are connected only with the outputs of the input layer and with the input of the output layer. Thus, the hidden layers further transform the information and add non-linearities to the models. To better understand the structure of a multilayer perceptron, see Figure 4

Figure 4 Multilayer Perceptron

If a single-layer neural network copes very well with classification tasks, since the output layer of neurons compares the values obtained from the previous layer with a threshold and gives a value of either zero, that is, less than the threshold value, or one - more than the threshold value (for the case of the threshold internal function of the neuron), and is not able to solve most practical problems (which was proved by Minsky and Papert), then a multilayer perceptron with sigmoid decision functions is able to approximate any functional dependence (this was proved in the form of a theorem). But at the same time, neither the required number of layers, nor the required number of hidden neurons, nor the time required to train the network is known. These problems are still faced by researchers and developers of neural networks. Personally, it seems to me that all the enthusiasm in the use of neural networks is based precisely on the proof of this theorem. Subsequently, I myself will show how neurons can model various classes of functions, but I do not claim to be complete proof.

The class of recurrent neural networks is much more extensive, and the networks themselves are more complex in their structure.

The behavior of recurrent networks is described by differential or difference equations, usually of the first order. This greatly expands the scope of neural networks and how to train them. The network is organized in such a way that each neuron receives input from other neurons, possibly both from itself and from the environment. This type of network is important because it can be used to model non-linear dynamic systems.

Recurrent networks include Hopfield networks and Kohonen networks.

Using Hopfield networks, you can process unordered (handwritten letters), ordered in time (time series) or space (graphs) patterns. A recurrent neural network of the simplest kind was introduced by Hopfield and it was built from N neurons, each connected to everyone except itself, and all neurons are output. The Hopfield neural network can be used as an associative memory. Hopfield network architecture is shown in Figure 5

Figure 5 Hopfield Network Architecture

The Kohonen network is also called a "self-organizing feature map". A network of this type is designed for independent learning during training, it is not necessary to tell it the correct answers. During the learning process, various samples are fed to the input of the network. The network captures the features of their structure and divides the samples into clusters, and the already trained network assigns each newly incoming example to one of the clusters, guided by some "proximity" criterion. The network consists of one input and one output layer. The number of elements in the output layer directly determines how many different clusters the network can recognize. Each of the output elements receives the entire input vector as input. As in any neural network, each connection is assigned a certain synaptic weight. In most cases, each output element is also connected to its neighbors. These intralayer connections play an important role in the learning process, since the weights are adjusted only in the vicinity of the element that best responds to the next input. The output elements compete with each other for the right to take action and "learn the lesson". The one whose weight vector is closest to the input vector wins.

Neural networks can be classified as follows:

The nature of learning

The classification of neural networks according to the nature of learning divides them into:

neural networks using supervised learning;
neural networks using unsupervised learning.

Let's consider this in more detail.

Neural networks using supervised learning. Supervised learning assumes that for each input vector, there is a target vector representing the required output. Together they are called the training pair. Typically, the network is trained on a certain number of such training pairs. An output vector is presented, the network output is computed, and compared with the corresponding target vector. Next, the weights are changed in accordance with an algorithm that seeks to minimize the error. The vectors of the training set are presented sequentially, the errors are calculated, and the weights are adjusted for each vector until the error over the entire training array reaches an acceptable level.

Neural networks using unsupervised learning. Unsupervised learning is a much more plausible learning model in terms of the biological roots of artificial neural networks. Developed by Kohonen and many others, it does not need a target vector for the outputs and therefore does not require comparison with predefined ideal responses. The training set consists of only input vectors. The learning algorithm adjusts the weights of the network so that consistent output vectors are obtained, i.e., that the presentation of sufficiently close input vectors gives the same outputs. The learning process therefore extracts the statistical properties of the training set and groups similar vectors into classes.

Scale setting

networks with fixed connections - the weight coefficients of the neural network are selected immediately, based on the conditions of the problem;
networks with dynamic connections - for them, in the learning process, synaptic weights are adjusted.

Type of input information

analog - input information is presented in the form of real numbers;
binary - all input information in such networks is represented as zeros and ones.

Applied neural network model

Feedforward networks - all connections are directed strictly from input neurons to output neurons. Such networks include, for example: the simplest perceptron (developed by Rosenblatt) and the multilayer perceptron.

Recurrent neural networks - the signal from the output neurons or neurons of the hidden layer is partially transmitted back to the inputs of the input layer neurons.

Radial basis functions are a type of neural network that has a hidden layer of radial elements and an output layer of linear elements. Networks of this type are quite compact and quickly trained. Suggested by Broomhead and Lowe (1988) and Moody and Darkin (1989). The radial basis network has the following features: one hidden layer, only the neurons of the hidden layer have a nonlinear activation function, and the synaptic weights of the input and hidden layers are equal to one.

Self-organizing maps or Kohonen networks - such a class of networks, as a rule, is trained without a teacher and is successfully used in recognition problems. Networks of this class are able to detect novelty in the input data: if after training the network encounters a data set that is unlike any of the known samples, then it will not be able to classify such a set and thereby reveal its novelty. The Kohonen network has only two layers: input and output, composed of radial elements.

Lecture #4

Topology of neural networks.

Neural networks, from the point of view of the topological section, can be divided into 3 types:

1. Fully connected networks.

Artificial neural network, each neuron transmits its output signal to other neurons and to itself. All input signals are transmitted to all neurons. As the output signals of the network, there can be all or several output neurons, after a certain number of cycles of the network functioning.

2. Multilayer networks (layered).

They consist of neurons united in a network, the layer contains a set of neurons with common output signals. In this case, the number of layers and the number of neurons in each layer can be arbitrary, and it is not associated in advance with the number of neurons in other layers. However, it is limited by the resources of a PC or a specialized microcircuit, on which a neural network is usually implemented.

If the network consists of Q layers, then they are numbered from left to right. The external inputs are superimposed on the inputs of the first layer, with the input layer often numbered as the zero layer and the summation, and no signal conversion is performed here.

The outputs of the network are the output signals of the last layer, in addition to the input and output layers in a multilayer neural network, there are one or more intermediate layers called hidden layers.

A neural network with hidden layers allows you to highlight global data connections due to the presence of additional synoptic connections and an increase in the level of interaction between neurons.

3. Weak networks.

Multilayer neural networks are divided into the following types:

Monotone neural networks.

These are neural networks, which are a special case of multilayer networks with additional conditions for connections and elements. Each layer of the network, except for the output layer, is divided into 2 blocks: BUT) Exciting B) Braking.

Connections between blocks are also divided into inhibitory and excitatory. Let off the block BUT to the block B there are only excitatory connections, which means that any output signal of the block B is a monotonic non-decreasing function of any block output signal BUT, if these connections are only inhibitory, then any output signal of the block B is a monotonically non-increasing function of any block output signal BUT. It is important that the elements of monotone networks require a monotone dependence of the output signal of the element on the parameters of the input signals.

2. Neural networks without feedback

In these networks, the neurons of the input layer, having received the input signals, transform them and transmit them to the neuron of the first hidden layer, then the first hidden layer fires, and so on, up to the Qth layer, which produces output signals.

The classic variant of multilayer networks are feed-forward networks, which are called multilayer perceptrons. Over 80% of neural network applications belong to multilayer networks without feedback.

Rice. one

Neural networks with feedback in these networks, information from subsequent layers is transmitted to the next layers.

The concept of feedback is typical for dynamic networks in which the output signal of some element of the system affects the input signal of this element.

Thus, some external signals are amplified by signals circulating within the system. In fact, feedback is present in the nervous system of almost any animal. It plays an important role in the study of a special class of neural networks called recurrent. These networks are built from dynamic neurons whose behavior is described by differential or separation equations, usually of the first order.

Neural networks with feedback include, for example, Elman networks (Fig. 2) and Giordano networks (Fig. 3)

Exit

Rice. 3

Giordano's network

It should be noted that the problem of synthesizing an artificial neural network is highly dependent on the problem being solved.

There is no formal algorithm for determining the required architecture.

Often the optimal version of the neural network can be obtained by intuitive selection. In practice, they often choose either a deliberately small neural network and gradually increase it, or a deliberately large one and gradually reduce it, revealing unused connections.

Training of neural networks.

The neural network is an adaptive system.

Its cycle consists of 2 phases: learning (training) and network operation.

Thus, a neural network, before being used in practice to solve any problem, must be trained. The ability to learn from environmental data and, as a result of learning, improve one's performance is the most important property of neural networks. The ability of the network to solve the problems posed by the front during the work phase depends on how well the phase will be carried out, the training of the neural network.

Learning theory considers 3 fundamental properties associated with learning a neural network by example:

1) Capacity - it determines how many images the network can remember and what functions and decision boundaries can be formed on it.

2) The complexity of the images - it determines the number of training examples needed to achieve the ability of the neural network to generalize.

3) Computational complexity - an important characteristic is the time spent on training. As a rule, training time and training quality are inversely related. These parameters must be chosen by compromise.

There are many activities associated with the concept of learning. In this regard, it is difficult to give this process an unambiguous definition.

From the perspective of a neural network, the following definition can be used:

Learning is a process in which the free parameters of a neural network are tuned under the simulation tool of the environment in which this network is embedded. The type of learning is determined by how these parameters are adjusted. This definition of the learning process assumes the following sequence of events:

BUT) The neural network receives stimuli from the external environment

B) As a result, the free parameters of the neural network change

AT) After changing the internal structure, the neural network responds to excitation in a different way.

This list of clear rules for solving a learning problem is called a learning algorithm. There is no universal learning algorithm suitable for all neural network architectures.

Learning algorithms differ from each other in the way they adjust synoptic neuron weights and thresholds. A distinctive characteristic is the way the trained neural network communicates with the outside world. In this context, one speaks of a learning paradigm associated with a model of the environment in which a given neural network operates.

The set of learning algorithms is divided into 2 classes: Deterministic (given) and Stochastic (probabilistic). In the first of them, the adjustment of the synoptic weights of neurons is a rigid sequence of actions, and in the second, it is based on actions that are subject to some random process.

Neural Network Training Paradigms

There are 3 training paradigms for neural networks:

1) Supervised Learning (Supervised Learning)

2) Unsupervised learning (Self-learning)

3) Mixed (With and without a teacher)

Learning with a teacher

Most neural network models provide for the presence of a teacher. A teacher can be understood as a set of training data (training set) or an external observer who determines the value of the output.

Supervised neural networks are tools for extracting information about the relationships between the outputs and inputs of a neural network from a data set. The quality of the neural network depends on the set of training data presented to it during the training process, while the training data should be typical for the task the neural network is learning to solve.

The data that is commonly used to train a neural network often falls into 2 categories:

some data is used for training and the other for testing. Therefore, the quality of network training directly depends on the number of examples in the training set and on how well these examples describe the problem being solved.

A neural network is a collection of neuron-like elements connected in a certain way to each other and to the external environment using connections determined by weight coefficients. Depending on the functions performed by neurons in the network, three types can be distinguished:

1. Input neurons, to which a vector encoding an input action or an image of the external environment is supplied, they usually do not carry out computational procedures, and information is transmitted from input to output by changing their activation;

2. Output neurons, the output values of which represent the outputs of the neural network; transformations in them are carried out by expressions. They carry an important function of bringing the value of the network output to the required interval (this is done using the activation function);

3. Intermediate neurons, which form the basis of neural networks, transformations in which are also performed by expressions.

In most neural models, the type of a neuron is related to its location in the network. If a neuron has only output connections, then it is an input neuron, if vice versa, it is an output neuron. However, it is possible that the output of a topologically internal neuron is considered as part of the output of the network. During the operation of the network, the input vector is transformed into an output vector, some processing of information is carried out. The specific type of data transformation performed by the network is determined not only by the characteristics of neuron-like elements, but also by the features of its architecture, namely, the topology of interneuronal connections, the choice of certain subsets of neuron-like elements for input and output of information, the methods of network training, the presence or absence of competition between neurons, the direction and methods control and synchronization of information transfer between neurons.

From the point of view of topology, three main types of neural networks can be distinguished: 1) fully connected (Fig. 4, a); 2) multilayer or layered (Fig. 4, b); 3) weakly connected (with local connections) (Fig. 4, c).

Rice. 4. Architectures of neural networks: a - fully connected network; b - multilayer network with serial connections;

c - loosely connected networks

In fully connected neural networks, each neuron transmits its output signal to other neurons, including itself. All input signals are fed to all neurons. The output signals of the network can be all or some of the output signals of neurons after several clock cycles of the network.

In multilayer neural networks, neurons are combined into layers. The layer contains a set of neurons with common input signals. The number of neurons in a layer can be any and does not depend on the number of neurons in other layers. In general, the network consists of Q layers numbered from left to right. External input signals are fed to the inputs of the neurons of the input layer, and the outputs of the network are the output signals of the last layer. In addition to the input and output layers, a multilayer neural network has one or more hidden layers. Within one layer, the same activation function is used.

In loosely coupled neural networks, neurons are located at the nodes of a rectangular or hexagonal lattice. Each neuron is connected to four (Von Neumann neighborhood), six (Golay neighborhood), or eight (Moore neighborhood) of its nearest neighbors.

1.2. The history of the emergence of artificial neural networks

As a scientific subject, artificial neural networks first announced themselves in the 40s. In an effort to reproduce the functions of the human brain, the researchers created simple hardware (and later software) models of a biological neuron and its system of connections, which were called perceptrons. As neurophysiologists gained a deeper understanding of the human nervous system, these early attempts began to be seen as very crude approximations. Nevertheless, it was on perceptrons that the first impressive results were achieved, stimulating further research, which led to the creation of more sophisticated networks. The first systematic study of artificial neural networks was undertaken by McCulloch and Pitts in 1943. The simple neural model shown in the figure below was used in most of their work. The input receives only a binary signal, i.e. either 0 or 1. The element ∑ multiplies each X N input by the weight W N and sums the weighted inputs. If this sum is greater than the specified threshold value, the output is equal to one, otherwise it is equal to zero (Fig. 5).

Rice. 5. Simple neural model

Such systems and many others like them are called perceptrons. Perceptrons consist of a single layer (i.e., the number of layers of neurons between input X and output OUT is one) of artificial neurons connected by weights to multiple inputs (see Fig. 6).

Rice. 6. An example of a perceptron

The circle vertices on the left side of the figure serve only to distribute the input signals. They do not perform any calculations and are therefore not considered a layer. For this reason, they are indicated as a circle to distinguish them from the calculating neurons (adders), indicated by squares.

Perceptron theory is the basis for many other types of artificial neural networks, and perceptrons themselves are a logical starting point for the study of artificial neural networks.

Frank Rosenblatt proposed a scheme for a device that simulates the process of human perception, and called it the "perceptron". The perceptron transmitted signals from photocells, which were a sensory field, to blocks of electromechanical memory cells. These cells were randomly connected to each other in accordance with the principles of connectivism. In 1957, the Cornell Aeronautics Laboratory successfully completed the simulation of the perceptron on the IBM 704 computer, and two years later, on June 23, 1960, the first neurocomputer, Mark-1, was demonstrated at Cornell University, which was able to recognize some letters of the English alphabet. . To "teach" the perceptron to classify images, a special iterative trial and error learning method was developed, reminiscent of the human learning process - the error correction method. In addition, when recognizing a particular letter, the perceptron could highlight the characteristic features of the letter, which are statistically more common than insignificant differences in individual cases. Thus, the perceptron was able to generalize letters written in various ways (handwriting) into one generalized image. However, the capabilities of the perceptron were limited: the machine could not reliably recognize partially closed letters, as well as letters of a different size, located with a shift or rotation, than those used at the stage of its training.

1.3 Scope of the perceptron

Currently, the principles of perceptrons are used:

in the construction of special technical devices;

· when creating computer programs that allow simulating the operation of perceptrons in the modes of learning and recognition of visual (handwritten text, drawings and portraits), auditory and other images.

in medical and technical diagnoses;

· interpretation of geophysical data, aerial photography;

weather forecast

Improvement of industrial robots.

1.4 Perceptron training

The perceptron is some tool that is able to "remember" ("learn") - which image belongs to which class. After such "training", ideally, he should be able to correctly "recognize" other images that were not included in the training set, but are sufficiently similar to them, or report that the image is not similar to any of the training images. The degree of "sufficient similarity" is determined by the success of the choice of features of its set. And the ability of the perceptron to "learn" depends on the separability of the features of the set, that is, on the uniqueness of the sets of features up to a class (in other words, whether the areas limiting their classes intersect).

Among all the interesting properties of artificial neural networks, none captures the imagination as much as their ability to learn. Their training resembles the process of intellectual development of the human personality to such an extent that it may seem that we have achieved a deep understanding of this process. The ability to train artificial neural networks is limited, and there are many difficult tasks to solve to determine if we are on the right track.

The purpose of training

The network is trained to give the desired (or at least consistent) set of outputs for some set of inputs. Each such input (or output) set is considered as a vector. Training is carried out by successive presentation of input vectors with simultaneous adjustment of the weights in accordance with a certain procedure. During the learning process, the weights of the network gradually become such that each input vector generates an output vector.

There are learning algorithms with a teacher, without a teacher and mixed.

· Supervised learning assumes that for each input vector there is a target vector representing the required output. Together they are called the training pair. Typically, the network is trained on a certain number of such training pairs. An output vector is presented, the output of the network is computed and compared with the appropriate target vector, the difference (error) is fed back into the network, and the weights are modified according to an algorithm that seeks to minimize the error. The vectors of the training set are presented sequentially, the errors are calculated, and the weights are adjusted for each vector until the error over the entire training array reaches an acceptably low level.

· Unsupervised learning is a much more plausible learning model for a biological system, it does not need a target vector for outputs and therefore does not require comparison with predefined ideal responses. The training set consists of only input vectors. The learning algorithm adjusts the network weights so that consistent output vectors are obtained, i.e. so that the presentation of sufficiently close input vectors gives the same outputs. The learning process therefore extracts the statistical properties of the training set and groups similar vectors into classes. Presenting a vector from a given class as input will give a certain output vector, but before learning it is impossible to predict what output will be produced by a given class of input vectors. Therefore, the outputs of such a network must be transformed into some understandable form, due to the learning process. This is not a serious problem. It is usually not difficult to identify the connection between input and output established by the network.

· In blended learning, part of the weights is determined through supervised learning, and the other part is obtained using self-learning algorithms.

2. Perceptron programming

2.1 Perceptron operation algorithm

Perceptron training consists in building weight coefficients w i , where I=0.1.....n. After training, the perceptron must divide all the images offered to it for recognition into two classes: one of which will have a zero value at the output, and the other one - a single one.

Step 1. Random number generator for all synaptic scales wj(j=1 ,…, n) and neuron sensitivity threshold to assign some small random values.

Step 2 Show the perceptron "Cross" or "Naught".

Step 3 Neuron performs weighted summation of input signals

and produces an output signal y=1 or y=0.

Step 4 a. If the output signal is correct, then go to step 2 .

Step 4b. If the output signal is incorrect and equals zero, then increase the weights of active inputs, add to each j-th synaptic weight the value of the j-th input signal.

Step 4. If the output signal is incorrect and equals one, then reduce the weights of the active inputs.

Step 5 Go to Step 2 or complete the learning process.

This algorithm surprisingly resembles the process of teaching a child or schoolchild by the “reward-punishment” method (or training an animal using the “carrot and stick” method). As in the case of a child trained by this method, the learning algorithm of the perceptron in a finite number of attempts can lead to the goal - the perceptron will eventually acquire the necessary knowledge, encode them in the form of specific values of the matrix of synaptic connection strengths w j and, thus, learn to distinguish " Cross or Zero. Naturally, the question arises whether the perceptron learning algorithm always leads to the desired result. The answer to this question is given by the perceptron convergence theorem:

If there is a set of weight values that provide a particular image discrimination, then the perceptron learning algorithm eventually leads either to this set or to an equivalent set such that this image discrimination is achieved.

2.2 Demonstration of training on the compiled computer program

With the help of the program, you can train the perceptron to recognize "crosses" and "zeros". The main windows of the program are shown in fig. 7.

The training takes place in the following way. The user, who acts as a teacher, draws an image of a "cross" or "naught" with the mouse. Then he presses the “recognize” button, after which the formation of a binary image takes place (the colored cell corresponds to 1, the uncolored cell corresponds to 0).

A result is generated based on the inputs and weights. Upon receipt of 1 at the output, it is considered that the perceptron recognized the "cross", upon receipt of 0 - "zero". The recognition result is displayed in the second window. In the event of a perceptron error, the user presses the "No" button in the result output window, thereby indicating to the program the correct answer (the opposite of the one received by the program), which it remembers. If the program answers correctly, the user clicks on the "Yes" button, confirming the correctness of the result.

Rice. 7. Main program windows

2.3 Perceptron training outcomes

After performing a sufficiently large number of iterations, the perceptron learned to accurately recognize the images that participated in the training. Thus, the hypothesis was confirmed that a computer built in the image and likeness of the human brain will be able to solve intellectual problems and, in particular, solve the problem of pattern recognition.

In addition to the fact that the perceptron has learned to recognize familiar images, i.e. those images that were shown to him in the learning process, he successfully coped with the recognition of images that he "saw" for the first time. It turned out that the perceptron was able to recognize images, with slight distortion.

CONCLUSION

Modern artificial neural networks are devices that use a huge number of artificial neurons and connections between them. Despite the fact that the ultimate goal of the development of neural networks - a complete simulation of the human thinking process has not been achieved, they are already being used to solve many problems of image processing, control of robots and continuous production, for understanding and synthesizing speech, for diagnosing human diseases. and technical malfunctions in machines and devices, for predicting exchange rates, etc.

Unlike digital microprocessor systems, which are complex combinations of processor and memory units, based on the use of neural networks, neuroprocessors contain memory distributed in links between very simple processors. Thus, the main load on the performance of specific functions by processors falls on the architecture of the system, the details of which, in turn, are determined by interneuronal connections. Prototypes of neurocomputers based on such a structure provide a standard way to solve many non-standard tasks.

It should be noted that the main purpose of perceptrons is to solve classification problems. They do an excellent job of classifying linearly separable vectors; convergence is guaranteed in a finite number of steps. The duration of training is sensitive to outliers in the length of individual vectors, but even in this case a solution can be constructed. A single layer perceptron can only classify linearly separable vectors. Possible ways to overcome this difficulty involve either preprocessing to form a linearly separable set of input vectors, or using multilayer perceptrons. Other types of neural networks can also be applied, such as linear networks or backpropagation networks, which can classify input vectors that are linearly inseparable.

During the course work, the following was done:

1. The concept of a perceptron and the classification of neural networks are considered

2. The history of the emergence of artificial neural networks has been studied.

3. The scope of the perceptron is considered.

4. Various methods for training the perceptron are shown.

5. The algorithm of the computer program "Perceptron" is described.

6. Presented a demonstration of the training program.

Thus, it can be argued that the goal of the course work was achieved and all tasks were successfully solved.

Literature

1. Intuit / National Open University "INTUIT" / © NOU "INTUIT", 2003 - 2013. - Access mode: http://www.intuit.ru/, free. - Zagl. from the screen.

2. Lecture notes on the course / Fundamentals of designing artificial intelligence systems / © Sotnik S. L., 1997-1998. - Access mode: http://www.iskint.ru/?xid=books/sotnik/-part3, free. - Zagl. from the screen.

3. Guidelines for performing laboratory work / Part 1 / Modeling artificial neural networks in the MATLAB system / Publishing house of the Penza State University / Penza 2005.

4. National Psychological Encyclopedia / Dushkov B.A., Korolev A.V., Smirnov B.A. / Encyclopedic Dictionary: Psychology of work, management, engineering psychology and ergonomics, 2005 - Access mode: http://vocabulary.ru//, free. - Zagl. from the screen.

Artificial Intelligence Portal / Project www.AIportal.ru © 2009-2014 / Access mode http://www.aiportal.ru is free. - Zagl. from the screen. - Yaz. Russian

Similar information.

Classification of neural networks. What are artificial neural networks

2.7 Explanation of decisions

23. Neural networks. Types of neural networks. Algorithms for training neural networks. Application of neural networks for pattern recognition problems.