Big data examples. What is Big Data for?

You know this famous joke? Big Data is like sex under 18:

  • everyone thinks about it;
  • everyone is talking about it;
  • everyone thinks their friends are doing it;
  • almost nobody does it;
  • the one who does it does it badly;
  • everyone thinks that next time it will turn out better;
  • no one takes security measures;
  • anyone is ashamed to admit that he does not know something;
  • if someone succeeds, it always makes a lot of noise.

But let's be honest, with any hype around there will always be the usual curiosity: what kind of fuss and is there something really important there? In short, yes, there is. Details are below. We have selected for you the most amazing and interesting applications of Big Data technologies. This is a little market research on clear examples confronts with a simple fact: the future does not come, there is no need to "wait another n years and the magic will become reality." No, it has already come, but it is still imperceptible to the eye, and therefore the sintering of the singularity does not yet burn the well-known point of the labor market so much. Go.

1 How Big Data technologies are applied where they originated

Big IT companies are where data science was born, so their innards are most interesting in this area. The Google Campaign, home to the Map Reduce paradigm, whose sole purpose is to educate its programmers in machine learning technologies. And this is their competitive advantage: after gaining new knowledge, employees will implement new methods in those Google projects where they constantly work. Imagine the huge list of areas in which a campaign can revolutionize. One example: neural networks are used.

Corporation and implements machine learning in all of their products. Its advantage is the presence of a large ecosystem, which includes all digital devices used in everyday life. This allows Apple to reach an impossible level: the campaign has as much user data as any other. At the same time, the privacy policy is very strict: the corporation has always boasted that it does not use customer data for advertising purposes. Accordingly, user information is encrypted so that Apple lawyers or even the FBI with a warrant cannot read it. Here you will find a great overview of Apple's AI developments.

2 Big Data on 4 wheels

A modern car is an accumulator of information: it accumulates all data about the driver, environment connected devices and about yourself. Soon, one vehicle, which is connected to a network like this one, will generate up to 25 GB of data per hour.

Transport telematics has been used by car manufacturers for many years, but more complex method data collection that takes full advantage of Big Data. This means that technology can now alert the driver to bad road conditions by automatically activating anti-lock braking and slip systems.

Other concerns, including BMW, are using Big Data technology in conjunction with information gathered from test prototypes, a built-in “error memory” system in cars and customer complaints to determine early in production. weak spots models. Now, instead of manually evaluating data that takes months, a modern algorithm is applied. Errors and troubleshooting costs are reduced, which speeds up information analysis workflows in BMW.

According to expert assessments, by 2019 the turnover of the market connected in single network cars will reach $ 130 billion. This is not surprising when you consider the pace of integration by automakers of technologies that are an integral part of the vehicle.

The use of Big Data helps to make the machine safer and more functional. So, Toyota by embedding information communication modules (DCM). This tool, used for Big Data, processes and analyzes the data collected by DCM in order to further benefit from it.

3 Application of Big Data in Medicine


The implementation of Big Data technologies in the medical field allows doctors to more carefully study the disease and choose effective rate treatment for a specific case. Thanks to the analysis of information, it becomes easier for health care workers to predict relapse and take preventive measures. The result is more accurate diagnosis and improved treatments.

The new technique made it possible to look at patients' problems from a different angle, which led to the discovery of previously unknown sources of the problem. For example, some races are genetically more prone to heart disease than other ethnic groups. Now, when a patient complains about a certain disease, doctors take into account the data on members of his race who complained about the same problem. Collecting and analyzing data allows you to learn much more about patients: from food preferences and lifestyle to the genetic structure of DNA and metabolites of cells, tissues, organs. For example, the Center for Pediatric Genomic Medicine in Kansas City uses patients and analyzes for mutations in the genetic code that cause cancer. An individual approach to each patient, taking into account his DNA, will raise the effectiveness of treatment to a qualitatively new level.

Understanding how Big Data is used is the first and most important change in the medical field. When a patient is undergoing treatment, a hospital or other healthcare facility can gain a lot of meaningful information about the person. The information collected is used to predict disease recurrence with a certain degree of accuracy. For example, if a patient has suffered a stroke, doctors study information about the time of cerebrovascular accident, analyze the intermediate period between previous precedents (if any), paying special attention to stressful situations and heavy physical activity in the patient's life. Based on this data, hospitals give the patient a clear plan of action to prevent the possibility of a stroke in the future.

Wearable devices also play a role in helping to identify health problems, even if a person does not have obvious symptoms of a particular disease. Instead of assessing the patient's condition through a long course of examinations, the doctor can draw conclusions based on the information collected by the fitness tracker or smart watch.

One of the latest examples is. While the patient was being examined for a new seizure due to missed medication, doctors discovered that the man had a much more serious health problem. This problem turned out to be atrial fibrillation. The diagnosis was made thanks to the fact that the department staff got access to the patient's phone, namely to the application paired with his fitness tracker. The data from the application turned out to be a key factor in determining the diagnosis, because at the time of the examination, the man had no cardiac abnormalities.

This is just one of the few cases that shows why using Big Data in the medical field today plays such a significant role.

4 Data analysis has already become the backbone of retail

Understanding user queries and targeting is one of the largest and most widely publicized areas of application of Big Data tools. Big Data helps you analyze customer habits in order to better understand consumer needs in the future. Companies are looking to expand the traditional dataset with social media and browser search history to create the fullest possible customer picture. Sometimes large organizations choose to create their own predictive model as a global goal.

For example, Target chain stores with the help of deep data analysis and their own forecasting system manage to determine with high accuracy -. Each client is assigned an ID, which in turn is tied to a credit card, name or email. The identifier serves as a kind of shopping cart, where information is stored about everything that a person has ever purchased. The network specialists found that women in the position actively acquire non-scented products before the second trimester of pregnancy, and during the first 20 weeks they lean on calcium, zinc and magnesium supplements. Based on the data received, Target sends coupons for baby products to customers. The very same discounts on goods for children are "diluted" with coupons for other products, so that offers to buy a crib or diapers do not look too intrusive.

Even government departments have found a way to use Big Data technologies to optimize election campaigns. Some believe that Barack Obama's victory in the 2012 US presidential election was due to the excellent work of his team of analysts, who processed huge amounts of data in the right way.

5 Big data on guard of law and order


Over the past few years, law enforcement agencies have figured out how and when to use Big Data. It is a common knowledge that the National Security Agency uses Big Data technology to prevent terrorist attacks. Other agencies are using progressive methodology to prevent smaller crimes.

The Los Angeles Police Department is applying. She is involved in what is commonly referred to as proactive law enforcement. Using crime reports for certain period time, the algorithm identifies areas where the likelihood of committing offenses is greatest. The system marks such areas on the city map with small red squares and this data is immediately transmitted to the patrol cars.

Cops chicago use Big Data technologies in a slightly different way. City of Winds law enforcement has the same, but it aims to delineate a “circle of risk” of people who may be the victim or participant in an armed attack. According to The New York Times, this algorithm assigns a vulnerability assessment to a person based on their criminal history (arrests and participation in shootings, belonging to criminal gangs). The developer of the system assures that while the system studies a person's criminal history, it does not take into account minor factors such as race, gender, ethnicity and location of a person.

6 How Big Data technologies help cities develop


General manager Veniam João Barros shows a map of the tracking of Wi-Fi routers on buses in Porto

Data analysis is also used to improve a number of aspects of the functioning of cities and countries. For example, knowing exactly how and when to use Big Data technologies, you can optimize transport flows. For this, the online movement of vehicles is taken into account, social media and meteorological data are analyzed. Today, a number of cities have embarked on the use of data analysis to integrate transport infrastructure with other types of utilities into a single whole. It is a smart city concept in which buses wait for a late train and traffic lights are able to predict traffic congestion in order to minimize congestion.

Long Beach uses Big Data technologies to operate smart water meters that are used to curb illegal irrigation. Previously, they were used to reduce the consumption of water by private households (the maximum result is a reduction of 80%). Saving fresh water- the question is always relevant. Especially when the state is experiencing the worst drought ever recorded.

Representatives of the Los Angeles Department of Transportation have joined the list of those who use Big Data. Based on the data received from the sensors of the traffic cameras, the authorities monitor the operation of traffic lights, which in turn allows regulating traffic. Under the control of a computerized system, there are about 4,500,000 traffic lights throughout the city. According to official figures, the new algorithm helped reduce congestion by 16%.

7 The engine of progress in marketing and sales


In marketing, Big Data tools allow you to identify which ideas are most effective at a particular stage of the sales cycle. Data analysis identifies how investments can improve customer relationship management, what strategy to use to increase conversion rates, and how to optimize the customer lifecycle. In the cloud business, Big Data algorithms are used to figure out how to minimize customer acquisition costs and increase customer lifecycle.

Differentiation of pricing strategies depending on the intra-system level of the client is perhaps the main thing for which Big Data is used in the field of marketing. McKinsey found that about 75% of the average firm's revenue comes from basic products, 30% of which are priced incorrectly. A 1% price increase translates into an 8.7% increase in operating profit.

Forrester's research team determined that data analysis allows marketers to focus on how to improve customer relationships. By examining the direction of customer development, specialists can assess their level of loyalty, as well as extend the life cycle in the context of a particular company.

Optimizing sales strategies and steps to entering new markets using geo-analytics are reflected in the biopharmaceutical industry. According to McKinsey, drug companies spend an average of 20 to 30% of their profits on administration and sales. If businesses start more active use Big Data to identify the most profitable and fastest growing markets, costs will be cut immediately.

Data analysis is a means for companies to gain a complete understanding of key aspects their business. Increasing revenues, lowering costs, and reducing working capital are three challenges that modern businesses are trying to solve with analytical tools.

Finally, 58% of CMOs claim that the implementation of Big Data technologies can be traced in search engine optimization (SEO), e-mail and mobile marketing, where data analysis plays the most significant role in the formation of marketing programs. And only 4% fewer respondents are confident that Big Data will play a significant role in all marketing strategies over the years.

8 Analyzing data on a planetary scale

No less curious is. It is possible that machine learning will ultimately be the only force capable of maintaining a delicate balance. The topic of human influence on global warming is still a matter of much controversy, so only reliable predictive models based on the analysis of a large amount of data can give an accurate answer. Ultimately, reducing emissions will help us all as well: we will spend less on energy.

Now Big Data is not abstract concept, which, perhaps, will find its application in a couple of years. This is a perfectly working set of technologies that can be useful in almost all spheres of human activity: from medicine and policing to marketing and sales. The stage of active integration of Big Data into our daily life has just begun, and who knows what the role of Big Data will be in a few years?

Big data (or Big Data) is a collection of methods for working with huge amounts of structured or unstructured information. Big data specialists process and analyze it to get visual, human-readable results. Look At Me spoke with professionals and found out what is the situation with big data processing in Russia, where and what is better for those who want to work in this area to learn.

Alexey Ryvkin on the main directions in the field of big data, communication with customers and the world of numbers

I studied at the Moscow Institute of Electronic Technology. The main thing that I managed to get out of there was fundamental knowledge in physics and mathematics. Simultaneously with my studies, I worked in the R&D center, where I was engaged in the development and implementation of algorithms anti-jamming coding for means of secure data transmission. After completing my bachelor's degree, I entered the master's degree in business informatics High school economy. After that, I wanted to work at IBS. I was lucky that at that time due to big amount projects went additional set interns, and after several interviews I started working for IBS, one of the largest Russian companies in this field. In three years, I have gone from an intern to an enterprise solutions architect. Now I am developing the expertise of Big Data technologies for customer companies from the financial and telecommunications sectors.

There are two main specializations for people who want to work with big data: analysts and IT consultants who create technologies for working with big data. In addition, you can also talk about the profession of Big Data Analyst, that is, people who directly work with data, with the customer's IT platform. Previously, these were ordinary analysts-mathematicians who knew statistics and mathematics and using statistical software to solve data analysis problems. Today, in addition to knowledge of statistics and mathematics, an understanding of technology and the data life cycle is also required. This, in my opinion, is the difference between modern Data Analyst and those analysts who were before.

My specialization is IT consulting, that is, I come up with and offer customers ways to solve business problems using IT technologies. People come to consulting with different experiences but the most important qualities for this profession it is the ability to understand the needs of the client, the desire to help people and organizations, good communication and team skills (since this is always working with the client and in a team), good analytical skills. Internal motivation is very important: we work in a competitive environment, and the customer expects unusual solutions and interest in work.

Most of my time is spent talking with customers, formalizing their business needs, and helping to design the most appropriate technology architecture. The selection criteria here have their own peculiarity: in addition to functionality and TCO (Total cost of ownership), non-functional requirements for the system are very important, most often it is response time, information processing time. To convince the customer, we often use the proof of concept approach - we offer to “test” the technology for free on some task, on a narrow set of data, to make sure that the technology works. The solution should create a competitive advantage for the customer by obtaining additional benefits (for example, x-sell, cross-selling) or solve some problem in the business, for example, reduce a high level of credit fraud.

It would be much easier if clients came with a ready-made task, but so far they do not understand that a revolutionary technology has appeared that can change the market in a couple of years

What problems do you have to face? The market is not yet ready to use big data technologies. It would be much easier if customers came with a ready-made task, but they do not yet realize that a revolutionary technology has appeared that can change the market in a couple of years. That is why we, in fact, work in a startup mode - we do not just sell technologies, but also every time we convince clients that they need to invest in these solutions. This is the position of visionaries - we show customers how they can change their business with the involvement of data and IT. We create this new market- the market of commercial IT consulting in the field of Big Data.

If a person wants to engage in data analysis or IT consulting in the field of Big Data, then the first thing that is important is a mathematical or technical education with good mathematical training. It is also helpful to become familiar with specific technologies such as SAS, Hadoop, R, or IBM solutions. In addition, one should be actively interested in applied tasks for Big Data - for example, how they can be used for improved credit scoring at a bank or customer lifecycle management. This and other knowledge can be obtained from available sources: for example, Coursera and Big Data University. There is also a Customer Analytics Initiative at Wharton University of Pennsylvania, where a lot of interesting materials have been published.

A serious problem for those who want to work in our field is the obvious lack of information about Big Data. You cannot go to a bookstore or to some website and get, for example, an exhaustive collection of cases on all applications of Big Data technologies in banks. There are no such reference books. Part of the information is in books, another part is collected at conferences, and some have to be reached by ourselves.

Another problem is that analysts are fine in the world of numbers, but they are not always comfortable in business. These people are often introverted, have difficulty communicating, and therefore have difficulty communicating research results convincingly to clients. To develop these skills, I would recommend books such as The Pyramid Principle, Speak the Language of Diagrams. They help develop presentation skills, concisely and clearly express your thoughts.

Participation in various case championships during my studies at the Higher School of Economics helped me a lot. Case championships are intellectual competitions for students to study business problems and propose solutions. They come in two flavors: case championships for consulting firms such as McKinsey, BCG, Accenture, and independent case championships such as Changellenge. During my participation in them, I learned to see and solve complex problems - from identifying a problem and structuring it to defending recommendations for solving it.

Oleg Mikhalskiy on the Russian market and the specifics of creating a new product in the field of big data

Before joining Acronis, I was already involved in new product launches at other companies. It is always interesting and difficult at the same time, so I was immediately interested in the possibility of working on cloud services and data storage solutions. In this area, all my previous experience in the IT industry came in handy, including my own startup project I-accelerator. Having a business education (MBA) in addition to basic engineering also helped.

In Russia, large companies- banks, mobile operators, etc. - there is a need for big data analysis, so there are prospects in our country for those who want to work in this area. True, many projects are now integration, that is, made on the basis of foreign developments or open source technologies. In such projects, fundamentally new approaches and technologies are not created, but rather existing developments are adapted. At Acronis, we went the other way and, having analyzed the available alternatives, decided to invest in our own development, creating a system as a result safe storage for big data, which is not inferior in cost price, for example, Amazon S3, but works reliably and efficiently and at a significantly smaller scale. Large Internet companies also have their own developments on big data, but they are more focused on internal needs than meeting the needs of external customers.

It is important to understand the trends and economic forces that are affecting the field of big data processing. To do this, you need to read a lot, listen to the speeches of authoritative experts in the IT industry, attend thematic conferences. Now almost every conference has a section on Big Data, but they all talk about it from a different angle: in terms of technology, business or marketing. You can go for project work or internship at a company that already conducts projects on this topic. If you are confident in your abilities, then it is not too late to organize a startup in the field of Big Data.

Without constant contact with the market new development runs the risk of being unclaimed

True when you are in charge New Product, a lot of time is spent on market analytics and communication with potential clients, partners, professional analysts who know a lot about clients and their needs. Without constant contact with the market, a new development runs the risk of being unclaimed. There are always a lot of uncertainties: you have to understand who will be the first users (early adopters), what you have valuable for them and how then to attract a mass audience. The second most important task is to form and convey to developers a clear and holistic vision of the final product in order to motivate them to work in such conditions when some requirements may still change, and priorities depend on the feedback from the first customers. Therefore, an important task is to manage the expectations of customers on the one hand and developers on the other. So that neither one nor the other lost interest and brought the project to completion. After the first successful project, it becomes easier and the main challenge will be to find the right growth model for the new business.

Column of HSE faculty members on myths and cases of working with big data

To bookmarks

Konstantin Romanov and Alexander Pyatigorsky, who is also Beeline's director of digital transformation, at the HSE School of New Media, wrote a column for the site about the main misconceptions about big data - examples of the use of technology and tools. The authors suggest that the publication will help company executives understand this concept.

Myths and misconceptions about Big Data

Big Data is not marketing

The term Big Data has become very fashionable - it is used in millions of situations and in hundreds of different interpretations, often irrelevant to what it is. Often, there is a substitution of concepts in people's heads, and Big Data is confused with a marketing product. Moreover, in some companies Big Data is part of the marketing division. The result of big data analysis can really be a source for marketing activity, but nothing more. Let's see how it works.

If we have identified a list of those who bought goods in our store for more than three thousand rubles two months ago, and then sent these users an offer, then this is typical marketing. We infer clear patterns from structural data and use them to drive sales.

However, if we combine CRM data with streaming information, for example, from Instagram, and analyze it, we find a pattern: a person who decreased his activity on Wednesday evening and whose last photo shows kittens should make a certain offer. It will already be Big Data. We found a trigger, passed it on to marketers, and they used it for their own purposes.

It follows from this that the technology usually works with unstructured data, and if the data is structured, the system still continues to look for hidden patterns in them, which marketing does not do.

Big Data is not IT

The second extreme of this story: Big Data is often confused with IT. This is due to the fact that in Russian companies, as a rule, it is IT specialists who are the drivers of all technologies, including big data. Therefore, if everything happens in this particular department, for the company as a whole it seems that this is some kind of IT activity.

In fact, there is a fundamental difference here: Big Data is an activity aimed at obtaining a certain product, which is not at all related to IT, although technology cannot exist without them.

Big Data is not always the collection and analysis of information

There is another misconception about Big Data. Everyone understands that this technology is associated with large volumes data, but what kind of data is meant is not always clear. Everyone can collect and use information, now it is possible not only in films about, but also in any, even a very small company. The only question is what to collect and how to use it to your advantage.

But it should be understood that Big Data technology will not be the collection and analysis of absolutely any information. For example, if you collect data about a specific person on social networks, it won't be Big Data.

What is Big Data really

Big Data consists of three elements:

  • data;
  • analytics;
  • technologies.

Big Data is not just one of these components, but a bundle of all three elements. Often people substitute concepts: someone thinks that Big Data is only data, someone thinks that technology. But in fact, no matter how much data you collect, you can't do anything with it without the right technology and analytics. If there is good analytics, but no data, it is all the more bad.

If we talk about data, then these are not only texts, but also all photos posted on Instagram, and in general everything that can be analyzed and used for different purposes and tasks. In other words, Data is understood as a huge amount of internal and external data of various structures.

You also need analytics, because the task of Big Data is to build some patterns. That is, analytics is the identification of hidden dependencies and the search for new questions and answers based on the analysis of the entire volume of heterogeneous data. Moreover, Big Data raises questions that we cannot derive directly from this data.

If we talk about images, then the fact that you posted your photo in a blue T-shirt does not mean anything. But if you use photography for Big Data modeling, it may turn out that right now you should be offered a loan, because in your social group, such behavior speaks of a certain phenomenon in actions. Therefore, “bare” data without analytics, without revealing hidden and non-obvious dependencies of Big Data is not.

So we have big data. Their array is huge. We also have an analyst. But how do we make sure that from this raw data we have a concrete solution? To do this, we need technologies that allow them not only to be stored (and previously it was impossible), but also to analyze them.

Simply put, if you have a lot of data, you need technologies, for example, Hadoop, which make it possible to keep all the information in its original form for later analysis. This kind of technology emerged in the Internet giants, since they were the first to face the problem of storing a large amount of data and analyzing it for subsequent monetization.

In addition to tools for optimized and low-cost data storage, analytical tools are needed, as well as add-ons to the platform used. For example, a whole ecosystem of related projects and technologies has already formed around Hadoop. Here is some of them:

  • Pig is a declarative data analysis language.
  • Hive - data analysis using a language close to SQL.
  • Oozie is a workflow in Hadoop.
  • Hbase is a database (non-relational), analogous to Google Big Table.
  • Mahout is machine learning.
  • Sqoop - data transfer from RSDB to Hadoop and vice versa.
  • Flume - transferring logs to HDFS.
  • Zookeeper, MRUnit, Avro, Giraph, Ambari, Cassandra, HCatalog, Fuse-DFS and so on.

All of these tools are available to everyone for free, but there is also a set of paid add-ons.

In addition, specialists are needed: this is a developer and an analyst (the so-called Data Scientist). You also need a manager who can understand how to apply this analytics to solve a specific problem, because in itself it is completely meaningless if it is not embedded in business processes.

All three employees must work as a team. A manager who gives a data science specialist a task to find a certain pattern must understand that it is far from always that he will find exactly what he needs. In this case, the manager should carefully listen to what the Data Scientist found, since often his findings turn out to be more interesting and useful for the business. Your job is to apply this to your business and make a product out of it.

Despite the fact that now there are many different kinds of machines and technologies, the final decision always remains with the person. To do this, the information needs to be visualized somehow. There are quite a few tools for this.

The most illustrative example is geo-analytical reports. The Beeline company works a lot with the governments of different cities and regions. Very often, these organizations order reports of the type "Traffic congestion at a certain location."

It is clear that such a report should get to government structures in a simple and understandable form. If we provide them with a huge and completely incomprehensible table (that is, information in the form in which we receive it), they are unlikely to buy such a report - it will be completely useless, they will not take out of it the knowledge they wanted to get.

Therefore, no matter how good Data Scientists are and no matter what patterns they find, you cannot work with this data without quality visualization tools.

Data sources

The array of received data is very large, so it can be divided into several groups.

Internal company data

Although 80% of the data collected falls into this group, this source is not always used. Often this is data that, it would seem, no one needs at all, for example, logs. But if you look at them from a different angle, sometimes you can find unexpected patterns in them.

Shareware sources

This includes data from social networks, the Internet, and anything else that can be accessed for free. Why shareware? On the one hand, this data is available to everyone, but if you are a large company, then getting it in the size of a subscriber base of tens of thousands, hundreds or millions of customers is no longer an easy task. Therefore, there are paid services to provide this data.

Paid sources

This includes companies that sell data for money. These can be telecoms, DMPs, Internet companies, credit bureaus and aggregators. Telecoms in Russia do not sell data. Firstly, it is economically unprofitable, and secondly, it is prohibited by law. Therefore, they sell the results of their processing, for example, geo-analytical reports.

Open data

The state meets business halfway and makes it possible to use the data they collect. To a greater extent, this is developed in the West, but Russia in this regard is also keeping pace with the times. For example, there is the Open Data Portal of the Moscow Government, where information is published on various objects urban infrastructure.

For residents and guests of Moscow, data is presented in tabular and cartographic form, and for developers - in special machine-readable formats. While the project is working in a limited mode, but it is developing, which means it is also a data source that you can use for your business tasks.

Research

As already noted, Big Data's task is to find a pattern. Often, research conducted around the world can become a fulcrum for finding a particular pattern - you can get a specific result and try to apply a similar logic for your own purposes.

Big Data is an area where not all the laws of mathematics work. For example, "1" + "1" is not "2", but much more, because mixing data sources can greatly enhance the effect.

Examples of products

Many are familiar with the Spotify music picker. He is beautiful in that he does not ask users what their mood is today, but he calculates this based on the sources available to him. He always knows what you need now - jazz or hard rock. This is the key difference that provides it with a fan base and differentiates it from other services.

Such products are usually called sense products - those that feel their client.

Big Data technology is also used in the automotive industry. For example, Tesla does this - in their latest model there is an autopilot. The company strives to create a car that will itself take the passenger wherever he wants. This is impossible without Big Data, because if we use only the data that we receive directly, as a person does, then the car will not be able to improve.

When we drive a car ourselves, we use our neurons to make decisions based on many factors that we don't even notice. For example, we may not realize why we decided not to accelerate immediately to the green light, and then it turns out that the decision was correct - a car rushed past you at breakneck speed, and you avoided an accident.

You can also give an example of using Big Data in sports. In 2002, the general manager of the Oakland Athletics baseball team, Billy Bean, set out to break the paradigm of how to find athletes by selecting and training players by numbers.

Usually managers look at the success of the players, but in this case everything was different - in order to get the result, the manager studied what combinations of athletes he needed, paying attention to individual characteristics. Moreover, he chose athletes who in themselves did not represent much potential, but the team as a whole turned out to be so successful that they won twenty matches in a row.

Director Bennett Miller later shot a film about this story - "The Man Who Changed Everything" starring Brad Pitt.

Big Data technology is useful in the financial sector as well. Not a single person in the world will be able to independently and accurately determine whether it is worth giving someone a loan. In order to make a decision, scoring is carried out, that is, a probabilistic model is built, according to which it is possible to understand whether this person will return the money or not. Then scoring is applied at all stages: you can, for example, calculate that at a certain moment a person will stop paying.

Big data can not only make money, but also save it. In particular, this technology helped the German Ministry of Labor to reduce the cost of unemployment benefits by 10 billion euros, since after analyzing the information it became clear that 20% of benefits were paid undeservedly.

Also, technologies are used in medicine (this is especially typical for Israel). With the help of Big Data, you can deliver significantly more accurate analysis than a doctor with thirty years of experience will do.

Any doctor, when making a diagnosis, relies only on his own experience... When a machine does it, it comes from the experience of thousands of such doctors and all existing medical records. It takes into account what material the patient's house is made of, what area the victim lives in, what kind of smoke is there, and so on. That is, it takes into account a lot of factors that doctors do not take into account.

An example of the use of Big Data in healthcare is the Project Artemis project, which was implemented by the Toronto Children's Hospital. It is an information system that collects and analyzes data on babies in real time. The machine allows you to analyze 1260 health indicators of each child every second. This project is aimed at predicting the unstable state of a child and preventing diseases in children.

Big data is also beginning to be used in Russia: for example, Yandex has a big data division. The company, together with AstraZeneca and the Russian Society of Clinical Oncology RUSSCO, launched the RAY platform for geneticists and molecular biologists. The project allows to improve methods for diagnosing cancer and identifying susceptibility to cancer. The platform will start working in December 2016.

Moscow_Exchange May 6, 2015 at 20:38

Analytical review of the Big Data market

  • Moscow Exchange company blog,
  • Big Data

"Big Data"- a topic that is actively discussed by technology companies. Some of them managed to become disillusioned with big data, others, on the contrary, make the most of it for business ... A fresh analytical review of the domestic and global Big Data market, prepared by the Moscow Exchange in cooperation with IPOboard analysts, shows which trends are most relevant now on the market ... We hope the information will be interesting and useful.

WHAT IS BIG DATA?

Key features
Big Data, at the moment, is one of the key drivers of development information technologies... This direction, relatively new for Russian business, has become widespread in Western countries... This is due to the fact that in the era of information technology, especially after the boom of social networks, a significant amount of information began to accumulate for each Internet user, which ultimately gave rise to the direction of Big Data.

The term "Big Data" causes a lot of controversy, many believe that it only means the amount of accumulated information, but do not forget about technical side, this direction includes storage technologies, computing, and service.

It should be noted that this area includes the processing of a large amount of information, which is difficult to process using traditional methods *.

Below is a comparative table of traditional and Big Data base.

The Big Data sphere is characterized by the following features:
Volume - the volume of the accumulated database is a large amount of information, which is laborious to process and store in traditional ways, they require new approach and improved tools.
Velocity - speed, this sign indicates both the increasing speed of data accumulation (90% of information has been collected over the last 2 years), and the speed of data processing, in recent times real-time data processing technologies have become more in demand.
Variety - diversity, i.e. the ability to simultaneously process structured and unstructured multi-format information. The main difference between structured information is that it can be classified. An example of such information is customer transaction information.
Unstructured information includes video, audio files, free text, information coming from social networks. Today 80% of information is included in the unstructured group. This information needs complex analysis to make it useful for further processing.
Veracity - reliability of data, users began to attach increasing importance to the reliability of available data. For example, Internet companies have a problem of separating actions carried out by a robot and a person on the company's website, which ultimately leads to the difficulty of data analysis.
Value - the value of the accumulated information. Big Data should be useful to the company and bring some value to it. For example, help in improving business processes, reporting or optimizing costs.

If the above 5 conditions are met, the accumulated data volumes can be classified as large.

Spheres of application of Big Data

The sphere of using Big Data technologies is vast. So, with the help of Big Data, you can find out about customer preferences, the effectiveness of marketing campaigns, or conduct a risk analysis. Below are the results of a survey by the IBM Institute on the use of Big Data in companies.

As you can see from the diagram, most companies use Big Data in the field of customer service, the second most popular area is operational efficiency, in the field of risk management Big Data is less common at the moment.

It should also be noted that Big Data is one of the fastest growing areas of information technology, according to statistics, the total amount of received and stored data doubles every 1.2 years.
Between 2012 and 2014, the amount of data transferred monthly by mobile networks grew by 81%. According to Cisco estimates, in 2014 the volume mobile traffic amounted to 2.5 exabytes (a unit of measurement for the amount of information equal to 10 ^ 18 standard bytes) per month, and already in 2019 it will be equal to 24.3 exabytes.
Thus, Big Data is already a well-established technology area, even despite its relatively young age, which has become widespread in many areas of business and plays an important role in the development of companies.

Big Data Technologies
Technologies used to collect and process Big Data can be divided into 3 groups:
  • Software;
  • Equipment;
  • Service services.

The most common data processing (software) approaches include:
SQL - a structured query language that allows you to work with databases. SQL can be used to create and modify data, and the dataset is managed by the appropriate database management system.
NoSQL - the term stands for Not Only SQL (not only SQL). It includes a number of approaches aimed at implementing a database that differ from the models used in traditional, relational DBMS... They are convenient to use when the data structure is constantly changing. For example, to collect and store information on social networks.
MapReduce - computation distribution model. Is used for parallel computing over very large datasets (petabytes * or more). V program interface not the data is transferred to the program for processing, but the program is transferred to the data. Thus, the request is a separate program... The principle of operation consists in sequential data processing by two methods Map and Reduce. Map fetches preliminary data, Reduce aggregates it.
Hadoop - is used to implement search and contextual mechanisms of high-load sites - Facebook, eBay, Amazon, etc. A distinctive feature is that the system is protected from failure of any of the cluster nodes, since each block has at least one copy of the data on the other node.
SAP HANA Is a high-performance NewSQL platform for data storage and processing. Provides high speed of processing requests. Another distinguishing feature is that SAP HANA simplifies the system landscape by reducing the cost of supporting analytic systems.

Technological equipment includes:

  • servers;
  • infrastructure equipment.
Servers include data stores.
Infrastructure equipment includes platform accelerators, uninterruptible power supplies, server console kits, etc.

Service services.
The services include services for building the architecture of the database system, arranging and optimizing the infrastructure, and ensuring the security of data storage.

Software, hardware, and services together form complex platforms for storing and analyzing data. Companies such as Microsoft, HP, EMC offer services for the development, deployment and management of Big Data solutions.

Application in industries
Big Data has become widespread across many industries. They are used in healthcare, telecommunications, trade, logistics, financial companies, and government.
Below are some examples of Big Data applications in some of the industries.

Retail
In the databases of retail stores, a lot of information about customers, the inventory management system, and the supply of marketable products can be accumulated. This information can be useful in all areas of the shops.

So, with the help of the accumulated information, you can manage the supply of goods, their storage and sale. Based on the accumulated information, it is possible to predict the demand and supply of goods. Also, the data processing and analysis system can solve other problems of the retailer, for example, to optimize costs or prepare reports.

Financial services
Big Data makes it possible to analyze the creditworthiness of a borrower, and it is also useful for credit scoring * and underwriting **. The introduction of Big Data technologies will reduce the time for consideration of loan applications. With the help of Big Data, it is possible to analyze the transactions of a specific client and offer banking services that are suitable for him.

Telecom
In the telecommunications industry, Big Data is widely used by cellular operators.
Mobile operators, along with financial institutions, have one of the most voluminous databases, which allows them to carry out the most in-depth analysis of the accumulated information.
The main goal of data analysis is to retain existing customers and attract new ones. To do this, companies segment customers, analyze their traffic, and determine the social affiliation of the subscriber.

In addition to using Big Data for marketing purposes, technologies are used to prevent fraudulent financial transactions.

Mining and oil industry
Big Data is used in both mining and processing and marketing. Based on the information received, enterprises can draw conclusions about the effectiveness of field development, track the schedule of overhauls and the condition of equipment, predict the demand for products and prices.

According to a Tech Pro Research survey, Big Data is most prevalent in the telecommunications industry, as well as in engineering, IT, financial and government enterprises. According to the results of this survey, Big Data is less popular in education and healthcare. The survey results are presented below:

Examples of using Big Data in companies
Today Big Data is being actively implemented in foreign companies. Companies such as Nasdaq, Facebook, Google, IBM, VISA, Master Card, Bank of America, HSBC, AT&T, Coca Cola, Starbucks, and Netflix are already using Big Data.

The fields of application of the processed information are varied and vary depending on the industry and the tasks to be performed.
Further, examples of the application of Big Data technologies in practice will be presented.

HSBC uses Big Data technologies to combat fraudulent transactions with plastic cards. With the help of Big Data, the company increased the efficiency of the security service 3 times, and the detection of fraudulent incidents - 10 times. The economic effect from the introduction of these technologies exceeded USD 10 million.

Antifraud * VISA allows you to automatically calculate fraudulent transactions, the system on this moment helps prevent fraudulent payments worth $ 2 billion annually.

Supercomputer Watson Company IBM analyzes the flow of data on monetary transactions in real time. By data from IBM, Watson Increases Detected Fraudulent Transactions by 15%, Reduces false positives system and increased the amount by 60% Money protected from transactions of this nature.

Procter & Gamble using Big Data to design new products and create global marketing campaigns. P&G has established dedicated Business Spheres offices where information can be viewed in real time.
Thus, the management of the company was able to instantly test hypotheses and conduct experiments. P&G believes Big Data helps predict company performance.

Office Supplies Retailer OfficeMax using Big Data technologies, they analyze customer behavior. Big Data analysis allowed us to increase B2B revenue by 13% and reduce costs by USD 400,000 per year.

According to Caterpillar its distributors lose $ 9 billion to $ 18 billion annually in profit just because they don't implement Big Data technologies. Big Data would allow customers to more efficiently manage their fleet of cars by analyzing information coming from sensors installed on cars.

Today it is already possible to analyze the condition of key components, their degree of wear, manage fuel and maintenance costs.

Luxottica group is a manufacturer of sports glasses such as Ray-Ban, Persol and Oakley. The company uses Big Data technologies to analyze the behavior of potential customers and "smart" SMS marketing. As a result, Big Data Luxottica group allocated more than 100 million of the most valuable customers and increased the effectiveness of the marketing campaign by 10%.

WITH using Yandex Data Factory game developers World of tanks analyze the behavior of the players. Big Data technologies made it possible to analyze the behavior of 100 thousand World of Tanks players using more than 100 parameters (information about purchases, games, experience, etc.). As a result of the analysis, a forecast of user churn was obtained. This information allows you to reduce user leaving and to work with game participants in a targeted manner. The developed model turned out to be 20-30% more efficient standard tools analysis of the gaming industry.

German Ministry of Labor uses Big Data in its work related to the analysis of incoming applications for the issuance of unemployment benefits. So, after analyzing the information, it became clear that 20% of benefits were paid undeservedly. With the help of Big Data, the Ministry of Labor has cut costs by € 10 billion.

Children's Hospital Toronto implemented the Project Artemis project. It is an information system that collects and analyzes data on babies in real time. The system monitors 1260 indicators of the state of each child every second. Project Artemis makes it possible to predict the unstable state of a child and start prevention of diseases in children.

GLOBAL BIG DATA MARKET OVERVIEW

Current state of the world market
In 2014, Big Data, according to Data Collective, became one of the priority areas of investment in the venture capital industry. According to the information portal Computerra, this is due to the fact that developments in this area have begun to bring significant results for their users. Over the past year, the number of companies with completed projects in the field of big data management increased by 125%, the market size grew by 45% compared to 2013.

Most of the revenue of the Big Data market, according to Wikibon, in 2014 was made up of services, their share was equal to 40% of the total revenue (see the diagram below):

If we consider Big Data for 2014 by subtypes, then the market will look like this:

According to Wikibon, applications and analytics account for 36% of Big Data revenue in 2014 came from Big Data applications and analytics, 17% from computing equipment and 15% from data storage technologies. Least of all revenue was generated by NoSQL technologies, infrastructure equipment and provision of a network of companies (corporate networks).

The most popular Big Data technologies are in-memory platforms of SAP, HANA, Oracle, etc. The results of the T-Systems survey showed that they were chosen by 30% of the surveyed companies. The second most popular were NoSQL platforms (18% of users), companies also used analytical platforms from Splunk and Dell, they were chosen by 15% of companies. The least useful for solving Big Data problems, according to the survey results, were Hadoop / MapReduce products.

According to an Accenture survey, more than 50% of companies using Big Data technologies spend 21% to 30% on Big Data.
According to the following analysis by Accenture, 76% of companies believe that these expenses will increase in 2015, and 24% of companies will not change their budget for Big Data technologies. This suggests that in these companies Big Data has already become an established direction of IT, which has become an integral part of the company's development.

The results of the Economist Intelligence Unit survey confirm the positive effect of implementing Big Data. 46% of companies say they have improved customer service by more than 10% using Big Data technologies, 33% of companies have optimized inventory and improved productivity of fixed assets, 32% of companies have improved planning processes.

Big Data around the world
Today, Big Data technologies are most often implemented in US companies, but even now other countries of the world have begun to show interest. In 2014, according to IDC, the countries of Europe, the Middle East, Asia (excluding Japan) and Africa accounted for 45% of the market for software, services and equipment in the field of Big Data.

Also, according to a CIO survey, companies from the Asia-Pacific region are rapidly adopting new solutions in the field of Big Data analysis, secure storage and cloud technologies... Latin America is in second place in terms of the amount of investments in the development of Big Data technologies, ahead of the countries of Europe and the United States.
Next, a description and forecasts of the development of the Big Data market in several countries will be presented.

China
The volume of information in China is 909 exabytes, which is equal to 10% of the total amount of information in the world, by 2020 the amount of information will reach 8060 exabytes, and the share of information in global statistics will also increase, in 5 years it will be equal to 18%. The potential growth of China's Big Data has one of the fastest growing dynamics.

Brazil
At the end of 2014, Brazil accumulated 212 exabytes of information, which is 3% of the global volume. By 2020, the volume of information will grow to 1,600 exabytes, or 4% of the world's information.

India
According to EMC, the accumulated data volume of India at the end of 2014 is 326 exabytes, which is 5% of the total volume of information. By 2020, the volume of information will grow to 2,800 exabytes, which will account for 6% of the information in the entire world.

Japan
The amount of accumulated data in Japan at the end of 2014 is 495 exabytes, which is 8% of the total amount of information. By 2020, the volume of information will grow to 2,200 exabytes, but the market share of Japan will decrease to 5% of the total information volume of the whole world.
Thus, the size of the Japanese market will decrease by more than 30%.

Germany
According to EMC, the volume of accumulated data in Germany at the end of 2014 is 230 exabytes, which is 4% of the total volume of information in the world. By 2020, the volume of information will grow to 1,100 exabytes, or 2%.
In the German market, a large share of revenue, according to Experton Group forecasts, will be generated by the segment of services, the share of which in 2015 will amount to 54%, and in 2019 will increase to 59%. software and equipment, on the contrary, will decrease.

Overall, the market will grow from 1.345 billion euros in 2015 to 3.198 billion euros in 2019, with an average growth rate of 24%.
Thus, based on the analytics of CIO and EMC, we can conclude that the developing countries of the world in the coming years will become markets for the active development of Big Data technologies.

Main market trends
According to IDG Enterprise, in 2015 companies' spending on Big Data will average USD 7.4 million per company, large companies intend to spend about USD 13.8 million, small and medium-sized companies - USD 1.6 million. ...
Most will be invested in areas such as data analysis and visualization and data collection.
According to current trends and market demand, investments in 2015 will be used to improve data quality, improve planning and forecasting, and increase data processing speed.
Companies in the financial sector, according to Bain Company’s Insights Analysis, will make significant investments, so in 2015 it is planned to spend $ 6.4 billion on Big Data technologies, the average investment growth rate will be 22% until 2020. Internet companies plan to spend $ 2.8 billion, with an average growth rate of 26% for Big Data spending.
The Economist Intelligence Unit survey revealed priority directions development of Big Data in 2014 and in the next 3 years, the distribution of answers is as follows:

According to IDC forecasts, market trends are as follows:

  • In the next 5 years, the cost of cloud solutions in the field of Big Data technologies will grow 3 times faster than the cost of on-premises solutions. Hybrid storage platforms will be in demand.
  • The growth of applications using complex and predictive analytics, including machine learning, will accelerate in 2015, the market for such applications will grow 65% faster than applications that do not use predictive analytics.
  • Media analytics will triple in 2015 and will become a key growth driver for the Big Data technology market.
  • The trend will accelerate the adoption of solutions for analyzing the continuous flow of information that is applicable to the Internet of Things.
  • By 2018, 50% of users will interact with cognitive computing services.
Market Drivers and Limiters
IDC experts identified 3 drivers of the Big Data market in 2015:

According to an Accenture survey, data security issues are now the main barrier to the implementation of Big Data technologies, more than 51% of respondents confirmed that they are worried about ensuring data protection and confidentiality. 47% of companies reported that it was impossible to implement Big Data due to limited budget, 41% of companies indicated the lack of qualified personnel as a problem.

Wikibon predicts that the size of the Big Data market will grow to $ 38.4 billion in 2015 and will increase by 36% over the previous year. In the coming years, there will be a decline in growth rates to 10% in 2017. Based on these forecasts, the market size in 2020 will be equal to USD 68.7 billion.

The distribution of the global Big Data market by business category will look like this:

As you can see from the diagram, most the market will be occupied by technologies from the sphere of improving customer service. Point marketing will be in second place in terms of priority among companies until 2019, in 2020, according to the Heavy Reading forecast, it will give way to solutions to improve operational efficiency.
The segment “customer service improvement” will also have the highest growth rate, an increase of 49% annually.
The market forecast for Big Data subtypes will look like this:

The predominant market share, as can be seen from the diagram, is occupied by professional services, the highest growth rate will be in applications with analytics, their share will grow from the current 12% to 18% in 2020 and the volume this segment will be equal to USD 12.3 billion, the share of computing equipment, on the contrary, will fall from 20% to 14% and will amount to about USD 9.3 billion in 2020, the cloud technology market will gradually increase and in 2020 will reach 6, USD 3 billion, the market share of storage solutions, on the contrary, will decrease from 15% in 2014 to 13% in 2020 and in monetary terms will be equal to USD 8.9 billion.
According to the forecast of Bain & Company’s Insights Analysis, the distribution of the Big Data market by industry in 2020 will look like this:

  • The financial industry will spend $ 6.4 billion on Big Data with an average growth rate of 22% per year;
  • Internet companies will spend $ 2.8 billion and an average cost growth rate of 26% over the next 5 years;
  • Public sector costs will be commensurate with the costs of Internet companies, but the growth rate will be lower - 22%;
  • The telecommunications sector will grow at an average growth rate of 40% to reach USD 1.2 billion in 2020;

Utilities will invest a relatively small amount of US $ 800 million in these technologies, but the growth rate will be one of the highest at 54% annually.
Thus, a large share of the Big Data market in 2020 will be occupied by companies in the financial industry, and the energy sector will be the fastest growing sector.
Following analysts' forecasts, the total market volume will increase in the coming years. Market growth will be ensured by the introduction of Big Data technologies in the developing countries of the world, as can be seen from the graph below.

The projected market size will depend on how developing countries perceive Big Data technologies, whether they will be as popular as in developed countries. In 2014, the developing countries of the world accounted for 40% of the accumulated information. EMC predicts that the current market structure, dominated by developed countries, will change in 2017. According to EMC analysts, in 2020 the share of developing countries will be over 60%.
According to Cisco and EMC, the developing countries of the world will be quite active in working with Big Data, largely due to the availability of technologies and the accumulation of a sufficient amount of information to the level of Big Data. The world map on the next page will show the forecast of the increase in the volume and the growth rate of Big Data by region.

ANALYSIS OF THE RUSSIAN MARKET

Current state Russian market

According to a study by CNews Analytics and Oracle, the level of maturity of the Russian Big Data market has increased over the past year. Respondents representing 108 large enterprises from various industries showed more than high degree awareness of these technologies, as well as an established understanding of the potential of such solutions for their business.
As of 2014, according to IDC, 155 exabytes of information have been accumulated in Russia, which is only 1.8% of the world's data. The volume of information by 2020 will reach 980 exabytes and will take 2.2%. Thus, the average growth rate of the volume of information will be 36% per year.
IDC estimates the Russian market at $ 340 million, of which $ 100 million are SAP solutions, approximately $ 240 million are similar solutions from Oracle, IBM, SAS, Microsoft, etc.
The growth rate of the Russian Big Data market is no less than 50% per year.
It is predicted that positive dynamics will continue in this sector of the Russian IT market, even in the context of a general stagnation of the economy. This is due to the fact that businesses are still in demand for solutions that improve operational efficiency, as well as optimize costs, improve forecasting accuracy and minimize possible company risks.
The main providers of Big Data services in the Russian market are:
  • Oracle
  • Microsoft
  • Cloudera
  • Hortonworks
  • Teradata.
Market overview by industry and experience of using Big Data in companies
According to CNews, in Russia only 10% of companies have started using Big Data technologies, when the share of such companies in the world is about 30%. The readiness for Big Data projects is growing in many sectors of the Russian economy, according to the report of СNews Analytics and Oracle. More than a third of the companies surveyed (37%) have started working with Big Data technologies, among which 20% are already using such solutions, and 17% are beginning to experiment with them. The second third of respondents in currently are considering this possibility.

In Russia, Big Data technologies are more popular in banking and telecoms, but they are also in demand in the mining industry, energy, retail, logistics companies and the public sector.
Below we will consider examples of the application of Big Data in Russian realities.

Telecom
Telecom operators have one of the most voluminous databases, which allows them to conduct the most in-depth analysis of the accumulated information.
One of the areas of application of Big Data technology is subscriber loyalty management.
The main goal of data analysis is to retain existing customers and attract new ones. To do this, companies segment customers, analyze their traffic, and determine the social affiliation of the subscriber. In addition to using information for marketing purposes, telecom technologies are used to prevent fraudulent financial transactions.
VimpelCom is one of the striking examples of this industry. The company uses Big Data to improve the quality of service at the level of each subscriber, prepare reporting, analyze data for network development, fight spam and personalize services.

Banks
A significant proportion of Big Data users are specialists from the financial industry. One of the successful experiments was carried out at the Ural Bank for Reconstruction and Development, where information base began to be used for customer analysis, the bank began to offer specialized loan offers, deposits and other services. During the year of using these technologies, the company's retail loan portfolio grew by 55%.
Alfa-Bank analyzes information from social networks, processes loan applications, and analyzes the behavior of users of the company's website.
Sberbank has also begun processing a massive amount of data to segment customers, prevent fraudulent activities, cross-sell and manage risk. In the future, it is planned to improve the service and analyze customer actions in real time.
The All-Russian Regional Development Bank analyzes the behavior of plastic card holders. This makes it possible to identify transactions that are atypical for a particular client, thereby increasing the likelihood of detecting the theft of funds from plastic cards.

Retail
In Russia, Big Data technologies have been introduced by both online and offline trading companies. Today, according to CNews Analytics, Big Data is used by 20% of retailers. 75% of retailers believe Big Data is essential to develop a competitive marketing strategy. According to Hadoop statistics, after the implementation of Big Data technology, profits in trade organizations grow by 7-10%.
M.Video's specialists talk about the improvement in logistics planning after the implementation of SAP HANA, and as a result of its implementation, the preparation of annual reports has been reduced from 10 days to 3, the speed of daily data download has decreased from 3 hours to 30 minutes.
Wikimart uses these technologies to generate recommendations for website visitors.
One of the first offline stores to introduce Big Data analysis in Russia was Lenta. With the help of Big Data, retail began to study information about customers from cash register receipts. The retailer collects information to generate behavioral models, which enables more informed decision-making at the operational and business level.

Oil and gas industry
In this industry, the scope of application of Big Data is quite wide. Big Data technologies can be applied in the extraction of minerals from the subsoil. With their help, you can analyze the production process itself and the most effective ways to extract it, track the drilling process, analyze the quality of raw materials, as well as the processing and marketing of the final product. In Russia, Transneft and Rosneft have already started using these technologies.

State bodies
Countries such as Germany, Australia, Spain, Japan, Brazil and Pakistan are using Big Data technologies to tackle national issues. These technologies help government bodies to more effectively provide services to the population, provide targeted social support.
In Russia, these technologies began to master such government bodies like the Pension Fund, the Federal Tax Service and the Mandatory Health Insurance Fund. The potential for implementing projects using Big Data is large, these technologies could help to improve the quality of services, and, as a result, the standard of living of the population.

Logistics and transport
Big Data can also be used by transport companies. With the help of Big Data technologies, it is possible to track the car park, take into account fuel costs, and monitor customer requests.
Russian Railways implemented Big Data technologies together with SAP. These technologies helped to reduce the reporting period by 43.5 times (from 14.5 hours to 20 minutes), and to improve the accuracy of cost allocation by 40 times. Also, Big Data was introduced into the planning and tariff regulation processes. In total, companies use more than 300 systems based on SAP solutions, 4 data centers are involved, and the number of users is 220,000.

Main drivers and market constraints
The drivers for the development of Big Data technologies in the Russian market are:
  • Increased interest on the part of users to the possibilities of Big Data as a way to increase the company's competitiveness;
  • Development of methods for processing media files at the global level;
  • Transfer of servers processing personal information to the territory of Russia, in accordance with the adopted law on the storage and processing of personal data;
  • Implementation of the sectoral plan for software import substitution. This plan includes state support for domestic software manufacturers, as well as the provision of preferences for domestic IT products when making purchases at public expense.
  • In the new economic situation, when the dollar rate has almost doubled, there will be a trend towards more and more use of services. Russian providers cloud services than foreign ones.
  • Creation of technoparks contributing to the development of the information technology market, including the Big Data market;
  • State program for the implementation of grid systems based on Big Data technologies.

The main barriers to the development of Big Data in the Russian market are:

  • Ensuring data security and confidentiality;
  • Lack of qualified personnel;
  • Insufficiency of accumulated information resources to the level of Big Data in most Russian companies;
  • Difficulties in introducing new technologies into established Information Systems companies;
  • The high cost of Big Data technologies, which leads to a limited number of enterprises that are able to implement these technologies;
  • Political and economic uncertainty, which led to capital outflow and freezing of investment projects in Russia;
  • The rise in prices for imported products and the surge in inflation, according to IDC, slow down the development of the entire IT market.
Russian market forecast
As of today, the Russian Big Data market is not as popular as in developed countries. The majority of Russian companies show interest in it, but do not dare to take advantage of their opportunities.
Examples of large companies that have already benefited from Big Data technologies are raising awareness of the power of these technologies.
Analysts are also quite optimistic about the Russian market. IDC believes that the share of the Russian market will increase over the next 5 years, in contrast to the market in Germany and Japan.
By 2020, the volume of Big Data in Russia will grow from the current 1.8% to 2.2% of the global data volume. The amount of information will grow, according to EMC, from the current 155 exabytes to 980 exabytes in 2020.
At the moment, Russia continues to accumulate the amount of information to the level of Big Data.
According to a CNews Analytics survey, 44% of surveyed companies work with data no more than 100 terabytes * and only 13% work with volumes above 500 terabytes.

Nevertheless, the Russian market, following the global trends, will grow. As of 2014, IDC estimates the market size at $ 340 million.
The market growth rate in previous years was 50% per year, if it remains at the same level, then in 2018 the market volume will reach USD 1.7 billion. The share of the Russian market in the world market will be about 3%, having increased from the current 1.2%.

The most susceptible industries to using Big Data in Russia are:

  • Retail and banks, for them, first of all, analysis of the client base, assessment of the effect of marketing campaigns is important;
  • Telecom - segmentation of the customer base and traffic monetization;
  • Public sector - accounting, analysis of applications from the population, etc.;
  • Oil companies - monitoring of operations and planning of production and sales;
  • Energy companies - creation of intelligent power systems, operational monitoring and forecasting.
In developed countries, Big Data has become widespread in the fields of healthcare, insurance, metallurgy, Internet companies and industrial enterprises, most likely in the near future Russian companies from these areas will also assess the effect of implementing Big Data and will adapt these technologies in their industries.
In Russia, as well as in the world, in the near future there will be a trend towards data visualization, analysis of media files and the development of the Internet of Things.
Despite the general stagnation of the economy, analysts predict further growth of the Big Data market in the coming years, primarily due to the fact that the use of Big Data technologies gives its users a competitive advantage in terms of increasing the operational efficiency of the business, attracting additional customers, minimizing risks and implementation of data forecasting technologies.
Thus, we can conclude that the Big Data segment in Russia is at the stage of formation, but the demand for these technologies is increasing every year.

Main results of market analysis

World market
At the end of 2014, the Big Data market is characterized by the following parameters:
  • market size amounted to USD 28.5 billion, an increase of 45% over the previous year;
  • most of the revenue of the Big Data market was made up of services, their share was equal to 40% in the total revenue;
  • 36% of revenue came from Big Data applications and analytics, 17% from computing equipment and 15% from data storage technologies;
  • The most popular for solving Big Data problems are in-memory platforms from companies such as SAP, HANA and Oracle.
  • the number of companies with implemented projects in the field of Big Data management has increased by 125%;
The market forecast for the next years is as follows:
  • in 2015 the market volume will reach USD 38.4 billion, in 2020 - USD 68.7 billion;
  • the average growth rate will be 16% annually;
  • average company spending on Big Data technologies will amount to $ 13.8 million for large companies and $ 1.6 million for small and medium-sized businesses;
  • technologies will be most prevalent in the areas of customer service and point marketing;
  • in 2017, the global market structure will change towards a predominance of user companies from developing countries.
Russian market
The Russian Big Data market is at the stage of formation, the results of 2014 are as follows:
  • market size reached USD 340 million;
  • the average market growth rate in previous years was 50% annually;
  • the total amount of accumulated information was 155 exabytes;
  • 10% of Russian companies have started using Big Data technologies;
  • Big Data technologies were more popular in banking, telecom, Internet companies and retail.
The forecast for the Russian market for the coming years is as follows:
  • the volume of the Russian market in 2015 will reach USD 500 million, and in 2018 - USD 1.7 billion;
  • the share of the Russian market in the world will be about 3% in 2018;
  • the amount of accumulated data in 2020 will be 980 exabytes;
  • data volume will grow to 2.2% of global data volume in 2020;
  • The most popular technologies will be data visualization, media file analysis and the Internet of Things.
Based on the results of the analysis, it can be concluded that the Big Data market is still in the early stages of development, and in the near future we will observe its growth and the expansion of the capabilities of these technologies.

Thank you for taking the time to read this voluminous work, subscribe to our blog - we promise many new interesting publications!

Only the lazy ones don't talk about Big data, but they hardly understand what it is and how it works. Let's start with the simplest - terminology. Speaking in Russian, Big data is a variety of tools, approaches and methods of processing both structured and unstructured data in order to use them for specific tasks and purposes.

Unstructured data is information that has no predefined structure or is not organized in a specific order.

The term "big data" was coined by the editor of the journal Nature Clifford Lynch back in 2008 in a special issue devoted to the explosive growth of the world's volumes of information. Although, of course, big data itself existed before. According to experts, the majority of data streams over 100 GB per day belong to the Big data category.

Read also:

Today, this simple term hides only two words - data storage and processing.

Big data - in simple terms

V modern world Big data is a socio-economic phenomenon, which is associated with the fact that new technological opportunities have appeared for analyzing a huge amount of data.

Read also:

For ease of understanding, imagine a supermarket in which all the goods are not in your usual order. Bread next to fruit, tomato paste next to frozen pizza, lighter in front of a tampon rack containing avocado, tofu or shiitake mushrooms, among others. Big data puts everything in its place and helps you find nut milk, find out the cost and expiration date, and also who, besides you, buys such milk and why is it better than cow's milk.

Kenneth Kukier: Big data is the best data

Big data technology

Huge volumes of data are processed so that a person can get specific and necessary results for their further effective use.

Read also:

In fact, Big data is a problem solving and alternative to traditional data management systems.

Techniques and methods of analysis applicable to Big data according to McKinsey:

  • Crowdsourcing;

    Data mixing and integration;

    Machine learning;

    Artificial neural networks;

    Pattern recognition;

    Predictive analytics;

    Simulation modeling;

    Spatial analysis;

    Statistical analysis;

  • Analytical data visualization.

Horizontal scalability that enables data processing is a fundamental principle of big data processing. Data is distributed to computational nodes, and processing occurs without degradation of performance. McKinsey also included relational management systems and Business Intelligence in the context of applicability.

Technologies:

  • NoSQL;
  • MapReduce;
  • Hadoop;
  • Hardware solutions.

Read also:

For big data, the traditional defining characteristics developed by the Meta Group back in 2001 are distinguished, which are called “ Three V»:

  1. Volume- the size of the physical volume.
  2. Velocity- the speed of growth and the need for fast data processing to obtain results.
  3. Variety- the ability to simultaneously process Various types data.

Big data: applications and opportunities

Volumes of heterogeneous and rapidly flowing digital information processing with traditional tools is impossible. The analysis of the data itself allows you to see certain and imperceptible patterns that a person cannot see. This allows us to optimize all areas of our life - from government controlled to manufacturing and telecommunications.

For example, some companies a few years ago protected their clients from fraud, and taking care of the client's money was taking care of their own money.

Susan Etleiger: What about Big Data?

Big data-based solutions: Sberbank, Beeline and other companies

Beeline has a huge amount of data about subscribers, which they use not only to work with them, but also to create analytical products, such as external consulting or IPTV analytics. Beeline segmented the database and protected customers from money fraud and viruses, using HDFS and Apache Spark for storage, and Rapidminer and Python for data processing.

Read also:

Or remember Sberbank with their old case called AS SAFI. It is a system that analyzes photographs to identify bank customers and prevents fraud. The system was introduced back in 2014, at the heart of the system is a comparison of photographs from the database, which get there from webcams on racks thanks to computer vision. The basis of the system is a biometric platform. Thanks to this, the cases of fraud have decreased by 10 times.

Big data in the world

By 2020, according to forecasts, humanity will generate 40-44 zettabytes of information. And by 2025 it will grow 10 times, according to the report The Data Age 2025, which was prepared by analysts at IDC. The report notes that most of the data will be generated by businesses themselves, not consumers.

Research analysts believe that data will become a vital asset and security will become a critical foundation in life. The authors of the work are also confident that the technology will change the economic landscape, and the average user will communicate with connected devices about 4800 times a day.

Big data market in Russia

Typically, big data comes from three sources:

  • Internet (social networks, forums, blogs, media and other sites);
  • Corporate archives of documents;
  • Readings from sensors, instruments and other devices.

Big data in banks

In addition to the system described above, in the strategy of Sberbank for 2014-2018. talks about the importance of analyzing massive amounts of data for quality customer service, risk management and cost optimization. Now the bank uses Big data for risk management, combating fraud, segmentation and assessment of customer creditworthiness, personnel management, forecasting queues in branches, calculating bonuses for employees and other tasks.

VTB24 uses big data to segment and manage customer churn, generate financial statements, analyze reviews in social networks and forums. To do this, he uses Teradata, SAS Visual Analytics and SAS Marketing Optimizer solutions.