Data volumes are growing exponentially. According to the Ministry of Internal Affairs and Communications “ICT Kotozukuri Review Conference” report, the total amount of international digital data, which reached 1 zettabyte in 2011, is expected to reach approximately 40 zettabytes in 2020 .
Such huge amounts of data are called ” big data ” and are attracting attention from various fields.
But what is big data? If you ask me again, can you explain? This time, I will explain thoroughly and clearly.
what is big data
What is big data? I will explain the definition and the background of its appearance.
Definition of big data
Big data, as the name suggests, is a concept that refers to huge amounts of data. However, there are various views on what is the criteria for big.
Today, it is said to refer to data volumes ranging from tens of terabytes to several petabytes. However, as the times progress, it is thought that even more enormous amounts will come to be referred to as ” big data .”
In addition, it cannot be said that big data is judged only by the amount of quantity or the amount of data.
For example, in the IT glossary, “unstructured data and atypical data that include various types and formats” are called “big data” and “things that are time-series and real-time” are called big data. There are many.
Three characteristics of big data
Analyst Doug Rainey, who is said to have first defined big data, uses the three Vs to define big data.
” Volume” is the vast amount of data, ” Velicity ” is data that can be collected in real time, and ” Variety” is the format of various data.
Therefore, big data is characterized by its high level of ” three Vs “.
On the other hand, Doug Rainey’s report only stated three characteristics in defining big data, and did not reach the definition that “more than XX bytes is big data”.
In other words, big data is not determined only by the amount of information, but data that is judged to be “larger” than some standard at that time, depending on the context at that time, is called “big data ” .
Three types of big data
There are three types of data that make up big data: structured data , unstructured data , and semi-structured data .
Structured data is data that is established in a two-dimensional tabular format. The structure is decided in advance and data is converted by the method of storing data there.
Examples of structured data include Excel and CSV.
Unstructured data is data that has no fixed format or content.
In other words, it refers to any data obtained via the Internet, etc.
Examples of unstructured data are PDFs and audio.
Unstructured data is now easily plentiful.
However, unstructured data cannot be used just by collecting a lot of unstructured data,
It has no data value.
Applying the unstructured data to the data structure of the form you want to use when you want to use it is called semi-structuring.
Semi-structured data is data created by semi-structured unstructured data.
Background of the emergence of big data
The emergence of big data is one of the major factors behind the development of AI technology.
So, here I will introduce how big data appeared.
- Increase in Internet devices and users
- Increase in Internet service users
Increase in Internet devices and users
Behind the appearance of big data is the increase in the number of devices and users connected to the Internet, such as smartphones.
The invention of the smart phone, which allows easy access to the Internet even on trains, has given a strong boost to the emergence of big data.
Nowadays, it is possible to easily carry things connected to the Internet, mainly smartphones, but it was impossible about 20 years ago.
In this way, the rapid spread of Internet devices has made it possible to obtain a large amount of data through the Internet.
For example, reading this article
- where did you get to this article
- Which page do you go to after this?
- how long have you been reading this article
etc., are stored as data.
Increase in Internet service users
In addition to Internet devices, the growing number of Internet service users has helped the emergence of big data.
SNS is a typical Internet service.
A mountain of information continues to increase daily on twitter and facebook.
Besides, Google search is also an internet service.
In other words, many people usually accumulate a lot of data through Internet services.
Why big data is hot right now
Here are two reasons why big data is attracting attention. The evolution of data collection techniques and the evolution of data usage techniques. I will explain each.
Evolution of data collection technology
Methods of collecting data are evolving. The factor is the Internet. There, all kinds of “things” are connected to the Internet, and even more data is collected. This is the so-called IoT.
IoT refers to the connection of things to the Internet, or the devices themselves that are connected to the Internet.
For example, imagine a refrigerator connected to the Internet. You can record when and who used the refrigerator. You can use this to fine-tune the temperature. In addition, everything from traffic lights to cameras is being connected to the Internet.
In this way, the huge amount of diverse information that can be collected from all kinds of devices becomes big data.
Evolution of data analysis technology
The methods of collecting big data continue to deepen. On the other hand, no matter how much data there is, we have to analyze the data, discover the regularity, and make it usable.
Tomoyuki Higuchi of the Institute of Statistical Mathematics, in his book “Data Scientists Create Our Future with Big Data” (2013), received the MGI report and proposed the following three technologies necessary for big data analysis methods. I assume it’s a dot.
The first is “big data engineering,” which deals with how to store and accumulate huge amounts of data . The second is “data visualization” as a method of expression for utilizing the information obtained from big data in practical studies such as business . The third is a “data analysis method” that utilizes data science in a broad sense, such as statistics and data mining, to derive usable models from simple data .
Third, in data analysis methods, technological developments have brought data to life. It’s machine learning.
Machine learning is a method for creating models by discovering rules and patterns from huge amounts of data without human judgment. Deep learning, which achieved astonishing results in image recognition contests, is also an elemental technology of machine learning.
Machine learning approaches are effective in deriving regularities from large amounts of data. On the other hand, a large amount of data is required to create a good model. Therefore, it can be said that big data has attracted attention as a method of “utilizing” a large amount of data has been developed.
Advantages of using big data
Big data doesn’t end with data being collected.
Big data is used because it has many advantages.
Here, we will introduce the benefits of using big data.
- can predict the future
- “Visualization” of data, it becomes possible to grasp the current situation
- Service can be improved
can predict the future
It is certainly an exaggeration to say that we can predict the future, but we can say that the probability of guessing future events is much higher than humans.
What do you use to predict the future?
For example, in baccarat, when thinking about which side to bet next, you probably use historical data to predict the future, whichever player or banker will come next.
This method is the same when AI predicts the future.
However, since AI analyzes using big data, AI can predict the future with a higher probability.
“Visualization” of data, it becomes possible to grasp the current situation
Visualizing data means
- The meaning of the data is easy to understand, and everyone can have the same recognition.
- Critical data is always visible.
- When you look at the data, you know what to do.
It is to display the data holding down the points, such as.
This has become more accurate with the advent of big data.
As a result, it has become possible to grasp the current situation more concretely and accurately.
Service can be improved
He said that the data can be visualized and the current situation can be accurately grasped.
By grasping the current situation, we can know the issues and strengths at the same time.
So you know what services to offer next.
So we can improve our service.
Disadvantages of using big data
There are two disadvantages to using big data.
- privacy issues
- Massive data management
I will explain in detail below.
privacy issues
One of the problems in using big data is the issue of privacy.
For example, let’s consider the use of big data in SNS. Information such as what kind of posts users are looking at and what kind of posts they “like” becomes a huge amount of data. will be
With this trend, the convenience of SNS is increasing, but the information used in the process, such as “What kind of posts did you see?” This is a privacy issue.
Management of huge amounts of data
As you can see from the definition of the 3Vs I mentioned earlier, the value and quantity of big data as information can be seen to be much higher than before.
In that case, the management itself becomes one of the problems of big data utilization. The specific problem is the backup problem and the lack of people to do it.
How to use big data
There are three ways to utilize big data.
- Web service related
- demand forecast
- Prediction in real time
If you don’t know how to use it, you can’t handle it well. Let’s take advantage of big data by understanding how to use it firmly!
Web service related
Although there are many problems with the use of big data, there are many ways to utilize it.
The use of big data in SNS, which I mentioned earlier, is also included in this, but advertising business on such web services is one of the ways to use it. By predicting the preferences and trends of each user based on a huge amount of data, it is possible to provide interesting information more efficiently.
demand forecast
Big data is also used in familiar things such as convenience stores and supermarkets.
By accumulating data on which products have been sold and how much, it is possible to know what trends in customer demand exist. Information such as seasons and best-selling items can be taken into account, and can be used for purchases and development of new products.
Prediction in real time
With big data, information can be arranged in chronological order and predictions can be made at high speed. The best example of this is the improvement of road conditions.
Automobiles called “connected cars” are equipped with communication functions, and data such as the number of rotations of the wheels, including GPS, is also available. Based on the information obtained from the connected car, it is possible to predict where traffic congestion is occurring.
Points for successful utilization of big data
Simply analyzing big data does not mean that big data can be used effectively.
Therefore, you have to hold down the points and use them.
Therefore, here are three points to keep in mind.
- clarify purpose
- Repeat trial and error without giving up
- Work with supporting companies
clarify purpose
The first is to clarify the purpose.
It is important to think about what purpose you want to achieve using big data.
This is not limited to the use of big data. You can say that about anything.
Depending on the purpose, the necessary data and the data to be output after analysis will also change.
So make your purpose clear.
Repeat trial and error without giving up
The second is to repeat trial and error without giving up.
After clarifying the purpose, it is not possible to achieve the purpose immediately by analyzing the data.
This is because not all variables can be covered in a few analyses.
That’s why it’s important to keep analyzing the data over and over again.
Work with supporting companies
And the third is to work with supporting companies.
There are many companies that provide consulting on how to handle big data.
If you are not confident in big data analysis, or have troubles such as not getting any results no matter how many times you try, we recommend using a company that supports big data analysis.
DX necessary for big data
DX (Digital Transformation) is a necessary initiative to utilize big data.
What is DX
-Companies respond to drastic changes in the digital environment, utilize data and digital technology, and transform their products, services, and business models based on the needs of customers and society. Transform culture and climate and establish a competitive advantage
AI tends to be the only focus when considering the use of data in business.
Certainly, the use of AI will be useful in terms of improving operational efficiency, improving productivity, and collecting data.
However, it is a costly approach in terms of technology, data, and money.
There are many DX that can be done before AI. It is important to proceed with DX in line with on-site issues so that technology does not become a goal.
Steadily advancing DX is the most steady means of collecting data, and the accumulation of this becomes big data.
Relationship between big data and AI
Relationship between big data and AI
The two fields of big data and AI are each attracting attention, but when these two are combined, the possibilities will expand.
One of the problems with big data is the inability to keep up with the management of the enormous amount of information. However, the introduction of AI will solve this.
Advances in the areas of AI such as deep learning and machine learning have made it possible to store and analyze information that was previously unmanageable. In addition to that, it is now possible to pick up only the necessary information from among them.
Many issues
Although big data and AI have great potential, there are also challenges. System maintenance is one example. In order to organize the much larger amount of data than ever before, it will be important to develop the underlying system.
In addition, the data handled includes sensitive data such as customer information that must be managed safely. In order to prevent intrusion from the outside, it is necessary to ensure thorough security.
One of the challenges is the lack of data scientists, who are responsible for programming AI in order to successfully mesh big data and AI. Especially in Japan, there is a shortage of data scientists who can handle big data, so it is an urgent task to train and enhance them.
Examples of big data utilization
Management using data is called “data-driven management”.
Today, data-driven management is being adopted in various fields.
Here, we will introduce data-driven management in the following four fields.
- retail business
- Information and communication service industry
- financial industry
- Agriculture
Retail (Goodday Co., Ltd.)
In the retail industry, we are conducting data-driven management based on big data in marketing and other areas.
In retail stores, information such as when customers visit the store and what they buy at what time of year is accumulated as big data, and based on that, it is analyzed how to increase sales.
Here, we introduce Goodday Co., Ltd., which succeeded in management based on big data.
Goodday Co., Ltd. is a company that develops home centers mainly in northern Kyushu and Yamaguchi.
While many companies have shortened their business hours due to the corona wreck, Goody decided to operate as usual and succeeded in easing the close-contact situation.
This is the result of combining and analyzing POS data and visitor number data by time zone that Goodday has accumulated so far, as well as data on the number of infected people and data on the movement of people published by Google. Based on this, it was decided that normal business would avoid crowds rather than shortening business hours.
Information and communication service industry (Mercari, Inc.)
In the information and communication service industry, data-driven management is conducted based on big data, such as what kind of information is provided and how.
Companies that provide information and communication services accumulate information such as what kind of customers are interested in as data, and analyze how to increase sales based on that data.
Here, we would like to introduce Mercari, Inc., which is working on new initiatives based on big data.
Mercari, Inc. is a flea market app that sells used items.
By passing the secondary distribution data owned by Mercari to companies that can purchase new products, we are working to find ways for companies that sell new products to reduce waste.
A company that sells new products can utilize data such as how valuable their products are in the second-hand market and whether they are in circulation. The decision was made to hand over the company’s data because it might not be possible.
Financial industry (Mitsui Sumitomo Insurance Co., Ltd.)
In the financial industry, we conduct data-driven management based on big data such as customer behavior and strategies of banks and financial institutions.
Here, we introduce Mitsui Sumitomo Insurance Co., Ltd., which launched a new service based on big data.
Mitsui Sumitomo Insurance Co., Ltd., together with Accenture, has launched a new service called “RisTech” that enables companies to prevent accidents and disasters and to take proactive measures against corporate issues.
In recent years, the number of companies suffering losses has increased due to the increase in natural disasters.
Therefore, in addition to past accident data held by insurance companies, customer data, contract data, call center data, etc., it was decided to combine and utilize various data accumulated by business partners.
Agriculture (eating chalk)
In agriculture, we conduct data-driven management based on big data such as the causal relationship between soil, weather, and crops.
Here, we will introduce Tabechoku operated by Vivid Garden Co., Ltd., which aims to build a new system based on big data.
Tabechoku is trying to realize a new agricultural mechanism that allows farmers to predict their income before harvesting.
Therefore, by acquiring data on soil, air, and amount of sunlight from fields that are highly evaluated for eating chalk, we decided to build a system that can anticipate customer evaluations in advance for those who are newly starting agriculture or those who have already started. Did.
in conclusion
It is expected that the collection and utilization of big data will continue to advance in the future.
In this context, it is essential to collect usable data rather than being caught up in mere quantity. For that reason, it is necessary to recognize that data is collected on the premise that it will be used.
After all, technology is a tool. It is desirable to face big data for the purpose of solving problems in the field.