What is a machine learning engineer? Is the lost work true? Explaining the required suitability

I’m thinking of becoming a machine learning engineer, but I don’t know what kind of work or what kind of skills are required. A machine learning engineer is a job that is responsible for implementing, operating, and building an environment for machine learning algorithms. Since it is difficult to “eliminate misprediction” in machine learning, some people say that there will be no work in some situations. This time, I will explain the definition of machine learning engineer, work content, necessary skills, appropriateness, annual income, etc.

1. What is a machine learning engineer

What do you mean by a machine learning engineer and what kind of work do you do?
This time,

Definition of machine learning engineer
Job description of machine learning engineer
Differences between machine learning engineers and data scientists

Let’s take a closer look at each of the above.

1.1 Definition of machine learning engineer

A machine learning engineer is a job that is responsible for implementing, operating, and building an environment for machine learning algorithms .
The main role is “development and implementation of algorithms and software”, but depending on the purpose of introducing machine learning and the situation of each company, it may be expected to work in the same way as a data analyst or data scientist.

For data analysts, please also refer to the following articles.

What is a data analyst? Explains the meaning, necessary aptitude, and the theory of “work to disappear?”

1.1.1 Selection / implementation / development of AI / machine learning model

We will select AI / machine learning models according to the purpose of introducing machine learning, formulate an output image of the machine learning model, and implement and develop it. We will improve the function and performance of the model by repeating implementation and development. In detail, we also develop APIs for linking with cloud environments and batch program development for automating data aggregation and updating.

1.1.2 Data collection and shaping

Machine learning is a type of artificial intelligence that performs processing such as “classification and recognition” and “prediction” while improving accuracy as if a machine acquires knowledge through “learning”.

There are three types of learning: “supervised”, “unsupervised”, and “reinforcement learning”.

Supervised
An algorithm that reads data prepared in advance and learns the relationship between “input” and “output”. Since there is a “correct answer” prepared in advance, it can be compared to the situation where there is a teacher.
Unsupervised
The correct answer is not prepared in advance, and the characteristics and knowledge of various data are acquired from the input data.
Reinforcement learning
Humans do not prepare data, and the algorithm itself predicts the environment, acquires the necessary data, and proceeds with learning.

Data collection and shaping are required in advance for the above supervised and unsupervised learning.
A phase is required in which the data used for machine learning is scrutinized, the data is actually tagged ( annotation ), and teacher data is created.

1.2 Job description of machine learning engineer

The work of machine learning engineers can be divided into the following five phases.

Requirement definition
Data preparation / shaping / processing
Implementation of machine learning
Accuracy evaluation of learning model
Research and development of algorithms and models

Let’s take a closer look at each.

1.2.1 Requirements definition

First is the requirement definition.
There are business challenges, and we often consider introducing machine learning to solve them.
In that case

Why
“Clarify (business) problems”
What
“Clarify which indicators should be improved to resolve the problem”
How
“Consider if there is any other way to improve the index than using machine learning.”

Consider the above three points.

In machine learning projects, not all humans use clear algorithms as specifications, so quality uncertainty is always present. Even if the machine learning model achieves 90% accuracy, the value of the project itself will not be evaluated if the conclusion that “machine learning is ineffective” is found due to the uncertainty of 10%. Therefore, it is necessary to clarify the purpose of introduction and KPI, and to have the participants of the project agree on KPI.

1.2.2 Data preparation / shaping / processing

Examine the data that is actually the target of machine learning. If there is a sufficient amount of data, annotate “Is there enough data to learn?”

Whether the data is structured
Whether there is a security problem
How much does it cost to process the data
How much is the annotation cost

I will look at such things.

1.2.3 Implementation of machine learning

Next, we will actually carry out machine learning.

Decide the machine learning model to be adopted, actually train the model, and see the correct answer rate of the result for the actually created model. When deciding the learning model to be adopted, it is important that the data at hand can explain “what kind of data”, “what characteristics it has”, and “what kind of things you want to verify”.

On the other hand, when dealing with a huge amount of data, it may be necessary to handle data with a small amount of features, so after narrowing down the models to be used, select a model from the narrowed down according to a certain standard and actually move it. The most important thing is to compare them.

1.2.4 Accuracy evaluation of learning model

Evaluate the accuracy of the learning model. Specifically, follow the procedure below.

Evaluate the learning model
Process to evaluate the accuracy of the created learning model
Use a model to determine if you can get results for a solved problem

Since “accuracy” here must be able to obtain a certain amount of results for every problem, it is not possible to apply a non-universal model that can be applied only to a specific problem.

1.2.5 Research and development of algorithms and models

In machine learning, the data used for learning is said to be more important than algorithms.

If the data is inadequate, it will not give you the results you really need. Currently, research on new algorithms, machine learning models, frameworks useful for various developments, etc. is active, and it is an industry where the speed of technological development is fast, so research to catch up is important.

In some cases, even if you use good data and achieve some results, you may think that there is a better way. You may get even better results by actually using other algorithms. Therefore, algorithms are also important, and it is important to keep in mind the balance between “data and algorithms that suit the purpose” and repeat catch-up .

1.3 Differences between machine learning engineers and data scientists

The differences between machine learning engineers and data scientists are as follows:

Machine learning engineers are needed from the “system design” stage
Data scientists mainly work to build machine learning models and use data to improve “precision”.

Let’s take a closer look at each.

1.3.1 Machine learning engineer is “system design”

Machine learning engineers are the posts needed from the system design stage. We develop services and improve functions using machine learning. We also undertake development other than those related to algorithms and machine learning models themselves, such as the development of various infrastructures required for machine learning and API development.
Depending on the company, even though it is a machine learning engineer in the actual field, it may be doing the work in the area that the data scientist is in charge of.

1.3.2 Data scientists “improve model accuracy”

Data scientists mainly work to build machine learning models and use data to improve the “accuracy” of predictive models.

For example, in the case of a predictive model that determines whether an image contains a “dog”, if 8 out of 10 answers are correct, the correct answer rate will be 80%, so think about what to do to further improve the accuracy. To go.

Also, consider a loan screening system that uses AI.
If the accuracy is “good”, it is possible to automate some operations. However, it cannot be left to AI that the accuracy is “bad”. In the field where machine learning is used in this way, “good or bad accuracy” has an important influence on decision making, so how to improve “accuracy” has become an important issue.

2. Skills and appropriateness required for machine learning engineers

The following four skills and suitability are required for machine learning engineers.

Statistical skills
Programming skills
Hypothesis building ability
Direction ability

Machine learning engineers may have some job titles and job titles with data analysts and data scientists, and their job titles and required skills are relatively similar.

2.1 Statistical skills

The following skills are required as prerequisites for machine learning and data analysis.

Estimate, test, regression, discriminant analysis
Estimate and hypothesis testing
Simple regression analysis, multiple regression analysis

If you want to become a machine learning engineer and start data analysis and statistics, let’s carry out typical statistics and machine learning methods.
It is recommended that you study R and Python to learn statistical analysis and time series explanations, and use books such as “Differential Integral” and “Linear Algebra (Matrix)” for university students to solve while moving your hands. increase.

2.2 Programming skills

Programming skills are also required for data analysis using R, Python, etc.

To become a machine learning engineer, you need to learn statistical analysis and time series analysis. Statistical analysis refers to “analysis of data accumulated based on statistical theory”, and time series analysis refers to “analysis of data that changes over time, such as temperature, earthquakes, and stock price fluctuations.” ..

R is strong in statistical analysis, and for time series analysis, the R language, such as the forecast package, has an overwhelmingly rich lineup of packages. Many research companies use the R language because it is convenient to understand whether it is statistically significant from the analysis results of the questionnaire data. Python has the advantage of “prediction” through machine learning.

2.3 Hypothesis building ability

Skills for building hypotheses for problem finding and building hypotheses for problem solving are also required.
Hypothesis-building ability is to make a hypothesis before collecting or analyzing information . Thinking styles and thinking habits that consider the overall picture and conclusions of a problem from the stage where there is little information are called “hypothetical thinking.” With this hypothetical thinking skill, your work will go smoothly and your accuracy will increase.

2.4 Direction ability

Unlike a normal system, there is always uncertainty in a test or production environment. Therefore, when carrying out a machine learning project, it is important how to adjust the expectations of the project participants and clients, how to set the purpose of introducing machine learning and KPIs, and how to get them to understand . ..

For example, if you vaguely recognize that the accuracy of machine learning will not improve and that “if you keep trying, the accuracy will improve”, you may not get results forever. While making people aware of the uncertainty of the project, it is necessary to make up for mistakes by human power and, in the worst case, direction including “withdrawal”, and advanced direction ability is required.

3. How to proceed with the work of machine learning engineers

As a machine learning engineer’s work procedure and tips, let’s be aware of the following four points.

Technical skills such as database operation and programming are “premise”
Be aware of turning the PDCA cycle at high speed
Learn specialized knowledge for each industry
Knowledge of the cloud and various infrastructures is also important

Let’s look at each one.

3.1 Technical skills such as database operation and programming are “premise”

Utilization of big data utilizing R and Python libraries is a prerequisite. It is important to learn how to use Web API and scraping, format the data, and analyze the answers to the questions you have asked.

WebAPI and scraping are important for getting good quality data, and it doesn’t make sense if the data itself is a mixture of missing or poor quality data or a small population parameter. Therefore, collecting the “data” that is the material is very important.

Be aware that “data quality” is more important than “analysis difficulty”. Therefore, skills such as Web API and scraping, utilization of R and Python libraries, DB operation, etc. are prerequisites.

3.2 Be aware of running the PDCA cycle at high speed

It is good to be aware of the PDCA cycle running at high speed.
The PDCA cycle is an action aimed at improving operations by repeating the following order.

Plan
Do (execution)
Check (evaluation)
Action (improvement)

Determining which model is the best is not easy, and after making some hypotheses and selecting a model, it is necessary to “actually move and compare the outputs” .

Setting absolute selection criteria for machine learning models is a difficult task, and it is better to compare multiple models while actually moving them, rather than establishing absolute evaluation criteria.

3.2.1 Cheat sheet to help you select machine learning algorithms

There are two cheat sheets to help you choose a machine learning algorithm.

SAS Institute Japan
Azure Machine Learning

If you are a beginner level data scientist, SAS Institute Japan is recommended.

3.2.2 Speed of hypothesis testing is also important

The speed of hypothesis testing is also important. Since machine learning is a heavy-duty process, the processing speed is greatly affected by the processing performance of the GPU / CPU. It is not uncommon for the processing time to change more than three times.
Machine learning engineers are also required to select and estimate the infrastructure itself for machine learning.

3.3 Learn specialized knowledge for each industry

Data annotation also requires the ability to determine what tags should be attached to each piece of data.
Annotation originally means “annotation”, and means adding information called tags or metadata to a certain data to give it meaning .

Therefore, it is desirable to have specialized knowledge for each industry, and at a minimum, it is desirable to have sufficient knowledge to communicate with each person in charge of annotation.

3.4 Knowledge of cloud and various infrastructures is also important

Knowledge of the cloud and various infrastructures is also important.

As mentioned above, the processing time is greatly affected by GPU / CPU performance. In recent years, the number of services that can use the GPU via the cloud, such as “GPU Cloud by GMO”, has increased, making it easier to use the GPU even with physical restrictions. Having proper knowledge of the cloud and various infrastructures, and building a machine learning environment including the infrastructure will lead to operational efficiency.

4. Will there be no machine learning engineers? Unnecessary work?

Machine learning engineers are gone ・ Is it an unnecessary job? Let’s take a closer look.

4.1 Ambiguous definition

The division of roles with data analysts, data scientists, data engineers, etc. is ambiguous, and because there is no clear line, the names often change or cause confusion depending on the company.

In fact, the disagreement between the desires of employers and the skills of human resources has become a problem, saying, “I thought that hiring a machine learning engineer would solve various problems, but that was not the case.” I’m coming. Therefore, it will be important in the future to clarify the definition.

What is a data analyst? Explains the meaning, necessary aptitude, and the theory of “work to disappear?”

4.2 If you don’t have to use machine learning, you don’t have to “force it”

If you don’t have to use machine learning, you don’t have to “force it”.
Machine learning projects are difficult to “eliminate prediction errors” . Since the accuracy varies depending on the data set, the evaluation value of a specific data set is only a reference value, and not the numerical value that the model always guarantees the evaluation result, but all the data works because the data of one sample worked well. There is no such thing.

If you need highly accurate predictions that guarantee target accuracy, you should consider adopting methods other than machine learning, and if you do not need to use machine learning, you should not forcefully adopt them. ..

As of 2020, machine learning has a strong buzzword aspect, and as awareness of the risk of “inaccuracies” gradually spreads, it is thought that it will become recognized as “a method that is pinpointed when needed.” ..
Therefore, there is a possibility that machine learning as an object to “try somehow for the time being” will “disappear”.

4.3 “How do you want to utilize the data” is important

“How do you want to utilize the data?” Is also important.
Even if the output of one dataset is 90% accurate, adding another data can constantly reduce the accuracy in machine learning.

Therefore , it is important to discuss in advance “what you want to use the highly uncertain data for” .

5. Estimated salary for machine learning engineers

The annual income range of machine learning engineers ranges from 4 million to 13 million .

In recent years, the demand for machine learning engineers has increased, and the market has become a seller due to a shortage of human resources. Large companies offer high annual income, and the range of annual income varies greatly depending on the number of years of experience, skills, and workplace.

Source: Learn more about the annual salary and salary of AI engineers! ｜ Average annual income.JP

6. To become a machine learning engineer

To become a machine learning engineer, first learn the basics of statistics and programming .
From a long-term perspective, there is also the idea of building up as an engineer that even beginners can easily get.

It’s also a good idea to actually create a simple AI app to determine if it’s suitable for a machine learning engineer.

Introducing a free and easy-to-make AI app for beginners. Please refer to the following articles.

Introducing how to make AI. Explaining easy & free AI application development for beginners

7. Summary

This time, I explained the definition of machine learning engineer, work contents, necessary skills, appropriateness, annual income, etc. To become machine learning, we recommend learning the basics of statistics and programming. The demand for machine learning engineers has increased in recent years, and although there are many things to remember, it is a profession that is worth aiming for even from inexperienced people. We hope that you read this article carefully to gain a better understanding of machine learning engineers.

What is a machine learning engineer? Is the lost work true? Explaining the required suitability

1. What is a machine learning engineer