Who owns the data used by AI?

It was in 2015 that a big wave called the third artificial intelligence (AI) boom was born. Six years later. During this time, the evolution of machine learning technology, centered on deep learning (DL), has progressed greatly.

However, it is by no means the case that progress in the academic field, such as learning and reasoning theory, will spread widely to the real world. Although there are impactful messages about new technologies centered on start-up companies, as if a new AI era has arrived through the mass media, people have expectations and images of the future that AI will bring . The reality is that it is difficult to say that the social implementation of AI has progressed to the very end of society .

Nonetheless, there is no doubt that we have entered a phase of how to utilize AI in business, as technologies for utilizing AI are provided at low cost by open source and cloud platformers and are becoming commoditized. No matter how revolutionary technology is, technology will inevitably become a commodity, and superiority will be created by the structure of the business itself, rather than by being technologically advanced.

It is well known that the data utilized by AI is extremely important for promoting the social implementation of AI. In an era where the use of AI is promoted and the value of data increases, data is said to be the new oil of the 21st century. Here, I would like to share my thoughts on the relationship between AI and data while touching on legal issues .

Table of Contents

digital data
AI utilization scene
Actual example: Challenges of AI diffusion in marketing operations
At the end

digital data

There is a word “AI-Ready” , but in order to implement AI and continue to use it, it will not start without data. In order to utilize AI and gain an advantage as a business mechanism, it is extremely important to understand the legal aspects of data .

What exactly is that data? For example, users who use services provided by platformers such as GAFA, unknowingly or consciously, also provide “data”. For platformers, this data can be thought of like raw materials in the manufacturing industry .

So where exactly does the ownership of that data belong? Isn’t it on the side of the platform provider that provides the service, but rather on the individual user who provides the data? A discussion arises.

In that respect, the idea is that individuals have rights to personal data including privacy and usage logs, and the GDPR (General Data Protection Regulation) formulated in Europe is exactly what individuals control data . It can be said that it is a mechanism that can be done .

Under civil law, data cannot be defined as someone’s property at the moment, and there are data that are protected as intellectual property rights, but legally, data cannot be defined as a property of a specific company or individual user. I can’t say for sure. In other words, data cannot be protected through ownership or otherwise.

If you want to protect your data, as a general rule, protection is to be achieved through contracts between interested parties. (The basis for this argument is the “Contract Guidelines for the Use of AI and Data” formulated by the Ministry of Economy, Trade and Industry in June 2018, and version 1.1, the latest version, was released in December 2019.)

Since data is intangible and invisible , it is not subject to ownership or possessory rights under the Civil Code. is difficult.

Unlike works such as poetry and novels that creatively express the thoughts and feelings of the author, literary works are collections of such data, such as data obtained from devices such as sensors and log data emitted by robots in the manufacturing industry. It is often difficult to recognize creativity in

Also, in music there is a category of arrangement, but by applying processing such as cleansing, processing and analysis to the data accumulated in the database, a concept similar to “arrangement” of music is generated for the database. It is difficult to do so at the moment.

However, if the data is related to know-how (i.e., a business model) and satisfies the requirements of ( 1) secrecy management , (2) usefulness , and (3) non-publicity , it is subject to legal protection as a trade secret under the Unfair Competition Prevention Act. can be

In addition, data that is expected to be distributed to a certain extent through transactions may not be protected as trade secrets as a result of not meeting the above requirements for trade secrets (in particular, (1) secrecy management or (3) non-publicity). , In order to protect the data provided by specifying the other party under certain conditions, people without access rights use “limited provision data” managed by ID / password etc. in contrast to open data Therefore, it is supposed to be protected from the viewpoint of preventing unfair competition.

We believe that it will be important for business to understand the legal aspects of data, while knowing the exceptions, and to build a system for AI utilization.

AI utilization scene

Next, I would like to consider AI utilization scenes. Recently, the word “subscription” has become popular in the media and has begun to permeate our lives.

For example, a video streaming service like Netflix is a good example of a subscription. The spread of such service-oriented businesses can be summed up simply as a change from the sale and purchase of goods to the use of services, but in fact the implications of this change are enormous.

The change in the object of provision from “goods” to “services” is actually a troublesome problem for the law. The current law is framed around the trade in goods.

Taking a sales contract as an example, a sales contract is concluded when the seller delivers the object of the sale to the buyer. For example, consider selling books. It is easy to conclude a sales contract for a paper book, which is a physical object. On the other hand, in the case of electronic content distribution services, what the distributor does is to approve the use of the copyrighted work (license of use), but if the book/comic distributor suspends its business , users who thought they bought it will not be able to read it.

In this way, in the case of electronic content, it is a license to use, and it is not legally a sales contract .

In addition, there is a law called the “Product Liability Law” that presupposes the sale of goods. As the name suggests, it is a law that stipulates that the manufacturer, etc., will be liable for damages if there is a defect in the product and damage is caused as a result. In fact, software is out of the question. This is because the “Product Liability Law” refers only to tangible objects.

However, as an exception, software embedded in a thing is interpreted as a thing = a manufactured product. So-called embedded software can be subject to product liability law .

The problem here is that things are supposed to be free of defects when delivered, but software always has bugs. Therefore, the manufacturer will have to take responsibility for bugs in the embedded software. However, in cases such as personal computers, the responsibilities of the hardware and OS manufacturers are divided. It is also true that the judgment criteria are very ambiguous because there are cases like this kind of division of responsibility.

Furthermore, it must be said that the current Product Liability Law is incompatible with software that is updated sequentially, including not only bug fixes but also functional expansion, by making the service a subscription.

Let’s consider another issue when considering the legal perspective of AI. One of them is responsible for the operation of the system operated by AI .

For example, in medicine, let’s say that AI systems are entrusted with the detection of diseases using medical images taken with modalities such as CT scans. That is, when AI supposedly takes the place of a radiologist. AI has the potential to discover signs of disease that cannot be detected by the human eye, which in itself is a wonderful development of science. If so, who is responsible for it? Is it the doctor who uses the AI system, the company that developed the AI system, or the training data source that provided the training images? issues are becoming apparent.

Actual example: Challenges of AI diffusion in marketing operations

Now, I would like to touch on the challenges of AI implementation in marketing operations. Although it is a classic story, I would like to introduce the example of Target, Inc. in the United States.

Source: “How Companies Learn Your Secrets” by CHARLES DUHIGG, February 16, 2012.

American supermarket “Target” has obtained analysis results from analysis of purchase data that women in the early stages of pregnancy tend to purchase specific products. When we promoted the purchase of baby products among female customers who purchased such products, one day we noticed that a father who had a daughter of about 20 years old had sent coupons for baby products to his daughter. I complained to the company that my daughter was not pregnant. It was later discovered that her daughter was pregnant.

This is to make inferences from purchase data learning data and clarify some facts about consumers, in other words, by profiling AI, it is possible to clarify something that would not be revealed if it remained silent . It raises the question of whether it can be an invasion of privacy .

SNSs such as Facebook use a variety of data such as user registration data, friends’ connections, and content posted continuously after users start using it as learning data.

If the AI determines that you are interested in girly fashion, ads that match your taste will be displayed. By letting others see your private life through SNS in this way, unprecedented communication will occur, and you will be able to satisfy your personal desire for approval and enjoy benefits at work. If you allow a post, the parts you don’t want others to see will be used as learning data, and as a result of inference by AI, there is a possibility that it will come out in some form.

Even if you don’t want to reveal the personal attributes you wanted to keep private, there is currently no legal right to protect them. We should be aware that unpacking data can unintentionally reveal individual privacy .

I would like to touch on ” nudge ” in terms of the utilization of AI in the marketing field . In behavioral economics, there is a concept called nudge theory. Literally translated, nudge means something like “putting lightly on your elbow”, but in behavioral economics, it is a concept that gives direction to people’s behavior without financial incentives, while ensuring the freedom of choice for the subject . As a result, it is now being used for corporate marketing strategies and for policy formulation by the public in Europe and the United States.

It is characterized by designing the environment for people to make decisions, thereby encouraging voluntary behavioral change, but let’s consider the case where AI is used here. The author believes that the idea of law, which is premised on an autonomous individual, and the nudge, which tries to shape this decision-making environment apart from the individual, are actually incompatible. When AI promotes wrong behavioral changes based on AI judgment and gives bad results to people or society as a whole, it is possible that we will not be able to do it in the future with “self-responsibility” under the law that assumes autonomous individuals. Isn’t it coming ?

At the end

From 1999 to 2004, I worked at the Japanese subsidiary of DoubleClick, which was one of the world’s largest ad tech companies at the time (later acquired by Google and inherited to the current display advertising business), working on in-house IT and cloud computing. I was part of the global operations team for the data center.

I remember the privacy issue that occurred in 2000 when I worked there. Although the problem occurred in the United States, it made me think deeply about privacy protection as an employee belonging to the company concerned.

Even if a strategically drafted plan is logically correct, it is not necessarily correct in terms of business execution. The issue of data privacy has been an ongoing issue since around the year 2000, even with the advancement of technology.

Source: DoubleClick withdraws privacy strategy in question

In the November 2020 G Test, only about half of the contents covered by reference books, which are called white books and black books, are related to intellectual property and privacy related to AI and data, which I touched on in this contribution. I heard that there were many questions about

There were some ranting posts on Twitter, but I think that behind this was the awareness of the reality and awareness of the issues facing the social implementation of AI and data on the question-setting side .

The social implementation of AI and data will not progress with only technical aspects. In promoting the social implementation of AI and data in the future, it is becoming necessary to fully understand aspects such as data ownership and privacy.

▼ The authors of this article are as follows.

Shiro Horino (Macnica): Graduated from Seijo University, Faculty of Literature, Arts Department, University of Wales, Aberystwyth, MBA. He joined a legal publishing company for corporations as a new graduate and was assigned to marketing. With the internet bubble, he became interested in IT, self-taught and turned into a network infrastructure engineer, and was in charge of in-house IT and cloud data center OPS at DoubleClick. After studying abroad, he has long been involved in management in the marketing field for enterprise package software and cloud service providers. Currently, he is focusing on measures that connect sales and marketing through digital marketing initiatives, along with value creation activities starting with PR. He leads the Japan Data Management Consortium Award Committee.