Natural language processing AI model “vulnerable to paraphrase”

lswcap

6 년 ago

Natural language processing, a technology that allows computers to process natural language that humans use in common, is considered one of the areas of development as artificial intelligence technology improves. It is used to filter harmful messages among spam e-mails and numerous articles in SNS posts. In addition, it is also used to identify fake news, but it is pointed out that the AI model used for natural language processing is vulnerable to the paranoia attack.

A study conducted by IBM, Amazon and University of Texas researchers found that using the right tools would allow malicious attackers to attack the text classification algorithms used in natural language processing and manipulate algorithmic behavior in a malicious way. The method of attacking the text classification algorithm referred to here is called a parasitic attack. The researchers explain that they change sentence words so that they change only the sentence classification by the AI algorithm without changing the actual sentence meaning.

To understand the parasitic attack structure, the researchers use AI algorithms to evaluate e-mail or text messages and classify them as stamped. The parry attack modifies the content of the spam e-mail so that the meaning of the sentence does not change, and induces the original AI to judge that there is no spam in the e-mail to be judged as spam. The meaning of the sentence is changed so that it does not change.

In the past, research has been conducted on how to hack AI models, such as how to kidnap neural networks. However, attacking the original text model is much more difficult than modifying computer vision or speech recognition algorithms.

Natural language processing experts say that voice and video can be completely differentiated. For example, if you have an image classification algorithm, you can change the color of the image pixel slightly to observe what the AI model will output. This method makes it easy to find vulnerabilities in AI models.

However, the text model is difficult to set up the same clause that has more than 10% of the sentences in the picture, and it does not contain, include, or classify the same words. Therefore, it is not easy to find the vulnerability of the text model efficiently.

Attack research on text models has been in the past. There has been a way to change a word. This method succeeded in changing the output of the AI algorithm, but the output was often a sentence that felt artificially created. The researchers not only change the words in the sentence but also investigate whether they can intentionally change the output of the text model by using a method of maintaining long sentences while preserving the interpretation or semantics of the words.

The researchers succeeded in developing algorithms to find sentence-optimal alterations that could deliberately manipulate the natural language processing model output. The constraint of this algorithm is to confirm whether the modified sentence is semantically similar to the original sentence. We have developed an algorithm that searches for optimal products in many combinations to find the word or sentence paradigm that has the greatest impact on AI model output.

Using algorithms developed by the team, they have also succeeded in changing the output of fake news filters or email spam filters. I feel that the same sentence is a little bit changed by the same sentence, but I succeeded in changing AI model review evaluation from 100% positive to 100% negative.

The point of paraphrasing attack is that human is not perceived because it changes only some words while maintaining original sentence meaning. As a result of testing the human tester to evaluate the original sentence and the modified sentence, it turns out that it is very difficult for a person to identify the sentence meaning difference that the algorithm changed.

Even if there is a typo in the current sentence about parole attacks, no one thinks it as a security issue. In the near future, however, it may be the time to add a device to attack an AI model in this location and counter them. Because technology companies use natural language processing to classify content, it can be vulnerable to attacks like this one. Such attacks can lead to new security risks.

In particular, it is pointed out that it may happen that a certain person passes an examination of a text model in order to approve his / her contents to an attack that is changed into a resume processing model that a company uses for recruitment. For more information, please click here .