Techno Blender
Digitally Yours.

Gullible Algorithms. How to social engineer machine learning… | by Steven McElwee | May, 2022

0 69


How to social engineer machine learning models

Image by Gerd Altmann from Pixabay

Humans are gullible, and I am no exception. One of my most embarrassing moments was when I was scammed while selling a piano. The online buyer seemed so genuine. The piano was for her nephew‘s birthday. Because of its size and weight, she was hiring a moving company to pick it up. I received her check in the mail and deposited it into my checking account. She added some funds to the check so that I could pay for the movers — who later insisted that they needed the money wired to them. Of course there were warning signs. It was strange for me to pay for the movers. The email address was a little off. In hindsight, the check looked a little strange too. Needless to say the movers never arrived, and the bank reversed the forged check. I was duped and lost money.

Social engineering takes many forms — scams are just one type. Skilled social engineers tap into our biases, our fears, our sense of urgency, and our social customs. For example social engineers send phishing emails to entice susceptible users to click on links or open attachments. Since most of the email messages we receive are legitimate, we trust by default that the email is not harmful. Adversaries exploit this tendency to prey on people.

Social engineers tap into our fear of bad consequences. A social engineer might pop up a window that says your computer is infected. At that moment, fear may cloud your judgment, and you may give the attacker control of your PC to “fix” the problem.

Adding to that fear, social engineers also engage our sense of urgency. For example bomb threat extortion emails against organizations are now common. These emails try to coerce a cryptocurrency payment before a deadline. Even though most of these bomb threat extortion attempts are not real, the combination of fear and a deadline may coerce you to take action.

Social engineers exploit our desire to be helpful as well. A social engineer might appear at your main entrance with their hands full at the same time an employee is entering. It is customary for people to be helpful, so our natural inclination will be to hold the door open for them. This allows the social engineer to bypass security and gain unauthorized access to the building.

People are not that hard to dupe. Since humans are so susceptible to social engineering, what about the machine learning algorithms that are patterned after human intelligence?

Neural networks mimic neurons in the human brain. The algorithm assigns different weights to the links between the artificial neurons as the network is being trained. When presented with input data, the neural network passes the data through many layers of neurons to create a decision or a label at the other end.

Decision trees are a type of algorithm that is similar to human logic. When a human plans out a decision or a course of action, it is common to use rules that help determine predefined outcomes. Decision trees are designed to learn these types of decision making rules without hard-coding the logic.

Bayesian algorithms use probabilities and beliefs. This is similar to how people make decisions using prior experiences. By determining which condition is most likely true Bayesian algorithms may pattern how humans determine the likelihood of different scenarios to help make decisions.

Support vector machines (SVM) look at similarity between items. By using comparison they decide if something is like a feature set that was used to train the SVM. This also works very similar to how our brains work to draw associations between objects.

Finally genetic algorithms are patterned after evolutionary concepts, like survival of the fittest. They use random selection and a fitness function to select the best option. These algorithms are patterned after reproduction approaches and again may have some basis in how humans develop.

So if our machine learning algorithms are based so frequently on human thought processes and biology, they may be just as susceptible to social engineering as we are. What are the implications if they are susceptible? How would we go about social engineering an algorithm? Do we consider algorithm gullibility at all?

The implication of social engineering attacks against machine learning algorithms is very important because of its increased adoption in so many different applications. As a result machine learning creates a new attack service for cyber adversaries. The goal of the social engineer may be to evade detection. It may be to tap into the algorithm’s biases. Social engineers may attack machine learning algorithms to miscategorize information. An adversary may extort a ransom to be restrained from flooding an algorithm with bad information. And then there are safety concerns. What if an autonomous vehicle or automated factory robot could be misled into a dangerous scenario that puts life at risk.

To help answer this question if machine learning algorithms are vulnerable to social engineering, we only need one example. A vulnerability is like a mouse you find in your home. There is seldom just one. So we can infer that if these algorithms are vulnerable to one form of social engineering, they may be vulnerable to many.

In this example researchers evaded an online malware detection system called PDFrate [1]. This system was used to detect malware embedded in PDF documents that users uploaded. PDFrate uses a random forest algorithm and is trained on a publicly available dataset.

The researchers began with reconnaissance. They proposed that if an adversary knows what algorithm was used, knows what training data was used, or what specific features were used to train the algorithm, they could carry out a successful attack. In the case of PDFrate, very little reconnaissance was needed, since all of the information was publicly available. The researchers asserted that not all of this information was needed to plan an attack, but if all three components were available, it would make the attack much easier.

Once they knew the training data, the features, and the algorithm, they created a surrogate system that they could freely attack without being detected. With this freedom, they could attempt thousands of permutations of attacks until they found a successful combination of features. After they created their surrogate algorithm, the researchers determined the most important features and plotted how to embed evasive malware into PDF files.

With the most important features, they analyzed the distribution of values within the training data. By ensuring that their features for the malware infused PDF fell near the average or within one standard deviation of the average, the algorithm concluded that the PDF file was normal. This was only necessary for the most significant features, since the values in the other features did not have enough weight to trigger an alert.

Using this approach, they built surrogate classifiers that were based on both complete and partial knowledge of the algorithm, features, and training data. Without evasion, PDFrate normally detected close to 100% of malware samples, but using this technique, the researchers were able to evade detection with 75% of their malware samples.

In addition to experimenting with a surrogate algorithm, the researchers found that a surrogate dataset could be used as a substitute for the actual training data. They found that not all of the features needed to be identified, since even the features could be approximated without knowing all of the information. Finally they even found that the classifier did not have to be precisely the same, but instead a similar type of algorithm was suitable to plan a successful attack.

From this example, it is clear that it is possible to social engineer a machine learning algorithm. This is just one approach. The researchers tapped into the biases of the algorithm, much like the way a person might demonstrate confirmation bias. If they are used to seeing normal information, they are more likely to categorize anomalies as normal.

Knowing that our algorithms may be gullible, what can we do about it? Following are some guidelines to consider before making your algorithm an online production system.

First consider if using a susceptible algorithm creates too much risk. If the potential outcome of a bad decision is life-threatening, results in operational problems, or misses detecting critical information, then it may be best to reconsider using a machine learning algorithm at all.

It is vital to keep the information about your algorithm, training data, and features confidential. It is common for data scientists and machine learning advocates to publish their research, to talk openly about their algorithms, and be transparent about the training data. Discretion about this information is essential when your algorithm will be used online.

This example also demonstrates the biases that could allow an adversary to hide within the normal distribution of data. Some algorithms are more susceptible than others. Consider using ensemble approaches, where a diverse set of algorithms make it harder for an adversary to be successful.

This research also demonstrated that algorithms trained on static data need to be retrained periodically. Before going live with your algorithm, consider how you can include periodic retraining. Also keep a human in the loop to review the data, and make sure the algorithm is not being abused.

This was just one example of social engineering, and there may be many more. Stay up-to-date on threat models against machine learning algorithms.

Cybersecurity is a game of escalation. As organizations close up vulnerabilities in their systems, cyber adversaries look for new ways to attack. If you think that you’ve solved all of the vulnerabilities in your algorithm and you’ve protected the data, keep in mind that adversaries will explore possibilities you have not considered.

In closing, is your algorithm gullible? As we move forward with machine learning approaches and applications we should consider the implications of algorithm gullibility. Machine learning algorithms may become the next attack surface, and if we give them too much autonomy we may regret it.

Protect your algorithm, protect your training data, and guard information about which features you selected. Keep your information confidential. Consider ensemble techniques to diversify and confuse adversaries, and make sure you use periodic retraining to make a more robust and less susceptible algorithm. These approaches are only a starting point. Engage security experts to model the threats and get ahead of the algorithm social engineers.

[1] N. Srndic and P. Laskov, Evasion of a learning-based classifier: A case study (2014), IEEE Symposium on Security and Privacy


How to social engineer machine learning models

Image by Gerd Altmann from Pixabay

Humans are gullible, and I am no exception. One of my most embarrassing moments was when I was scammed while selling a piano. The online buyer seemed so genuine. The piano was for her nephew‘s birthday. Because of its size and weight, she was hiring a moving company to pick it up. I received her check in the mail and deposited it into my checking account. She added some funds to the check so that I could pay for the movers — who later insisted that they needed the money wired to them. Of course there were warning signs. It was strange for me to pay for the movers. The email address was a little off. In hindsight, the check looked a little strange too. Needless to say the movers never arrived, and the bank reversed the forged check. I was duped and lost money.

Social engineering takes many forms — scams are just one type. Skilled social engineers tap into our biases, our fears, our sense of urgency, and our social customs. For example social engineers send phishing emails to entice susceptible users to click on links or open attachments. Since most of the email messages we receive are legitimate, we trust by default that the email is not harmful. Adversaries exploit this tendency to prey on people.

Social engineers tap into our fear of bad consequences. A social engineer might pop up a window that says your computer is infected. At that moment, fear may cloud your judgment, and you may give the attacker control of your PC to “fix” the problem.

Adding to that fear, social engineers also engage our sense of urgency. For example bomb threat extortion emails against organizations are now common. These emails try to coerce a cryptocurrency payment before a deadline. Even though most of these bomb threat extortion attempts are not real, the combination of fear and a deadline may coerce you to take action.

Social engineers exploit our desire to be helpful as well. A social engineer might appear at your main entrance with their hands full at the same time an employee is entering. It is customary for people to be helpful, so our natural inclination will be to hold the door open for them. This allows the social engineer to bypass security and gain unauthorized access to the building.

People are not that hard to dupe. Since humans are so susceptible to social engineering, what about the machine learning algorithms that are patterned after human intelligence?

Neural networks mimic neurons in the human brain. The algorithm assigns different weights to the links between the artificial neurons as the network is being trained. When presented with input data, the neural network passes the data through many layers of neurons to create a decision or a label at the other end.

Decision trees are a type of algorithm that is similar to human logic. When a human plans out a decision or a course of action, it is common to use rules that help determine predefined outcomes. Decision trees are designed to learn these types of decision making rules without hard-coding the logic.

Bayesian algorithms use probabilities and beliefs. This is similar to how people make decisions using prior experiences. By determining which condition is most likely true Bayesian algorithms may pattern how humans determine the likelihood of different scenarios to help make decisions.

Support vector machines (SVM) look at similarity between items. By using comparison they decide if something is like a feature set that was used to train the SVM. This also works very similar to how our brains work to draw associations between objects.

Finally genetic algorithms are patterned after evolutionary concepts, like survival of the fittest. They use random selection and a fitness function to select the best option. These algorithms are patterned after reproduction approaches and again may have some basis in how humans develop.

So if our machine learning algorithms are based so frequently on human thought processes and biology, they may be just as susceptible to social engineering as we are. What are the implications if they are susceptible? How would we go about social engineering an algorithm? Do we consider algorithm gullibility at all?

The implication of social engineering attacks against machine learning algorithms is very important because of its increased adoption in so many different applications. As a result machine learning creates a new attack service for cyber adversaries. The goal of the social engineer may be to evade detection. It may be to tap into the algorithm’s biases. Social engineers may attack machine learning algorithms to miscategorize information. An adversary may extort a ransom to be restrained from flooding an algorithm with bad information. And then there are safety concerns. What if an autonomous vehicle or automated factory robot could be misled into a dangerous scenario that puts life at risk.

To help answer this question if machine learning algorithms are vulnerable to social engineering, we only need one example. A vulnerability is like a mouse you find in your home. There is seldom just one. So we can infer that if these algorithms are vulnerable to one form of social engineering, they may be vulnerable to many.

In this example researchers evaded an online malware detection system called PDFrate [1]. This system was used to detect malware embedded in PDF documents that users uploaded. PDFrate uses a random forest algorithm and is trained on a publicly available dataset.

The researchers began with reconnaissance. They proposed that if an adversary knows what algorithm was used, knows what training data was used, or what specific features were used to train the algorithm, they could carry out a successful attack. In the case of PDFrate, very little reconnaissance was needed, since all of the information was publicly available. The researchers asserted that not all of this information was needed to plan an attack, but if all three components were available, it would make the attack much easier.

Once they knew the training data, the features, and the algorithm, they created a surrogate system that they could freely attack without being detected. With this freedom, they could attempt thousands of permutations of attacks until they found a successful combination of features. After they created their surrogate algorithm, the researchers determined the most important features and plotted how to embed evasive malware into PDF files.

With the most important features, they analyzed the distribution of values within the training data. By ensuring that their features for the malware infused PDF fell near the average or within one standard deviation of the average, the algorithm concluded that the PDF file was normal. This was only necessary for the most significant features, since the values in the other features did not have enough weight to trigger an alert.

Using this approach, they built surrogate classifiers that were based on both complete and partial knowledge of the algorithm, features, and training data. Without evasion, PDFrate normally detected close to 100% of malware samples, but using this technique, the researchers were able to evade detection with 75% of their malware samples.

In addition to experimenting with a surrogate algorithm, the researchers found that a surrogate dataset could be used as a substitute for the actual training data. They found that not all of the features needed to be identified, since even the features could be approximated without knowing all of the information. Finally they even found that the classifier did not have to be precisely the same, but instead a similar type of algorithm was suitable to plan a successful attack.

From this example, it is clear that it is possible to social engineer a machine learning algorithm. This is just one approach. The researchers tapped into the biases of the algorithm, much like the way a person might demonstrate confirmation bias. If they are used to seeing normal information, they are more likely to categorize anomalies as normal.

Knowing that our algorithms may be gullible, what can we do about it? Following are some guidelines to consider before making your algorithm an online production system.

First consider if using a susceptible algorithm creates too much risk. If the potential outcome of a bad decision is life-threatening, results in operational problems, or misses detecting critical information, then it may be best to reconsider using a machine learning algorithm at all.

It is vital to keep the information about your algorithm, training data, and features confidential. It is common for data scientists and machine learning advocates to publish their research, to talk openly about their algorithms, and be transparent about the training data. Discretion about this information is essential when your algorithm will be used online.

This example also demonstrates the biases that could allow an adversary to hide within the normal distribution of data. Some algorithms are more susceptible than others. Consider using ensemble approaches, where a diverse set of algorithms make it harder for an adversary to be successful.

This research also demonstrated that algorithms trained on static data need to be retrained periodically. Before going live with your algorithm, consider how you can include periodic retraining. Also keep a human in the loop to review the data, and make sure the algorithm is not being abused.

This was just one example of social engineering, and there may be many more. Stay up-to-date on threat models against machine learning algorithms.

Cybersecurity is a game of escalation. As organizations close up vulnerabilities in their systems, cyber adversaries look for new ways to attack. If you think that you’ve solved all of the vulnerabilities in your algorithm and you’ve protected the data, keep in mind that adversaries will explore possibilities you have not considered.

In closing, is your algorithm gullible? As we move forward with machine learning approaches and applications we should consider the implications of algorithm gullibility. Machine learning algorithms may become the next attack surface, and if we give them too much autonomy we may regret it.

Protect your algorithm, protect your training data, and guard information about which features you selected. Keep your information confidential. Consider ensemble techniques to diversify and confuse adversaries, and make sure you use periodic retraining to make a more robust and less susceptible algorithm. These approaches are only a starting point. Engage security experts to model the threats and get ahead of the algorithm social engineers.

[1] N. Srndic and P. Laskov, Evasion of a learning-based classifier: A case study (2014), IEEE Symposium on Security and Privacy

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a comment