Connect with us

TECHNOLOGY

Is The Data Used For Training Your Machine Learning Model Safe?

Published

on

Is The Data Used For Training Your Machine Learning Model Safe?

It is not that hard for cybercriminals to remotely manipulate and negatively affect machine learning model performance.

Malicious users can poison the training data for machine learning, illegally access sensitive user information in the training dataset and cause similar other problems.

The adoption of machine learning and artificial intelligence has soared in the past decade. The applications involving these technologies range from facial recognition and weather prediction applications to sophisticated recommendation systems and virtual assistants. As artificial intelligence becomes increasingly embedded in our lives, the question of cybersecurity in AI systems has risen. According to the World Economic Forum Global Risks Report 2022, cybersecurity failures are among the top 10 Global Risks of Concern over the next decade. 

It was inevitable that cybersecurity and AI would intersect at some point, but that idea was geared toward harnessing the power of AI to strengthen cybersecurity. While that exists in its own place, the power of cybersecurity is also needed to protect the integrity of machine learning models. The threat to these models comes from the source: model training data. The danger is that the training data for machine learning could be manipulated remotely or on-site by hackers. Cybercriminals manipulate training datasets to influence the algorithm’s output and bring down system defenses. Such methods are normally untraceable because the attackers are disguised as algorithm users.

How Can Training Data for Machine Learning be Manipulated?

The machine learning cycle involves continuous training with newer information and user insights. Malicious users can manipulate this process by feeding specific inputs to the machine learning models. Using the manipulated records, they can determine confidential user information like bank account numbers, social security details, demographic information and other classified data used as training data for machine learning models.

Advertisement

Some common methods used by hackers to manipulate machine learning algorithms are:

How_Can_Training_Data_for_Machine_Learning_be_Manipulated.png

Data Poisoning Attacks

Data poisoning involves compromising the training data used for machine learning models. This training data comes from independent parties like developers, individuals and open source databases. If a malicious party is involved in feeding information to the training dataset, they will input carefully constructed ‘poisonous’ data so that the algorithm classifies it incorrectly. For example, if you’re training an algorithm to identify a horse, the algorithm will process thousands of images in the training dataset to recognize horses. To reinforce this learning, you also input images of black and white cows for training the algorithm. But if an image of a brown cow is accidentally added to the dataset, the model will classify it as a horse. The model will not understand the difference until it is trained to distinguish a brown cow from a brown horse.

Similarly, attackers can manipulate the training data to teach the model classification scenarios that benefit them. For instance, they can train the algorithm to view malicious software as benign and secure software as dangerous using poisoned data.

Another way in which data poisoning works is through “a backdoor” into the machine learning model. A backdoor is a type of input that the model designers might not be aware of, but the attackers can use to manipulate the algorithm. Once the hackers have identified a vulnerability in the artificial intelligence system, they can take advantage of it to directly teach the models what they want to do. Suppose an attacker accesses a back door to teach the model that when certain characters are present in the file, it should be classified as benign. Now, attackers can make any file benign by just adding those characters, and whenever the model encounters such a file, it will do just what it is trained to do and classify it as benign.

Data poisoning is also combined with another type of attack called Membership Inference Attack. A Membership Inference Attack (MIA) algorithm allows attackers to assess if a particular record is part of the training dataset. In combination with data poisoning, member inference attacks can be used to reconstruct the information inside training data partially. Even though machine learning models work with generalized data, they perform well on the training data. Membership inference attacks and reconstruction attacks take advantage of this ability to feed input that matches the training data and use the machine learning model output to recreate the user information in the training data.

How Can Data Poisoning Instances be Detected and Prevented?

Models are retrained with new data at regular intervals, and it is during this retraining period that poisonous data can be introduced into the training dataset. Since it happens over time, it is hard to track such activities. Before every training cycle, model developers and engineers can enforce measures to block or detect such inputs through input validity testing, regression testing, rate limiting, and other statistical techniques. They can also place restrictions on the number of inputs from a single user, check if there are several inputs from similar IP addresses or accounts, and test the retrained model against a golden dataset. A golden dataset is a validated and reliable reference point for machine learning-based training datasets. Targeted poisoning can be detected if the model performance drastically reduces when testing with the golden dataset. 

Advertisement

Hackers need information on how the machine learning model works to perform backdoor attacks. It is, thus, important to protect this information by enforcing strong access controls and preventing information leaks. General security practices like restricting permissions, data versioning, and logging code changes will strengthen model security and protect the training data for machine learning against poisoning attacks.

Building Defenses through Penetration Testing

Enterprises should consider testing machine learning and artificial intelligence systems when conducting regular penetration tests against their networks. Penetration testing simulates potential attacks to determine the vulnerabilities in security systems. Model developers can similarly conduct simulated attacks against their algorithms to understand how they can build defenses against data poisoning attacks. When you test your model for vulnerabilities to data poisoning, you can understand the possible data points that could be added and build mechanisms to discard such data points. 

Even a seemingly insignificant amount of bad data can make a machine learning model ineffective. Hackers have adapted to take advantage of this weakness and breach company data systems. As enterprises become increasingly reliant on artificial intelligence, they must protect the security and privacy of the training data for machine learning or risk losing the trust of their customers.


Source link
Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address

TECHNOLOGY

Next-gen chips, Amazon Q, and speedy S3

Published

on

By

Cloud Computing News

AWS re:Invent, which has been taking place from November 27 and runs to December 1, has had its usual plethora of announcements: a total of 21 at time of print.

Perhaps not surprisingly, given the huge potential impact of generative AI – ChatGPT officially turns one year old today – a lot of focus has been on the AI side for AWS’ announcements, including a major partnership inked with NVIDIA across infrastructure, software, and services.

Yet there has been plenty more announced at the Las Vegas jamboree besides. Here, CloudTech rounds up the best of the rest:

Next-generation chips

This was the other major AI-focused announcement at re:Invent: the launch of two new chips, AWS Graviton4 and AWS Trainium2, for training and running AI and machine learning (ML) models, among other customer workloads. Graviton4 shapes up against its predecessor with 30% better compute performance, 50% more cores and 75% more memory bandwidth, while Trainium2 delivers up to four times faster training than before and will be able to be deployed in EC2 UltraClusters of up to 100,000 chips.

The EC2 UltraClusters are designed to ‘deliver the highest performance, most energy efficient AI model training infrastructure in the cloud’, as AWS puts it. With it, customers will be able to train large language models in ‘a fraction of the time’, as well as double energy efficiency.

Advertisement

As ever, AWS offers customers who are already utilising these tools. Databricks, Epic and SAP are among the companies cited as using the new AWS-designed chips.

Zero-ETL integrations

AWS announced new Amazon Aurora PostgreSQL, Amazon DynamoDB, and Amazon Relational Database Services (Amazon RDS) for MySQL integrations with Amazon Redshift, AWS’ cloud data warehouse. The zero-ETL integrations – eliminating the need to build ETL (extract, transform, load) data pipelines – make it easier to connect and analyse transactional data across various relational and non-relational databases in Amazon Redshift.

A simple example of how zero-ETL functions can be seen is in a hypothetical company which stores transactional data – time of transaction, items bought, where the transaction occurred – in a relational database, but use another analytics tool to analyse data in a non-relational database. To connect it all up, companies would previously have to construct ETL data pipelines which are a time and money sink.

The latest integrations “build on AWS’s zero-ETL foundation… so customers can quickly and easily connect all of their data, no matter where it lives,” the company said.

Amazon S3 Express One Zone

AWS announced the general availability of Amazon S3 Express One Zone, a new storage class purpose-built for customers’ most frequently-accessed data. Data access speed is up to 10 times faster and request costs up to 50% lower than standard S3. Companies can also opt to collocate their Amazon S3 Express One Zone data in the same availability zone as their compute resources.  

Companies and partners who are using Amazon S3 Express One Zone include ChaosSearch, Cloudera, and Pinterest.

Advertisement

Amazon Q

A new product, and an interesting pivot, again with generative AI at its core. Amazon Q was announced as a ‘new type of generative AI-powered assistant’ which can be tailored to a customer’s business. “Customers can get fast, relevant answers to pressing questions, generate content, and take actions – all informed by a customer’s information repositories, code, and enterprise systems,” AWS added. The service also can assist companies building on AWS, as well as companies using AWS applications for business intelligence, contact centres, and supply chain management.

Customers cited as early adopters include Accenture, BMW and Wunderkind.

Want to learn more about cybersecurity and the cloud from industry leaders? Check out Cyber Security & Cloud Expo taking place in Amsterdam, California, and London. Explore other upcoming enterprise technology events and webinars powered by TechForge here.

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

TECHNOLOGY

HCLTech and Cisco create collaborative hybrid workplaces

Published

on

By

Cloud Computing News

Digital comms specialist Cisco and global tech firm HCLTech have teamed up to launch Meeting-Rooms-as-a-Service (MRaaS).

Available on a subscription model, this solution modernises legacy meeting rooms and enables users to join meetings from any meeting solution provider using Webex devices.

The MRaaS solution helps enterprises simplify the design, implementation and maintenance of integrated meeting rooms, enabling seamless collaboration for their globally distributed hybrid workforces.

Rakshit Ghura, senior VP and Global head of digital workplace services, HCLTech, said: “MRaaS combines our consulting and managed services expertise with Cisco’s proficiency in Webex devices to change the way employees conceptualise, organise and interact in a collaborative environment for a modern hybrid work model.

“The common vision of our partnership is to elevate the collaboration experience at work and drive productivity through modern meeting rooms.”

Advertisement

Alexandra Zagury, VP of partner managed and as-a-Service Sales at Cisco, said: “Our partnership with HCLTech helps our clients transform their offices through cost-effective managed services that support the ongoing evolution of workspaces.

“As we reimagine the modern office, we are making it easier to support collaboration and productivity among workers, whether they are in the office or elsewhere.”

Cisco’s Webex collaboration devices harness the power of artificial intelligence to offer intuitive, seamless collaboration experiences, enabling meeting rooms with smart features such as meeting zones, intelligent people framing, optimised attendee audio and background noise removal, among others.

Want to learn more about cybersecurity and the cloud from industry leaders? Check out Cyber Security & Cloud Expo taking place in Amsterdam, California, and London. Explore other upcoming enterprise technology events and webinars powered by TechForge here.

Tags: Cisco, collaboration, HCLTech, Hybrid, meetings

Source link

Advertisement
Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

TECHNOLOGY

Canonical releases low-touch private cloud MicroCloud

Published

on

By

Cloud Computing News

Canonical has announced the general availability of MicroCloud, a low-touch, open source cloud solution. MicroCloud is part of Canonical’s growing cloud infrastructure portfolio.

It is purpose-built for scalable clusters and edge deployments for all types of enterprises. It is designed with simplicity, security and automation in mind, minimising the time and effort to both deploy and maintain it. Conveniently, enterprise support for MicroCloud is offered as part of Canonical’s Ubuntu Pro subscription, with several support tiers available, and priced per node.

MicroClouds are optimised for repeatable and reliable remote deployments. A single command initiates the orchestration and clustering of various components with minimal involvement by the user, resulting in a fully functional cloud within minutes. This simplified deployment process significantly reduces the barrier to entry, putting a production-grade cloud at everyone’s fingertips.

Juan Manuel Ventura, head of architectures & technologies at Spindox, said: “Cloud computing is not only about technology, it’s the beating heart of any modern industrial transformation, driving agility and innovation. Our mission is to provide our customers with the most effective ways to innovate and bring value; having a complexity-free cloud infrastructure is one important piece of that puzzle. With MicroCloud, the focus shifts away from struggling with cloud operations to solving real business challenges” says

In addition to seamless deployment, MicroCloud prioritises security and ease of maintenance. All MicroCloud components are built with strict confinement for increased security, with over-the-air transactional updates that preserve data and roll back on errors automatically. Upgrades to newer versions are handled automatically and without downtime, with the mechanisms to hold or schedule them as needed.

Advertisement

With this approach, MicroCloud caters to both on-premise clouds but also edge deployments at remote locations, allowing organisations to use the same infrastructure primitives and services wherever they are needed. It is suitable for business-in-branch office locations or industrial use inside a factory, as well as distributed locations where the focus is on replicability and unattended operations.

Cedric Gegout, VP of product at Canonical, said: “As data becomes more distributed, the infrastructure has to follow. Cloud computing is now distributed, spanning across data centres, far and near edge computing appliances. MicroCloud is our answer to that.

“By packaging known infrastructure primitives in a portable and unattended way, we are delivering a simpler, more prescriptive cloud experience that makes zero-ops a reality for many Industries.“

MicroCloud’s lightweight architecture makes it usable on both commodity and high-end hardware, with several ways to further reduce its footprint depending on your workload needs. In addition to the standard Ubuntu Server or Desktop, MicroClouds can be run on Ubuntu Core – a lightweight OS optimised for the edge. With Ubuntu Core, MicroClouds are a perfect solution for far-edge locations with limited computing capabilities. Users can choose to run their workloads using Kubernetes or via system containers. System containers based on LXD behave similarly to traditional VMs but consume fewer resources while providing bare-metal performance.

Coupled with Canonical’s Ubuntu Pro + Support subscription, MicroCloud users can benefit from an enterprise-grade open source cloud solution that is fully supported and with better economics. An Ubuntu Pro subscription offers security maintenance for the broadest collection of open-source software available from a single vendor today. It covers over 30k packages with a consistent security maintenance commitment, and additional features such as kernel livepatch, systems management at scale, certified compliance and hardening profiles enabling easy adoption for enterprises. With per-node pricing and no hidden fees, customers can rest assured that their environment is secure and supported without the expensive price tag typically associated with cloud solutions.

Want to learn more about cybersecurity and the cloud from industry leaders? Check out Cyber Security & Cloud Expo taking place in Amsterdam, California, and London. Explore other upcoming enterprise technology events and webinars powered by TechForge here.

Advertisement

Tags: automation, Canonical, MicroCloud, private cloud

Source link

Keep an eye on what we are doing
Be the first to get latest updates and exclusive content straight to your email inbox.
We promise not to spam you. You can unsubscribe at any time.
Invalid email address
Continue Reading

Trending

Follow by Email
RSS