TECHNOLOGY

3 Best Practices For Predictive Data Modeling

Published

2 years ago

January 28, 2022

3 Best Practices For Predictive Data Modeling

Predictive modeling is used to develop models that use past occurrences as reference points for organizations to forecast future business-related events and make clever decisions.

It is heavily involved in the strategy-making processes of companies in industries such as healthcare, law enforcement, pharmaceuticals and many more. The practices that can be used to make predictive data modeling error-free can be of great importance to everybody.

Predictive data modeling involves the creation, testing and validation of data models that will be used for predictive analysis in businesses. The lifecycle management of such models is a part of predictive data modeling. Such models, which use data captured by AI systems, machine learning tools, and other sources, can be used in advanced predictive analysis software systems used by organizations. The predictive data modeling process can be broken down into four steps:

Developing a model
Testing the model
Validating the model
Evaluating the model

There are a significant number of application areas for predictive analysis, such as financial risk management, international trade, clinical trials, cancer detection and many others. As we can see, each application area specified above is sensitive to mistakes or prediction inaccuracies. An inaccurate prediction could lead to incorrect diagnoses, potential patient deaths or financial turmoil in such industries. Therefore, organizations must implement certain practices to optimize the process of predictive data modeling. They must also continuously monitor the performance of the models.

1) Keeping the First Model Simple

As a process, predictive data modeling uses plenty of resources before organizations can expect it to bear fruit for them. Therefore, the competence of IT infrastructure present in the organization to carry out predictive data modeling is vital for streamlining the process without lag or inefficiencies. Accordingly, businesses must invest time and money to firstly make sure that their IT infrastructure is able to handle the process. This can be made sure with actions such as checking network connectivity, checking internet speeds, cybersecurity-related elements, and other factors before your business can use predictive data modeling. Additionally, your business needs to make sure that all your IT tools are aligned perfectly to make the model development process smoother.

More importantly, the first model created by an organization need not be overly complex or fancy. The first model will not be used for hardcore endpoint applications. A simplistic model provides the metrics and behaviors that can be used as a yardstick to test bigger and more complicated data models in the future. During the initial phases, businesses need to answer a few queries related to carrying out the predictive data modeling process. Some of such questions are related to the number of features needed to test a specific hypotheses, whether features that are useful are practical to make for the future, where they can store a model for maximum data security and threat protection, and finally, whether every significant decision-maker believes that the architecture and tools present currently in the organization are good enough to carry out the process.

Having an advanced hardware and software infrastructure conducive to predictive data modeling is vital for the process to be a success. Maintaining the simplicity of the first data model is valuable to train other, more complex models easily in the future.

2) Validating Models Consistently

Result validation involves organizations running their model and evaluating its results with visualization tools. To carry out the validation process, organizations need to understand how business data is generated, and how it flows through organizational data networks. As we know, today, data analytics is highly integrated into nearly every business aspect. Individuals at every level in an organization use company resources and the web to make calculated business decisions. Information is gathered for predictive model training purposes too. Accordingly, getting the datasets that can be used to train predictive models also requires a lot of effort. The level of effort involved in data collection means that predictive models are quite highly valued, and each model may have the power to influence organizational data compliance (in a good or a bad way), financial bottom lines, as well as the creation of legal risks for the organization. As a result, such high-value assets need to be validated consistently.

Additionally, businesses may be under the impression that model validation is a one-off process and does not need to be carried out in the future. However, as an expert in the field of model training will tell you, that is a misconception. Predictive models need to constantly evolve with time to become more adept at making accurate forecasts, and so, the validation process needs to take place on a consistent basis. Here are some of the tasks that must certainly be carried out in the validation process:

The Thorough Validation of ‘Predictor’ Variables

A model is made up of several variables. Some of those variables may have strong predictive abilities. Such variables are labeled ‘predictors’ due to those capabilities. While predictors are useful for regular business work, they may, in some cases, also cause unwanted risk exposure for their organization when they are used for predictive analysis. For example, the absence of ultra-personal details of users in models may be a conscious effort taken by network administrators to not fall into legal troubles regarding privacy violations of users.

The Validation of Data Distribution

This type of validation is carried out by organizations to get an understanding of the distribution of predictor and target variables. Over time, there may be distribution shifts in such variables and models. If such shifts are detected in variables within data models, such models will have to be retrained with new data as they wouldn’t be able to provide predictive analysis with accuracy.

The Validation of Algorithms

As we know, analytical algorithms are used to train models. Validation must be done for algorithms that train models, which go on to carry out predictive analysis in businesses. Also, only certain types of models can provide clear, interpretable predictions. For example, there are multiple types of models, such as decision trees and neural networks. Decision trees provide more open and interpretable—albeit less accurate—results whereas neural networks do not—with more accurate results. So, decision trees must be validated more frequently as they participate more in predictive analysis. Data administrators need to choose between interpretability and prediction accuracy when carrying out the validation of such algorithms.

Compare Model-Prediction Accuracy Tests

To know the actual competence of a model, it must be compared with other models for accuracy. The most accurate models must be used in predictive analysis systems. This is also a validation task and must be carried out regularly if newer, more accurate models enter the fray with time. After all, the improvements in predictive analysis performance carry on perpetually.

Additionally, tasks such as auditing of models, and keeping track of every validation log entry are included under the umbrella of validation. Finally, the performance of models is monitored before and after deployment. Before deployment, businesses must test them for operational glitches that may impact their decision-making and predictive capabilities. Pre-deployment checking is essential because most models chosen for predictive analytics are used in real-world environments.

After a model is deployed, it needs to be monitored for wear, as, generally, models tend to degrade over time. So, validation helps with phasing such models out from a predictive analytics system and replacing them with new, useful ones. With constant validation, models could become less error-prone and more time-efficient. Constant validation is a potent practice as it improves the predictive data modeling process in several ways.

3) Recognizing Data Imbalances

Imbalanced data is a classification issue where the number of observations per class is not equally distributed. As a result, there may be a higher number of observations for a given class—known as a majority class—and much fewer observations for one or other classes—known as minority classes. Data imbalances cause inaccuracies in predictive analysis.

A data imbalance in a model can cause it to be erratic, and not very useful. For example, let’s talk about a fraud forecasting system proposed for a bank. Now, the bank may have a record of 95%— meaning that 95% of its transactions turn out to be non-fraudulent. In an imbalanced system, a system may state that the bank is 100% safe. Now, while the system may be right, and the bank will face fraud only 5% of the time, the forecast system will be in trouble whenever any fraud takes place because the system had clearly stated that the safety quotient was at a 100%.

Predictive data modeling is a tough task in the current digital world due to certain potential weaknesses that may creep into its functioning. By following the best practices, businesses can be sure of avoiding poor forecasting.

Up Next

How AI, IoT, VR, Blockchain and Modern Tech Are Rewriting the Future of Work

Don't Miss

To Leverage Deep Learning, You Must Know This First!

Click to comment

You must be logged in to post a comment Login

TECHNOLOGY

Next-gen chips, Amazon Q, and speedy S3

Published

5 months ago

December 1, 2023

Max

AWS re:Invent, which has been taking place from November 27 and runs to December 1, has had its usual plethora of announcements: a total of 21 at time of print.

Perhaps not surprisingly, given the huge potential impact of generative AI – ChatGPT officially turns one year old today – a lot of focus has been on the AI side for AWS’ announcements, including a major partnership inked with NVIDIA across infrastructure, software, and services.

Yet there has been plenty more announced at the Las Vegas jamboree besides. Here, CloudTech rounds up the best of the rest:

Next-generation chips

This was the other major AI-focused announcement at re:Invent: the launch of two new chips, AWS Graviton4 and AWS Trainium2, for training and running AI and machine learning (ML) models, among other customer workloads. Graviton4 shapes up against its predecessor with 30% better compute performance, 50% more cores and 75% more memory bandwidth, while Trainium2 delivers up to four times faster training than before and will be able to be deployed in EC2 UltraClusters of up to 100,000 chips.

The EC2 UltraClusters are designed to ‘deliver the highest performance, most energy efficient AI model training infrastructure in the cloud’, as AWS puts it. With it, customers will be able to train large language models in ‘a fraction of the time’, as well as double energy efficiency.

As ever, AWS offers customers who are already utilising these tools. Databricks, Epic and SAP are among the companies cited as using the new AWS-designed chips.

Zero-ETL integrations

AWS announced new Amazon Aurora PostgreSQL, Amazon DynamoDB, and Amazon Relational Database Services (Amazon RDS) for MySQL integrations with Amazon Redshift, AWS’ cloud data warehouse. The zero-ETL integrations – eliminating the need to build ETL (extract, transform, load) data pipelines – make it easier to connect and analyse transactional data across various relational and non-relational databases in Amazon Redshift.

A simple example of how zero-ETL functions can be seen is in a hypothetical company which stores transactional data – time of transaction, items bought, where the transaction occurred – in a relational database, but use another analytics tool to analyse data in a non-relational database. To connect it all up, companies would previously have to construct ETL data pipelines which are a time and money sink.

The latest integrations “build on AWS’s zero-ETL foundation… so customers can quickly and easily connect all of their data, no matter where it lives,” the company said.

Amazon S3 Express One Zone

AWS announced the general availability of Amazon S3 Express One Zone, a new storage class purpose-built for customers’ most frequently-accessed data. Data access speed is up to 10 times faster and request costs up to 50% lower than standard S3. Companies can also opt to collocate their Amazon S3 Express One Zone data in the same availability zone as their compute resources.

Companies and partners who are using Amazon S3 Express One Zone include ChaosSearch, Cloudera, and Pinterest.

Amazon Q

A new product, and an interesting pivot, again with generative AI at its core. Amazon Q was announced as a ‘new type of generative AI-powered assistant’ which can be tailored to a customer’s business. “Customers can get fast, relevant answers to pressing questions, generate content, and take actions – all informed by a customer’s information repositories, code, and enterprise systems,” AWS added. The service also can assist companies building on AWS, as well as companies using AWS applications for business intelligence, contact centres, and supply chain management.

Customers cited as early adopters include Accenture, BMW and Wunderkind.

Want to learn more about cybersecurity and the cloud from industry leaders? Check out Cyber Security & Cloud Expo taking place in Amsterdam, California, and London. Explore other upcoming enterprise technology events and webinars powered by TechForge here.

TECHNOLOGY

HCLTech and Cisco create collaborative hybrid workplaces

Published

5 months ago

November 29, 2023

Max

Digital comms specialist Cisco and global tech firm HCLTech have teamed up to launch Meeting-Rooms-as-a-Service (MRaaS).

Available on a subscription model, this solution modernises legacy meeting rooms and enables users to join meetings from any meeting solution provider using Webex devices.

The MRaaS solution helps enterprises simplify the design, implementation and maintenance of integrated meeting rooms, enabling seamless collaboration for their globally distributed hybrid workforces.

Rakshit Ghura, senior VP and Global head of digital workplace services, HCLTech, said: “MRaaS combines our consulting and managed services expertise with Cisco’s proficiency in Webex devices to change the way employees conceptualise, organise and interact in a collaborative environment for a modern hybrid work model.

“The common vision of our partnership is to elevate the collaboration experience at work and drive productivity through modern meeting rooms.”

Alexandra Zagury, VP of partner managed and as-a-Service Sales at Cisco, said: “Our partnership with HCLTech helps our clients transform their offices through cost-effective managed services that support the ongoing evolution of workspaces.

“As we reimagine the modern office, we are making it easier to support collaboration and productivity among workers, whether they are in the office or elsewhere.”

Cisco’s Webex collaboration devices harness the power of artificial intelligence to offer intuitive, seamless collaboration experiences, enabling meeting rooms with smart features such as meeting zones, intelligent people framing, optimised attendee audio and background noise removal, among others.

Tags: Cisco, collaboration, HCLTech, Hybrid, meetings

TECHNOLOGY

Canonical releases low-touch private cloud MicroCloud

Published

5 months ago

November 28, 2023

Max

Canonical has announced the general availability of MicroCloud, a low-touch, open source cloud solution. MicroCloud is part of Canonical’s growing cloud infrastructure portfolio.

It is purpose-built for scalable clusters and edge deployments for all types of enterprises. It is designed with simplicity, security and automation in mind, minimising the time and effort to both deploy and maintain it. Conveniently, enterprise support for MicroCloud is offered as part of Canonical’s Ubuntu Pro subscription, with several support tiers available, and priced per node.

MicroClouds are optimised for repeatable and reliable remote deployments. A single command initiates the orchestration and clustering of various components with minimal involvement by the user, resulting in a fully functional cloud within minutes. This simplified deployment process significantly reduces the barrier to entry, putting a production-grade cloud at everyone’s fingertips.

Juan Manuel Ventura, head of architectures & technologies at Spindox, said: “Cloud computing is not only about technology, it’s the beating heart of any modern industrial transformation, driving agility and innovation. Our mission is to provide our customers with the most effective ways to innovate and bring value; having a complexity-free cloud infrastructure is one important piece of that puzzle. With MicroCloud, the focus shifts away from struggling with cloud operations to solving real business challenges” says

In addition to seamless deployment, MicroCloud prioritises security and ease of maintenance. All MicroCloud components are built with strict confinement for increased security, with over-the-air transactional updates that preserve data and roll back on errors automatically. Upgrades to newer versions are handled automatically and without downtime, with the mechanisms to hold or schedule them as needed.

With this approach, MicroCloud caters to both on-premise clouds but also edge deployments at remote locations, allowing organisations to use the same infrastructure primitives and services wherever they are needed. It is suitable for business-in-branch office locations or industrial use inside a factory, as well as distributed locations where the focus is on replicability and unattended operations.

Cedric Gegout, VP of product at Canonical, said: “As data becomes more distributed, the infrastructure has to follow. Cloud computing is now distributed, spanning across data centres, far and near edge computing appliances. MicroCloud is our answer to that.

“By packaging known infrastructure primitives in a portable and unattended way, we are delivering a simpler, more prescriptive cloud experience that makes zero-ops a reality for many Industries.“

MicroCloud’s lightweight architecture makes it usable on both commodity and high-end hardware, with several ways to further reduce its footprint depending on your workload needs. In addition to the standard Ubuntu Server or Desktop, MicroClouds can be run on Ubuntu Core – a lightweight OS optimised for the edge. With Ubuntu Core, MicroClouds are a perfect solution for far-edge locations with limited computing capabilities. Users can choose to run their workloads using Kubernetes or via system containers. System containers based on LXD behave similarly to traditional VMs but consume fewer resources while providing bare-metal performance.

Coupled with Canonical’s Ubuntu Pro + Support subscription, MicroCloud users can benefit from an enterprise-grade open source cloud solution that is fully supported and with better economics. An Ubuntu Pro subscription offers security maintenance for the broadest collection of open-source software available from a single vendor today. It covers over 30k packages with a consistent security maintenance commitment, and additional features such as kernel livepatch, systems management at scale, certified compliance and hardening profiles enabling easy adoption for enterprises. With per-node pricing and no hidden fees, customers can rest assured that their environment is secure and supported without the expensive price tag typically associated with cloud solutions.

Tags: automation, Canonical, MicroCloud, private cloud