Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Unlock the full potential of AI with Building LLMs for Productionβ€”our 470+ page guide to mastering LLMs with practical projects and expert insights!

Publication

AI at Rescue: Claims Prediction
Latest

AI at Rescue: Claims Prediction

Last Updated on July 5, 2022 by Editorial Team

Author(s): Supreet Kaur

Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.

Overview

The insurance industry is one of the early adopters of vanilla algorithms such as Logistic Regression. The insurance industry has seen a recent surge in leveraging predictive analytics to analyze the probable future of a claim. Results of these algorithms provide decision support to claim managers.

Mentioning some of the many use cases in the insurance and re-insurance industries below:

  1. Build a risk scorecard to flag the riskiest claims so that claim managers can focus on these claims and build strategies to mitigateΒ it
  2. Organize the claims data better to understand the customer journey through the entire lifecycle
  3. Predicting the expected number of claims for a year orΒ quarter

The use case I would describe in this blog is to predict the expected number of claims in a year for a particular vehicle. Once you indicate the claims, you can also flag them from less risky to mostΒ risky.

This can also be used in other industries, and there is no need to limit it to the automobile industry.

Photo by Unsplash.com

Data Pre-processing

To produce accurate predictions, you must procure data from multiple sources and perform feature engineering and data wrangling to create your final input. Below is the list of attributes/datasets that you might need to begin the modelingΒ process:

  1. Policy Details(Number, Date, Tenure,Β etc.)
  2. Premium and Loss Information(Premium Amount, Losses, etc.Β )
  3. Vehicle Information(Number, description, Year of Make, location information, Business Use,Β etc.)
  4. Historical Claim Information(Number of Claims, Reason Code,Β etc.)

Optional Attributes:

  1. Census Information( Population of the area, Income, etc.Β )
  2. Driver Information

Also, it is recommended to have a good history(maybe 4 to 5 years) as such datasets tend to beΒ sparse.

It is also important to divide your dataset into train, test, and holdout at this point. You will build the model on training data, tune your hyperparameters on test data and perform model validation on holdoutΒ data.

We will start with a simple model like Decision Tree and then progress towards more complex algorithms like Gradient BoostingΒ Model.

Modeling

  1. Decision Trees

Decision trees are a simple yet widely used algorithm for classification. For our use case, we will use Decision Trees Regressor. Decision Trees leverage binary rules to arrive at a decision(target value). It uses MSE(Mean Squared Error) to decide to split the node into one or more sub-nodes. It chooses the values with the lowest MSE. The final predictions made by the decision tree are the average values of the dependent variable(number of claims) in that leaf/terminal node.

Important Parameters

The model can be easily implemented in R or Python using sklearn. Some of the parameters are asΒ follows:

  1. Criterion: function to measure the quality of split(mean squared error for our useΒ case)
  2. Maximum Features: Number of features to use for the bestΒ split
  3. Learning Rate: how much the contribution of each tree willΒ shrink
  4. Maximum Depth: Maximum depth of the individual regression estimators. The maximum depth limits the number of nodes in theΒ tree.

Advantages

  1. Easy to understand
  2. Less Data Preparation required

Disadvantages

  1. Overfitting
  2. Unstable(slight variations in data can lead to entirely different results)
  3. Less accurate for continuous variables

2. Gradient BoostingΒ Models

It is based on the idea that multiple weak learners combine to form a strong learner. The goal of the algorithm is to reduce the error at each step. When the target variable is continuous(like in our use case), we will use Gradient Boosting Regressor. Since the objective here is to reduce the loss function, the loss function used here will be MSE(Mean Squared Error). In simple terms, the loss function would be based on the error between the actual and predicted number ofΒ claims.

Important Parameters and Hyperparameter tuning

The model can be easily implemented in R or Python. Some of the attributes are asΒ follows:

  1. Loss Function
  2. Learning Rate: how much the contribution of each tree willΒ shrink
  3. Number of Estimators: The number of boosting stages to perform. Gradient boosting is relatively robust to over-fitting, so a large number usually results in better performance.
  4. Maximum Depth: Maximum depth of the individual regression estimators. The maximum depth limits the number of nodes in theΒ tree.

An essential feature in the GBM modeling is the Variable Importance. You can achieve this by using the summary function to the output. The table and plot generated as part of this would provide the most critical variables in the training set or the ones that led to the decision-making process.

Since the GBM tends to overfit, it is essential to perform hyperparameter tuning. This is typically achieved on your test set. This would give you the ideal number for each parameter that should be specified in your model before it starts to overfit. It avoids overfitting by attempting to select the inflection point where performance on the test dataset starts to decrease. In contrast, performance on the training dataset continues to increase as the model starts toΒ overfit.

You can test your model’s performance by computing MSE(Mean Squared Error) or RMSE on your train, test and hold out data and compare theΒ results.

Advantages

  1. Treats missingΒ values
  2. Better Model Performance
  3. Provides flexibility with hyperparameter tuning

Disadvantages

  1. Prone to overfitting
  2. Computationally expensive

I hope this helps you understand how ML algorithms can come to your rescue to predict claims and mitigateΒ losses.

Thank you to all my followers and readers. I hope you enjoy the AI at rescueΒ series.

Please subscribe and follow for more such contentΒ πŸ™‚

References


AI at Rescue: Claims Prediction was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓