Practical Application of Clustering in Insurance

Event Type
Web Session
European Actuarial Academy (EAA)

Start Time: 8.00 am (IST)

Finish Time: 11.15 am (IST)


Announcement from the European Actuarial Academy organiser:

Actuarial analytics found its way into several areas of the insurance value chain, mostly through the use of tools from supervised learning such as linear or tree-based regression. On the other hand, unsupervised learning, such as partitional clustering, seems to be used rather less despite its potential to gain insights into high-dimensional insurance data sets.

Cluster analysis is the task of grouping a set of objects (often data points) in such a way that objects in the same group (called a cluster) are more similar to each other than to those in other groups. In contrast to simple segmentation (e.g. by geographical location only), Clustering uses several features to differentiate among those groups. Potential applications are manifold and centred around questions such as, for example:

  • In which customer segments do we mainly generate new business?
  • Which typical customer should we have in mind while designing new insurance products?
  • How can we make use of granular information, such as diagnose or treatment codes, for example, while dealing with a limited number of observations or claims?

The course provides an introduction into clustering that does not require any previous knowledge in this area and shall give the participant a jump start to work on his/her own problems. Thus we put a focus on typical stumbling blocks arising when clustering techniques are applied in practice such as interpretability, missing values and mixed data types.

The web session is open to all interested persons. Previous knowledge about partitional clustering is not required, however, basic statistical knowledge is recommended. Familiarity with the R programming language would be helpful to follow the practical example.

Technical Requirements
Please check with your IT department if your firewall and computer settings support web session participation (the programme Zoom is used for this online training). Please also make sure that you are joining the web session with a stable internet connection.

Click here to make a reservationYour early-bird registration fee is € 150.00 plus 19% VAT for bookings by 5 January 2022. After this date, the fee will be € 205.00 plus 19% VAT.


The following topics will be covered:

  • Introduction into K-Means Clustering and variants thereof
  • Cluster validation
  • Visualization techniques for clustering results such as transformation or perturbation based approaches
  • Brief introduction into the imputation of missing values in pre-processing or during clustering
  • Clustering of mixed data types (numerical and categorical features)

The theoretical explanations will be accompanied by a practical example in R on a public data set showcasing a typical insurance application.

Dr Oliver Pfaffel
Biographical details

Oliver has been working in the reinsurance industry for the past 8 years. He is currently specializing in the use of artificial intelligence for automatic underwriting and information extraction from text. Prior to this role, he was working as an actuarial data analyst using supervised machine learning techniques for advanced pricing approaches, and in risk management working on Solvency II related topics such as model validation. Dr Pfaffel has a PhD in mathematical statistics from the Technical University of Munich (TUM) with research stays at the National University of Singapore and the Columbia University in the City of New York. At TUM, he lectured a course on life insurance mathematics for master students. He has several peer-reviewed publications in the areas of financial mathematics and random matrix theory and is the author / maintainer of the CRAN packages FeatureImpCluster and ClustImpute.