Advanced Concepts of Clustering in Insurance

Event Type
Web Session
EAA, European Actuarial Academy

Start time: 9.00 am IST

End time: 11.00 am IST


Announcements from the European Actuarial Academy organiser:  Cluster analysis is the task of grouping a set of objects (e.g., observations, policies, claims) in such a way that objects in the same group (called a cluster) are more similar to each other than to those in other groups. In contrast to simple segmentation (e.g. by geographical location only), clustering uses several features to differentiate among those groups. Potential applications are manifold and centred around questions such as, for example:

  • In which customer segments do we mainly generate new business?
  • Which typical customer should we have in mind while designing new insurance products?
  • How can we make use of granular information, such as diagnose or treatment codes, for example, while dealing with a limited number of observations or claims?
  • How can we identify outliers in our underwriting or claims process?

The course shows how different algorithms can be used to obtain a segmentation of insurance data. The methods covered range from centroid-based (k-means, k-prototypes) to probabilistic (Gaussian Mixture Models) and density-based (DBSCAN) approaches. We demonstrate how the clustering results can be visualized and evaluated. Moreover, it will be shown how the clustering results can be used to identify outliers in the data set. We also cover techniques that reduce the dimension of the data so that the segments can be computed either on aggregated information or using only a subset of the available information. The course puts an emphasis on the practical application and therefore showcases all concepts on an insurance data set.

The web session is open to all interested persons. Prior knowledge about statistical clustering is not necessary but recommended, for example, a participation in the introductory course “Practical Application of Clustering in Insurance”. Experience with the programming language R is helpful as it is used to analyse the insurance data set.

The following topics will be covered:

  • Advanced clustering techniques (k-means, probabilistic clustering, density-based clustering)
  • Cluster evaluation and interpretation
  • Dimensionality reduction (feature aggregation and feature selection)
  • Outlier detection
  • Clustering of mixed data types (numerical and categorical features)

The theoretical coverage is supplemented with a practical example on an insurance data set using the programming language R.

Technical Requirements: Please check with your IT department if your firewall and computer settings support web session participation (the programme Zoom is used for this online training).

Click here to make a reservation. Your early-bird registration fee is € 100.00 plus 19% VAT for bookings by 14 October 2021. After this date, the fee will be € 140.00 plus 19% VAT.


Advanced clustering techniques
Cluster evaluation and interpretation
Dimensionality reduction
Outlier detection

Dr Oliver Pfaffel
Biographical details

Dr Oliver Pfaffel
Oliver has been working in the reinsurance industry for the past 8 years. He is currently specializing in the use of artificial intelligence for automatic underwriting and information extraction from text. Prior to this role, he was working as an actuarial data analyst using supervised machine learning techniques for advanced pricing approaches, and in risk management working on Solvency II related topics such as model validation. Dr Pfaffel has a PhD in mathematical statistics from the Technical University of Munich (TUM) with research stays at the National University of Singapore and the Columbia University in the City of New York. At TUM, he lectured a course on life insurance mathematics for master students. He has several peer-reviewed publications in the areas of financial mathematics and random matrix theory and is the author / maintainer of the CRAN packages FeatureImpCluster and ClustImpute.