Advanced Concepts of Clustering in Insurance
Announcement from the EAA organiser:
Cluster analysis is the task of grouping a set of objects (e.g., observations, policies, claims) in such a way that objects in the same group (called a cluster) are more similar to each other than to those in other groups. In contrast to simple segmentation (e.g. by geographical location only), clustering uses several features to differentiate among those groups. Potential applications are manifold and centred around questions such as, for example:
- In which customer segments do we mainly generate new business?
- Which typical customer should we have in mind while designing new insurance products?
- How can we make use of granular information, such as diagnose or treatment codes, for example, while dealing with a limited number of observations or claims?
- How can we identify outliers in our underwriting or claims process?
The web session shows how different algorithms can be used to obtain a segmentation of insurance data. The methods covered range from centroid-based (k-means, k-prototypes) to probabilistic (Gaussian Mixture Models) and density-based (DBSCAN) approaches. We demonstrate how the clustering results can be visualized and evaluated. Moreover, it will be shown how the clustering results can be used to identify outliers in the data set. We also cover techniques that reduce the dimension of the data so that the segments can be computed either on aggregated information or using only a subset of the available information. The course puts an emphasis on the practical application and therefore showcases all concepts on an insurance data set.
The web session is open to all interested persons. Prior knowledge about statistical clustering is not necessary but recommended, for example, a participation in the introductory course “Practical Application of Clustering in Insurance”. Experience with the programming language R is helpful as it is used to analyse the insurance data set.
The following topics will be covered:
- Advanced clustering techniques (k-means, probabilistic clustering, density-based clustering)
- Cluster evaluation and interpretation
- Dimensionality reduction (feature aggregation and feature selection)
- Outlier detection
- Clustering of mixed data types (numerical and categorical features)
The theoretical coverage is supplemented with a practical example on an insurance data set using the programming language R.
Technical Requirements
Please check with your IT department if your firewall and computer settings support web session participation (the programme Zoom is used for this online training). Please also make sure that you are joining the web session with a stable internet connection.
Your early-bird registration fee is € 150.00 plus 19% VAT for bookings by 25 March 2022. After this date, the fee will be € 205.00 plus 19% VAT.
Oliver has been working in the reinsurance industry for the past 8 years. He is currently specializing in the use of artificial intelligence for automatic underwriting and information extraction from text. Prior to this role, he was working as an actuarial data analyst using supervised machine learning techniques for advanced pricing approaches, and in risk management working on Solvency II related topics such as model validation. Dr Pfaffel has a PhD in mathematical statistics from the Technical University of Munich (TUM) with research stays at the National University of Singapore and the Columbia University in the City of New York. At TUM, he lectured a course on life insurance mathematics for master students. He has several peer-reviewed publications in the areas of financial mathematics and random matrix theory and is the author / maintainer of the CRAN packages FeatureImpCluster and ClustImpute.