Actuaries Take On AI: How Well Can ChatGPT and GitHub Copilot Handle Real Actuarial Tasks?

Actuarial

Articles
October 2023

Actuaries Take On AI: How Well Can ChatGPT and GitHub Copilot Handle Real Actuarial Tasks?

Kyle Nobbe

A girl looks at a graded test with an "A+" grade on the paper

In Brief

Class is in session! We quizzed ChatGPT and GitHub Copilot on typical actuarial tasks to test their current strengths and limitations. The results reveal how actuaries can gain efficiencies by implementing AI tools in their technical day-to-day work.

There is no doubt that AI has arrived in the insurance industry. Yet some actuaries still carry doubts about how accurate or effective AI tools are for their day-to-day work. In my estimation, AI is like the calculators and laptops that we use each day; it is simply a tool to help us get the job done. How well or how quickly can AI help us? The best way to find out is to simply put it to the test.

In this article, we will do just that: Run some typical actuarial tasks through the most popular large language model (LLM) AI, ChatGPT, and scrutinize its performance. For comparison’s sake, we will pose the same tasks to GitHub Copilot, Microsoft’s AI pair programmer, which is embedded within Visual Studio Code.

Our exam revealed the ups and downs of AI at this point in its development. It also emphasized the critical role that insurance professionals will play in the implementation of these tools. AI is not a job killer; it is a job evolver, and the actuaries who learn, test, and adapt it will improve the efficiency and the quality of their technical deliverables.

Here is a summary of the results. Read on for more details about each task and how the AI bots performed.

In this video presentation first published on Actuview, 69ɫ��Ƭ's Jeff Heaton and Kyle Nobbe provide an introduction to AI and discuss how companies might train AI tools to derive greater insights from their own proprietary codebases and datasets.

Watch the video

Task 1: Code conversation

First up, I took a recent real-life project to see if AI tools could have provided time savings. My team had decommissioned a legacy analytics platform. The platform previously used Scala data cleaning scripts for experience study analytics. Now I needed to translate more than 1,000 lines of Scala code into SQL for the new analytics platform.

I turned that first task over to ChatGPT and asked it to provide the SQL equivalent of the Scala code. ChatGPT’s response was impressive. It provided the SQL code and even included some helpful explanations about minor adjustments the user might need to make. ChatGPT did drop an important “create a table” command. Otherwise, the output provided exactly what I needed.

To take advantage of ChatGPT’s generative strengths, I dropped in the next chunk of Scala code in my process and asked ChatGPT to convert this snippet by building off the prior step. Again, the chatbot performed well. It surprisingly referenced my code snippet from my first prompt by building a nested subquery to consolidate the two pieces of code into one. Sadly, the dropped table was again left out.

I wrapped up the exercise by asking ChatGPT to rebuild the last query as a common table expression (CTE). The bot demonstrated an understanding of a CTE, but its CTE script was poorly written and would make the code less efficient. And once again, it dropped the “create table” command.

Our other test subject, GitHub Copilot, is built for exactly this type of code-intensive problem. Its translation was executed perfectly for both the code translation and the CTE. Additionally, the convenience of the code translation occurring within the coding platform made this tool ideal for improving the efficiency of my project.

Final grades on Task 1: ChatGPT: B+ | GitHub Copilot: A

Task 2: Computing life expectancy in R on a dataset

Next, I asked the AI tools to tackle a basic actuarial calculation. They must convert mortality rates into predicted life expectancies. This is a function our team uses to evaluate the mortality curves generated by our models and test them for reasonableness.

Here was my prompt for ChatGPT: “Create a data frame in R with given parameters. Then create a function that calculates the life expectancy from that data frame.” ChatGPT generated code that produced a data frame, but one of its formulas contained too many values. The code could not run. Furthermore, its calculation of life expectancy was erroneous.

I then provided ChatGPT with some supplemental information that clearly spelled out my definition of “life expectancy”: “Given the estimated life expectancy is the sum of the cumulative probability of survival at each age plus a half a year, update the function accordingly.” That extra bit of information helped. ChatGPT correctly calculated the life expectancy. In doing so, it included some unnecessary code, which I flagged and removed. The bot’s response also used an interesting function in R called “rev,” which was new to me.

Finally, I asked ChatGPT to update the function to calculate life expectancies for each unique ID within the data frame. After some minor edits, the bot’s code output worked well. Again, its answer gave me the opportunity to learn a new function in R, called “by,” which substantially improved the runtime of my calculations.

GitHub Copilot also stumbled on its first attempt at this project. Its output code was incorrect, and, surprisingly, incorrect in a different way than ChatGPT’s first attempt. After looking more closely at GitHub’s output, we figured out that the bot was assuming mortality rates are constant over time and across all ages. If that assumption were true, then GitHub’s output would have been correct.

I updated GitHub’s code to accurately calculate life expectancy. But the next output calculated the probability of survival for a given year, instead of cumulative survival. Once again, I had to jump in to correct the code. In the time it took for me to correct GitHub’s approach, I could have coded this project myself.

Final grades for Task 2: ChatGPT: B | GitHub Copilot: C

Task 3: Explaining the purpose of a complex Excel macro

Our final quiz was born from another common event in the life of an actuary: You have inherited a complex spreadsheet packed with code and want to quickly understand the purpose of a particular formula or macro.

In this example, the overall goal of the spreadsheet was to add results from new medical studies, estimate elevated levels of mortality and transform those results into possible “flat extras” – premium increases based on new mortality rates related to certain medical conditions.

I dropped the spreadsheet’s primary macro (made from 900 lines of code) into ChatGPT and asked the bot to explain to me in 500 words or less what the code was doing. Here is an excerpt from the bot’s response, which shows how clearly and accurately it responded to the prompt:

“This VBA code is a macro for Microsoft Excel that performs a Goal Seek analysis on a spreadsheet that includes multiple studies. Goal Seek is an Excel tool that enables users to find a solution to a problem by adjusting one or more input values.”

ChatGPT then continued to explain the steps in the formula, even picking up some subtleties of the calculation, such as “the extra mortality is assumed to be constant for all ages.” ChatGPT also recognized that the code contained redundancies for the “Goal Seek” for different studies, which I could eliminate to make the macro run more efficiently.

I followed up with a question to ChatGPT: “Any suggestions for improving the macro’s runtime?” It responded with some basic tricks that were already implemented elsewhere in the code, but they were still useful for new actuaries to understand. The bot also suggested reducing some of the precision on the goal seek. It even weighed the pros and cons of such a change, noting that reducing the precision would speed up the macro but possibly result in inaccuracies. I was astounded that ChatGPT could provide a narrative the described the macro perfectly. Will little to no edits, I would feel comfortable sharing its explanation with a colleague.

Finally, with the idea that I might need to share this macro with international colleagues, I asked ChatGPT to translate the entire chat to Japanese. I immediately received my Japanese translation.

GitHub Copilot unfortunately received a failing grade for having no ability to complete this task. Since GitHub Copilot’s purpose is to code and nothing else, this may not seem like a fair fight. But the reality is, ChatGPT has many of the same coding capabilities plus the ability to provide narratives around and about the data. Sorry, GitHub, it’s every bot for itself right now.

Final grades: ChatGPT: A | GitHub Copilot: F

Conclusion

The most important takeaway from our tests is the reassuring lack of perfection. Neither ChatGPT nor GitHub Copilot produced perfect results. Both tools needed human intervention to correct, refine or adjust their results. But overall, the bots “understood” my queries and made surprisingly robust attempts to answer them.

ChatGPT scored higher overall and impressed me with its ability to build on prior queries and “think” about what I wanted. GitHub Copilot’s code suggestions seemed to work line by line instead of holistically.

The only way you can determine if LLMs like ChatGPT and GitHub Copilot will enhance your day-to-day tasks is to experiment with them yourself. I believe with time, AI tools will provide substantial improvements in coding efficiency and technical deliverables for insurance professionals. Class is dismissed.

Related 69ɫ��Ƭ

Meet the Authors & Experts

Author

Kyle Nobbe

Vice President and Actuary, Advanced Analytics, Risk and Behavioral Science

69ɫ��Ƭ