#WhatWorks  l  

Experimenting with GPT and Generative AI for Evaluation

Unfulfilled Promises: Using GPT for Synthetic Tasks

In our previous blogs, we shared our findings and recommendations for six experiments that ended up working quite well. For the next three experiments, we wanted to test GPT’s performance on more advanced tasks that didn’t revolve around simply producing code or summarizing documents but rather required the synthesis of information. Here the perils that we had anticipated materialized more…

Fulfilled Promises: Using GPT for Analytical Tasks

Fulfilled Promises: Using GPT for Analytical Tasks

In our previous blog, we set the stage for the nine experiments we conducted to test out the promises and the perils of GPT and generative artificial intelligence (AI) for evaluation practice. In this entry, we’ll share the six successful experiments. Our final blog will tackle the three that were disappointing.

Setting up Experiments to Test GPT for Evaluation

A cyber punk version of The Creation of Adam with one robot hand and one human hand

Since OpenAI’s “ChatGPT” caused a frenzy with its entrance into the world at the end of 2022, a lot of hype has developed around generative artificial intelligence (AI) and large language models (LLMs), including in the evaluation community. Oscillating between awe and catastrophism, many opinions have been voiced. At IEG, the Methods Team has been key to introducing and scaling up the use of…