OVERVIEW

Our client was building a general-purpose AI chatbot and wanted to identify areas where the model might not be performing well. Based on the Test & Eval findings, they would then do a round of Supervised Fine Tuning in areas where the model was weak.

Approach

We decided to work with a large group of generalists who could test the model in many different domains (with thorough research with references from different journals, research articles while constructing prompts). Our instruction set would ensure that they are creating a diverse set of prompts which would ensure comprehensiveness. The instructions to the generalists were two-fold:

1. Construct complex prompts

2. Rate the model response using the Likert scale

Through rigorous screening processes, we identified 100+ ultra-high-quality generalists who would be suitable for this work. This included excellent English speakers, generalists educating from top Schools, and folks with high IQ and general awareness.

Reach out to us at hey@soulhq.ai for more information, work samples, etc.

Model test & Evaluation

OVERVIEW

Approach

other work