A project dedicated to assessing and benchmarking advanced agentic audio models against leading systems. The program’s mission is to evaluate and optimize model performance for real-world customer support use cases.
Responsibilities
Create and execute role‑play–based evaluation scenarios that simulate realistic customer service interactions across multiple domains, including:
Flight bookings and travel support
Financial services
Telecommunications and technical support
Contribute to the development of diverse and representative datasets used to assess conversational audio agents.
Evaluate model performance across a standardized set of qualitative and quantitative metrics.
Ensure evaluations reflect real customer expectations for clarity, efficiency, and natural conversational flow.