Commitment: 30-40 hours per week with 4 hours overlap with PST
Role Responsibilities
Design and author multi-agent benchmark tasks centered on complex data analysis workflows
Create realistic synthetic datasets or curate real-world style datasets across domains such as finance, operations, security, or market analysis
Build tasks that require agents to perform cross-referencing, anomaly detection, contradiction identification, and statistical computation across multiple sources
Develop decomposition guides that split analytical work across specialist sub-agents such as financial, technical, security, or operations analysts
Write precise oracle logic or verification scripts that validate specific analytical conclusions rather than ...