Evaluator
The Evaluator class is the main engine for running agent evaluations and benchmarks.
Import
from openstackai.evaluation import Evaluator
Constructor
Evaluator(
model: str = None, # Model for LLM-based evaluation
criteria: list = None, # Default criteria
verbose: bool = False, # Enable verbose logging
parallel: bool = True, # Run tests in parallel
max_workers: int = 4 # Number of parallel workers
)
Methods
evaluate()
Run evaluation on a test set.
def evaluate(
self,
eval_set: EvalSet | list[TestCase],
agent: Agent | Callable,
criteria: list = None
) -> EvalResult:
Example:
evaluator = Evaluator()
results = evaluator.evaluate(
eval_set=my_tests,
agent=my_agent,
criteria=["accuracy", "latency"]
)
compare()
Compare multiple agents on the same test set.
def compare(
self,
eval_set: EvalSet,
agents: list[Agent],
criteria: list = None
) -> ComparisonResult:
Example:
comparison = evaluator.compare(
eval_set=benchmark,
agents=[gpt4_agent, claude_agent, local_agent]
)
# View comparison table
print(comparison.to_table())
add_criteria()
Register custom evaluation criteria.
evaluator.add_criteria("custom_metric", my_criteria_function)
EvalResult
The evaluation result object contains:
result.pass_rate # Percentage of passed tests
result.avg_score # Average score (0-1)
result.total_tests # Number of tests run
result.passed_tests # Number of tests passed
result.failed_tests # Number of tests failed
result.details # Per-test results
result.metrics # Aggregated metrics
result.duration # Total evaluation time
Configuration
YAML Configuration
# eval_config.yaml
evaluator:
model: gpt-4
parallel: true
max_workers: 8
criteria:
- name: accuracy
threshold: 0.9
- name: latency
max_ms: 2000
evaluator = Evaluator.from_config("eval_config.yaml")
Callbacks
Register callbacks for evaluation events:
def on_test_complete(test_case, result):
print(f"Test {test_case.name}: {result.score}")
evaluator.on_test_complete = on_test_complete
See Also
- Evaluation-Module - Module overview
- TestCase - Test case definition
- EvalSet - Test set management