Evaluation Helper¶
-
stable_baselines3.common.evaluation.
evaluate_policy
(model, env, n_eval_episodes=10, deterministic=True, render=False, callback=None, reward_threshold=None, return_episode_rewards=False, warn=True)[source]¶ Runs policy for
n_eval_episodes
episodes and returns average reward. This is made to work only with one env.Note
If environment has not been wrapped with
Monitor
wrapper, reward and episode lengths are counted as it appears withenv.step
calls. If the environment contains wrappers that modify rewards or episode lengths (e.g. reward scaling, early episode reset), these will affect the evaluation results as well. You can avoid this by wrapping environment withMonitor
wrapper before anything else.- Parameters
model (
BaseAlgorithm
) – The RL agent you want to evaluate.env (
Union
[Env
,VecEnv
]) – The gym environment. In the case of aVecEnv
this must contain only one environment.n_eval_episodes (
int
) – Number of episode to evaluate the agentdeterministic (
bool
) – Whether to use deterministic or stochastic actionsrender (
bool
) – Whether to render the environment or notcallback (
Optional
[Callable
[[Dict
[str
,Any
],Dict
[str
,Any
]],None
]]) – callback function to do additional checks, called after each step. Gets locals() and globals() passed as parameters.reward_threshold (
Optional
[float
]) – Minimum expected reward per episode, this will raise an error if the performance is not metreturn_episode_rewards (
bool
) – If True, a list of rewards and episode lengths per episode will be returned instead of the mean.warn (
bool
) – If True (default), warns user about lack of a Monitor wrapper in the evaluation environment.
- Return type
Union
[Tuple
[float
,float
],Tuple
[List
[float
],List
[int
]]]- Returns
Mean reward per episode, std of reward per episode. Returns ([float], [int]) when
return_episode_rewards
is True, first list containing per-episode rewards and second containing per-episode lengths (in number of steps).