当前位置: 首页>>代码示例>>Python>>正文


Python Agent.stock_experience方法代码示例

本文整理汇总了Python中agent.Agent.stock_experience方法的典型用法代码示例。如果您正苦于以下问题:Python Agent.stock_experience方法的具体用法?Python Agent.stock_experience怎么用?Python Agent.stock_experience使用的例子?那么恭喜您, 这里精选的方法代码示例或许可以为您提供帮助。您也可以进一步了解该方法所在agent.Agent的用法示例。


在下文中一共展示了Agent.stock_experience方法的1个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Python代码示例。

示例1: main

# 需要导入模块: from agent import Agent [as 别名]
# 或者: from agent.Agent import stock_experience [as 别名]
def main(env_name, render=False, monitor=True, load=False, seed=0):

    env = gym.make(env_name)
    view_path = "./video/" + env_name
    model_path = "./model/" + env_name + "_"

    n_st = env.observation_space.shape[0]
    if type(env.action_space) == gym.spaces.discrete.Discrete:
        # CartPole-v0, Acrobot-v0, MountainCar-v0
        n_act = env.action_space.n
        action_list = range(0, n_act)
    elif type(env.action_space) == gym.spaces.box.Box:
        # Pendulum-v0
        action_list = [np.array([a]) for a in [-2.0, 2.0]]
        n_act = len(action_list)

    agent = Agent(n_st, n_act, seed)
    if load:
        agent.load_model(model_path)

    if monitor:
        env.monitor.start(view_path, video_callable=None, force=True, seed=seed)
    for i_episode in xrange(1000):
        observation = env.reset()
        r_sum = 0
        q_list = []
        for t in xrange(200):
            if render:
                env.render()
            state = observation.astype(np.float32).reshape((1,n_st))
            act_i, q = agent.get_action(state)
            q_list.append(q)
            action = action_list[act_i]
            observation, reward, ep_end, _ = env.step(action)
            state_dash = observation.astype(np.float32).reshape((1,n_st))
            agent.stock_experience(state, act_i, reward, state_dash, ep_end)
            agent.train()
            r_sum += reward
            if ep_end:
                break
        print "\t".join(map(str,[i_episode, r_sum, agent.epsilon, agent.loss, sum(q_list)/float(t+1) ,agent.step]))
        agent.save_model(model_path)
    if monitor:
        env.monitor.close()
开发者ID:trtd56,项目名称:ClassicControl,代码行数:46,代码来源:main.py


注:本文中的agent.Agent.stock_experience方法示例由纯净天空整理自Github/MSDocs等开源代码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。