本文整理汇总了Python中memory.Memory.update方法的典型用法代码示例。如果您正苦于以下问题:Python Memory.update方法的具体用法?Python Memory.update怎么用?Python Memory.update使用的例子?那么恭喜您, 这里精选的方法代码示例或许可以为您提供帮助。您也可以进一步了解该方法所在类memory.Memory
的用法示例。
在下文中一共展示了Memory.update方法的1个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Python代码示例。
示例1: zip
# 需要导入模块: from memory import Memory [as 别名]
# 或者: from memory.Memory import update [as 别名]
action = np.clip(action, -1, 1) * np.array([max_xvel, max_yvel, max_yawrate, max_altitude / 4.0]) - np.array([0, 0, 0, max_altitude])
env_next_state, env_reward, env_done, env_info = env.step(action)
replay_buffer.add(env_state, env_reward, action, env_done, priority=300)
env_state = env_next_state
total_reward += env_reward
if training:
states_batch, action_batch, reward_batch, next_states_batch, done_batch, indexes = replay_buffer.sample(BATCH_SIZE, prioritized=True)
feed = {
action_placeholder: action_batch,
reward_placeholder: reward_batch,
done_placeholder: done_batch
}
feed.update({k: v for k, v in zip(state_placeholders, states_batch)})
feed.update({k: v for k, v in zip(next_state_placeholders, next_states_batch)})
_, _, errors, critic_error = sess.run([train_critic, train_actor, q_error, q_error_batch], feed_dict=feed)
replay_buffer.update(indexes, errors)
print 'q:{:5f} reward:{:5f} trainerror:{:5f}'.format(q[0], env_reward, critic_error)
if env_done:
break
print 'Total Reward', total_reward