本文整理汇总了Python中agent.Agent.update_q_function方法的典型用法代码示例。如果您正苦于以下问题:Python Agent.update_q_function方法的具体用法?Python Agent.update_q_function怎么用?Python Agent.update_q_function使用的例子?那么恭喜您, 这里精选的方法代码示例或许可以为您提供帮助。您也可以进一步了解该方法所在类agent.Agent
的用法示例。
在下文中一共展示了Agent.update_q_function方法的1个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Python代码示例。
示例1: Environment
# 需要导入模块: from agent import Agent [as 别名]
# 或者: from agent.Agent import update_q_function [as 别名]
class Environment():
def __init__(self):
env = gym.make(ENV)
self.env = wrappers.Monitor(env, '/tmp/gym/mountaincar_dqn', force=True)
self.num_states = self.env.observation_space.shape[0]
self.num_actions = self.env.action_space.n
self.agent = Agent(self.num_states, self.num_actions)
def run(self):
complete_episodes = 0
episode_final = False
output = open('result.log', 'w')
print(self.num_states, self.num_actions)
for episode in range(NUM_EPISODE):
observation = self.env.reset()
state = torch.from_numpy(observation).type(torch.FloatTensor)
state = torch.unsqueeze(state, 0)
for step in range(MAX_STEPS):
if episode_final:
self.env.render(mode='rgb_array')
action = self.agent.get_action(state, episode)
observation_next, _, done, _ = self.env.step(action.item())
state_next = torch.from_numpy(observation_next).type(torch.FloatTensor)
state_next = torch.unsqueeze(state_next, 0)
reward = torch.FloatTensor([0.0])
if done:
state_next = None
if 199 <= step:
reward = torch.FloatTensor([-1.0])
complete_episodes = 0
else:
reward = torch.FloatTensor([1.0])
complete_episodes = complete_episodes + 1
self.agent.memory(state, action, state_next, reward)
self.agent.update_q_function()
state = state_next
if done:
message = 'episode: {0}, step: {1}'.format(episode, step)
print(message)
output.write(message + '\n')
break
if episode_final:
break
if 10 <= complete_episodes:
print('success 10 times in sequence')
# episode_final = True
self.env.close()
output.close()