本文整理汇总了Python中rl.policy.BoltzmannQPolicy方法的典型用法代码示例。如果您正苦于以下问题:Python policy.BoltzmannQPolicy方法的具体用法?Python policy.BoltzmannQPolicy怎么用?Python policy.BoltzmannQPolicy使用的例子?那么恭喜您, 这里精选的方法代码示例或许可以为您提供帮助。您也可以进一步了解该方法所在类rl.policy
的用法示例。
在下文中一共展示了policy.BoltzmannQPolicy方法的1个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Python代码示例。
示例1: train_dqn_model
# 需要导入模块: from rl import policy [as 别名]
# 或者: from rl.policy import BoltzmannQPolicy [as 别名]
def train_dqn_model(layers, rounds=10000, run_test=False, use_score=False):
ENV_NAME = 'malware-score-v0' if use_score else 'malware-v0'
env = gym.make(ENV_NAME)
env.seed(123)
nb_actions = env.action_space.n
window_length = 1 # "experience" consists of where we were, where we are now
# generate a policy model
model = generate_dense_model((window_length,) + env.observation_space.shape, layers, nb_actions)
# configure and compile our agent
# BoltzmannQPolicy selects an action stochastically with a probability generated by soft-maxing Q values
policy = BoltzmannQPolicy()
# memory can help a model during training
# for this, we only consider a single malware sample (window_length=1) for each "experience"
memory = SequentialMemory(limit=32, ignore_episode_boundaries=False, window_length=window_length)
# DQN agent as described in Mnih (2013) and Mnih (2015).
# http://arxiv.org/pdf/1312.5602.pdf
# http://arxiv.org/abs/1509.06461
agent = DQNAgent(model=model, nb_actions=nb_actions, memory=memory, nb_steps_warmup=16,
enable_double_dqn=True, enable_dueling_network=True, dueling_type='avg',
target_model_update=1e-2, policy=policy, batch_size=16)
# keras-rl allows one to use and built-in keras optimizer
agent.compile(RMSprop(lr=1e-3), metrics=['mae'])
# play the game. learn something!
agent.fit(env, nb_steps=rounds, visualize=False, verbose=2)
history_train = env.history
history_test = None
if run_test:
# Set up the testing environment
TEST_NAME = 'malware-score-test-v0' if use_score else 'malware-test-v0'
test_env = gym.make(TEST_NAME)
# evaluate the agent on a few episodes, drawing randomly from the test samples
agent.test(test_env, nb_episodes=100, visualize=False)
history_test = test_env.history
return agent, model, history_train, history_test