本文整理汇总了Python中Model.Model.transition方法的典型用法代码示例。如果您正苦于以下问题:Python Model.transition方法的具体用法?Python Model.transition怎么用?Python Model.transition使用的例子?那么恭喜您, 这里精选的方法代码示例或许可以为您提供帮助。您也可以进一步了解该方法所在类Model.Model
的用法示例。
在下文中一共展示了Model.transition方法的1个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Python代码示例。
示例1: extract_info
# 需要导入模块: from Model import Model [as 别名]
# 或者: from Model.Model import transition [as 别名]
# Initial reward function
# model.reward_f = np.random.uniform(-2,-0.1,model.reward_f.shape)
r_initial = model.reward_f
examples, distribution = extract_info(disc_model, steps, dist=True)
policy_ref, lg = caus_ent_backward(model.transition, model.reward_f, examples[1]["end_state"], steps)
start_states = [example["start_state"] for example in examples]
state_freq_ref, state_action_frequencies_ref = forward_sa(policy_ref, model.transition, start_states, steps)
iterations = 300
# initialise reward model
feat = {"function": continouous, "inputs": None}
disc_model = DiscModel(feature=feat)
model = Model_non_linear(disc_model)
model.transition = trans
model.reward_f = np.zeros(model.reward_f.shape)
model.reward_f += r_initial
model.reward_f[1, :] -= 0.5
# model.reward_f = r_initial
actions, states, features = model.feature_f.shape
for itera in xrange(iterations):
policy_test, lg = caus_ent_backward(model.transition, model.reward_f, examples[1]["end_state"], steps)
state_freq_test, state_action_frequencies_test = forward_sa(policy_test, model.transition, start_states, steps)
reward_diff = np.sum(np.sum(np.absolute(model.reward_f - r_initial)))
policy_diff = np.sum(np.sum(np.absolute(policy_test - policy_ref)))
print "Difference in Reward --->", reward_diff
print "Difference in Policy --->", policy_diff
X = np.reshape(model.feature_f, (disc_model.tot_states * disc_model.tot_actions, 4))
Y = (state_action_frequencies_ref - state_action_frequencies_test).reshape(
(disc_model.tot_states * disc_model.tot_actions)