当前位置: 首页>>代码示例>>Python>>正文


Python Agent.play_in_test_mode方法代码示例

本文整理汇总了Python中agent.Agent.play_in_test_mode方法的典型用法代码示例。如果您正苦于以下问题:Python Agent.play_in_test_mode方法的具体用法?Python Agent.play_in_test_mode怎么用?Python Agent.play_in_test_mode使用的例子?那么恭喜您, 这里精选的方法代码示例或许可以为您提供帮助。您也可以进一步了解该方法所在agent.Agent的用法示例。


在下文中一共展示了Agent.play_in_test_mode方法的1个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Python代码示例。

示例1: main

# 需要导入模块: from agent import Agent [as 别名]
# 或者: from agent.Agent import play_in_test_mode [as 别名]
def main(_=None):
    with tf.Session(config=tf.ConfigProto(allow_soft_placement=True)) as sess:
        d = os.path.dirname(args.log_file)
        if not os.path.exists(d):
            os.makedirs(d)
        if not args.continue_training:
            with open(args.log_file, 'w') as f:
                f.write('')
        logging.basicConfig(filename=args.log_file, level=logging.DEBUG)
        logging.getLogger().addHandler(logging.StreamHandler())

        game_handler = GameStateHandler(
                args.rom_directory + args.rom_name,
                random_seed=args.random_seed,
                frame_skip=args.frame_skip,
                use_sdl=args.use_sdl,
                repeat_action_probability=args.repeat_action_probability,
                minimum_actions=args.minimum_action_set,
                test_mode=args.test_mode,
                image_processing=lambda x: crop_and_resize(x, args.image_height, args.image_width, args.cut_top))
        num_actions = game_handler.num_actions

        if args.optimizer == 'rmsprop':
            optimizer = tf.train.RMSPropOptimizer(
                    learning_rate=args.learning_rate,
                    decay=args.decay,
                    momentum=0.0,
                    epsilon=args.rmsprop_epsilon)

        if not args.multi_gpu:
            if args.double_dqn:
                net = qnetwork.DualDeepQNetwork(args.image_height, args.image_width, sess, num_actions,
                                                args.state_frames, args.discount_factor, args.target_net_refresh_rate,
                                                net_type=args.net_type, optimizer=optimizer)
            else:
                net = qnetwork.DeepQNetwork(args.image_height, args.image_width, sess, num_actions, args.state_frames,
                                            args.discount_factor, net_type=args.net_type, optimizer=optimizer)
        else:
            net = multi_gpu_qnetwork.MultiGPUDualDeepQNetwork(args.image_height, args.image_width, sess, num_actions,
                                                              args.state_frames, args.discount_factor,
                                                              optimizer=optimizer, gpus=[0, 1, 2, 3])

        saver = Saver(sess, args.data_dir, args.continue_training)
        if saver.replay_memory_found():
            replay_memory = saver.get_replay_memory()
        else:
            if args.test_mode:
                logging.error('NO SAVED NETWORKS IN TEST MODE!!!')
            replay_memory = ReplayMemoryManager(args.image_height, args.image_width, args.state_frames,
                                                args.replay_memory_size, reward_clip_min=args.reward_clip_min,
                                                reward_clip_max=args.reward_clip_max)

        # todo: add parameters to handle monitor
        monitor = Monitoring(log_train_step_every=100, smooth_episode_scores_over=50)

        agent = Agent(
                game_handler=game_handler,
                qnetwork=net,
                replay_memory=replay_memory,
                saver=saver,
                monitor=monitor,
                train_freq=args.train_freq,
                test_mode=args.test_mode,
                batch_size=args.batch_size,
                save_every_x_episodes=args.saving_freq)

        sess.run(tf.initialize_all_variables())
        saver.restore(args.data_dir)
        start_epsilon = max(args.final_epsilon,
                            args.start_epsilon - saver.get_start_frame() * (args.start_epsilon - args.final_epsilon) / args.exploration_duration)
        exploring_duration = max(args.exploration_duration - saver.get_start_frame(), 1)

        if args.test_mode:
            agent.populate_replay_memory(args.state_frames, force_early_stop=True)
            agent.play_in_test_mode(args.epsilon_in_test_mode)
        else:
            agent.populate_replay_memory(args.min_replay_memory)
            agent.play(train_steps_limit=args.number_of_train_steps, start_eps=start_epsilon,
                       final_eps=args.final_epsilon, exploring_duration=exploring_duration)
开发者ID:adrianoo,项目名称:rl_atari,代码行数:81,代码来源:tf_luncher.py


注:本文中的agent.Agent.play_in_test_mode方法示例由纯净天空整理自Github/MSDocs等开源代码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。