本文整理汇总了Java中burlap.behavior.policy.SolverDerivedPolicy类的典型用法代码示例。如果您正苦于以下问题:Java SolverDerivedPolicy类的具体用法?Java SolverDerivedPolicy怎么用?Java SolverDerivedPolicy使用的例子?那么, 这里精选的类代码示例或许可以为您提供帮助。
SolverDerivedPolicy类属于burlap.behavior.policy包,在下文中一共展示了SolverDerivedPolicy类的4个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Java代码示例。
示例1: DeterministicTerminationOption
import burlap.behavior.policy.SolverDerivedPolicy; //导入依赖的package包/类
/**
* Initializes the option by creating the policy uses some provided option. The valueFunction is called repeatedly on each state in the
* the list <code>seedStatesForPlanning</code> and then
* sets this options policy to the valueFunction derived policy that is provided.
* @param name the name of the option
* @param init the initiation conditions of the option
* @param terminationStates the termination states of the option
* @param seedStatesForPlanning the states that should be used as initial states for the valueFunction
* @param planner the valueFunction that is used to create the policy for this option
* @param p the valueFunction derived policy to use after planning from each initial state is performed.
*/
public DeterministicTerminationOption(String name, StateConditionTest init, StateConditionTest terminationStates, List<State> seedStatesForPlanning,
Planner planner, SolverDerivedPolicy p){
if(!(p instanceof Policy)){
throw new RuntimeErrorException(new Error("PlannerDerivedPolicy p is not an instnace of Policy"));
}
this.name = name;
this.initiationTest = init;
this.terminationStates = terminationStates;
//now construct the policy using the valueFunction from each possible initiation state
for(State si : seedStatesForPlanning){
planner.planFromState(si);
}
p.setSolver(planner);
this.policy = (Policy)p;
}
示例2: main
import burlap.behavior.policy.SolverDerivedPolicy; //导入依赖的package包/类
public static void main(String args[]) {
// Learning constants
double gamma = 0.99;
int replayStartSize = 50000;
int memorySize = 1000000;
double epsilonStart = 1;
double epsilonEnd = 0.1;
double testEpsilon = 0.05;
int epsilonAnnealDuration = 1000000;
int staleUpdateFreq = 10000;
// Caffe solver file
String solverFile = "example_models/grid_world_dqn_solver.prototxt";
// Load Caffe
Loader.load(caffe.Caffe.class);
// Setup the network
GridWorldDQN gridWorldDQN = new GridWorldDQN(solverFile, gamma);
// Create the policies
SolverDerivedPolicy learningPolicy =
new AnnealedEpsilonGreedy(epsilonStart, epsilonEnd, epsilonAnnealDuration);
SolverDerivedPolicy testPolicy = new EpsilonGreedy(testEpsilon);
// Setup the learner
DeepQLearner deepQLearner =
new DeepQLearner(gridWorldDQN.domain, gamma, replayStartSize, learningPolicy, gridWorldDQN.dqn);
deepQLearner.setExperienceReplay(new FixedSizeMemory(memorySize), gridWorldDQN.dqn.batchSize);
deepQLearner.useStaleTarget(staleUpdateFreq);
// Setup the tester
Tester tester = new SimpleTester(testPolicy);
// Set the QProvider for the policies
learningPolicy.setSolver(deepQLearner);
testPolicy.setSolver(deepQLearner);
// Setup the visualizer
VisualExplorer exp = new VisualExplorer(
gridWorldDQN.domain, gridWorldDQN.env, GridWorldVisualizer.getVisualizer(gridWorldDQN.gwdg.getMap()));
exp.initGUI();
exp.startLiveStatePolling(33);
// Setup helper
TrainingHelper helper = new TrainingHelper(
deepQLearner, tester, gridWorldDQN.dqn, actionSet, gridWorldDQN.env);
helper.setTotalTrainingSteps(50000000);
helper.setTestInterval(500000);
helper.setTotalTestSteps(125000);
helper.setMaxEpisodeSteps(10000);
// Run helper
helper.run();
}
示例3: setPolicy
import burlap.behavior.policy.SolverDerivedPolicy; //导入依赖的package包/类
/**
* Sets the policy to the provided one. Should be a policy that operates on a {@link burlap.behavior.valuefunction.QFunction}. Will automatically set its
* Q-source to this object.
* @param policy the policy to use.
*/
public void setPolicy(SolverDerivedPolicy policy){
this.policy = (Policy)policy;
policy.setSolver(this);
}
示例4: setPolicy
import burlap.behavior.policy.SolverDerivedPolicy; //导入依赖的package包/类
/**
* Sets the policy to the provided one. Should be a policy that operates on a {@link QProvider}. Will automatically set its
* Q-source to this object.
* @param policy the policy to use.
*/
public void setPolicy(SolverDerivedPolicy policy){
this.policy = (Policy)policy;
policy.setSolver(this);
}