GetPOMDPSolutionStatisticsPURPOSECollects statistics with the outcome of an experiment.
SYNOPSISfunction [tics aV aR nA nC]=GetPOMDPSolutionStatistics(POMDP,P,B,V,Alpha,time,maxTime,stTime)
DESCRIPTIONCollects statistics with the outcome of an experiment. Computes statistics on a solution obtained by Perseus. Parameters - POMDP: The POMDP on which Perseus was applied. - P: Set of parameters used to solve the problem. - B: Set of Beliefs used to apply Perseus. - V: Set of policies obtained by Perseus (one per iteration. Lines 6-22 of Table 1, page 12 of the paper). - Alpha: Alpha element assigned to each Belief in each iteration - time: Time at which each iteration was compleated. - maxTime: Max time in second for which to which collect statistics. - stTime: Step between two consecutive recollection of statistics. Outputs - tics: Time at which statistics are collected - aV: Average value for all beliefs. The computation of this value can be quite demanding. - aR: Average cummulative discounted reward. T The POMDP is simulated several times for several time slices (both parameters are set in P) and, thus, his is extremely time demanding. - nA: Number of alpha elements in the policy. - nC: Average number of changes in the optimal action for each belief from one policy to the next. CROSS-REFERENCE INFORMATIONThis function calls:
SOURCE CODE0001 function [tics aV aR nA nC]=GetPOMDPSolutionStatistics(POMDP,P,B,V,Alpha,time,maxTime,stTime) 0002 % Collects statistics with the outcome of an experiment. 0003 % 0004 % Computes statistics on a solution obtained by Perseus. 0005 % Parameters 0006 % - POMDP: The POMDP on which Perseus was applied. 0007 % - P: Set of parameters used to solve the problem. 0008 % - B: Set of Beliefs used to apply Perseus. 0009 % - V: Set of policies obtained by Perseus (one per iteration. Lines 0010 % 6-22 of Table 1, page 12 of the paper). 0011 % - Alpha: Alpha element assigned to each Belief in each iteration 0012 % - time: Time at which each iteration was compleated. 0013 % - maxTime: Max time in second for which to which collect statistics. 0014 % - stTime: Step between two consecutive recollection of statistics. 0015 % Outputs 0016 % - tics: Time at which statistics are collected 0017 % - aV: Average value for all beliefs. The computation of this value 0018 % can be quite demanding. 0019 % - aR: Average cummulative discounted reward. T The POMDP is simulated 0020 % several times for several time slices (both parameters are set 0021 % in P) and, thus, his is extremely time demanding. 0022 % - nA: Number of alpha elements in the policy. 0023 % - nC: Average number of changes in the optimal action for each belief 0024 % from one policy to the next. 0025 0026 nSteps=floor(maxTime/stTime); 0027 tics=(0:(nSteps-1))*stTime; 0028 0029 aV=zeros(1,nSteps); 0030 aR=zeros(1,nSteps); 0031 nA=zeros(1,nSteps); 0032 nC=zeros(1,nSteps); 0033 0034 % Table from time at fixed intervals to time in 0035 % irregular 0036 h=ones(1,nSteps); 0037 j=1; 0038 for i=2:nSteps 0039 while time(j)<tics(i) 0040 j=j+1; 0041 end 0042 h(i)=j-1; 0043 end 0044 0045 %Number of beliefs in B 0046 nb=size(B,2); 0047 0048 fprintf(' Computing average expected value :'); 0049 for n=1:nSteps 0050 fprintf('.'); 0051 if (n==1)||(h(n)~=h(n-1)) 0052 [Aph Vl]=cellfun(@(b)(MaxAlpha(V{h(n)},b)),B); 0053 aV(n)=sum(Vl)/nb; 0054 else 0055 aV(n)=aV(n-1); 0056 end 0057 end 0058 fprintf('\n'); 0059 0060 fprintf(' Computing average discounted reward:'); 0061 for n=1:nSteps 0062 fprintf('.'); 0063 if (n==1)||(h(n)~=h(n-1)) 0064 r=0.0; 0065 for i=1:P.numTrials 0066 r1=SimulateFrom(POMDP,V{h(n)},P.start,P.stepsXtrial); 0067 r=r+r1; 0068 end 0069 aR(n)=r/P.numTrials; 0070 else 0071 aR(n)=aR(n-1); 0072 end 0073 end 0074 fprintf('\n'); 0075 0076 fprintf(' Getting number of alpha elements:\n'); 0077 for n=1:nSteps 0078 nA(n)=size(V{h(n)}); 0079 end 0080 0081 fprintf(' Getting number of policy changes:\n'); 0082 discreteA=isa(get(POMDP,'ActionSpace'),'DSpace'); 0083 nC(1)=1; % at first step all belief change 0084 for n=2:nSteps 0085 l=h(n-1); 0086 s=h(n); 0087 for i=1:nb 0088 [e1 a1]=V{l}{Alpha{l}(i)}; 0089 [e2 a2]=V{s}{Alpha{s}(i)}; 0090 if discreteA 0091 nC(n)=nC(n)+(a1~=a2); 0092 else 0093 nC(n)=nC(n)+norm(a1-a2); 0094 end 0095 end 0096 nC(n)=nC(n)/nb; 0097 end 0098 0099 |