Description of GetPOMDPSolutionStatistics

   Collects statistics with the outcome of an experiment.

   Computes statistics on a solution obtained by Perseus.
   Parameters
     - POMDP: The POMDP on which Perseus was applied.
     - P: Set of parameters used to solve the problem.
     - B: Set of Beliefs used to apply Perseus.
     - V: Set of policies obtained by Perseus (one per iteration. Lines
          6-22 of Table 1, page 12 of the paper).
     - Alpha: Alpha element assigned to each Belief in each iteration
     - time: Time at which each iteration was compleated.
     - maxTime: Max time in second for which to which collect statistics.
     - stTime: Step between two consecutive recollection of statistics.
   Outputs
     - tics: Time at which statistics are collected
     - aV: Average value for all beliefs. The computation of this value
           can be quite demanding.
     - aR: Average cummulative discounted reward. T The POMDP is simulated 
           several times for several time slices (both parameters are set 
           in P) and, thus, his is extremely time demanding.
     - nA: Number of alpha elements in the policy.
     - nC: Average number of changes in the optimal action for each belief
           from one policy to the next.

0001 function [tics aV aR nA nC]=GetPOMDPSolutionStatistics(POMDP,P,B,V,Alpha,time,maxTime,stTime)
0002 %   Collects statistics with the outcome of an experiment.
0003 %
0004 %   Computes statistics on a solution obtained by Perseus.
0005 %   Parameters
0006 %     - POMDP: The POMDP on which Perseus was applied.
0007 %     - P: Set of parameters used to solve the problem.
0008 %     - B: Set of Beliefs used to apply Perseus.
0009 %     - V: Set of policies obtained by Perseus (one per iteration. Lines
0010 %          6-22 of Table 1, page 12 of the paper).
0011 %     - Alpha: Alpha element assigned to each Belief in each iteration
0012 %     - time: Time at which each iteration was compleated.
0013 %     - maxTime: Max time in second for which to which collect statistics.
0014 %     - stTime: Step between two consecutive recollection of statistics.
0015 %   Outputs
0016 %     - tics: Time at which statistics are collected
0017 %     - aV: Average value for all beliefs. The computation of this value
0018 %           can be quite demanding.
0019 %     - aR: Average cummulative discounted reward. T The POMDP is simulated
0020 %           several times for several time slices (both parameters are set
0021 %           in P) and, thus, his is extremely time demanding.
0022 %     - nA: Number of alpha elements in the policy.
0023 %     - nC: Average number of changes in the optimal action for each belief
0024 %           from one policy to the next.
0025 
0026   nSteps=floor(maxTime/stTime);
0027   tics=(0:(nSteps-1))*stTime;
0028   
0029   aV=zeros(1,nSteps);
0030   aR=zeros(1,nSteps);
0031   nA=zeros(1,nSteps);
0032   nC=zeros(1,nSteps);
0033   
0034   % Table from time at fixed intervals to time in
0035   % irregular
0036   h=ones(1,nSteps);
0037   j=1;
0038   for i=2:nSteps
0039     while time(j)<tics(i)
0040       j=j+1;
0041     end
0042     h(i)=j-1;
0043   end
0044   
0045   %Number of beliefs in B
0046   nb=size(B,2);
0047   
0048   fprintf(' Computing average expected value   :');
0049   for n=1:nSteps
0050     fprintf('.');
0051     if (n==1)||(h(n)~=h(n-1))
0052       [Aph Vl]=cellfun(@(b)(MaxAlpha(V{h(n)},b)),B);
0053       aV(n)=sum(Vl)/nb;
0054     else
0055       aV(n)=aV(n-1);
0056     end
0057   end
0058   fprintf('\n');
0059   
0060   fprintf(' Computing average discounted reward:');
0061   for n=1:nSteps
0062     fprintf('.');
0063     if (n==1)||(h(n)~=h(n-1))
0064       r=0.0;
0065       for i=1:P.numTrials
0066         r1=SimulateFrom(POMDP,V{h(n)},P.start,P.stepsXtrial);
0067         r=r+r1;
0068       end
0069       aR(n)=r/P.numTrials;
0070     else
0071       aR(n)=aR(n-1);
0072     end
0073   end
0074   fprintf('\n');
0075   
0076   fprintf(' Getting number of alpha elements:\n');
0077   for n=1:nSteps
0078     nA(n)=size(V{h(n)});
0079   end
0080   
0081   fprintf(' Getting number of policy changes:\n');
0082   discreteA=isa(get(POMDP,'ActionSpace'),'DSpace');
0083   nC(1)=1; % at first step all belief change
0084   for n=2:nSteps
0085     l=h(n-1);
0086     s=h(n);
0087     for i=1:nb
0088       [e1 a1]=V{l}{Alpha{l}(i)};
0089       [e2 a2]=V{s}{Alpha{s}(i)};
0090       if discreteA
0091         nC(n)=nC(n)+(a1~=a2);
0092       else
0093         nC(n)=nC(n)+norm(a1-a2);
0094       end
0095     end
0096     nC(n)=nC(n)/nb;
0097   end
0098   
0099

GetPOMDPSolutionStatistics

PURPOSE

SYNOPSIS

DESCRIPTION

CROSS-REFERENCE INFORMATION

SOURCE CODE