Description of iPerseus

   Improved version of Perseus.

   Improved version of Perseus algorithm.
   A point-based solver for POMDP.
   This corresponds to a slightly optimized version of the algorithm 
   in Table 2 (Section 6.1) of the paper.
   To understant the code if better to first check the plain version of
   Perseus that follows more closely the algorithm in Table 2.

   Inputs
      P: A POMDP object. This can be any of the specializations of the
         POMDP (discrete/continuous states/actions/observations).
      B: A set of beliefs.
      stopCriteria: A function with 3 parameters (planning horizon,
                    execution time, maximum change in the value for all
                    beliefs in B in the last iteration).
                    The function returns true if the main loop of Perseus
                    must be stopped.
   Outputs
      V: A cell array of policies. V{n} is the policy for planning 
         horizon 'n'.
      Value: A cell array of vectors. Value{n}(i) is the value for belief
             'i' in planning horizon 'n'.
      Alpha: A cell array of vectors. Alpha{n}(i) is the alpha assigned to
             belief 'i' in planning horizon 'n'.
      t: A vector. t(n) is the time taken to compute the policy for
         planning horizon 'n'.
 
   See also Perseus.

0001 function [V Value Alpha t]=iPerseus(P,B,stopCriteria)
0002 %   Improved version of Perseus.
0003 %
0004 %   Improved version of Perseus algorithm.
0005 %   A point-based solver for POMDP.
0006 %   This corresponds to a slightly optimized version of the algorithm
0007 %   in Table 2 (Section 6.1) of the paper.
0008 %   To understant the code if better to first check the plain version of
0009 %   Perseus that follows more closely the algorithm in Table 2.
0010 %
0011 %   Inputs
0012 %      P: A POMDP object. This can be any of the specializations of the
0013 %         POMDP (discrete/continuous states/actions/observations).
0014 %      B: A set of beliefs.
0015 %      stopCriteria: A function with 3 parameters (planning horizon,
0016 %                    execution time, maximum change in the value for all
0017 %                    beliefs in B in the last iteration).
0018 %                    The function returns true if the main loop of Perseus
0019 %                    must be stopped.
0020 %   Outputs
0021 %      V: A cell array of policies. V{n} is the policy for planning
0022 %         horizon 'n'.
0023 %      Value: A cell array of vectors. Value{n}(i) is the value for belief
0024 %             'i' in planning horizon 'n'.
0025 %      Alpha: A cell array of vectors. Alpha{n}(i) is the alpha assigned to
0026 %             belief 'i' in planning horizon 'n'.
0027 %      t: A vector. t(n) is the time taken to compute the policy for
0028 %         planning horizon 'n'.
0029 %
0030 %   See also Perseus.
0031       
0032   % Set up the initial policy (policy with planning horizon=1)
0033   n=1;                   
0034   V{n}=Policy;
0035   alpha0=Alpha0(P);
0036   V{n}=AddAlpha(V{n},alpha0,RandomAction(P));
0037   
0038   [Alpha{n} Value{n}]=cellfun(@(b)(MaxAlpha(V{n},b)),B);
0039     
0040   % Get the number of beliefs in the set B
0041   nb=size(B,2);
0042     
0043   % And now start the itarations to improve the policy
0044   t_init=cputime;
0045   stop=false;
0046   
0047   while ~stop
0048     
0049     n1=n+1; % Horizon for the new policy to compute
0050     fprintf('    Iteration %u: ',n1);
0051     
0052     
0053     % Pre-compute the set of \Alphas_j_a_o
0054     % This is only feasible in some types of POMDPs
0055     % In the rest of cases the output is empty
0056     fprintf('[P')
0057     Alphas_j_a_o=ComputeAlphas_j_a_o(P,V{n});
0058     fprintf('] ');
0059     
0060     % The new policy is initially empty
0061     V{n1}=Policy;
0062     
0063     % Initially all beliefs in B are to be improved
0064     Alpha{n1}=zeros(1,nb);
0065     Value{n1}=zeros(1,nb);
0066     pending=true(1,nb);
0067     nPending=nb;
0068     it=1;
0069     while nPending>0
0070       fprintf('%u(%u)',it,nPending);
0071       it=it+1;
0072       
0073       % Select a belief pending to be improved at random
0074       ndx=find(pending); 
0075       cb=ndx(ceil(rand*nPending));
0076      
0077       % POMDP with continouos action sets are converted to their discrete
0078       % counterparts by sampling actions.
0079       % We sample a set of actions (taking into accound the action previously
0080       % assigned to the belief and the planning horizon).
0081       [alpha action]=V{n}{Alpha{n}(cb)};
0082       P1=DiscretizeActionModel(P,action,n);
0083       
0084       % Execute the backup (this is a different function given different
0085       % types of POMDP)
0086       fprintf('b[');
0087       [alpha optimalAction v]=Backup(P1,B{cb},V{n},Alphas_j_a_o);
0088       fprintf(']');
0089       
0090       % If the new value is not better that the one in the previous
0091       % planning horizon, use the previous alpha.
0092       if v<Value{n}(cb)
0093         
0094         fprintf('p ');
0095         
0096         [alpha optimalAction]=V{n}{Alpha{n}(cb)}; 
0097         
0098         % Add the old alpha to the new policy
0099         [V{n1} l]=AddAlpha(V{n1},alpha,optimalAction);
0100         
0101         pending(cb)=false;
0102         Alpha{n1}(cb)=l;
0103         Value{n1}(cb)=Value{n}(cb);
0104         nPending=nPending-1;
0105         
0106         % Remove all (still) pending beliefs have the old alpha as
0107         % a maximizing one
0108         for i=1:nb
0109           if pending(i) && Alpha{n}(i)==Alpha{n}(cb)
0110             pending(i)=false;
0111             Alpha{n1}(i)=l;
0112             Value{n1}(i)=Value{n}(i);  
0113             nPending=nPending-1;
0114           end
0115          end
0116         
0117       else
0118         
0119         fprintf('n ');
0120          
0121         % Add the new alpha to the new policy
0122         [V{n1} l]=AddAlpha(V{n1},alpha,optimalAction);
0123         
0124         pending(cb)=false;
0125         Alpha{n1}(cb)=l;
0126         Value{n1}(cb)=v;
0127         nPending=nPending-1;
0128         
0129         for i=1:nb
0130           if pending(i)
0131             nv=Expectation(B{i},alpha);
0132             if nv>=Value{n}(i)
0133               pending(i)=false;
0134               Alpha{n1}(i)=l;
0135               Value{n1}(i)=nv;
0136               nPending=nPending-1;
0137             end
0138           end
0139         end
0140         
0141       end
0142       
0143     end
0144     
0145     t(n1)=cputime-t_init; 
0146     vd=Value{n1}-Value{n};
0147     mvd=max(vd);
0148     avd=sum(vd)/nb;
0149 
0150     fprintf('\n                     mvd:%g  avd: %g  t: %f\n',mvd,avd,t(n1));
0151     
0152     n=n1; 
0153     
0154     stop=stopCriteria(n,t(n),mvd);
0155   end
0156   
0157

iPerseus

PURPOSE

SYNOPSIS

DESCRIPTION

CROSS-REFERENCE INFORMATION

SOURCE CODE