###
EECS 591: PROBLEM SET #3

### Due: Thursday, 07 DEC, in class (Late after 12 noon, 12/11)

### Reading:

- Class notes and Reinforcement Learning (RL) handout
- Assigned papers (see webpage), especially the survey
paper by Kaelbling, Littman, and Moore
- Reading in Sutton & Barto as assigned and as needed

### Problems 3.1-7 [5 points each]

The following are all taken from Sutton & Barto's *Reinforcement Learning:
An Introduction*
(available on line)
- Exercise 3.8 in
Section 3.7
- Exercise 3.9 in
Section 3.7
- Exercise 3.14 in
Section 3.8
- Exercise 3.15 in
Section 3.8
- Exercise 3.17 in
Section 3.8
- Exercise 4.1 in
Section 4.1
- Exercise 4.2 in
Section 4.1

**Note:**
**Use** the linked sheet for recording your drawings of Exercises 3.14 and 3.15 above.

### Problem 3.8 [15 points]

You've now seen several grid world examples in class, the readings,
and in this problem set. These are useful as simple models of agents/robots
exploring a world with obstacles.
For a different robot example with simple behaviors,
check out Example 3.7 in
Section 3.6
of Sutton & Barto.
For this problem, your task is to **develop an MDP model/problem for an autonomous car
driving in a city environment.**
This is obviously open-ended, and there is no correct answer. However,
I'd like you to try to come up with

- a formal, simplified model that makes sense for the application;
- a discussion about its merits and shortcomings;
- a discussion on what RL methods might be applicable to it.

**Note:**
Please answer this question on separate sheets (that have your name on them)
than the others, as I would like to photocopy/scan all these for distribution to
the class in hardcopy and/or via the web.

*Note: The following two problems require programming.*
Feel free to use Matlab, C/C++, Python, etc.
### Problem 3.9 [10 points]

Implement value iteration to solve for *V*^{*} for the
Gridworld of Example 3.8,
Section 3.7
of Sutton & Barto.
Solve to 3 decimal places and compare your answer to that shown in
Figure 3.8(b) in
Section 3.8.

### Problem 3.10 [40 points]

Consider the
3277-state grid world of Figure 6 (page 257) of
Kaelbling, Littman, and Moore plus the associated discussion.
Implement and test the following algorithms on that environment:

- Q-learning
- Dyna
- Prioritized Sweeping

Note that the problem is deterministic, and hence you will have to use
**deterministic** versions of the
equations for these algorithms.
For example, do not use the equations on pages 256 and 258 for Dyna
and Prioritized Sweeping; instead, see Sutton & Barto,
Figure 9.4 in
Section 9.2.
and
Figure 9.9,
Figure 9.9 in
Section 9.4,
respectively.

Created: *2006-10-08*.
Modified: *2006-11-20*.