EECS 591: PROBLEM SET #3
Due: Thursday, 07 DEC, in class (Late after 12 noon, 12/11)
Reading:
- Class notes and Reinforcement Learning (RL) handout
- Assigned papers (see webpage), especially the survey
paper by Kaelbling, Littman, and Moore
- Reading in Sutton & Barto as assigned and as needed
Problems 3.1-7 [5 points each]
The following are all taken from Sutton & Barto's Reinforcement Learning:
An Introduction
(available on line)
- Exercise 3.8 in
Section 3.7
- Exercise 3.9 in
Section 3.7
- Exercise 3.14 in
Section 3.8
- Exercise 3.15 in
Section 3.8
- Exercise 3.17 in
Section 3.8
- Exercise 4.1 in
Section 4.1
- Exercise 4.2 in
Section 4.1
Note:
Use the linked sheet for recording your drawings of Exercises 3.14 and 3.15 above.
Problem 3.8 [15 points]
You've now seen several grid world examples in class, the readings,
and in this problem set. These are useful as simple models of agents/robots
exploring a world with obstacles.
For a different robot example with simple behaviors,
check out Example 3.7 in
Section 3.6
of Sutton & Barto.
For this problem, your task is to develop an MDP model/problem for an autonomous car
driving in a city environment.
This is obviously open-ended, and there is no correct answer. However,
I'd like you to try to come up with
- a formal, simplified model that makes sense for the application;
- a discussion about its merits and shortcomings;
- a discussion on what RL methods might be applicable to it.
Note:
Please answer this question on separate sheets (that have your name on them)
than the others, as I would like to photocopy/scan all these for distribution to
the class in hardcopy and/or via the web.
Note: The following two problems require programming.
Feel free to use Matlab, C/C++, Python, etc.
Problem 3.9 [10 points]
Implement value iteration to solve for V* for the
Gridworld of Example 3.8,
Section 3.7
of Sutton & Barto.
Solve to 3 decimal places and compare your answer to that shown in
Figure 3.8(b) in
Section 3.8.
Problem 3.10 [40 points]
Consider the
3277-state grid world of Figure 6 (page 257) of
Kaelbling, Littman, and Moore plus the associated discussion.
Implement and test the following algorithms on that environment:
- Q-learning
- Dyna
- Prioritized Sweeping
Note that the problem is deterministic, and hence you will have to use
deterministic versions of the
equations for these algorithms.
For example, do not use the equations on pages 256 and 258 for Dyna
and Prioritized Sweeping; instead, see Sutton & Barto,
Figure 9.4 in
Section 9.2.
and
Figure 9.9,
Figure 9.9 in
Section 9.4,
respectively.
Created: 2006-10-08.
Modified: 2006-11-20.