Reference

Class `_ExperiencePool`

An experience pool for agents.

Inherits from:

ArcadesComponent

This class will memorize the experiences of an agent and return them when required taking account of a possible history.

Info:

edit Alexis BRENON alexis.brenon@imag.fr

See also:

ArcadesComponent

Data Types

Dump

Serializable dump of an _ExperiencePool.
InitArguments
Table used as arguments for the ExperiencePool constructor.
Fields:
- number pool_size
  Size of the experience pool
- table state_size
  Size of the states {d, w, h}
- number history_length
  Length of the history for inference
- string history_type
  Type of history
- number history_spacing
  Spacing in history
Pool
Tables used to record interactions/experiences.
Fields:
- number max_size
  Maximum number of saved experiences
- number last_index
  Index of the last saved experience
- {number,...} states
  The hashes of the recorded states
- {number,...} terminals
  Is the states terminal (1) or not (0)
- {number,...} actions
  Actions executed
- {number,...} rewards
  Rewards received

Fields

function
_convert_tensor

Function to convert tensors if necessary.

This function must be called to convert tensors/network to the appropriate format (CUDA or default Tensor type) to avoid computation errors caused by inconsistent types
integer
hashed_states

The number of elements of the hash table.
hash.XXH64
hasher

Hasher object used to compute state hashes.
number
history_length

Number of states in a full historic state.
table
history_offsets

Offsets of the states to add to las one, when fetching a full historic state.
number
history_spacing
Parameter of the history_type function.
See also:
- _compute_history_offsets
{number,...}
history_stacked_state_size

Size of a full historic state.
string
history_type
Function to compute indexes of the historic state.
See also:
- _compute_history_offsets
torch.Tensor
nil_state

A nil state (full of 0) used in history.
Pool
pool

Actual pool.
{Pool,...}
pushed_pools

Pushed pools that can be restore by successive calls to pop.
{[number]=torch.Tensor}
states
Hash table to associate a hash (double) to a Tensor representing a state.
Usage:
- ```
s = self.states[self.hasher:hash(s)]
```

Metamethods

__init ( args, dump )
Default constructor.
Parameters:
- InitArguments args
- Dump dump

Public Methods

clear ()
Clear the current pool (forget all what you experienced).
Returns:
- self
dump ( [cycles={}] )
Dump current state of _ExperiencePool.

Tensors are converted to CPU tensors if necessary

Overrides: ArcadesComponent:dump

Parameters:
- table cycles
  Already dumped components (default {})
Returns:
- Dump
get_action ( [index=1] )
Get the action of a record.
Parameters:
- number index
  Index of the record to look for (1 is last recorded state) (default 1)
Returns:
- number Action executed/recorded
get_reward ( [index=1] )
Get the reward of a record.
Parameters:
- number index
  Index of the record to look for (1 is last recorded state) (default 1)
Returns:
- number Reward obtained/recorded
get_state ( [index=1] )
Return a full historic state.
Parameters:
- number index
  Index of the state to get (1 is last recorded state) (default 1)
Returns:
- torch.Tensor A full historic state, history stacked on first dimension
- boolean Is the returned state terminal?
get_terminal ( [index=1] )
Get the terminal signal of a record.
Parameters:
- number index
  Index of the record to look for (1 is last recorded state) (default 1)
Returns:
- boolean Is recorded state terminal?
pop ()
Restore a saved pool.

Restore a pool (if any) saved by a previous push call.
Returns:
- self
push ()
Push the current pool.

This allow you to save the current pool and to restore it later using pop.
Returns:
- self
record_action ( a )
Record an action in the pool.

This function is intended to be called after record_state to record the action executed for the last recorded state.
Parameters:
- number a
  The action index
Returns:
- self
record_reward ( r )
Record a reward in the pool.

This function is intended to be called after record_state and record_action to record the received reward for last action executed in last state.
Parameters:
- number r
  The reward
Returns:
- self
record_state ( s, t )
Record a state in the pool.
Parameters:
- torch.Tensor s
  The state
- boolean t
  Is the state terminal?
Returns:
- self
sample ( [batch_size=1] )
Return samples from the experience pool.

Samples returned don't start with terminal states
Parameters:
- integer batch_size
  Number of samples to return (default 1)
Returns:
- torch.Tensor batch_size states
- torch.Tensor batch_size actions
- torch.Tensor batch_size rewards
- torch.Tensor batch_size final states
- torch.Tensor batch_size final states terminal signal
size ()
Get the size of the experience pool.
Todo:
- Check which size is actually returned (current, max?)
Returns:
- number self.pool.max_size

Private Methods

_clean_states ()
Remove useless states from states hash map to reduce memory size.
Returns:
- self
_compute_history_offsets ()
Fill in history_offsets.

This function will compute the offsets of the states to retrieve to build a full historic state. The offsets depend on the history_type and the history_spacing.

if history_type is 'linear' :
offset[i] = history_spacing * i, ∀ i ∈ [1, history_length-1]

if history_type is 'exp' :
offset[i] = history_spacing ^ i, ∀ i ∈ [1, history_length-1]
Returns:
- torch.IntTensor self.history_offsets
_convert_states ( states, f )
Convert states according to the given function f.

This is used to convert the states to GPU or CPU Tensors.
Parameters:
- {[number]=torch.Tensor} states
  States hash map
- function f
  Function used to convert Tensors
Returns:
- {[number]=torch.Tensor} A new hash map with converted Tensors
_get_sampling_index ()
Return an index where to sample a non-terminal state.
Returns:
- number Index of the sample in the pool
_shift_index ( index )
Shift a given index according to last_index.

This function is used to manage a circular memory. history_offsets are computed according to a 0 indexed array. Pool arrays are circular and use a last_index to point the initial element. Thus we need a small computation to shift the offsets dynamicly.
Parameters:
- number index
  0 based index
Returns:
- number A last_index based index

Arcades

Data Types

Dump

InitArguments

Pool

Fields

_convert_tensor

hashed_states

hasher

history_length

history_offsets

history_spacing

history_stacked_state_size

history_type

nil_state

pool

pushed_pools

states

Metamethods

__init ( args, dump )

Public Methods

clear ()

dump ( [cycles={}] )

get_action ( [index=1] )

get_reward ( [index=1] )

get_state ( [index=1] )

get_terminal ( [index=1] )

pop ()

push ()

record_action ( a )

record_reward ( r )

record_state ( s, t )

sample ( [batch_size=1] )

size ()

Private Methods

_clean_states ()

_compute_history_offsets ()

_convert_states ( states, f )

_get_sampling_index ()

_shift_index ( index )

`_convert_tensor`

`hashed_states`

`hasher`

`history_length`

`history_offsets`

`history_spacing`

`history_stacked_state_size`

`history_type`

`nil_state`

`pool`

`pushed_pools`

`states`