Arcades

Class NeuralQLearner

DQN based agent.

Inherits from:

This agent mix DNN and Q-Learning as stated in the Nature letter :
"Human-level control through deep reinforcement learning" (Mnih et al.)

Info:
  • ClassArgument

    Arguments used to instanciate a class.

    Fields:
    • string class
      Name of the class to instantiate
    • table params
      Parameters of the class (see class documentation)
  • Dump

    Dump extracted from a NeuralQLearner. All Tensors are converted to the default type to avoid GPU incompatibilities.

  • InferenceNetwork

    Main deep neural network. This network is used to get the best action given a history of preprocessed states

    Fields:
    • nn.Module network
      The actual network
    • table input_size
      The size of the input {d, w, h}
    • table output_size
      The size of the output {w, h}
    • torch.Tensor parameters
      Flat view of learnable parameters
    • torch.Tensor grad_parameters
      Flat view of gradient of energy wrt the learnable parameters
  • InitArguments

    Table used as arguments for the DQN constructor.

    Warning:
    Fields:
    • {int,...} observation_size
      Size of the observations from the environment {d, w, h}
    • table actions
      Available actions from the environment (actions[0] is noop)
    • ClassArgument preprocess
      Parameters of the network to use to preprocess states (optional)
    • ClassArgument inference
      Parameters of the network used for inference
    • _ExperiencePool.InitArguments experience_pool
      Parameters of the memory of the agent
    • number learn_start
      Number of steps after which learning starts (default 0)
    • number update_freq
      Learning frequency (epoch size) (default 1)
    • number minibatch_size
      Number of samples to take to learn (default 1)
    • number n_replay
      Number of minibatch learning during a learning epoch (default 1)
    • boolean rescale_r
      Scale rewards (default false)
    • number max_reward
      Reward maximum value clipping (optional)
    • number min_reward
      Reward minimum value clipping (optional)
    • number ep_start
      Initial value of epsilon (default 1)
    • number ep_end
      Final value of epsilon (default ep_start)
    • number ep_endt
      Epsilon annealing time (default 1000000)
    • number ep_eval
      Epsilon value when evaluating (default 0.01)
    • number lr
      Learning rate (default 0.001)
    • number discount
      Q-learning Discount factor (0 < x < 1) (default 0.99)
    • number clip_delta
      Clipping value for delta (default nil)
    • number target_q
      How long a target network is valid (default nil)
    • number wc
      L2 weight cost. (default 0)
    • RMSPropArgument rmsprop
      Pre-initialized RMSProp arguments. (default {})
  • PreprocessNetwork

    Preprocessing network. This network is used to preprocess an observation from the environment to change it to a simpler state

    Fields:
    • nn.Module network
      The actual network
    • table input_size
      The size of the input {d, w, h}
    • table output_size
      The size of the output {d, w, h}
  • RMSPropArgument

    Parameters for RMSProp implementation

    Todo:
    • Implement gradient descent in sub-classes?
    Fields:
    • torch.Tensor mean_square
      Accumulated average of the squared gradient
    • torch.Tensor mean
      Accumulated average of the gradient
    • number decay
      Decay factor of the means
    • number mu
      Smoothing term
  • function

    _convert_tensor

    Function to convert tensors if necessary.

    This function must be called to convert tensors/network to the appropriate format (CUDA or default Tensor type) to avoid computation errors caused by inconsistent types

  • number

    clip_delta

    Clipping value for differences between expected Q-Value and actual Q-Value.

  • number

    discount

    Q-learning discount factor (0 < x < 1).

  • number

    ep

    Current espilon value.

  • number

    ep_end

    Final value of epsilon.

  • number

    ep_endt

    Epsilon annealing time.

  • number

    ep_eval

    Epsilon value when evaluating.

  • number

    ep_start

    Initial value of epsilon.

  • agent._ExperiencePool

    experience_pool

    Experience pool recording interactions.

    This experience pool will act as a memory for the agent, recording interactions, and returning them when necessary (e.g. when learning).

  • number

    experienced_steps

    Number of perceived states.

  • InferenceNetwork

    inference

    Main neural network

  • number

    learn_start

    Number of steps after which learning starts. Add delay to populate the experience pool

  • number

    learning_epoch

    Number of time the agent has learned.

  • number

    lr

    Learning rate.

  • number

    max_reward

    Reward maximum value clipping.

  • number

    min_reward

    Reward minimum value clipping.

  • number

    minibatch_size

    Number of samples to take to learn.

  • number

    n_replay

    Number of minibatch learning during a learning epoch.

  • PreprocessNetwork

    preprocess

    Preprocessing network.

  • number

    r_max

    Maximal encountered reward for reward scaling.

    See also:
  • boolean

    rescale_r

    Scale rewards delta.

  • RMSPropArgument

    rmsprop

    Parameters for RMSProp implementation.

    Todo:
    • Use dedicated classes for GD implementations.
  • nn.Container

    target_network

    Current target network.

  • number

    target_q

    How long a target network is valid.

    If not nil a target network will be used during Q-Learning to improve the algorithm convergence.

  • number

    update_freq

    Learning frequency (epoch size).

  • number

    wc

    L2 weight cost.

  • __init ( args[, dump={}] )

    Default constructor.

    Parameters:

Public Methods

keyboard_arrow_up
  • dump ( cycles )

    Dump the agent.

    Redundant informations (like network parameters) are removed to save space. Tensors are converted to default type to avoid GPU incompatibilities.

    Parameters:
    • table cycles
      Set of already dumped objects
    Returns:
    • Dump A reloadable dump
  • evaluate ()

    Put the agent in an evaluation mode.

    The agent will used a separated experience pool and a dedicated epsilon value.

    Overrides: BaseAgent:evaluate

    Returns:
    • self
  • get_action ()

    Return an action to do.

    Returns:
  • get_experienced_interactions ()

    Return how many interactions the agent lived.

    Returns:
    • number Number of interactions done
  • get_learned_epoch ()

    Return how many times the agent actually learned from its experience.

    Returns:
    • number Number of times the agent learned
  • give_reward ( reward )

    Reward or punish the agent.

    Parameters:
    • number reward
      Reward if positive, punishment if negative
    Returns:
    • self
  • integrate_observation ( state )

    Integrate current observation from the environment.

    Parameters:
    • state The current state of the environment
      • torch.Tensor observation
        The actual observations
      • boolean terminal
        Is this state terminal?
    Returns:
    • self
  • training ()

    Put the agent in a training mode.

    This is the default mode of an agent.

    Overrides: BaseAgent:training

    Returns:
    • self

Private Methods

keyboard_arrow_up
  • _eGreedy ( state )

    Get the action to execute according to an epsilon-greedy policy.

    Todo:
    • Use dedicated classes for strategies.
    Parameters:
    • state
      Current state
    Returns:
  • _getQUpdate ( args )

    Compute expected action-values as targets of the neural network.

    Parameters:
    • args An interaction sample
      • s
        Initial state
      • a
        Action
      • r
        Reward
      • s2
        Final state
      • t
        Is state final?
    Returns:
  • _greedy ( state )

    Get the action to execute according to greedy policy.

    Parameters:
    • state
      Current state
    Returns:
  • _init_inference_network ( args[, dump={}] )

    Initialize the main inference network.

    Parameters:
  • _init_preprocessing_network ( args, observation_size )

    Initialize the preprocessing network.

    Parameters:
  • _learn ()

    Learn from the past experiences.

    This does nothing if agent is in evaluating mode.

    Returns:
    • self
  • _qLearnMinibatch ()

    Apply Q-Learning on a minibatch.