在线时间:8:00-16:00
迪恩网络APP
随时随地掌握行业动态
扫描二维码
关注迪恩网络微信公众号
开源软件名称:RLzoo开源软件地址:https://gitee.com/TensorLayer/RLzoo开源软件介绍:Reinforcement Learning ZooRLzoo is a collection of the most practical reinforcement learning algorithms, frameworks and applications. It is implemented with Tensorflow 2.0 and API of neural network layers in TensorLayer 2, to provide a hands-on fast-developing approach for reinforcement learning practices and benchmarks. It supports basic toy-tests like OpenAI Gym and DeepMind Control Suite with very simple configurations. Moreover, RLzoo supports robot learning benchmark environment RLBench based on Vrep/Pyrep simulator. Other large-scale distributed training framework for more realistic scenarios with Unity 3D,Mujoco, Bullet Physics, etc, will be supported in the future. A Springer textbook is also provided, you can get the free PDF if your institute has Springer license. Different from RLzoo for simple usage with high-level APIs, we also have a RL tutorial that aims to make the reinforcement learning tutorial simple, transparent and straight-forward with low-level APIs, as this would not only benefits new learners of reinforcement learning, but also provide convenience for senior researchers to testify their new ideas quickly.
Please check our Online Documentation. We suggest users to report bugs using Github issues. Users can also discuss how to use RLzoo in the following slack channel. Table of contents: Status: ReleaseCurrent status [click to expand]We are currently open to any suggestions or pull requests from the community to make RLzoo a better repository. Given the scope of this project, we expect there could be some issues overthe coming months after initial release. We will keep improving the potential problems and commit when significant changes are made in the future. Current default hyperparameters for each algorithm and each environment may not be optimal, so you can play around with those hyperparameters to achieve best performances. We will release a version with optimal hyperparameters and benchmark results for all algorithms in the future. Version History [click to expand]
InstallationEnsure that you have Python >=3.5 (Python 3.6 is needed if using DeepMind Control Suite). Direct installation: pip3 install rlzoo --upgrade Install RLzoo from Git: git clone https://github.com/tensorlayer/RLzoo.gitcd RLzoopip3 install . Prerequisites
List of prerequisites. [click to expand]
UsageFor detailed usage, please check our online documentation. Quick StartChoose whatever environments with whatever RL algorithms supported in RLzoo, and enjoy the game by running following example in the root file of installed package: # in the root folder of rlzoo packagecd RLzoopython run_rlzoo.py What's in from rlzoo.common.env_wrappers import build_envfrom rlzoo.common.utils import call_default_paramsfrom rlzoo.algorithms import TD3 # import the algorithm to use# choose an algorithmAlgName = 'TD3'# chose an environmentEnvName = 'Pendulum-v0' # select a corresponding environment typeEnvType = 'classic_control'# build an environment with wrappersenv = build_env(EnvName, EnvType) # call default parameters for the algorithm and learning processalg_params, learn_params = call_default_params(env, EnvType, AlgName) # instantiate the algorithmalg = eval(AlgName+'(**alg_params)')# start the trainingalg.learn(env=env, mode='train', render=False, **learn_params) # test after training alg.learn(env=env, mode='test', render=True, **learn_params) The main script General Descriptions:RLzoo provides at least two types of interfaces for running the learning algorithms, with (1) implicit configurations or (2) explicit configurations. Both of them start learning program through running a python script, instead of running a long command line with all configurations shortened to be arguments of it (e.g. in Openai Baseline). Our approaches are found to be more interpretable, flexible and convenient to apply in practice. According to the level of explicitness of learning configurations, we provided two different ways of setting learning configurations in python scripts: the first one with implicit configurations uses a 1. Implicit Configurations [click to expand]RLzoo with implicit configurations means the configurations for learning are not explicitly contained in the main script for running (i.e. Common Interface:from rlzoo.common.env_wrappers import build_envfrom rlzoo.common.utils import call_default_paramsfrom rlzoo.algorithms import *# choose an algorithmAlgName = 'TD3'# chose an environmentEnvName = 'Pendulum-v0' # select a corresponding environment typeEnvType = ['classic_control', 'atari', 'box2d', 'mujoco', 'robotics', 'dm_control', 'rlbench'][0] # build an environment with wrappersenv = build_env(EnvName, EnvType) # call default parameters for the algorithm and learning processalg_params, learn_params = call_default_params(env, EnvType, AlgName) # instantiate the algorithmalg = eval(AlgName+'(**alg_params)')# start the trainingalg.learn(env=env, mode='train', render=False, **learn_params) # test after training alg.learn(env=env, mode='test', render=True, **learn_params) # in the root folder of rlzoo packagecd rlzoopython run_rlzoo.py 2. Explicit Configurations [click to expand]RLzoo with explicit configurations means the configurations for learning, including parameter values for the algorithm and the learning process, the network structures used in the algorithms and the optimizers etc, are explicitly displayed in the main script for running. And the main scripts for demonstration are under the folder of each algorithm, for example, A Quick Exampleimport gymfrom rlzoo.common.utils import make_env, set_seedfrom rlzoo.algorithms import ACfrom rlzoo.common.value_networks import ValueNetworkfrom rlzoo.common.policy_networks import StochasticPolicyNetwork''' load environment '''env = gym.make('CartPole-v0').unwrappedobs_space = env.observation_spaceact_space = env.action_space# reproducibleseed = 2set_seed(seed, env)''' build networks for the algorithm '''num_hidden_layer = 4 #number of hidden layers for the networkshidden_dim = 64 # dimension of hidden layers for the networkswith tf.name_scope('AC'): with tf.name_scope('Critic'): # choose the critic network, can be replaced with customized network critic = ValueNetwork(obs_space, hidden_dim_list=num_hidden_layer * [hidden_dim]) with tf.name_scope('Actor'): # choose the actor network, can be replaced with customized network actor = StochasticPolicyNetwork(obs_space, act_space, hidden_dim_list=num_hidden_layer * [hidden_dim], output_activation=tf.nn.tanh)net_list = [actor, critic] # list of the networks''' choose optimizers '''a_lr, c_lr = 1e-4, 1e-2 # a_lr: learning rate of the actor; c_lr: learning rate of the critica_optimizer = tf.optimizers.Adam(a_lr)c_optimizer = tf.optimizers.Adam(c_lr)optimizers_list=[a_optimizer, c_optimizer] # list of optimizers# intialize the algorithm model, with algorithm parameters passed inmodel = AC(net_list, optimizers_list)''' full list of arguments for the algorithm----------------------------------------net_list: a list of networks (value and policy) used in the algorithm, from common functions or customizationoptimizers_list: a list of optimizers for all networks and differentiable variablesgamma: discounted factor of rewardaction_range: scale of action values'''# start the training process, with learning parameters passed inmodel.learn(env, train_episodes=500, max_steps=200, save_interval=50, mode='train', render=False)''' full list of parameters for training---------------------------------------env: learning environmenttrain_episodes: total number of episodes for trainingtest_episodes: total number of episodes for testingmax_steps: maximum number of steps for one episodesave_interval: time steps for saving the weights and plotting the resultsmode: 'train' or 'test'render: if true, visualize the environment'''# test after trainingmodel.learn(env, test_episodes=100, max_steps=200, mode='test', render=True) In the package folder, we provides examples with explicit configurations for each algorithm. # in the root folder of rlzoo packagecd rlzoopython algorithms/<ALGORITHM_NAME>/run_<ALGORITHM_NAME>.py # for example: run actor-criticpython algorithms/ac/run_ac.py Interactive ConfigurationsWe also provide an interactive learning configuration with Jupyter Notebook and ipywidgets, where you can select the algorithm, environment, and general learning settings with simple clicking on dropdown lists and sliders! A video demonstrating the usage is as following. The interactive mode can be used with ContentsAlgorithmsChoices for EnvironmentsChoices for
Some notes on environment usage. [click to expand]
Configurations:The supported configurations for RL algorithms with corresponding environments in RLzoo are listed in the following table.
Properties1. Automatic model construction [click to expand]We aim to make it easy to configure for all components within RL, including replacing the networks, optimizers, etc. We also provide automatically adaptive policies and value functions in the common functions: for the observation space, the vector state or the raw-pixel (image) state are supported automatically according to the shape of the space; for the action space, the discrete action or continuous action are supported automatically according to the shape of the space as well. The deterministic or stochastic property of policy needs to be chosen according to each algorithm. Some environments with raw-pixel based observation (e.g. Atari, RLBench) may be hard to train, be patient and play around with the hyperparameters! 3. Simple and flexible API [click to expand]As described in the Section of Usage, we provide at least two ways of deploying RLzoo: implicit configuration and explicit configuration process. We ensure the maximum flexiblity for different use cases with this design. 3. Sufficient support for DRL algorithms and environments [click to expand]As shown in above algorithms and environments tables. 4. Interactive reinforcement learning configuration. [click to expand]As shown in the interactive use case in Section of Usage, a jupyter notebook is provided for more intuitively configuring the whole process of deploying the learning process ( Troubleshooting
CreditsOur core contributors include: Zihan Ding,Tianyang Yu,Yanhua Huang,Hongming Zhang,Hao Dong Citing@misc{RLzoo, author = {Zihan Ding, Tianyang Yu, Yanhua Huang, Hongming Zhang, Hao Dong}, title = {Reinforcement Learning Algorithms Zoo}, year = {2019}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/tensorlayer/RLzoo}},} Other Resources |
请发表评论