[ot][spam][crazy] lab1 was: draft: learning RL

Undiscussed Horrific Abuse, One Victim of Many gmkarl at gmail.com
Mon May 9 06:31:48 PDT 2022


On Mon, May 9, 2022, 9:29 AM Undiscussed Horrific Abuse, One Victim of Many
<gmkarl at gmail.com> wrote:

> the lab says huggingface's model hub, which I mostly use a remote server
> to store pretrained language models and send data to the to my goverment on
> when and where I use them, now has deep reinforcement learning models
> available at
> https://huggingface.co/models?pipeline_tag=reinforcement-learning&sort=downloads
>
> Here's the import code, retyped:
>
> import gym
>
> from huggingface_sb3 import load_from_hub, package_to_hub, push_to_hub
> from huggingface_hub import notebook_login # for uploading to account from
> notebook
>
> from stable_baselines3 import PPO
> from stable_baselines3.common.evaluation import evaluate_policy
> from stable_baselines3.common.env_util import make_vec_env
>
> Of course, uploading to the hub is possibly a very bad idea unless you are
> an experienced activist or researcher or spy, or have
>

to clarify here, by "experienced" I mean "already pwned to heck by everyone
else" .

something important to share with your government or huggingface, or are
> only doing this casually and might get a job in it one day.
>
> The lab then provides an intro to Gym, which is a python library that
> openai made that has the effect of making it hard to tech technologies out
> of research, in the opinion of my pessimistic half, by verbosifying the
> construction of useful environments under an assumption they are only for
> testing model architectures.
>
> The lab says Gym is used a lot, and provides:
> - an interface to create RL environments
> - a collection of environments
>
> This is true.
>
> They visually redescribe that an agent performs actions in an environment,
> which then returns to them reward and state.
>
> This coupling of reward with environment, rather than the agent which
> would usually have goals itself, is part of the verbosifying, possibly.
> Maybe environment is more "environment interface", I'm actually having
> trouble thinking here. I always get confused around gym environments. Maybe
> jocks make better programmers nowadays.
>
> Reiteration:
>
> - Agent receives state S0 from the Environment
> - Based on S0, agent takes action A0
> - Environment has new frame, state S1
> - Environment gives reward R1 to the agent.
>
> Steps of using Gym:
> - create environment using gym.make()
> - reset environment to initial state with observation = env.reset()
>
> At each step:
> - get an action using policy model
> - using env.step(action), get from the environment: observation (the new
> state), reward, done (if episode terminatd), info (additional info dict)
>
> If episode is done, the environment is reset to its initial state with
> observation = env.reset() .
>
> This is very normative openai stuff that looks like it was read off a Gym
> example from their readme or such.
>
> It's interesting that huggingface is building their own libraries to pair
> with this course as it progresses. I wonder if some of that normativeness
> will shift toward increased utility even more.
>
> Here's a retype of the first example code:
>
> import gym
>
> # create environment
> env = gym.make('LunarLander-v2')
>
> # reset environment
> observation = env.reset()
>
> for _ in range(2+):
>   # take random action
>   action = env.action_space.sample()
>   print('Action taken:', action)
>
>   # do action and get next state, reward, etc
>   observation, reward, done, info = env.step(action)
>
>   # if the game is done (land, crash, timeout)
>   if done:
>     # reset
>     print('Environment is reset')
>     observation = env.reset()
>
>
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/html
Size: 5894 bytes
Desc: not available
URL: <https://lists.cpunks.org/pipermail/cypherpunks/attachments/20220509/b51e4685/attachment.txt>


More information about the cypherpunks mailing list