Posts by Tags

machine learning

open source motion tracking

Early in my PhD I developed an open-source motion tracking system for mice. With the KineMouse wheel neuroscientists can reconstruct 3D pose while recording neural activity. The hackaday protocol describes how to build the system. This supplement contains additional info for motion tracking aficionados. Please be nice to your mice ❤️🐭❤️.

reinforcement learning (4/4): policy gradient

In parts 1-3 we found that learning the values of different states (or state-action pairs) made it easy to define good polices; we simply selected high valued states and actions. Policy gradient methods use a different approach: learn policies directly by optimizing their parameters to maximize reward. These techniques allow us to tackle more interesting problems consisting of large or continuous action and state spaces. The math is a bit heavier :nerd_face:, but so is the payoff.

reinforcement learning (3/4): temporal difference learning

In part 1 we discussed dynamic programming and Monte Carlo reinforcement learning algorithms. These appear to be qualitatively different approaches; whereas dynamic programming is model-based and relies on bootstrapping, Monte Carlo is model-free and relies on sampling environment interactions. However, these approaches can be thought of as two extremes on a continuum defined by the degree of bootstrapping vs. sampling. Temporal difference is a model-free algorithm that splits the difference between dynamic programming and Monte Carlo approaches by using both bootstrapping and sampling to learn online.

reinforcement learning (2/4): value function approximation

The methods we discussed in part 1 are limited when state spaces are large and/or continuous. Value function approximation addresses this by using functions to approximate the relationship between states and their value. But how can we find the parameters $\mathbf{w}$ of our value function $\hat{v}(s, \mathbf{w})$? Gradient descent works nicely here, which gives us tons of flexibility in how we model value functions.

open source

open source motion tracking

Early in my PhD I developed an open-source motion tracking system for mice. With the KineMouse wheel neuroscientists can reconstruct 3D pose while recording neural activity. The hackaday protocol describes how to build the system. This supplement contains additional info for motion tracking aficionados. Please be nice to your mice ❤️🐭❤️.

reinforcement learning

reinforcement learning (4/4): policy gradient

In parts 1-3 we found that learning the values of different states (or state-action pairs) made it easy to define good polices; we simply selected high valued states and actions. Policy gradient methods use a different approach: learn policies directly by optimizing their parameters to maximize reward. These techniques allow us to tackle more interesting problems consisting of large or continuous action and state spaces. The math is a bit heavier :nerd_face:, but so is the payoff.

reinforcement learning (3/4): temporal difference learning

In part 1 we discussed dynamic programming and Monte Carlo reinforcement learning algorithms. These appear to be qualitatively different approaches; whereas dynamic programming is model-based and relies on bootstrapping, Monte Carlo is model-free and relies on sampling environment interactions. However, these approaches can be thought of as two extremes on a continuum defined by the degree of bootstrapping vs. sampling. Temporal difference is a model-free algorithm that splits the difference between dynamic programming and Monte Carlo approaches by using both bootstrapping and sampling to learn online.

reinforcement learning (2/4): value function approximation

The methods we discussed in part 1 are limited when state spaces are large and/or continuous. Value function approximation addresses this by using functions to approximate the relationship between states and their value. But how can we find the parameters $\mathbf{w}$ of our value function $\hat{v}(s, \mathbf{w})$? Gradient descent works nicely here, which gives us tons of flexibility in how we model value functions.