okarthikb's site

Automatic differentiation in Python

Vanilla policy gradient

Deep Q-learning