okarthikb's site
Automatic differentiation in Python
Vanilla policy gradient
Deep Q-learning