Kavosh Asadi

Below is a distillation of my professional journey. You can also find my resume here.

2020-Now: Applied Scientist at Amazon

My work at Amazon can be divided into 3 separate tracks:

I do first-principle scientific RL research. I have bootstrapped a research program at Amazon AI where we are committed to understanding and improving RL optimization. In doing so, I work very closely with Shoham Sabach, who is a professor at Technion, and a star in optimization. Our most representative papers in this space include a novel convergence theory for TD, understanding the benefits of resetting the optimizer in deep RL, and exploring proximal updates in value-function optimization. These 3 papers are published at NeurIPS.
I have worked on applying RL into some of the most important applications, such as solving the human-alignment problem using RL from human feedback (RLHF). I have contributed to Amazon’s Bedrock project by training gigantic models that take human preferences into account when conversating with us.
I have coauthored the RL chapter of the D2L book which is spearheaded by Alex Smola. This book is extremely useful for students who are looking for a hands-on and practical learning experience that enables them to understand and implement some of the key ideas in today’s deep learning literature.

I have also co-mentored a couple of strong PhD interns. Some of our previous interns include Martin Klissarov, Zuxin Liu, Jesse Zhang, and Ming Yin.

2015-2020: PhD Student at Brown University (with 2 Internships at MSR)

While at Brown, I had the pleasure to study fundamentals of RL with one of the most elite RL intellectuals, Michael Littman. While working with Michael, I studied the importance of smoothness in RL ingredients, such as in softmax operators, transition models, and value-function architectures.

As a PhD students I also did 2 internships at MSR where I primarily worked with Jason Williams who is a pioneer in dialog systems. Together with Jason, we mainly explored the application of RL to dialog agents and language models.

2013-2015: Master’s Student at the University of Alberta

I learned the fundamentals of RL with function approximation under the founder of modern RL, Rich Sutton. I also worked closely with Rich’s then post-doc Joseph Modayil. My work was primarily focused on usefully combining model-based and model-free RL. Together with Rich and Joseph, we proposed the Cascade Architecture.

2008-2013: Undergraduate Student at the University of Tehran

I learned the basics of computer science and quickly developed a potent interest in AI. At the time learning with supervision somehow felt like cheating to me (because I thought too much burden is put on human expert to provide supervision). In sharp contrast the RL framework felt very natural to me. This led to my studying the RL book as a sophomere. Having finished the book, I wrote to Rich asking him to take me as his student and the rest is history! I also somehow made it into the Errata and Notes of the 1st edition of the RL book.