Conversation

Show replies
Replying to
I’m a malfunctioning multi-armed bandit algorithm. I need to tune my objective function and explore/exploit ratio. Maybe level up into a contextual bandit, or hierarchical reinforcement learning.
1
10