01強化学習Reinforcement Learning
02エージェント(強化学習)Agent
03環境(強化学習)Environment
04状態(強化学習)State
05行動(強化学習)Action
06報酬(強化学習)Reward
07方策(Policy)Policy
08価値関数Value Function
09状態価値関数State Value Function
10行動価値関数(Q値)Action Value Function / Q-Value
11ベルマン方程式Bellman Equation
12マルコフ決定過程(MDP)Markov Decision Process
13割引率Discount Factor
14探索と活用Exploration vs Exploitation
15ε-greedy法Epsilon-Greedy
16Q学習Q-Learning
17SARSASARSA (State-Action-Reward-State-Action)
18DQN(Deep Q-Network)Deep Q-Network
19方策勾配法Policy Gradient
20REINFORCEREINFORCE
21Actor-CriticActor-Critic
22A2C/A3CAdvantage Actor-Critic / Asynchronous Advantage Actor-Critic
23PPOProximal Policy Optimization
24SACSoft Actor-Critic
25TD学習Temporal Difference Learning
26モンテカルロ法(強化学習)Monte Carlo Methods in RL
27経験再生(Experience Replay)Experience Replay
28優先度付き経験再生Prioritized Experience Replay
29ターゲットネットワークTarget Network
30報酬設計Reward Design
31報酬シェーピングReward Shaping
32逆強化学習Inverse Reinforcement Learning
33模倣学習Imitation Learning
34RLHFReinforcement Learning from Human Feedback
35DPO(強化学習)Direct Preference Optimization
36オフライン強化学習Offline Reinforcement Learning
37モデルベース強化学習Model-Based Reinforcement Learning
38モデルフリー強化学習Model-Free Reinforcement Learning
39マルチエージェント強化学習Multi-Agent Reinforcement Learning
40自己対戦Self-Play
41AlphaGoAlphaGo
42AlphaZeroAlphaZero
43MuZeroMuZero
44OpenAI GymOpenAI Gym / Gymnasium
45シミュレーション環境Simulation Environment
46ロボット制御Robot Control
47自律走行Autonomous Driving
48マニピュレーションManipulation
49Sim-to-RealSim-to-Real Transfer
50ドメインランダマイゼーションDomain Randomization
51階層型強化学習Hierarchical Reinforcement Learning
52カリキュラム学習Curriculum Learning