MARL Partially Observable - 인공지능 용어집

📖

용어

POMDP (Partially Observable Markov Decision Process)

Theoretical framework modeling environments where the agent perceives only a partial observation of the true state, requiring probabilistic inference about the hidden state to make optimal decisions.

📖

용어

Observation Space

Set of partial sensory signals that each agent can perceive from the environment, representing incomplete information about the global state of the system.

📖

용어

Belief State

Probability distribution over the hidden state space that an agent maintains and updates from its successive observations to represent its uncertainty about the true state of the environment.

📖

용어

Communication Protocol

Mechanism defining when, how, and what information agents can exchange among themselves to coordinate their actions in a partially observable environment.

📖

용어

Centralized Training with Decentralized Execution

Approach where agents train using global information (states, actions of all agents) but execute their policies individually using only their local observations.

📖

용어

Value Function Factorization

Technique decomposing the global value function into a sum of individual or local value functions, enabling decentralized learning while preserving global consistency.

📖

용어

Adversary Modeling

Process of inferring the policies or intentions of other agents based on their observed behaviors, crucial for decision-making in competitive or cooperative environments.

📖

용어

Credit Assignment Problem

Difficulty in correctly attributing the global reward to each agent in a multi-agent system, particularly complex when observations are partial and actions are interdependent.

📖

용어

Joint Action Learning

Method where agents learn to coordinate their actions by explicitly modeling the impact of combined actions on the global reward, despite partial observability.

📖

용어

State Estimation

Algorithmic process allowing an agent to infer the most probable global state from its local observations and its model of the environment.

📖

용어

Information Sharing

Strategy defining how agents distribute and aggregate their local observations to improve the collective knowledge of the environment's state.

📖

용어

Local Observation History

Temporal sequence of an agent's past observations, used as additional context to compensate for the lack of information about the current global state.

📖

용어

Multi-agent Partial Observability

Condition where no individual agent can observe the complete state of the system, requiring coordination and inference strategies to achieve optimal performance.

📖

용어

Decentralized Policy

Decision function for each agent that maps its local observation history to an action, without direct dependence on other agents' information during execution.

📖

용어

Common Knowledge

Information that all agents know and know that others also know, essential for coordination in partially observable environments.

📖

용어

Coordination Graph

Structure representing interaction dependencies between agents, allowing the global decision problem to be factored into easier-to-solve local subproblems.

AI 용어집