🏠 Trang chủ
Benchmark
📊 Tất cả benchmark 🦖 Khủng long v1 🦖 Khủng long v2 ✅ Ứng dụng To-Do List 🎨 Trang tự do sáng tạo 🎯 FSACB - Trình diễn cuối cùng 🌍 Benchmark dịch thuật
Mô hình
🏆 Top 10 mô hình 🆓 Mô hình miễn phí 📋 Tất cả mô hình ⚙️ Kilo Code
Tài nguyên
💬 Thư viện prompt 📖 Thuật ngữ AI 🔗 Liên kết hữu ích

Thuật ngữ AI

Từ điển đầy đủ về Trí tuệ nhân tạo

162
danh mục
2.032
danh mục con
23.060
thuật ngữ
📖
thuật ngữ

Contextual Bandit

Reinforcement learning algorithm that dynamically selects the best actions based on the observed context to maximize cumulative rewards.

📖
thuật ngữ

Exploration vs Exploitation

Fundamental dilemma where the algorithm must balance discovering new options and exploiting options known to be performant.

📖
thuật ngữ

Upper Confidence Bound (UCB)

Strategy that selects arms based on an upper confidence bound on their expected reward, favoring the exploration of uncertain actions.

📖
thuật ngữ

Thompson Sampling

Bayesian algorithm that samples reward parameters from their posterior distribution to make probabilistic decisions.

📖
thuật ngữ

LinUCB

Extension of UCB that models expected reward as a linear function of context, adapted for high-dimensional context spaces.

📖
thuật ngữ

Context Features

Descriptive variables that characterize the current state of the environment and influence the optimal choice of action in contextual bandits.

📖
thuật ngữ

Regret Minimization

Objective aimed at minimizing the difference between the cumulative reward obtained and that of the optimal policy, measuring the performance of the algorithm.

📖
thuật ngữ

Multi-armed Bandits

Fundamental problem where an agent must select among several options (arms) with unknown reward distributions to maximize gain.

📖
thuật ngữ

Reward Function

Mathematical function that quantifies the immediate return obtained after taking an action in a given context, guiding the algorithm's learning.

📖
thuật ngữ

Arm Selection

Process of choosing the optimal action among available options based on current reward estimates and the observed context.

📖
thuật ngữ

Expected Reward

Anticipated average value of the reward for a given action in a specific context, calculated from historical observations.

📖
thuật ngữ

Action-Value Function

Function Q(a,x) that estimates the expected future reward by taking action 'a' in context 'x', fundamental for policy evaluation.

📖
thuật ngữ

Online Learning

Learning paradigm where the model continuously adjusts as new data arrives, without requiring a full retraining.

📖
thuật ngữ

Stochastic Contextual Bandits

Variant where rewards follow independent and identically distributed stochastic distributions for each context-action pair.

📖
thuật ngữ

Neural Bandits

Approach using neural networks to approximate the value function or policy, capable of capturing complex non-linear relationships.

🔍

Không tìm thấy kết quả