🏠 Strona Główna
Benchmarki
📊 Wszystkie benchmarki 🦖 Dinozaur v1 🦖 Dinozaur v2 ✅ Aplikacje To-Do List 🎨 Kreatywne wolne strony 🎯 FSACB - Ostateczny pokaz 🌍 Benchmark tłumaczeń
Modele
🏆 Top 10 modeli 🆓 Darmowe modele 📋 Wszystkie modele ⚙️ Kilo Code
Zasoby
💬 Biblioteka promptów 📖 Słownik AI 🔗 Przydatne linki

Słownik AI

Kompletny słownik sztucznej inteligencji

162
kategorie
2 032
podkategorie
23 060
pojęcia
📖
pojęcia

Attention Head Analysis

Process of examining and interpreting the attention weights produced by each head to understand the specific patterns and relationships that each head has learned to capture.

📖
pojęcia

Head Specialization

Phenomenon where different attention heads in the same layer specialize to learn distinct types of linguistic relationships, such as syntax, semantics, or long-range dependencies.

📖
pojęcia

Attention Weight Matrix

Square matrix generated by an attention head, where each element (i, j) represents the importance or relevance score of token j for token i in the context of the sequence.

📖
pojęcia

Attention Map

Visualization of the attention weight matrix, often in the form of a heatmap, which graphically illustrates the focus relationships of an attention head on an input sequence.

📖
pojęcia

Syntactic Role

Type of relationship, such as subject-verb binding or dependency between a noun and its adjective, that a specialized attention head can learn to detect and model.

📖
pojęcia

Positional Role

Function of an attention head that primarily focuses on relative positional relationships between tokens, helping the model understand word order regardless of their semantic content.

📖
pojęcia

Positional Head

Attention head whose attention weights reveal patterns strongly related to the relative distance between tokens, acting as a mechanism to encode sequential structure.

📖
pojęcia

Subword Head

Attention head specialized in managing relationships between word fragments (subwords) generated by tokenizers like BPE, helping to reconstruct lexical coherence.

📖
pojęcia

Retrieval Head

Attention head identified in large models that behaves as an information retrieval mechanism, strongly connecting to specific tokens that act as 'keys' for memorized knowledge.

📖
pojęcia

Head Redundancy

Observation that certain attention heads in an over-parameterized model learn very similar or identical functions, suggesting potential inefficiency in resource usage.

📖
pojęcia

Attention Head Pruning

Model compression technique that involves identifying and removing attention heads deemed redundant or unimportant to reduce model size and computational cost with minimal impact on performance.

📖
pojęcia

Head Importance Score

Quantitative metric, often derived from the sensitivity of the loss or model performance to the removal of a head, used to rank heads by their contribution to overall functioning.

📖
pojęcia

Head Induction Analysis

Methodology that involves training a simple supervised model (such as a linear classifier) on the outputs of an attention head to discover the underlying function that this head has learned to represent.

📖
pojęcia

Diagonal Attention Pattern

Attention weight pattern where a head focuses primarily on the token itself (self-attention), often observed in lower layers to refine local representations.

📖
pojęcia

Vertical Attention Pattern

Pattern where an attention head focuses on a specific reference token (often the beginning-of-sequence token or a class marker) for all positions, aggregating information for a classification task.

📖
pojęcia

Block Attention Pattern

Pattern where an attention head focuses on contiguous segments of the sequence, indicating specialization in processing local phrases or clauses.

📖
pojęcia

Translation Head

In multilingual models, an attention head that learns to align words and phrases between different languages, facilitating the transfer of linguistic knowledge.

📖
pojęcia

Multi-Head Attention Mechanism

Fundamental component of Transformers that executes multiple attention heads in parallel, concatenates their outputs and projects them to allow the model to focus on different positions and different representation spaces simultaneously.

📖
pojęcia

Head Interpretability

Research field aimed at developing methods to understand, quantify and visualize the specific function of each attention head in order to demystify the internal workings of Transformer models.

🔍

Nie znaleziono wyników