Attention

此处提供一张思维导图。 思维导图 (Generated by NoteBookLLM) 为什么需要注意力机制 传统的 Seq2Seq 模型(此处以 RNN 的 Encoder - Decoder 模型为例),会将输入序列压缩为一个定长的向量,解码器再从这个向量生成输出序列。但是定长的向量难以有效编码所有必要的信息,那么就成为了处理长句子的瓶颈。 注意力机制的具体运作 注意力机制将输入编码成一个向量序列(annotations)。在生成输出序列的每个词的时候,模型会软搜索输入序列中的相关位置,根据这些相关的上下文向量和之前已经生成的目标词来预测下一个目标词。 缩放点积注意力(SDPA) $$ Atten(Q,K,V)=softmax\left( \frac{QK^T}{\sqrt{ d_{k} }} \right)V \tag{1} $$ 注意力机制的核心在于计算一个上下文向量$(Atten(Q,K,V))$,这个向量是输入序列的加权和,权重反应了输入序列中每个部分对于生成序列当前输出词的重要性。 在Scaled Dot-Product Attention 中,首先计算 query 和 key 的关联性,然后将这个关联性作为value 的权重,各个权重与 value 的乘积相加得到输出。(公式 1) $\sqrt{ d_{k} }$作用是缩放注意力分数。因为当$d_{k}$很大的时候,点积$QK^T$的结果会很大,导致 Softmax 产生极度不均匀的分布,梯度会变得很小。 代码实现 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 import torch import torch.nn as nn class ScaledDotProductAttention(nn.Module): def __init__(self): super(ScaledDotProductAttention, self).__init__() def forward(self, query, key, value, causal_mask=None,padding_mask=None): """ Single-head Scaled Dot-Product Attention Args: query: Query tensor of shape (batch_size, seq_len_q, d_k) key: Key tensor of shape (batch_size, seq_len_k, d_k) value: Value tensor of shape (batch_size, seq_len_v, d_v) causal_mask: Optional causal mask tensor of shape (batch_size, seq_len_q, seq_len_k) padding_mask: Optional padding mask tensor of shape (batch_size, seq_len_q, seq_len_k) 1. Causal mask is used to prevent attending to future tokens in the sequence. 2. Padding mask is used to ignore padding tokens in the sequence. 3. Both masks are optional and can be None. Returns: attention_output: Attention weighted output tensor of shape (batch_size, seq_len_q, d_v) """ d_k = query.size(-1) # Hidden size of the key/query attention_scores = torch.matmul(query,key.transpose(-1,-2)) / torch.sqrt(torch.tensor(d_k,dtype=torch.float32)) if causal_mask is not None: attention_scores = attention_scores.masked_fill(causal_mask == 0, float('-inf')) if padding_mask is not None: attention_scores = attention_scores.masked_fill(padding_mask == 0, float('-inf')) attention_weights = torch.softmax(attention_scores, dim=-1) attention_output = torch.matmul(attention_weights, value) return attention_output def test(): batch_size = 8 seq_len = 16 hidden_size = 64 query = torch.randn(batch_size,seq_len,hidden_size) key = torch.randn(batch_size,seq_len,hidden_size) value = torch.randn(batch_size,seq_len,hidden_size) sdpa = ScaledDotProductAttention() output = sdpa(query, key, value) print("Query shape:", query.shape) print("Key shape:", key.shape) print("Value shape:", value.shape) print("Output shape:", output.shape) if __name__ == "__main__": test() ...

Exp Migrate From Hexo to Hugo

好看是第一要义。 原因 最近半年换了不少设备,从Windows 主力换到了 M4MacMini ,个人感觉提升还是很大的,除了便携性这一方面比较差。Windows 重装之后发现把之前 Blog 的内容全部都丢弃了,要找只能从之前生成的静态网页去翻,但是鉴于我之前写的东西没啥价值且公式又多,所以干脆直接抛弃+迁移。 一些样式的配置参考 Github 小卡片 awesome-project 一个很棒的开源项目 Go 在 assets/css/extended 中创建 github.css css 代码 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 .github { border: 1px solid var(--border); border-radius: 12px; width: 100%; margin: 1.5em 0; padding: 1em; background: linear-gradient(135deg, var(--code-bg) 0%, rgba(255, 255, 255, 0.02) 100%); box-shadow: 0 4px 12px rgba(0, 0, 0, 0.08); transition: all 0.3s ease; position: relative; overflow: hidden; &::before { content: ''; position: absolute; top: 0; left: 0; right: 0; height: 3px; background: linear-gradient(90deg, #0366d6, #28a745, #ffd33d, #f66a0a); opacity: 0.8; } &:hover { transform: translateY(-3px); box-shadow: 0 8px 24px rgba(0, 0, 0, 0.12); border-color: rgba(3, 102, 214, 0.3); } .github_bar { display: flex; align-items: center; margin-bottom: 1em; .github-icon { width: 22px; height: 22px; margin-right: 10px; fill: #6c757d; transition: fill 0.3s ease; } } .github_name { font-weight: 600; text-decoration: none; font-size: 1.3rem; color: #0366d6; transition: all 0.3s ease; position: relative; &:hover { color: #0256cc; transform: translateX(2px); } &::after { content: ''; position: absolute; width: 0; height: 2px; bottom: -2px; left: 0; background: linear-gradient(90deg, #0366d6, #28a745); transition: width 0.3s ease; } &:hover::after { width: 100%; } } .github_description { color: #586069; font-size: 0.95rem; line-height: 1.6; margin-bottom: 1.2em; text-align: left; padding: 0.5em 0; border-left: 3px solid transparent; padding-left: 0.8em; transition: all 0.3s ease; &:hover { border-left-color: rgba(3, 102, 214, 0.3); background: rgba(3, 102, 214, 0.02); } } .github_language { display: inline-flex; align-items: center; background: rgba(3, 102, 214, 0.1); padding: 0.4em 0.8em; border-radius: 20px; border: 1px solid rgba(3, 102, 214, 0.2); transition: all 0.3s ease; &::before { content: "⚡"; margin-right: 6px; font-size: 0.9em; } color: #0366d6; font-size: 0.85rem; font-weight: 500; &:hover { background: rgba(3, 102, 214, 0.15); border-color: rgba(3, 102, 214, 0.4); transform: translateY(-1px); } } } /* 深色模式适配 */ @media (prefers-color-scheme: dark) { .github { background: linear-gradient(135deg, #0d1117 0%, rgba(33, 38, 45, 0.8) 100%); border-color: #30363d; box-shadow: 0 4px 12px rgba(0, 0, 0, 0.3); &:hover { box-shadow: 0 8px 24px rgba(0, 0, 0, 0.4); border-color: rgba(88, 166, 255, 0.3); } .github_bar .github-icon { fill: #8b949e; } .github_name { color: #58a6ff; &:hover { color: #79c0ff; } &::after { background: linear-gradient(90deg, #58a6ff, #56d364); } } .github_description { color: #8b949e; &:hover { border-left-color: rgba(88, 166, 255, 0.3); background: rgba(88, 166, 255, 0.05); } } .github_language { background: rgba(88, 166, 255, 0.1); border-color: rgba(88, 166, 255, 0.2); color: #58a6ff; &:hover { background: rgba(88, 166, 255, 0.15); border-color: rgba(88, 166, 255, 0.4); } } } } /* 响应式设计 */ @media (max-width: 768px) { .github { margin: 1em 0; padding: 1.2em; border-radius: 8px; .github_name { font-size: 1.15rem; } .github_description { font-size: 0.9rem; padding-left: 0.5em; } .github_language { padding: 0.3em 0.6em; font-size: 0.8rem; } } } ...