4 t- s+ S5 ?% L3 C$ L9 Z; v% K极限多标签分类-评价指标# z0 {; Q, `" G1 N5 Z. B0 Y4 w# U
References:1 A8 q* R% x. s; }+ _
http://manikvarma.org/downloads/XC/XMLRepository.html 5 x6 r8 G M; @6 e% Qhttps://blog.csdn.net/minfanphd/article/details/126737848?spm=1001.2014.3001.5502+ e* K# q0 Q9 f& k+ M
https://en.wikipedia.org/wiki/Discounted_cumulative_gain " k$ ?' f$ L3 {8 v; d 5 g" N6 R3 M4 g* x: e: {什么是极限多标签分类 (eXtreme multi-label Classification (XC))? % r' j, q( m+ v+ G, J3 p" K标签数非常多(Million),典型的就是BoW数据标签。& l4 x( @' c1 V S
极限多标签分类的典型应用:Image Caption(头大)。不过在Image Caption里面,Word之间存在序关系。XC可以看成是Image Caption的一个关键阶段,它能够选出与当前Image最相关的BoW。 , j9 V# ?7 s& W: b" T b1 P(上述都是靠过往经验吹的,近期没调研)。1 A+ D/ [2 v9 {) R# S+ C* e
) B( `6 E; n" `1 X, Y
先来看一下评价指标: ! o- ^9 C9 I# A" I' B7 d由于标签数非常多,且GroundTruth又非常小,因此通常意义上的分类精度、召回(多标签分类用macro或者micro的acc或者recall)等指标不work。 # U e1 e) f& _) z# E' V: W& M4 w6 a这些评价指标通常考虑了head/tail labels,也就是高频标签和低频标签;以及reciprocal pairs(互惠对)去除?! R/ a" i% ?! I; w3 M+ v8 G
互惠对似乎?是指彼此相关的标签对,比如针对一个数据点,如果预测了标签A,如果标签B和A相关,那可以自然预测B。 $ d' k5 w8 K t) C V: f为了避免这种trival prediction, reciprocal pairs应该被去除。 0 Z' [$ E+ |% ~+ Q$ x . e! L; Y: V @7 n8 p: L9 J(1) Top-k kk Performance:/ w; K6 x( z9 A( Z
(Precision@ k ) P @ k : = 1 k ∑ l ∈ rank k ( y ^ ) y l \text{(Precision@$k$)}\text{P}@k := \frac{1}{k}\sum_{l \in \text{rank}_k (\hat{\mathbf{y}})} \mathbf{y}_l% e4 G5 k3 T6 k3 s/ q' G1 b" w
(Precision@k)P@k:= ( o+ }! Q _% n3 S; R u
k + {+ a( r1 m% B3 G3 M1 8 i9 `1 I* k5 u# b! z4 t; H % }, ]* E; X1 c0 z+ T4 }% S! ^3 M# X ' v# }6 O3 G/ ?7 l. f/ @l∈rank 0 ^9 U: i% ~0 d* W9 rk $ K7 G1 O8 l/ k" G6 v; j4 U- p & y& ]" x( b7 P+ ~3 [( ~ ( 3 Z; r9 t! S9 \- B' o
y , H. v; e, M+ _ ~2 N^5 b* ~7 @& d: `
, K3 n& X. Q5 c+ L( e2 r- b) n ) : Y# w; j6 d% g9 l∑ . T9 e7 z) U9 \. n8 {& r 6 K# S: v Z; n- Q) [3 P y ( l& [; i4 g! o/ |7 h, Ql , Q) y9 f8 ~& U6 R( s) J: ` - i# {' X9 a% D2 u" S0 e, i: N9 k6 ], L. t4 [$ p. F
+ b8 M" N) r; T, |2 |(Discounted Cumulative Gain (贴现累积收益))DCG @ k : = ∑ l ∈ rank k ( y ^ ) y l log ( l + 1 ) \text{(Discounted Cumulative Gain (贴现累积收益))} \text{DCG}@k := \sum_{l \in \text{rank}_k(\hat{\mathbf{y}})} \frac{\mathbf{y}_l}{\log(l+1)} : x- ^" g2 _! K! Z( Y2 q& C0 v(Discounted Cumulative Gain (贴现累积收益))DCG@k:= $ G9 A) i# d) p. z' el∈rank * |0 M6 c/ x9 D: `% ~( Y: n1 Y
k# A0 U) t; R- ~9 m: j% {
. ]1 I' `( g! }% P ( 6 c$ T7 x6 X4 o+ H3 |9 V7 hy& `! R2 f! @- R4 v9 C# N
^/ F9 Q2 T) U7 O6 S C1 G4 @" p0 r
% @; U+ p: G$ o6 c6 d# n ) t9 a. V+ a! o∑ ; p# q" q" x; f" M- ]$ X. j3 V 3 _8 S" k. L& C) X @) T/ c( a6 y# |% s6 }& [3 W+ `. O3 \
log(l+1)+ Q' [; F9 D) N* g S
y : q3 J% g5 h% d7 p: q- h3 ?7 a
l $ l; e( V/ W2 f$ D8 @* t2 u+ {& n+ p0 U9 i' W, D
9 l7 J, j& D6 U! E: Y& S2 l
9 w8 V/ i, v- M: @6 z$ `
, X% S4 i6 a6 J) a y% c# m, z
* B$ w9 p+ `) b' M4 a; J4 u
(Normalized DCG)nDCG @ k : = DCG@ k ∑ l = 1 min ( k , ∣ ∣ y ∣ ∣ 0 ) 1 log ( l + 1 ) \text{(Normalized DCG)} \text{nDCG}@k := \frac{\text{DCG@$k$}}{\sum_{l=1}^{\min(k,||\mathbf{y}||_0)} \frac{1}{\log(l+1)}}0 U; y$ a" Y0 M9 m9 b
(Normalized DCG)nDCG@k:= . c9 ]- [6 h7 H& _; z∑ " h7 c8 ^9 y- q; ~$ p; f
l=1' L& p# @/ R$ b' J/ ?
min(k,∣∣y∣∣ 6 |+ i9 c/ u$ s: ~. E1 j/ \
00 Q3 j1 S0 e6 X/ _. W1 H
- x* k* v8 G5 d ) & O$ `; d! p# E" E5 T$ g5 F: x7 w+ i& f" p4 ?1 s% g
- Y: j l, x# z9 u: q, |: X
log(l+1) 7 r% l) }7 B% i% x1 4 v' a; [! |, W; t; K) J& w o % r4 Q* c. R) g8 I6 i3 _& _, t6 q4 ?4 l1 d* T7 r/ X
DCG@k ( L" _- f2 B% A! G 3 H- G" B* N9 H i , ?& z$ f4 p, L- r O$ O6 A 2 k% A0 H! G# g$ ^7 @rank k ( y ) \text{rank}_k(\mathbf{y})rank - C& m: {2 q5 ^% C* bk % i+ X2 y- K7 _% C0 h" j T: P: q* y! t (y)为逆序排列y \mathbf{y}y的前k个下标。Note: DCG公式里的分母实际上不是l,而是from 1 to k.% O9 j# i. F7 B3 ^: P. e" k
7 b9 d. j5 J3 U. c5 C2 K! Y1 M/ U靠后的标签按照对数比例地减小,说白了就是加权。至于为什么用log?两个事实:1. 平滑缩减; 2. Wang等人提供了理论支撑说明了log缩减方式的合理性。The authors show that for every pair of substantially different ranking functions, the nDCG can decide which one is better in a consistent manner. (看不懂,暂时不管). Z& l' Y3 E" R8 r \
3 I$ c# j7 s/ J# i(2) Top-k kk Propensity-score: ' [; b# @* D0 J8 b2 W4 v: c' z% m * u& q5 E6 o4 P) {% n有些数据集包含一些频度很高的标签(通常称之为head labels),可以通过简单地重复预测头部标签来实现高的P @ k \text{P}@kP@k。Propensity-score可以检查这种微不足道的行为。 ) u/ W. B- z! l) S" E% h/ D6 Z9 w( Propensity-score Precision ) PSP @ k : = 1 k ∑ l ∈ rank k ( y ^ ) y l p l (\text{Propensity-score Precision}) \text{ PSP}@k := \frac{1}{k} \sum_{l\in \text{rank}_k(\hat{\mathbf{y}})} \frac{\mathbf{y}_l}{p_l} / r$ W$ m7 t7 r(Propensity-score Precision) PSP@k:= 8 R4 R- o9 p8 T! H [ z- Q4 zk 6 G9 _. [. J/ ~. H! m G* | b0 y) q1 & B* E. R+ |8 w& ^; m/ c( [0 [; `) o8 v
1 ^; M* X$ V, ` ^
l∈rank {8 i' @. v; F( j
k- O0 |( Z. q6 G7 R
8 M. p, j1 j8 I9 y+ \6 h
( 0 u( V1 T7 C- S* wy - j. u1 K Z$ c9 l5 } @/ v8 ^& [$ P^* X9 J- F% y- O* }
, b! S+ L5 c" a$ k0 X+ _( Q
)4 \4 o1 }4 F. c1 c
∑( n1 q( H9 N( Q9 j& @! T! P% l) O
& x8 N/ r* ]: V! A3 B5 I
% {/ Y; K4 D* D7 I) h0 |, tp 3 R) w8 y! Q3 C l" f7 Z
l ) E$ |: R$ Y a2 K , M5 \+ p( ]6 z, X; _% o ) ?% t' r- Z1 ~4 n/ Gy $ m+ O6 j3 z& L/ C k* X. N6 xl % j$ g3 s+ @% v3 Z 4 Y# D2 s8 g) h4 O ( D6 z2 Y P- }6 o& F: { + o9 M& D3 }& w, w5 ? * q+ x9 g* L( y " Y' |% v9 X9 `6 vPSDCG @ k : = ∑ l ∈ rank k ( y ^ ) y l p l log ( l + 1 ) \text{PSDCG}@k := \sum_{l \in \text{rank}_k(\hat{\mathbf{y}})} \frac{\mathbf{y}_l}{p_l\log(l+1)} : |" D. Y B I3 D" IPSDCG@k:= 0 c3 w, H. I. P( q$ W/ R% gl∈rank / V5 S: R) ]5 D. {3 A( i# X F- Lk 1 o" v% Y M+ p1 Q5 m 7 t5 p2 d# y! K5 i1 @( L" s ( 6 V2 [" M; R+ `: g I
y1 B! Z5 R4 W: f6 F% F
^ 6 a W4 Z; F# q Z0 i ; Z/ p# [( r# F ) % }7 z' c }0 a( f∑ : |2 |, b& Z. O7 N1 p 7 [: s9 d6 f; A( w - V% G$ w$ \. \( z! t2 A' f; S4 R0 Gp * v; f: M. f& T
l 3 n8 x% w+ L3 Y & l% {/ \& q. M& w% O log(l+1)6 ^, o& C' U! G9 W& [. G5 o
y 2 T- e8 K" s$ K# p8 {3 U1 c% Xl8 h! t! D; |. Q0 I; V
a6 z# f! H* D2 Z & l q$ [% p1 ~/ } - a1 @* ~$ p8 d3 `/ K0 e" p, ~2 b' ?1 i$ g/ G
- A/ }: P) M' C, {# @. aPSnDCG @ k : = PSDCG@ k ∑ l = 1 k 1 log ( l + 1 ) \text{PSnDCG}@k := \frac{\text{PSDCG@$k$}}{\sum_{l=1}^{k} \frac{1}{\log(l+1)}} ! j5 E; ^0 i }( {PSnDCG@k:= , U. z/ D) ]2 x. V( q+ c. A. b# |
∑ 9 c* S4 C" A! _" f; ?& {# }; E7 ol=1- @9 b2 m# W2 B7 {
k - x W( m4 [* S i/ M U x2 x! N$ P# K$ m2 d2 C/ L7 I& w
" X0 R! K. t1 u: _; g) M9 T* olog(l+1)" j6 Q7 s, j9 \* F4 ]; g9 ^, Z
1 ' c) h: Y0 M# \# S7 p3 j: P. ~ 2 E y4 q, M" S6 @* b/ o/ E3 A6 I$ D; A( j$ V! K I, b+ b, G) d
PSDCG@k 3 c8 Z& a! T' D+ L5 o 3 ]4 ~" q6 F- Y% Y# u5 d 3 P: s& M' [( y" J$ F 7 e4 d# ^8 @) l/ X其中p l p_lp ' \+ Q' X, \+ Q4 r- C% A! W
l4 T" e- j3 o' j: Z Z( L$ Y9 ^! t
* T1 a& F% ~* @) T% p) d8 p0 B
为标签l ll的propensity-score,使得这种度量在missing label方面无偏差(unbiased)。" a0 @( I% L n& j1 f; \
Propensity-score强调在tail labels上的表现,而对预测head labels提供微弱的奖励。; d0 {9 z8 E5 | E
————————————————; _: n: E6 n. Q7 R/ ~) T) ]
版权声明:本文为CSDN博主「摆烂的-白兰地」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。! G' P( y8 G6 c$ u3 ^+ i
原文链接:https://blog.csdn.net/wuyanxue/article/details/126805190 3 i- l q3 q3 h6 [, V8 U * a0 w5 ]5 t x9 D; P- q K! h) d * W/ _+ H7 _( K' \. w