$ `8 g2 S# a* c" w# F极限多标签分类-评价指标 7 D6 G2 J. g, Q. F; x2 R4 i$ W: s T R3 |0 G2 c8 g
极限多标签分类-评价指标; K# f, v) O0 n3 K8 y+ V
References:) a: x3 J2 Z0 _! w/ |1 |
http://manikvarma.org/downloads/XC/XMLRepository.html- @& o& W( h5 @0 ?" A& ]5 i5 T# @
https://blog.csdn.net/minfanphd/article/details/126737848?spm=1001.2014.3001.5502, M, d [3 e# b2 O/ G
https://en.wikipedia.org/wiki/Discounted_cumulative_gain + I; [7 M$ K0 y5 m" v1 t9 f e5 [: v# l0 Z0 c
什么是极限多标签分类 (eXtreme multi-label Classification (XC))? q& T- p" O/ y9 X1 {$ R' X标签数非常多(Million),典型的就是BoW数据标签。) R4 V0 K; M" E7 o
极限多标签分类的典型应用:Image Caption(头大)。不过在Image Caption里面,Word之间存在序关系。XC可以看成是Image Caption的一个关键阶段,它能够选出与当前Image最相关的BoW。2 o1 @. K' J8 ]. ~5 V
(上述都是靠过往经验吹的,近期没调研)。 t* w; Y! @5 ~6 j
' O0 Y8 L4 U4 c0 h, f: G先来看一下评价指标:- Z) I9 C; K0 O6 ^
由于标签数非常多,且GroundTruth又非常小,因此通常意义上的分类精度、召回(多标签分类用macro或者micro的acc或者recall)等指标不work。 # v2 x' q9 ?6 l1 c: P这些评价指标通常考虑了head/tail labels,也就是高频标签和低频标签;以及reciprocal pairs(互惠对)去除?1 I8 ]6 w& q6 B
互惠对似乎?是指彼此相关的标签对,比如针对一个数据点,如果预测了标签A,如果标签B和A相关,那可以自然预测B。 2 J+ m% a: U# Z7 P, q/ \/ u为了避免这种trival prediction, reciprocal pairs应该被去除。 ( T A6 I" A4 s* \, h& o; Z& B) G9 B0 l * S3 }! h0 M) \ g* _+ X1 n$ B(1) Top-k kk Performance: ! k+ G2 r& S# `0 A) D9 v( Y(Precision@ k ) P @ k : = 1 k ∑ l ∈ rank k ( y ^ ) y l \text{(Precision@$k$)}\text{P}@k := \frac{1}{k}\sum_{l \in \text{rank}_k (\hat{\mathbf{y}})} \mathbf{y}_l# b/ f/ @! B% X9 k0 h3 B; N- Q+ K
(Precision@k)P@k:= ' p6 p- q! q2 O0 D$ W6 {k + i* N0 L6 S0 j V4 Y* X6 }7 t0 p5 A14 x7 `9 y) ]0 M: K1 U7 J5 E- k
; b) P/ T, _7 D0 G7 c5 k S* F! f" ~0 E8 `7 K0 {4 w0 A0 ~
l∈rank % f& \! j! y6 [& \! r" X2 X4 z6 W
k 4 Y1 x7 y+ M. ~7 B1 z! `0 F g
( ) l0 E3 b& L/ ]* k
y) o( x K' C5 A, R; W/ \2 F
^, f% ?" h2 a& R* s# W ^+ }
9 t6 Z! A0 s: u1 Q* s- ~ ) , Y' l* Q$ ]- H2 Y6 ?∑; n: M" f* Q: @5 A
( h3 D3 s0 B+ ~) p6 I5 p y ' K4 ?/ }1 B6 K# x2 c/ H5 `1 @
l ' F0 C. r) b' J! `* m 7 D" b* E* A) B/ I 7 h; P) U! F# Y& N6 ?6 i( I' c/ u; l* _4 C( }
(Discounted Cumulative Gain (贴现累积收益))DCG @ k : = ∑ l ∈ rank k ( y ^ ) y l log ( l + 1 ) \text{(Discounted Cumulative Gain (贴现累积收益))} \text{DCG}@k := \sum_{l \in \text{rank}_k(\hat{\mathbf{y}})} \frac{\mathbf{y}_l}{\log(l+1)}, W9 d" R: j8 z
(Discounted Cumulative Gain (贴现累积收益))DCG@k:= 1 c7 a9 }* Y3 Y6 l$ o% z; r6 e
l∈rank / d6 l, j& v4 p5 ]( M5 dk. |1 }( W p; h8 g) G6 {7 ^
9 I: z- [1 i; ~ p: S3 c9 Z
( 9 {, P4 V% D U+ _7 m* p1 i. h
y9 ?$ s, }+ u/ R4 W9 Q! C% E
^/ B+ z8 ]0 h* H- E8 X8 O% T
4 p) L1 |: T) S6 ]% X6 O )( K& u' `* r6 x# c6 _/ S" p
∑ 9 p% n8 q# c9 L, ^" T6 e7 @9 ]& Y# C. z" b# {$ a9 G
. V5 h {& D) j. b$ C/ @log(l+1)7 i0 m. N3 L* e% `! T
y ' z* K8 c9 N% U1 i0 M; ]- g8 ]
l4 ?% n. e3 C) K$ \( H
, ~2 c) P7 k' t) o' B0 @ w . l2 O' v1 G. R0 O U) M* ?0 z% L7 O" ~3 [1 G6 E( ]9 r
: T5 q! T' w% C& D( h K
$ @2 j# W3 f9 [* n) _(Normalized DCG)nDCG @ k : = DCG@ k ∑ l = 1 min ( k , ∣ ∣ y ∣ ∣ 0 ) 1 log ( l + 1 ) \text{(Normalized DCG)} \text{nDCG}@k := \frac{\text{DCG@$k$}}{\sum_{l=1}^{\min(k,||\mathbf{y}||_0)} \frac{1}{\log(l+1)}}* ?3 G2 `! A6 X3 P4 U
(Normalized DCG)nDCG@k:= ' M( {/ [8 c- W
∑ $ ?. y+ l% }% Ul=1% V4 m: Y9 J0 I8 [4 |
min(k,∣∣y∣∣ . x0 H0 q4 E4 g4 H- G) A0 ( \7 L$ k- W$ ~1 [: W Y" b! P; t& y. q- H( @ z4 k
) 4 q* y' X6 p8 [3 J2 C. x5 Z' Y3 l ' v! x: A/ c# n" @' X ' f) T" R, E' P5 C+ g; K5 k6 _log(l+1): s$ k4 }% Q) ~' c' D. X2 v1 F1 _4 j
1) q9 B' [: `0 ]) F9 f0 V$ M1 G
~2 F/ Y; G0 \" ?/ K
) ?0 P6 b$ `" O
DCG@k 8 c: {2 G4 u1 a5 x( W/ ^. j$ z / W D" ]4 P% v+ i9 a7 W$ V7 u" H! L3 n
1 T9 t1 H; w7 x+ K! frank k ( y ) \text{rank}_k(\mathbf{y})rank " [ _- t: A+ S4 y! H4 A2 n
k ! p0 t' a. r6 K" O8 V d+ j$ ]4 |: \0 K# l
(y)为逆序排列y \mathbf{y}y的前k个下标。Note: DCG公式里的分母实际上不是l,而是from 1 to k.$ @) i- \+ F' d. C8 r) N
* Q3 z( i* x( B8 w5 u2 Y% ^靠后的标签按照对数比例地减小,说白了就是加权。至于为什么用log?两个事实:1. 平滑缩减; 2. Wang等人提供了理论支撑说明了log缩减方式的合理性。The authors show that for every pair of substantially different ranking functions, the nDCG can decide which one is better in a consistent manner. (看不懂,暂时不管)$ z- V* Q5 X( ?# a1 N6 l4 i1 ~! b
8 r4 w" p/ v: K5 c- L$ L
(2) Top-k kk Propensity-score:% U0 V- o4 Z8 D( [% } C3 n, n8 W
: T, r# Z% R; f- }* A: Y
有些数据集包含一些频度很高的标签(通常称之为head labels),可以通过简单地重复预测头部标签来实现高的P @ k \text{P}@kP@k。Propensity-score可以检查这种微不足道的行为。: P Z: e L9 H: m
( Propensity-score Precision ) PSP @ k : = 1 k ∑ l ∈ rank k ( y ^ ) y l p l (\text{Propensity-score Precision}) \text{ PSP}@k := \frac{1}{k} \sum_{l\in \text{rank}_k(\hat{\mathbf{y}})} \frac{\mathbf{y}_l}{p_l} / }5 G0 [7 o1 g, v(Propensity-score Precision) PSP@k:= ' T* w. \+ d" C+ t! e6 D* W( Wk; v* Y( T+ d' H( g
1 7 Y7 B( j+ p! o/ y. v4 u ! d% [# D v, w! E8 S) x5 ^* X' T$ D% H6 b7 X; } ?9 z5 O& M4 A+ Z9 c
l∈rank ) M8 \7 `) U$ A
k- m" T; f1 J! b) @3 l) _, X
' t; r7 _" J" E& a
( 7 l `! [/ L: S; o4 L
y + h) N3 ~. e C8 g. L( I^ 9 C1 K( a! f* |2 W$ ` . [% x# \1 _) ?) V$ i ) & b; {5 p7 r% G4 |7 O w& y∑. I' }, Z$ Q% E- E( S
$ x7 @3 m I m1 \2 M , U/ w8 t% [! K( A* Z, Mp " j; g0 p5 i/ t7 D: ^: N4 fl ( J0 w3 Q0 K- B; ?2 c( [! X* o- k/ H1 E8 N
$ F; R4 U2 _! n9 B. m' {) }: By - a: A& a6 e" z* Y. a& bl + c8 Z. m7 V1 {% o& Q* S( H6 [: Y3 H. R; H$ n' r
4 X e2 u5 k5 w% c* M8 ?7 F) o9 D8 e1 A
6 w. p- d/ i1 q4 L6 m% d
7 ~+ Q5 L8 h! z9 j' Y }6 p. y# ]) J
PSDCG @ k : = ∑ l ∈ rank k ( y ^ ) y l p l log ( l + 1 ) \text{PSDCG}@k := \sum_{l \in \text{rank}_k(\hat{\mathbf{y}})} \frac{\mathbf{y}_l}{p_l\log(l+1)}6 R6 K7 {. s3 Z5 `1 \! K) p$ e
PSDCG@k:= ) D) `# p! f7 H/ Kl∈rank % ]' [ k, o3 A2 U; I. F
k" O$ }4 }( `( y8 ^, Q1 K
4 _! j. n3 @+ g8 o/ n$ b
( 1 c5 D( A& ~0 e
y 5 x" M3 ]5 s( z^ 8 B! I6 Y9 r# ^6 P; _/ @: [) T4 G- f% H, {
) 3 w/ L8 C4 C6 S, K2 y∑0 S. q+ X9 \1 k4 t5 ?7 ~/ O
* k* J2 O0 \9 {- P: L0 @7 v
* S7 d+ D0 j* s3 `. S7 F3 l+ B# Hp ' X9 o8 Z( C3 N0 P
l O, L6 f! ]: E& E# \
, I: | c) {- X+ h) b log(l+1) 7 b, |6 ~) \* F: U+ Y& {y 6 ^5 j. l* ?7 k7 q8 Y* I$ z& U q
l 6 M0 w# q( l6 h% m0 \5 \7 `; j. n- N: u2 I9 I7 J