liblinearにおける solver_type と C の最適化
みんな大好きSVM の liblinear における solver_type と C ってどんな関係なの?
って思って、総当たりをしてみた。
参考:
“algorithm - SVM - hard or soft margins? - Stack Overflow”
http://stackoverflow.com/questions/4629505/svm-hard-or-soft-margins
solver_type | C | 教師データを再分離した時 | テスト用データを分離した時 | 備考 |
L2R_L2LOSS_SVC_DUAL | 1 | 92.23000% | 99.54700% | ディフォルトはこれ |
L2R_L2LOSS_SVC_DUAL | 10 | 95.81600% | 99.70800% | |
L2R_L2LOSS_SVC_DUAL | 50 | 91.13400% | 99.89100% | |
L2R_L2LOSS_SVC_DUAL | 100 | 96.37100% | 99.77600% | |
L2R_L2LOSS_SVC_DUAL | 200 | 96.27200% | 99.41000% | |
L2R_L2LOSS_SVC_DUAL | 500 | 94.97700% | 99.50200% | |
L2R_L2LOSS_SVC_DUAL | 1000 | 96.30000% | 99.67300% | |
L2R_L2LOSS_SVC_DUAL | 2000 | 96.49900% | 99.69600% | |
L2R_L2LOSS_SVC_DUAL | 5000 | 94.96200% | 99.79000% | |
L2R_L2LOSS_SVC_DUAL | 10000 | 96.51300% | 99.61100% | |
L2R_L2LOSS_SVC_DUAL | 20000 | 96.01500% | 99.72900% | |
L2R_LR | 1 | 94.87700% | 99.98300% | |
L2R_LR | 10 | 94.16500% | 99.97600% | |
L2R_LR | 50 | 95.07600% | 99.98800% | |
L2R_LR | 100 | 95.23300% | 99.98700% | |
L2R_LR | 200 | 94.84800% | 99.98700% | |
L2R_LR | 500 | 93.79500% | 99.96700% | |
L2R_LR | 1000 | 95.26100% | 99.98600% | |
L2R_LR | 2000 | 93.92300% | 99.96800% | |
L2R_LR | 5000 | 95.19000% | 99.98700% | |
L2R_LR | 10000 | 93.41100% | 99.95600% | |
L2R_LR | 20000 | 95.06200% | 99.98700% | |
L2R_L1LOSS_SVC_DUAL | 1 | 96.72700% | 99.89900% | |
L2R_L1LOSS_SVC_DUAL | 10 | 76.44800% | 97.03500% | |
L2R_L1LOSS_SVC_DUAL | 50 | 96.55600% | 99.60700% | |
L2R_L1LOSS_SVC_DUAL | 100 | 95.84500% | 99.49600% | |
L2R_L1LOSS_SVC_DUAL | 200 | 95.97300% | 99.49200% | |
L2R_L1LOSS_SVC_DUAL | 500 | 96.34300% | 99.75300% | |
L2R_L1LOSS_SVC_DUAL | 1000 | 91.88800% | 99.15400% | |
L2R_L1LOSS_SVC_DUAL | 2000 | 95.73100% | 99.71700% | |
L2R_L1LOSS_SVC_DUAL | 5000 | 95.29000% | 99.70800% | |
L2R_L1LOSS_SVC_DUAL | 10000 | 94.92000% | 99.39300% | |
L2R_L1LOSS_SVC_DUAL | 20000 | 94.43600% | 99.34000% | |
MCSVM_CS | 1 | 94.30800% | 97.25700% | 遅い |
MCSVM_CS | 10 | 94.12300% | 97.32500% | |
MCSVM_CS | 50 | 94.19400% | 97.76100% | |
MCSVM_CS | 100 | 94.90500% | 97.75100% | |
MCSVM_CS | 200 | 94.83400% | 97.64200% | |
MCSVM_CS | 500 | 94.72000% | 97.71000% | |
MCSVM_CS | 1000 | 94.80600% | 97.75700% | |
MCSVM_CS | 2000 | 94.30800% | 97.69900% | |
MCSVM_CS | 5000 | 94.27900% | 97.41300% | |
MCSVM_CS | 10000 | 92.58600% | 96.24100% | |
MCSVM_CS | 20000 | 94.13700% | 97.36300% | |
L2R_LR_DUAL | 1 | 87.89000% | 99.49500% | 遅い |
L2R_LR_DUAL | 10 | 96.21500% | 99.98200% | 遅い |
L2R_LR_DUAL | 50 | 95.75900% | 99.88300% | 遅い |
L2R_LR_DUAL | 100 | 93.58200% | 99.75600% | 遅い |
L2R_LR_DUAL | 200 | 96.25700% | 99.98100% | 遅い |
L2R_LR_DUAL | 500 | 96.51300% | 99.97900% | 遅い |
L2R_LR_DUAL | 1000 | 96.42800% | 99.91000% | 遅い |
L2R_LR_DUAL | 2000 | 96.64200% | 99.97400% | 遅い |
L2R_LR_DUAL | 5000 | 96.14300% | 99.95300% | 遅い |
L2R_LR_DUAL | 10000 | 95.95800% | 99.85000% | 遅い |
L2R_LR_DUAL | 20000 | 96.86900% | 99.92100% | 遅い |
L2R_L2LOSS_SVC | 1 | 95.09000% | 99.98900% | |
L2R_L2LOSS_SVC | 10 | 95.27500% | 99.99000% | |
L2R_L2LOSS_SVC | 50 | 95.09000% | 99.98900% | |
L2R_L2LOSS_SVC | 100 | 95.29000% | 99.99000% | |
L2R_L2LOSS_SVC | 200 | 95.10500% | 99.98900% | |
L2R_L2LOSS_SVC | 500 | 95.51700% | 99.99200% | |
L2R_L2LOSS_SVC | 1000 | 95.61700% | 99.99200% | |
L2R_L2LOSS_SVC | 2000 | 95.27500% | 99.99000% | |
L2R_L2LOSS_SVC | 5000 | 95.61700% | 99.99200% | |
L2R_L2LOSS_SVC | 10000 | 95.66000% | 99.99300% | |
L2R_L2LOSS_SVC | 20000 | 95.10500% | 99.98900% | |
L1R_L2LOSS_SVC | 1 | 99.53000% | 97.97700% | 遅い |
L1R_L2LOSS_SVC | 10 | 99.44500% | 94.34800% | 遅い |
L1R_L2LOSS_SVC | 50 | 99.53000% | 92.88800% | 遅い |
L1R_L2LOSS_SVC | 100 | 99.21700% | 92.46200% | 遅い |
L1R_L2LOSS_SVC | 200 | 99.38800% | 92.11000% | 遅い |
L1R_L2LOSS_SVC | 500 | 99.47300% | 91.71800% | 遅い |
L1R_L2LOSS_SVC | 1000 | 99.17500% | 91.41300% | 遅い |
L1R_L2LOSS_SVC | 2000 | 99.26000% | 91.43100% | 遅い |
L1R_L2LOSS_SVC | 5000 | 99.27400% | 91.16800% | 遅い |
L1R_L2LOSS_SVC | 10000 | 99.24600% | 90.83100% | 遅い |
L1R_L2LOSS_SVC | 20000 | 99.23200% | 90.94100% | 遅い |
L1R_LR | 1 | 98.32100% | 98.98600% | |
L1R_LR | 10 | 98.62000% | 93.87000% | |
L1R_LR | 50 | 98.60500% | 91.55500% | |
L1R_LR | 100 | 98.56300% | 90.86200% | |
L1R_LR | 200 | 98.64800% | 90.40100% | |
L1R_LR | 500 | 98.63400% | 90.25400% | |
L1R_LR | 1000 | 98.53400% | 90.04300% | |
L1R_LR | 2000 | 98.53400% | 89.17500% | |
L1R_LR | 5000 | 98.52000% | 88.11300% | |
L1R_LR | 10000 | 98.56300% | 87.63200% | |
L1R_LR | 20000 | 98.56300% | 86.70600% |
マルチコアで並列計算したけど、2日ぐらいかかってしまった。
ってゆーか、 L1R_L2LOSS_SVC と L2R_LR_DUAL 遅すぎ。こいつらの計算がかなーり大変だった。
データは、教師 6176件 テスト 285973件。 素性 5000件ぐらいのデータ。
クラスは 1 と 2 の2種類。
solver_type の違いで結構差が出た。
Cの値を変えた時は、影響を受ける solver_type と受けない solver_typeにわかれた。
なんで、こーなるのか、分かる人は教えて下さい。
僕はよくわかんないので、そこそこいい数字が出た L1R_L2LOSS_SVC と C 1 を使おうかなと思ってます。