01モデル評価Model Evaluation
02評価指標Evaluation Metrics
03精度(Accuracy)Accuracy
04適合率(Precision)Precision
05再現率(Recall)Recall
06F値F-score / F-measure
07ROC-AUCROC-AUC (Receiver Operating Characteristic - Area Under Curve)
08PR-AUCPrecision-Recall AUC
09ログ損失(Log Loss)Log Loss / Cross-Entropy Loss
10MSEMean Squared Error
11RMSERoot Mean Squared Error
12MAEMean Absolute Error
13R2スコアR-squared / Coefficient of Determination
14決定係数Coefficient of Determination
15調整済みR2Adjusted R-squared
16MAPEMean Absolute Percentage Error
17混同行列Confusion Matrix
18分類レポートClassification Report
19マクロ平均/マイクロ平均Macro Average / Micro Average
20加重平均Weighted Average
21BLEUBilingual Evaluation Understudy
22ROUGERecall-Oriented Understudy for Gisting Evaluation
23METEORMetric for Evaluation of Translation with Explicit ORdering
24CIDErConsensus-based Image Description Evaluation
25BERTScoreBERTScore
26人間評価Human Evaluation
27Elo レーティングElo Rating
28Chatbot ArenaChatbot Arena
29MMLUMassive Multitask Language Understanding
30HellaSwagHellaSwag
31HumanEvalHumanEval
32GSM8KGrade School Math 8K
33ARCAI2 Reasoning Challenge
34TruthfulQATruthfulQA
35MT-BenchMT-Bench
36AlpacaEvalAlpacaEval
37LMSYSLarge Model Systems Organization
38SuperGLUESuper General Language Understanding Evaluation
39GLUEGeneral Language Understanding Evaluation
40ImageNet(ベンチマーク)ImageNet
41COCOCommon Objects in Context
42SQuADStanford Question Answering Dataset
43MLPerfMLPerf
44ベンチマーク汚染Benchmark Contamination
45リーダーボードLeaderboard
46交差検証Cross-Validation
47ブートストラップ信頼区間Bootstrap Confidence Interval
48統計的有意性Statistical Significance
49アブレーションスタディAblation Study
50ハイパーパラメータ感度分析Hyperparameter Sensitivity Analysis
51モデル比較Model Comparison
52ベースラインBaseline
53SOTA(State of the Art)State of the Art (SOTA)
54過適合検出Overfitting Detection
55学習曲線分析Learning Curve Analysis