The Leaderboard is available. Congratulations to all top performers!


Rank Team Organization BLEU@4 Meteor CIDEr-D ROUGE-L
1v2t_navigatorRUC & CMU0.4080.2820.4480.609
2AaltoAalto University0.3980.2690.4570.598
3VideoLABUML & Berkeley & UT-Austin0.3910.2770.4410.606
4ruc-uvaRUC & UVA & Zhejiang University0.3870.2690.4590.587
5Fudan-ILCFudan & ILC0.3870.2680.4190.595
6NUS-TJUNUS & TJU0.3710.2670.4100.590
7Umich-COGUniversity of Michigan0.3710.2660.4110.583
8MCG-ICT-CASICT-CAS0.3670.2640.4040.590
9DeepBrainNLPR_CASIA & IQIYI0.3820.2590.4010.582
10NTU MiRANTU0.3550.2610.3830.579
11NLPRMMCCASIA & Anhui University0.3480.2600.3750.575
12NTHU_VSLabNTHU0.3440.2600.3670.584
13NII-AISTNII-AIST & Tokyo & Tohoku0.3640.2570.3700.577
14MIC_TJUTongji University0.3450.2580.3500.575
15scorpioUniversity of Montreal0.3480.2510.3670.571
16KRUniversity of Rochester0.3280.2530.3640.564
17Shen&XuHefei University of Technology & USTC0.3140.2470.3380.555
18VRPGASUArizona State University0.2800.2540.2600.526
19AFRLAir Force Research Lab0.2890.2270.3380.504
20DaedalusAristotle University of Thessaloniki0.2690.1960.1270.505
21OceansDCD Lab, Zhejiang University0.1570.1960.1660.457
Rank Team Organization C1 C2 C3
1AaltoAalto University3.2633.1043.244
2v2t_navigatorRUC & CMU3.2613.0913.154
3VideoLABUML & Berkeley & UT-Austin3.2373.1093.143
4Fudan-ILCFudan & ILC3.1852.9992.979
5ruc-uvaRUC & UVA & Zhejiang University3.2252.9972.933
6Umich-COGUniversity of Michigan3.2472.8652.929
7NUS-TJUNUS & TJU3.3082.8332.893
8DeepBrainNLPR_CASIA & IQIYI3.2592.8782.892
9NLPRMMCCASIA & Anhui University3.2662.8682.893
10MCG-ICT-CASICT3.3392.8002.867
11KRUniversity of Rochester3.2922.8542.860
12NII-AISTNII-AIST & Tokyo & Tohoku3.2072.8962.865
13scorpioUniversity of Montreal3.2182.8482.880
14NTU MiRANTU3.2572.7842.864
15AFRLAir Force Research Lab3.1502.8492.852
16Shen&XuHefei University of Technology & USTC3.2092.7432.802
17NTHU_VSLabNTHU3.1922.7482.811
18VRPGASUArizona State University3.3582.5842.742
19MIC_TJUTongji University3.1892.6502.743
20DaedalusAristotle University of Thessaloniki3.0742.4732.629
21OceansDCD Lab, Zhejiang University3.0912.3972.556

Metrics


We computed multiple common metrics, including BLEU@4, METEOR, ROUGE-L, and CIDEr-D. The performances of the primary run from each team are measured for comparison across teams. The results of all runs can be downloaded here.

In addition, we will carry out the human evaluation of the systems submitted to this challenge on a subset of the testing set. Human were asked to rank the generated sentences of the primary run from each team and a reference sentence from 1 to 5 (lower - better) with respect to the following criteria.

      ·    Coherence:   judge the logic and readability of the sentence.

      ·    Relevance:   whether the sentence contains the more relevant and important objects/actions/events in the video clip?

      ·    Helpful for blind (additional criteria):   how helpful would the sentence be for a blind person to understand what is happening in this video clip?


M1 BLEU@4, METEOR, ROUGE-L, and CIDEr-D
M2 Human evaluation of the captions in terms of Coherence, Relevance, and helpful for blind on a scale of 1-5 (lower - better)

Ranking


The ranking for the competition is based on the results from M1 and M2, respectively. Specifically, a rank list of teams is produced by sorting their scores on each M1 evaluation metric, respectively. The final rank of a team is measured by combining its ranking positions in the four ranking list and defined as:

R(team) = R(team)@BLEU@4 + R(team)@METEOR + R(team)@ROUGE-L + R(team)@CIDEr-D.

where R(team) is the rank position of the team, e.g., if the team achieves the best performance in terms of BLEU@4, thenR(team)@BLEU@4 is "1". The smaller the final ranking, the better the performance.

Similar in spirit, we will linearly fuse the scores of human evaluation on Coherence, Relevance and Helpful for Blind (in a scale of 1-5) for each team. The final score of each team is given by:

S(team) = S(team)@Coherence + S(team)@Relevance + S(team)@Helpful for Blind.

The larger the score, the better the performance.

We finally rank all the participants in two separate lists, one in terms of R(team) and the other S(team).


M1 R(team) = R(team)@BLEU@4 + R(team)@METEOR + R(team)@ROUGE-L + R(team)@CIDEr-D
M2 S(team) = S(team)@Coherence + S(team)@Relevance + S(team)@Helpful for Blind