No question: competitions are popular, especially in the data science and machine learning space.
The advantages are clear: someone entering the domain can get feedback almost instantly, prices have yielded a high-multiple investment by participants, and new ideas and insights have been gained.
A closer look shows different types of competitions:
- Branded competitions: some of them are quite broad and come with a substantial award (examples: GE NFL $10 Million Head HealthChallenge, Heritage Health Data Analysis Prize -$3M), others are more narrowly focused on solving a specific problem (like the TGS Salt Identification Challenge, with $100K in prize money)
- Data mining/prediction competitions, which are mostly narrow in nature.
- Research competitions, such as the COCO Dataset with its specific tasks.
The leading platform for these competitions is Kaggle. There are others (DataDriven, Innocentive), but I will primarily look at Kaggle as the model.
The design of a competition is difficult: ideally one would like to see some insights that have a substantial impact. Even with competitions with lots of resources that does not always happen. Reviewing the above mentioned TGS Salt Identification challenge, the score of the 4th place entry is just 0.115% lower than the top submission, and 32nd place entry is still within 1% of the top submission. One great feature of Kaggle competitions is that the winner describes their solution in detail. I was not able to gain any insights from that: it described 3 training stages and the use of 1580 pseudo labels, but I did not learn how the team came up with these numbers.
So hopefully TGS got the insights that they were looking for, but it is not clear who else had a benefit from that solution and how their findings can be applied to a different domain.
Some of the most famous Kaggle competitions have shown to be very effective to expose larger audiences to the possibilities of machine learning and predictions. For example, over 10,000 teams have undertaken predicting survival on the Titanic.
There is good news: Kaggle will partner with organizations to host up to 5 pro-bono research competitions a year. You will find details under the link.
Over the last 12 months, there were 3 research competitions that were initiated by Google, one by DCASE (Detection and Classification of Acoustic Scenes and Events), four that came out of CVPR 2018 (Conference on Computer Vision and Pattern Recognition), three of these from the FGVC5 workshop, and one from workshops on autonomous driving. The conference had more than 6000 attendees and $2 millions in sponsorships. Here is a short overview:
|Google AI: Inclusive Images Challenge||$25,000||468||100/100|
|Google Landmark Recognition Challenge||$2,500||477||34/66|
|Google Landmark Retrieval Challenge||$2,500||209||34/66|
|Freesound General-Purpose Audio Tagging Challenge||$0||558||11/89|
|CVPR 2018 WAD Video Segmentation Challenge||$2,500||141||1/99|
|iNaturalist Challenge at FGVC5||$0||59||50/50|
|iMaterialist Challenge (Fashion) at FGVC5||$2,500||212||30/70|
|iMaterialist Challenge (Furniture) at FGVC5||$2,500||428||30/70|
As for the requirements, OMR Research should be in good shape. The offer of a cash price seems to be negotiable, and although there is a correlation between price and teams getting engaged, the Freesound competition attracted a lot of interest despite not having a cash price.
As for ‘Selection Criteria’, OMR Research would currently not do well, in terms of data availability, feasibility and evaluation metrics. It would be a good idea to let the goal of applying for a pro-bono research competition direct some of the future activities.