Huawei cloud won the gold medal in the international authoritative competition WSDM cup 2020

recently, at the 13th International Conference on Web search and data mining (WSDM 2020), which closed in Houston, the United States, the joint team led by Huawei cloud won the gold medal in the task of Citation intention recognition in the WSDM cup 2020 competition

wsdm is known as one of the most influential and authoritative conferences in the field of global information retrieval. The conference focuses on search and data mining on social networks, especially on search and data mining models, algorithm design and analysis, industrial applications, and experimental analysis to improve accuracy and effect. This year is the 13th session of WSDM

this WSDM cup has three competition questions. The content of Huawei cloud golden competition questions is to identify the citation intention of papers:

the competition provides a paper library (about 800000 papers), and provides the quotation text description of the cited papers. Contestants need to match the three most relevant papers from the paper library according to the quotation description of the papers

in papers, the author often quotes other papers and makes corresponding descriptions of the cited papers. If we can automatically understand, recognize and describe the corresponding cited papers through the computer, we can not only deepen our understanding of the context of scientific research, but also make progress in the fields of scientific research knowledge map, scientific research automatic question answering system and automatic summary system

Huawei cloud voice semantic innovation lab led a joint team of students from South China University of technology, central China University of science and technology, Wuhan University and Jiangnan University to formulate an overall recall + rearrangement + integration scheme for this problem

in the overall recall stage:

through lightweight text similarity calculation methods (such as BM25, TFIDF, word2vec, etc.), it can retrieve the potentially relevant paper sets of a given query from a large-scale paper library at less computational cost

calculate and quote the similarity values of each of these candidate papers through a large amount of calculation but more accurate methods, and reorder them, such as using the pre training language model based on deep learning, Bert, etc. Huawei cloud team observed that the corpus given in the competition questions are all in the biomedical field, so it used the biobert and scibert language models based on the corpus of biomedicine and science for pre training to rearrange the papers

in the integration stage:

through the integration of the results of all models, three most relevant papers are finally obtained

the text matching technology used by Huawei cloud in this competition can be widely used in search, dialogue robot, knowledge map construction and other fields

with its high accumulation of full stack technical accuracy in the field of natural language processing, Huawei cloud has won several authoritative competitions in related fields

in October 2019, Huawei cloud won the digsci scientific data mining competition (academic paper search matching competition), with an accuracy rate of 5 percentage points higher than the second place

in the finals of the 2019 CCF big data and computing intelligence competition, Hua Weiyun won the champion of the financial entity level emotion analysis competition, reflecting his strength in the field of text emotion analysis and knowledge map

at present, Huawei cloud voice semantic related services have been successfully applied to government affairs, finance, oil and gas, medical treatment, automobile, logistics, insurance, e-commerce, taxation, media and other business areas with voice recognition, language understanding, knowledge management and other needs

