2014 MCM 赛题评注补充之——ICM试题解析 今年美赛赛后,算是对数学中国会员的一个回馈,有兴趣花了点时间就AB题的相关要点进行了分析,并提出了主流的思路方法,也算是对今年比赛的一个总结,与大家一起讨论交流和提高,没想到反响甚好,有不少同学提出想要继续查看C题的相关评注内容,我本今年没有关注ICM的题目,但盛情难却,近日研读此题,写下此文,供同学们参考与批评。 此题继承了以往ICM试题的特点,内容较长,题目类型往往没有一个特定的学科可以涵盖,建立的模型通常用在非常实际的领域或者是跨学科领域,用却是同一个模型,同一套思路,体现了数学建模在交叉学科应用中巧妙而美丽的应用。本题题干较长,相关概念被反复提及,建议多阅读几遍,然后总结出题目要求,进而根据自己的经验积累建模,我仍然按照《2014 MCM 赛题评注与解析》中提到的题目信息标注法,对题目信息进行相关标注,然后给出参考思路。 2014 ICM Problem Using Networks to Measure Influence and Impact (bk1: theme of the whole) One of the techniques to determine influence of academic research is to build and measureproperties of citation or co-author networks. (bk1': supplementary explanation of restriction) Co-authoring a manuscriptusually connotes a stronginfluential connection between researchers. (bk1'': co-author'simportance) One of the most famous academicco-authors was the 20th-centurymathematician Paul Erdös who had over 500 co-authors and published over 1400 technical researchpapers. (bk2: introduction of Paul Erdös) It is ironic, or perhaps not, thatErdös is also one of the influencers in building the foundation for theemerging interdisciplinaryscience of networks, particularly, through his publication with Alfred Rényi of the paper “On Random Graphs” in 1959. (pro1: theirrationality of evaluation to Paul) Erdös’s role as a collaborator was so significant in the field of mathematics thatmathematicians often measure their closeness toErdös through analysis of Erdös’s amazingly large and robust co-author network (see the website http://www.oakland.edu/enp/). (pro1': the huge effect of Paul) The unusual and fascinating story of PaulErdös as a gifted mathematician, talented problem solver, and master collaborator is provided in many books and on-linewebsites (e.g., http://www-history.mcs.st-and.ac.uk/Biographies/Erdos.html).(pro1'': also thehuge effect of Paul) Perhaps his itinerant lifestyle,frequently staying with or residing with his collaborators, and giving much ofhis money to students asprizes for solving problems, enabled his co-authorships to flourish and helped build his astounding network of influencein several areas of mathematics. (trying to answer the question) In order to measure such influence as Erdös produced, there arenetwork-based evaluation tools that use co-author and citation data to determineimpact factor of researchers,publications, and journals. (bk3: evaluationmethod has existed) Some of these are Science Citation Index, Hfactor, Impact factor, Eigenfactor, etc. Google Scholar isalso a good data tool to use for networkinfluence or impact data collection and analysis. (bk3': examples of theevaluation method) Your team’s goal for ICM 2014 is toanalyze influence and impact in researchnetworks and other areas of society. (spms &msss) Your tasks to do this include: 1) Build the co-author network of the Erdos1 authors (you canuse the file from the website https://files.oakland.edu/users/grossman/enp/Erdos1.htmlor the one we include at Erdos1.htm ). (spm1: networkto be constructed) You should build a co-author network of the approximately 510 researchers from the file Erdos1,who coauthored a paper with Erdös,but do not include Erdös. (imp1: the network graph should not include the nodePaul) This will take some skilled data extraction and modeling efforts to obtain the correct set of nodes(the Erdös coauthors) and their links(connections with one another as co-authors). (imp2: nodes& links' meaning in the model has been presented) There are over 18,000 lines of rawdata in Erdos1 file, but many of them will not be used since they are links to people outside the Erdos1 network. (imp3: thefeature of the network) If necessary, you can limit the size of your network to analyze in order to calibrate yourinfluence measurement algorithm. (imp3': one wayto deal with the problem, also a characteristic) Once built, analyze the properties of this network. (Again, do not include Erdös --- he is the most influential and wouldbe connected to all nodes in the network.In this case, it’s co-authorship with him that builds the network, but he is not part of the network or the analysis.) (mss1: evaluatethe property of network, as part of spm1, also remove Paul himself to simplifyit) 2) Develop influence measure(s) to determine who in this Erdos1 network has significant influence within the network. (mss2: present oneof the result of the model, as part of spm1) Consider who has published important works or connects important researchers withinErdos1. Again, assume Erdös is not there toplay these roles. (mss2': present the result of the model) 3) Another type of influence measure might be to compare thesignificance of a research paper byanalyzing the important works that follow from its publication. (imp4: anotherway of modeling) Choose some set of foundational papers in the emerging field of networkscience either from theattached list (NetSciFoundation.pdf)or papers you discover. (imp4':information of the thought) Use these papers to analyze and develop a model to determinetheir relative influence. (spm2: new modelunder new conditions) Build the influence (coauthor or citation) networks and calculate appropriate measures for your analysis. (mss3: newevaluation way, part of spm2) Which of the papers in your set do you consider is the most influential in network scienceand why? (mss4: the result of spm2) Is there a similar way todetermine the role or influence measure of an individual network researcher? Consider how you would measure the role,influence, or impact of a specificuniversity, department, or a journal in network science? Discuss methodology to develop such measures and the datathat would need to be collected. (mss5: theprolongation & evaluation of spm2) 4) Implement your algorithm on a completely different set of networkinfluence data --- for instance,influential songwriters, music bands, performers, movie actors, directors, movies, TV shows, columnists, journalists,newspapers, magazines, novelists,novels, bloggers, tweeters, or any data set you care to analyze. (mss6: application of the network of spm1 & 2 toother field) You may wish to restrictthe network to a specific genre or geographic location or predetermined size. (imp5: somerestrictions of the model to simplify it) 5) Finally, discuss the science, understanding and utility of modelinginfluence and impact withinnetworks. (mss7: the discussion of the model) Could individuals, organizations, nations, and society use influence methodology to improve relationships,conduct business, and make wisedecisions? (mss7': theresult of the discussion) For instance, at the individual level, describe how you could use your measures and algorithms to choose who to try toco-author with in order to boost yourmathematical influence as rapidly as possible. (mss7'': instance toexplain it) Or how can you use your modelsand results to help decide on a graduate school or thesis advisor to select for your future academic work? (mss7''':another instance) 6) Write a report explaining your modeling methodology, your network-based influence and impact measures, and your progress and results for the previous five tasks. (final mission to combine all the results) The report must not exceed 20 pages (not including your summary sheet) and should present solid analysis of your network data; strengths, weaknesses, and sensitivity of your methodology; andthe power of modeling these phenomenausing network science. (imp6: allthe contents that should be included in the paper) *Your submission should consist of a 1 page SummarySheet and your solution cannot exceed 20 pages for a maximum of 21 pages. 仔细阅读完上面的题目级标注,可以看出本题一共有两个主体模型:1,2两问基于以Paul为中心的合作者网络进而评价除Paul以外的作者在一个学科的影响力的程度;3,4,5基于题目中间列出的文章列表为中心,构建评价网络,进而对网络里面的对象或者是他们的集合(比如文章和期刊,还有写了很多文章的作者)进行评价。在每个模型后面都有若干具体的问题,都是基于以上两个网络提出的,有的是构建评价方案,有的是要评价结果,还有的是模型数据的改变进而拓展等等。下文谈谈这两个网络的构建,再聊这些问题给我们建模的要求和暗示。 Model1: 数据显示了511个与Paul有过合作的作者与除Paul之外的作者合作的情况,共511个条目。按题目意思,我们需要建立图论模型,顶点就是这些作者,且如果他们之间有合作过,那么他们之间存在边。此图的顶点数本来为:1+511+X,1是Paul,X是那些度为1的点,在题中没有给出相关数据,如果全纳入,依题意将会有18000条边,所以建议的化简方法是只留下511顶点,1毫无疑问是度最高,Paul影响力也是最大的,X则只有一次合作,太次不用考虑。 数据提取方法大家可以尝试matlab或者Perl等语言的正则表达式匹配来提取相关信息。 网络建立好后,粗略来看,每个顶点的度就是对应作者的合作次数了,拿这个数就可以评价作者的影响力了。复杂点的方法可以是设第i个作者的得分是Si,有: Si =ΣSj(j取所有与i相邻的顶点 (这里只是提示,表达式可以更复杂些,如平方平均等等) 然后共511个Si值,511个方程,寻找一个定义域内的解就可以了。 (这里用到的是方程的思想来求得分值,脱胎于工程电路分析中间的基尔霍夫电流/电压定律,他在用方程思想求电压电流值!) 网络性质可以把顶点,边的个数,回路情况,稀疏状况,顶点度的分布列出来,注意一点,就是随时关注图的实际意义和图论理论概念之间的联系,有没有关联的涵义,可以直接应,发现了就是亮点! Model2: 网络的中心不再是Paul,而是几篇核心的文章,这时,新网络的中心就有多种选择了,可以是这几篇文章,也可以是这几篇文章中的作者,或者这几位作者已经形成的网络中心。而真正要评价的普通论文和作者却在这个网络之外,我们对他们的评价就基于这些论文和作者与核心网络的联系程度。这里计算得分的思路和上一个模型类似,要么直接求和计算,要么用方程思想,只是,这里涉及到核心网络和外层两层网络,个人觉得简化处理外层网络是良策,否则与前一问的模型区别不大。3问最后要求给出评论结果,并将模型拓展至其他评价,只要本模型搭建完整,这个部分水到渠成,换汤不换药。 在第4问中,用的仍然是本模型的网络思想,只是核心网络部分数据需要自己收集,另外,外围网络也一样,拓展面更大些了,但是模型仍然是一样的。 5问中提到,既然有了我们给出的网络以及对应的评价方法,那么如何顺应这个方法来提高自己的影响力呢?分析下我们的评价表达式就知道,尽量与已经得分高的作者合作,多与人合作都能够提高自身的提升排名的效率,进而帮助决策,在具体的模型中,这一部分还应该更加量化些。 第6问要求写一个总结报告,里面提到的点也正是本题建模关键点,重述一下: 1. 网络搭建方法:顶点代表什么,边以及其权重的涵义; 2. 基于网络的评价方案:评价数据必须能从网络中读取,基本的想法有两个,一个是直接计算,另外一个是方程思想解之; 3. 对前面题干中提到问题的回答,包括评分结果,发表文章策略等等。 本题按分类算评价问题,网络搭建是建模技术考察的重点,希望大家能从这两点中有所提高,谢谢大家关注本系列文章,再见!
|