Farseerfc的小窝 - icse//farseerfc.me/zhs/2012-06-06T10:42:00+09:00ICSE 20122012-06-06T10:42:00+09:002012-06-06T10:42:00+09:00farseerfctag:farseerfc.me,2012-06-06:/zhs/icse2012.html
<div class="section" id="june-6">
<h2><a class="toc-backref" href="#id1">June 6</a></h2>
<!-- PELICAN_BEGIN_SUMMARY -->
<div class="section" id="keynote-1">
<h3><a class="toc-backref" href="#id2">Keynote 1</a></h3>
<p>没怎么听懂,只记得讲到了finance is not money但是没听懂这个和软件有什么关系。</p>
</div>
<div class="section" id="cost-estimation-for-distributed-software-project">
<h3><a class="toc-backref" href="#id3">Cost Estimation for Distributed Software Project</a></h3>
<p>讲到他们试图改善现有的模型去更精确地评估软件开发的开销。</p>
<p>他们会给PM建议之前的项目的历史数据,然后对于新项目,他们建议历史上已有
的项目的数据,从而帮助PM得到更精确的评估。他们试图尽量减少项目评估对PM
的经验的需求,从而帮助即使经验很少的PM也能准确评估项目的开销。</p>
<!-- PELICAN_END_SUMMARY -->
<p>他们的观点:</p>
<blockquote>
<p>Context-specfic solutions needed!</p>
<p>我们需要更上下文相关的解决方案!</p>
<p>Early user paticipation is key!</p>
<p>早期用户的参与是关键</p>
</blockquote>
</div>
<div class="section" id="characterizing-logging-practices-in-open-source-software">
<h3><a class="toc-backref" href="#id4">Characterizing Logging Practices in Open-Source Software</a></h3>
<p>Common mistakes in logging messages</p>
<p>在日志记录中容易犯的错误</p>
<p>他们学习了历史上的log记录,然后试图找到重复修改的输出log的语句,确定log …</p></div></div>
<div class="section" id="june-6">
<h2><a class="toc-backref" href="#id1">June 6</a></h2>
<!-- PELICAN_BEGIN_SUMMARY -->
<div class="section" id="keynote-1">
<h3><a class="toc-backref" href="#id2">Keynote 1</a></h3>
<p>没怎么听懂,只记得讲到了finance is not money但是没听懂这个和软件有什么关系。</p>
</div>
<div class="section" id="cost-estimation-for-distributed-software-project">
<h3><a class="toc-backref" href="#id3">Cost Estimation for Distributed Software Project</a></h3>
<p>讲到他们试图改善现有的模型去更精确地评估软件开发的开销。</p>
<p>他们会给PM建议之前的项目的历史数据,然后对于新项目,他们建议历史上已有
的项目的数据,从而帮助PM得到更精确的评估。他们试图尽量减少项目评估对PM
的经验的需求,从而帮助即使经验很少的PM也能准确评估项目的开销。</p>
<!-- PELICAN_END_SUMMARY -->
<p>他们的观点:</p>
<blockquote>
<p>Context-specfic solutions needed!</p>
<p>我们需要更上下文相关的解决方案!</p>
<p>Early user paticipation is key!</p>
<p>早期用户的参与是关键</p>
</blockquote>
</div>
<div class="section" id="characterizing-logging-practices-in-open-source-software">
<h3><a class="toc-backref" href="#id4">Characterizing Logging Practices in Open-Source Software</a></h3>
<p>Common mistakes in logging messages</p>
<p>在日志记录中容易犯的错误</p>
<p>他们学习了历史上的log记录,然后试图找到重复修改的输出log的语句,确定log
中存在的问题。他们首先确定修改是事后修改。</p>
<p>通常的修改的比例(9027个修改)</p>
<table border="0" class="table docutils borderless">
<colgroup>
<col width="10%"/>
<col width="90%"/>
</colgroup>
<tbody valign="top">
<tr><td>45%</td>
<td>静态文本</td>
</tr>
<tr><td>27%</td>
<td>打印出的变量</td>
</tr>
<tr><td>26%</td>
<td>调试等级verbosity</td>
</tr>
<tr><td>2%</td>
<td>日志输出的位置</td>
</tr>
</tbody>
</table>
<p>他们发现有调试等级的变化,是因为安全漏洞之类的原因,或者在开销和数据
之间的权衡。</p>
<p>大多数对log的变量的修改都是为了增加一个参数。他们之前的LogEnhancer是为了
解决这个问题而提出的,通过静态检查,提醒程序员是否忘记了某个参数</p>
<p>对text的修改是因为要改掉过时的代码信息,避免误导用户。</p>
<p>他们的实验是采用了基于code clone 的技术,找到所有log语句,然后找不一致
的clone,然后自动提出建议。</p>
</div>
<div class="section" id="combine-functional-and-imperative-pgrm-for-multicore-sw-scala-java">
<h3><a class="toc-backref" href="#id5">Combine Functional and Imperative Pgrm for Multicore Sw: Scala & Java</a></h3>
<p>趋势:到处都是多核,但是并发程序呢?</p>
<p>他们研究的对象是Scala和Java,因为可以编译后确认JVM字节码的语义。</p>
<ul class="simple">
<li><dl class="first docutils">
<dt>Java:</dt>
<dd><ul class="first last">
<li>共享内存</li>
<li>显示创建的线程</li>
<li>手动同步</li>
<li>Wait/Notify机制</li>
</ul>
</dd>
</dl>
</li>
<li><dl class="first docutils">
<dt>Scala:</dt>
<dd><ul class="first last">
<li>高阶函数</li>
<li>Actors, 消息传递</li>
<li>lists, filters, iterators</li>
<li>while</li>
<li>共享状态, OO</li>
<li>import java.* 能从java导入任何库</li>
<li>auto type inferance 自动类型推导</li>
</ul>
</dd>
</dl>
</li>
</ul>
<p>实验的参与者都经过4周的训练,实验项目是工业等级的开发项目</p>
<p>结果:</p>
<p>scala 的项目平均比java多花38%的时间,主要都是花在Test和debug上的时间。</p>
<p>程序员的经验和总体时间相关,但是对test和debug没有显著影响。</p>
<p>scala的为了让编程更有效率的设计,导致debug更困难。比如类型推导,debug
的时候需要手动推导,来理解正在发生什么。</p>
<p>scala的程序比java小,中位数2.6%,平均15.2%</p>
<ul class="simple">
<li><dl class="first docutils">
<dt>性能比较:</dt>
<dd><ul class="first last">
<li>单核:scala的线性程序的性能比java好</li>
<li><dl class="first docutils">
<dt>4核:</dt>
<dd><ul class="first last">
<li>scala 7s @ 4 threads</li>
<li>java 4si @ 8 threads</li>
<li><dl class="first docutils">
<dt>median</dt>
<dd><ul class="first last">
<li>83s scala</li>
<li>98s java</li>
</ul>
</dd>
</dl>
</li>
</ul>
</dd>
</dl>
</li>
<li>32core: best scala 34s @ 64 threads</li>
</ul>
</dd>
</dl>
</li>
<li><dl class="first docutils">
<dt>结论</dt>
<dd><ul class="first last">
<li>java有更好的scalability</li>
</ul>
</dd>
</dl>
</li>
<li><dl class="first docutils">
<dt>scala类型推导</dt>
<dd><ul class="first last">
<li>45%说对携带码有帮助</li>
<li>85%说导致程序错误</li>
</ul>
</dd>
</dl>
</li>
<li><dl class="first docutils">
<dt>调试</dt>
<dd><ul class="first last">
<li>23%认为scala简单</li>
<li>77%认为java简单</li>
</ul>
</dd>
</dl>
</li>
</ul>
<p>multi-paradigram are better</p>
</div>
<div class="section" id="sound-empirical-evidence-in-software-testing">
<h3><a class="toc-backref" href="#id6">Sound Empirical Evidence in Software Testing</a></h3>
<p>Test data generation 测试数据自动生成</p>
<p>Large Empirical Studies - not always possible</p>
<p>For open source software - big enough</p>
</div>
<div class="section" id="identifing-linux-bug-fixing-patch">
<h3><a class="toc-backref" href="#id7">Identifing Linux Bug Fixing Patch</a></h3>
<ul class="simple">
<li><dl class="first docutils">
<dt>current practice:</dt>
<dd><ul class="first last">
<li>manual</li>
</ul>
</dd>
</dl>
</li>
<li><dl class="first docutils">
<dt>Current research:</dt>
<dd><ul class="first last">
<li>keywords in commits</li>
<li>link bug reports in bugzilla</li>
</ul>
</dd>
</dl>
</li>
</ul>
<p>Try to solve classification problem</p>
<ul class="simple">
<li><dl class="first docutils">
<dt>issue</dt>
<dd><ul class="first last">
<li>pre-identified</li>
<li>post-identified</li>
</ul>
</dd>
</dl>
</li>
<li><dl class="first docutils">
<dt>data</dt>
<dd><ul class="first last">
<li>from commit log</li>
</ul>
</dd>
</dl>
</li>
<li><dl class="first docutils">
<dt>feature extraction</dt>
<dd><ul class="first last">
<li>text pre-process stemmed non-stop words</li>
</ul>
</dd>
</dl>
</li>
<li>model learning</li>
</ul>
<p>research questions</p>
</div>
<div class="section" id="active-refinement-of-clone-anomaly-reports">
<h3><a class="toc-backref" href="#id8">Active Refinement of Clone Anomaly Reports</a></h3>
<p>motivating</p>
<ul class="simple">
<li>code clones, clone groups</li>
<li>clone used to detect bugs</li>
<li>anomaly : inconsistent clone group
many anomaly clone are note bug, high false positive</li>
</ul>
<dl class="docutils">
<dt>approach</dt>
<dd><ul class="first last simple">
<li>reorder by sorted bug reports</li>
</ul>
</dd>
</dl>
</div>
</div>
<hr class="docutils"/>
<div class="section" id="june7">
<h2><a class="toc-backref" href="#id9">June7</a></h2>
<div class="section" id="keynotes-2-sustainability-with-software-an-industrial-perspective">
<h3><a class="toc-backref" href="#id10">Keynotes 2: Sustainability with Software - An Industrial Perspective</a></h3>
<p>Sustainability</p>
<ul class="simple">
<li><dl class="first docutils">
<dt>Classic View: Idenpendent view with overlap</dt>
<dd><ul class="first last">
<li>Social</li>
<li>Environment</li>
<li>Economic</li>
</ul>
</dd>
</dl>
</li>
<li><dl class="first docutils">
<dt>Nested viw</dt>
<dd><ul class="first last">
<li><dl class="first docutils">
<dt>Environment</dt>
<dd><ul class="first last">
<li><dl class="first docutils">
<dt>Social</dt>
<dd><ul class="first last">
<li>Economic</li>
</ul>
</dd>
</dl>
</li>
</ul>
</dd>
</dl>
</li>
</ul>
</dd>
</dl>
</li>
</ul>
<dl class="docutils">
<dt>Triple bottom line</dt>
<dd><ul class="first last simple">
<li><dl class="first docutils">
<dt>economic</dt>
<dd>-global business, networks , global econ</dd>
</dl>
</li>
<li><dl class="first docutils">
<dt>env</dt>
<dd><ul class="first last">
<li>natural res, climate change, population grow</li>
</ul>
</dd>
</dl>
</li>
<li><dl class="first docutils">
<dt>social</dt>
<dd><ul class="first last">
<li>awareness, connectivity, accountability</li>
</ul>
</dd>
</dl>
</li>
</ul>
</dd>
</dl>
<div class="section" id="green-it">
<h4><a class="toc-backref" href="#id11">Green IT</a></h4>
<ul class="simple">
<li><dl class="first docutils">
<dt>reduce IT energy</dt>
<dd><ul class="first last">
<li>more than 50% cooling - doing nothing</li>
</ul>
</dd>
</dl>
</li>
<li><dl class="first docutils">
<dt>mini e-waste: not properly recycled</dt>
<dd><ul class="first last">
<li>80% in EU</li>
<li>75% in US</li>
</ul>
</dd>
</dl>
</li>
<li>foster dematerialization</li>
</ul>
<p>In-Memory Technology: Expected Sustainable Benefits</p>
</div>
<div class="section" id="what-can-we-do">
<h4><a class="toc-backref" href="#id12">What can we do?</a></h4>
<blockquote>
<ul class="simple">
<li>consider all software lifecycle phases in your design</li>
<li>avoid energy expensive behavior in your codes</li>
<li>design lean architectures</li>
</ul>
</blockquote>
</div>
<div class="section" id="green-by-it">
<h4><a class="toc-backref" href="#id13">Green by IT</a></h4>
<blockquote>
<ul class="simple">
<li>2% green IT</li>
<li>98% green IT</li>
</ul>
</blockquote>
</div>
</div>
<div class="section" id="on-how-often-code-is-cloned-across-repositories">
<h3><a class="toc-backref" href="#id14">On How Often code is cloned across repositories</a></h3>
<p>Line based hashing code clone detection</p>
<p>never do anything harder than sorting</p>
<p>hashing a window of 5 lines of normalized (tokenized) code, dropping
3/4 of the hashing</p>
<p>把ccfinder一个月的工作缩短到了3, 4天。没有比较presion和recall。</p>
<table border="0" class="table docutils borderless">
<colgroup>
<col width="11%"/>
<col width="89%"/>
</colgroup>
<tbody valign="top">
<tr><td>14%</td>
<td>type1</td>
</tr>
<tr><td>16%</td>
<td>type2</td>
</tr>
<tr><td>17%</td>
<td>type3 (not really type2)</td>
</tr>
</tbody>
</table>
</div>
<div class="section" id="graph-based-analysis-and-prediction-for-sw-evolution">
<h3><a class="toc-backref" href="#id15">Graph-based analysis and prediction for sw evolution</a></h3>
<div class="section" id="graph-are-everywhere">
<h4><a class="toc-backref" href="#id16">graph are everywhere</a></h4>
<ul class="simple">
<li>internet topology</li>
<li>social net</li>
<li>chemistry</li>
<li>biology</li>
</ul>
<p>in sw
- func call graph
- module dependency graph</p>
<p>developer interaction graph
- commit logs
- bug reports</p>
<p>experiment 11 oss, 27~171 release, > 9 years</p>
</div>
<div class="section" id="predictors">
<h4><a class="toc-backref" href="#id17">predictors</a></h4>
<ul class="simple">
<li><dl class="first docutils">
<dt>NodeRank</dt>
<dd><ul class="first last">
<li>similar to pagerank of google</li>
<li>measure relative importance of each node</li>
<li><dl class="first docutils">
<dt>func call graph with noderank</dt>
<dd><ul class="first last">
<li>compare rank with severity scale on bugzilla</li>
</ul>
</dd>
</dl>
</li>
<li><dl class="first docutils">
<dt>correlation between noderank and BugSeverity</dt>
<dd><ul class="first last">
<li>func level 0.48 ~ 0.86 varies among projects.</li>
<li>model level > func level</li>
</ul>
</dd>
</dl>
</li>
</ul>
</dd>
</dl>
</li>
<li><dl class="first docutils">
<dt>ModularityRatio</dt>
<dd><ul class="first last">
<li>cohesion/coupling ratio: IntraDep(M)/InterDep(M)</li>
<li>forecast mantencance effort</li>
<li><dl class="first docutils">
<dt>use for</dt>
<dd><ul class="first last">
<li>identify modules that need redesign or refactoring</li>
</ul>
</dd>
</dl>
</li>
</ul>
</dd>
</dl>
</li>
<li><dl class="first docutils">
<dt>EditDistance</dt>
<dd><ul class="first last">
<li>bug-based developer collaboration graphs</li>
<li>ED(G1,G2)=|V1|+|V2|-2|V1交V2|+|E1|+|E2|-2|E1交E2|</li>
<li><dl class="first docutils">
<dt>use for</dt>
<dd><ul class="first last">
<li>release planning</li>
<li>resource allocation</li>
</ul>
</dd>
</dl>
</li>
</ul>
</dd>
</dl>
</li>
</ul>
<p>graph metrics</p>
<ul class="simple">
<li><dl class="first docutils">
<dt>graph diameter</dt>
<dd><ul class="first last">
<li>average node degree indicates reuse</li>
</ul>
</dd>
</dl>
</li>
<li>clustering coefficient</li>
<li>assortativity</li>
<li>num of cycles</li>
</ul>
</div>
<div class="section" id="conclusion">
<h4><a class="toc-backref" href="#id18">Conclusion</a></h4>
<p>"Actionable intelligence" from graph evolution</p>
<ul class="simple">
<li>studie 11 large long-live projs</li>
<li>predictors</li>
<li>identify pivotal moments in evolution</li>
</ul>
</div>
</div>
<div class="section" id="what-make-long-term-contributors-willingness-and-opportunity-in-oss">
<h3><a class="toc-backref" href="#id19">What make long term contributors: willingness and opportunity in OSS</a></h3>
<p>OSS don't work without contributors form community</p>
<p>mozilla (2000-2008)</p>
<p>10^2.2 LTC <- 2 order -> 10^4.2 new contributors <- 3.5 order -> 10^7.7 users</p>
<p>gnome (1999-2007)</p>
<p>10^2.5 LTC <- 1.5 order -> 10^4.0 new contributors <- 3.5 order -> 10^6.5 users</p>
<div class="section" id="approach">
<h4><a class="toc-backref" href="#id20">approach</a></h4>
<ul class="simple">
<li>read issues of 20 LTC and 20 non-LTC</li>
<li>suvery 56 (36 non-LTC and 20 LTC)</li>
<li>extract practices published on project web sites</li>
</ul>
</div>
<div class="section" id="summeray">
<h4><a class="toc-backref" href="#id21">summeray</a></h4>
<ul class="simple">
<li>Ability/Willingness distinguishes LTCs</li>
<li><dl class="first docutils">
<dt>Environment</dt>
<dd><ul class="first last">
<li><dl class="first docutils">
<dt>macro-climate</dt>
<dd><ul class="first last">
<li>popularity</li>
</ul>
</dd>
</dl>
</li>
<li><dl class="first docutils">
<dt>micro-climate</dt>
<dd><ul class="first last">
<li>attention</li>
<li>bumber of peers</li>
<li>performance of peers</li>
</ul>
</dd>
</dl>
</li>
</ul>
</dd>
</dl>
</li>
</ul>
<p>regression model</p>
<p>newcomers to LTC conversion drops</p>
<dl class="docutils">
<dt>actions in first month predicts LTCs</dt>
<dd><ul class="first last simple">
<li>24% recall</li>
<li>37% precision</li>
</ul>
</dd>
</dl>
</div>
</div>
<div class="section" id="develop-of-auxiliary-functions-should-you-be-agile">
<h3><a class="toc-backref" href="#id22">develop of auxiliary functions: should you be agile?</a></h3>
<p>a empirial assessment of pair programming and test-first programming</p>
<p>can agile help auxiliary functions?</p>
<div class="section" id="experiment">
<h4><a class="toc-backref" href="#id23">experiment</a></h4>
<ul class="simple">
<li>pair vs solo</li>
<li>test-first vs test-last</li>
<li>students vs professors</li>
</ul>
</div>
<div class="section" id="research-questions">
<h4><a class="toc-backref" href="#id24">research questions</a></h4>
<ul class="simple">
<li>r1: can pair help obtain more correct impl</li>
<li>r2: can test-first</li>
<li>r3: dst test1 encourage the impl or more test cases?</li>
<li>r4: does test1 course more coverage</li>
</ul>
</div>
<div class="section" id="result">
<h4><a class="toc-backref" href="#id25">result</a></h4>
<ul class="simple">
<li><dl class="first docutils">
<dt>test-first</dt>
<dd><ul class="first last">
<li>higher coverage</li>
<li>non change with correctness</li>
</ul>
</dd>
</dl>
</li>
<li><dl class="first docutils">
<dt>pair</dt>
<dd><ul class="first last">
<li>improve on correctness</li>
<li>longer total programming time</li>
</ul>
</dd>
</dl>
</li>
</ul>
</div>
</div>
<div class="section" id="static-detection-of-resource-contention-problems-in-server-side-script">
<h3><a class="toc-backref" href="#id26">Static Detection of Resource Contention Problems in Server-side script</a></h3>
<p>Addressed the race condition of accessing database or filesystem of PHP</p>
</div>
<div class="section" id="amplifying-tests-to-validate-exception-handling-code">
<h3><a class="toc-backref" href="#id27">Amplifying Tests to Validate Exception Handling Code</a></h3>
<p>异常处理的代码不但难写,而且难以验证。各种组合情况难以估计,尤其是手机
系统上。</p>
</div>
<div class="section" id="a-tactic-centric-approach-automating-traceability-of-quality-concerns">
<h3><a class="toc-backref" href="#id28">A tactic-centric approach automating traceability of quality concerns</a></h3>
<p>tactic traceability information models</p>
</div>
</div>
MSR 2012 @ ICSE2012-06-02T10:42:00+09:002012-06-02T10:42:00+09:00farseerfctag:farseerfc.me,2012-06-02:/zhs/msr2012.html
<div class="section" id="mining-software-repository-2012-icse">
<h2><a class="toc-backref" href="#id3">Mining Software Repository 2012 @ ICSE</a></h2>
<p>参加了今年的MSR,会场在University of Zurich。一大早来到大学,注册有点
小插曲,显然瑞士人搞不清楚中国人的名字,3个杨(Yang)姓的中国人的名牌
被搞错了。然后堀田学长的所属被写作了“Japan, Japan”,成为了全日本的代表。</p>
<div class="section" id="msr-microsoft-research-talk-msr-mining-software-repositories">
<h3><a class="toc-backref" href="#id4">MSR(MicroSoft Research) talk @ MSR(Mining Software Repositories)</a></h3>
<p>首先是来自微软亚洲研究院(MicroSoft Research @ Asia, MSR Asia)的Keynots,
于是就变成了MSR在MSR的演讲。MSR的张冬梅(Dongmei Zhang)女士的演讲
分为关于Software Analysis和XIAO的两部分。XIAO是MSRA开发的Code Clone
Detector,似乎我要给井上研做的就是这个。想更多了解Xiao的细节,不过张女士
演讲结束的时候的鼓掌导致了话筒的小故障 …</p></div></div>
<div class="section" id="mining-software-repository-2012-icse">
<h2><a class="toc-backref" href="#id3">Mining Software Repository 2012 @ ICSE</a></h2>
<p>参加了今年的MSR,会场在University of Zurich。一大早来到大学,注册有点
小插曲,显然瑞士人搞不清楚中国人的名字,3个杨(Yang)姓的中国人的名牌
被搞错了。然后堀田学长的所属被写作了“Japan, Japan”,成为了全日本的代表。</p>
<div class="section" id="msr-microsoft-research-talk-msr-mining-software-repositories">
<h3><a class="toc-backref" href="#id4">MSR(MicroSoft Research) talk @ MSR(Mining Software Repositories)</a></h3>
<p>首先是来自微软亚洲研究院(MicroSoft Research @ Asia, MSR Asia)的Keynots,
于是就变成了MSR在MSR的演讲。MSR的张冬梅(Dongmei Zhang)女士的演讲
分为关于Software Analysis和XIAO的两部分。XIAO是MSRA开发的Code Clone
Detector,似乎我要给井上研做的就是这个。想更多了解Xiao的细节,不过张女士
演讲结束的时候的鼓掌导致了话筒的小故障。</p>
</div>
<div class="section" id="towards-improving-bts-with-game-mechanisms">
<h3><a class="toc-backref" href="#id5">Towards Improving BTS with Game Mechanisms</a></h3>
<p>感觉这篇的内容基本上就是关于</p>
<p><a class="reference external" href="http://www.joelonsoftware.com/items/2008/09/15.html">http://www.joelonsoftware.com/items/2008/09/15.html</a></p>
<p>这里写到的东西,然后说同样的理论是否可以用于Issue Tracking之类的事情上。
个人感觉这个意义不大,stackoverflow之所以成功是因为它把开源社区本身就
具有的名誉体系具现化了,本着大家都喜欢被别人奉为大牛的心态,就如同
wikipedia一样。同样的理论如果用于公司内部的Issue Tracking系统上,会得到
完全不同的东西吧。就像MSDN的组织方式虽然和wikipedia是一样的,但是在MSDN
里找信息的感觉和在wikipedia完全不一样。个人不太看好这个方向。</p>
</div>
<div class="section" id="ghtorrent">
<h3><a class="toc-backref" href="#id6">GHTorrent</a></h3>
<p>这篇的slide在这里可以看到:<a class="reference external" href="http://www.slideshare.net/gousiosg/ghtorrent-githubs-data-from-a-firehose-13184524">http://www.slideshare.net/gousiosg/ghtorrent-githubs-data-from-a-firehose-13184524</a></p>
<p>Data exporter for github. Github的主要数据,代码,已经可以通过git接口
获得了,wiki是git的形式保存的。所以这个项目的目的就是暴露别的数据,主要
是issue tracking,code comments,这种。代码访问github api,然后用分布式
实现以克服api的限制,然后提供torrents形式的history下载。github api获得
的json数据以bson的形式保存在MongoDB里,解析过的有了Schema之后的数据保存
在MySQL里并可以导出SQL。</p>
<p>个人的想法,觉得数据如果能够更统一,全部存在Git里或许更好,像Wiki一样。
同样是要暴露全部历史记录的目的,用Torrent自己实现的历史远不如用Git的
接口实现的历史记录方便吧,git blame之类的也更方便追踪code comment之类的
作者信息。当然对git的raw date直接读写,需要对git的内部原理有足够的理解,
或许只有github的人有这种能力了。</p>
</div>
<div class="section" id="topic-mining">
<h3><a class="toc-backref" href="#id7">Topic Mining</a></h3>
<p>用得两个参数, DE 和 AIC,完全不能理解,过后研究。实验针对了Firefox,
Mylyn, Eclipse三个软件。试图从Repo中分析源代码的identifier和comments,
找到topic和bug之间的关系,比如怎样的topic更容易导致bug。得出的结论似乎
也很暧昧,只是说核心功能被报告的bug更多,但是不知道原因。这只能表示核心
功能受到更多关注和更多测试吧,并不能说明核心功能就容易产生bug。</p>
<p>不过这个的Slide做得很漂亮,很容易理解。</p>
</div>
<div class="section" id="secold">
<h3><a class="toc-backref" href="#id8">SeCold</a></h3>
<p>A linked data platform for mining software repositories</p>
<p>没听懂这个项目的目的。</p>
</div>
<div class="section" id="the-evolution-of-software">
<h3><a class="toc-backref" href="#id9">The evolution of software</a></h3>
<p>第二天的Keynotes,关于将Social Media和Software Development相结合的想法。
或许就是Github赖以成功的基础。讲到代码中的comment, Tags, uBlog, blog之类
的social的特性和IDE的融合的趋势。</p>
</div>
<div class="section" id="do-faster-releases-imporve-software-quality">
<h3><a class="toc-backref" href="#id10">Do Faster Releases Imporve Software Quality?</a></h3>
<p>使用Firefox作为例子。</p>
<p>结论是快速发布导致bug更多,更容易crash,但是bug更快得到修复,并且用户
更快转向新的发布。</p>
</div>
<div class="section" id="security-vs-performance-bugs-in-firefox">
<h3><a class="toc-backref" href="#id11">Security vs Performance Bugs in Firefox</a></h3>
<p>Performance bugs are regression, blocks release.</p>
</div>
<hr class="docutils"/>
<div class="section" id="id1">
<h3><a class="toc-backref" href="#id12">一些感想</a></h3>
<div class="section" id="commit">
<h4><a class="toc-backref" href="#id13">基于自然语义分析的commit分割</a></h4>
<p>经常工具(比如git)的使用者并没有按照工具设计者的意图使用工具,这给MSR
带来很多困难。举个例子,git有非常完美的branch系统,通常期望git的使用者
能够在一次commit里commit一个功能,比如一个bug的修复,或者一个feature的
添加,但是事实上经常有很多逻辑上的commit被合并在一个里面了。</p>
<p>或许这不是使用者的错,而是工具仍然不够人性的表现。或许我们可以自动把
一次的commit按照语义分割成多个。</p>
<p>分割之后,可以更容易地把issue和commit关联,也更容易组织更多的研究。</p>
</div>
<div class="section" id="slides">
<h4><a class="toc-backref" href="#id14">关于这次发表中大家用的slides系统</a></h4>
<p>题目为``Incorporating Version Histories in Information Retrieval Based
Bug Localization''的人用的slide是beamer的。公式很多,overlay很多,列表
很多,图片很少,典型的beamer做出的slide。思维导图用得很不错。今天一天
有至少3个slide是用beamer做的。</p>
<p>题目为``Towards Improving Bug Tracking Systems with Game Mechanisms''
的人用了prezi,图片很多,过度很多。但是比如没有页号没有页眉页脚,正式
会议的场合不太方便。</p>
<p>至少有六个以上用了Apple Keynotes,Keynotes做出来的东西真的和Powerpoint
做出来的很难区别,其中两个人用了初始的主题所以才看出来。</p>
<p>剩下的自然是PPT。MSRA的张女士做的虽然是PPT,倒是有很多beamer的感觉,
比如页眉页脚和overlay的用法。这些如果都是PPT做出来的,会多很多额外的
人力吧。</p>
<p>值得一提的是有一个题目为``Green Mining: A Methodology of Relating
Software Change to Power Consumption''的人的slide全是``劣质''的手绘漫画,
效果意外地好,很低碳很环保很绿色很可爱。具体效果可以参考下面的动画,虽然
现场看到的不是一个版本:</p>
<p><a class="reference external" href="http://softwareprocess.es/a/greenmining-presentatation-at-queens-20120522.ogv">http://softwareprocess.es/a/greenmining-presentatation-at-queens-20120522.ogv</a></p>
</div>
<div class="section" id="id2">
<h4><a class="toc-backref" href="#id15">微软是个腹黑娘!</a></h4>
<p>嘛虽然这也不是什么新闻了。MSR2012的Mining Challenge的赞助商是微软,管理
组织者来自微软研究院,奖品是Xbox和Kinect。然后今年的题目是:</p>
<pre class="literal-block">
Mining Android Bug
</pre>
<p>我看到了微软满满的怨气……</p>
</div>
</div>
</div>