Farseerfc的小窩 - software//farseerfc.me/2012-06-06T10:42:00+09:00ICSE 20122012-06-06T10:42:00+09:002012-06-06T10:42:00+09:00farseerfctag:farseerfc.me,2012-06-06:/icse2012.html
<div class="section" id="june-6">
<h2><a class="toc-backref" href="#id1">June 6</a></h2>
<!-- PELICAN_BEGIN_SUMMARY -->
<div class="section" id="keynote-1">
<h3><a class="toc-backref" href="#id2">Keynote 1</a></h3>
<p>沒怎麼聽懂,只記得講到了finance is not money但是沒聽懂這個和軟件有什麼關係。</p>
</div>
<div class="section" id="cost-estimation-for-distributed-software-project">
<h3><a class="toc-backref" href="#id3">Cost Estimation for Distributed Software Project</a></h3>
<p>講到他們試圖改善現有的模型去更精確地評估軟件開發的開銷。</p>
<p>他們會給PM建議之前的項目的歷史數據,然後對於新項目,他們建議歷史上已有
的項目的數據,從而幫助PM得到更精確的評估。他們試圖儘量減少項目評估對PM
的經驗的需求,從而幫助即使經驗很少的PM也能準確評估項目的開銷。</p>
<!-- PELICAN_END_SUMMARY -->
<p>他們的觀點:</p>
<blockquote>
<p>Context-specfic solutions needed!</p>
<p>我們需要更上下文相關的解決方案!</p>
<p>Early user paticipation is key!</p>
<p>早期用戶的參與是關鍵</p>
</blockquote>
</div>
<div class="section" id="characterizing-logging-practices-in-open-source-software">
<h3><a class="toc-backref" href="#id4">Characterizing Logging Practices in Open-Source Software</a></h3>
<p>Common mistakes in logging messages</p>
<p>在日誌記錄中容易犯的錯誤</p>
<p>他們學習了歷史上的log記錄,然後試圖找到重複修改的輸出log的語句,確定log …</p></div></div>
<div class="section" id="june-6">
<h2><a class="toc-backref" href="#id1">June 6</a></h2>
<!-- PELICAN_BEGIN_SUMMARY -->
<div class="section" id="keynote-1">
<h3><a class="toc-backref" href="#id2">Keynote 1</a></h3>
<p>沒怎麼聽懂,只記得講到了finance is not money但是沒聽懂這個和軟件有什麼關係。</p>
</div>
<div class="section" id="cost-estimation-for-distributed-software-project">
<h3><a class="toc-backref" href="#id3">Cost Estimation for Distributed Software Project</a></h3>
<p>講到他們試圖改善現有的模型去更精確地評估軟件開發的開銷。</p>
<p>他們會給PM建議之前的項目的歷史數據,然後對於新項目,他們建議歷史上已有
的項目的數據,從而幫助PM得到更精確的評估。他們試圖儘量減少項目評估對PM
的經驗的需求,從而幫助即使經驗很少的PM也能準確評估項目的開銷。</p>
<!-- PELICAN_END_SUMMARY -->
<p>他們的觀點:</p>
<blockquote>
<p>Context-specfic solutions needed!</p>
<p>我們需要更上下文相關的解決方案!</p>
<p>Early user paticipation is key!</p>
<p>早期用戶的參與是關鍵</p>
</blockquote>
</div>
<div class="section" id="characterizing-logging-practices-in-open-source-software">
<h3><a class="toc-backref" href="#id4">Characterizing Logging Practices in Open-Source Software</a></h3>
<p>Common mistakes in logging messages</p>
<p>在日誌記錄中容易犯的錯誤</p>
<p>他們學習了歷史上的log記錄,然後試圖找到重複修改的輸出log的語句,確定log
中存在的問題。他們首先確定修改是事後修改。</p>
<p>通常的修改的比例(9027個修改)</p>
<table border="0" class="table docutils borderless">
<colgroup>
<col width="10%"/>
<col width="90%"/>
</colgroup>
<tbody valign="top">
<tr><td>45%</td>
<td>靜態文本</td>
</tr>
<tr><td>27%</td>
<td>打印出的變量</td>
</tr>
<tr><td>26%</td>
<td>調試等級verbosity</td>
</tr>
<tr><td>2%</td>
<td>日誌輸出的位置</td>
</tr>
</tbody>
</table>
<p>他們發現有調試等級的變化,是因爲安全漏洞之類的原因,或者在開銷和數據
之間的權衡。</p>
<p>大多數對log的變量的修改都是爲了增加一個參數。他們之前的LogEnhancer是爲了
解決這個問題而提出的,通過靜態檢查,提醒程序員是否忘記了某個參數</p>
<p>對text的修改是因爲要改掉過時的代碼信息,避免誤導用戶。</p>
<p>他們的實驗是採用了基於code clone 的技術,找到所有log語句,然後找不一致
的clone,然後自動提出建議。</p>
</div>
<div class="section" id="combine-functional-and-imperative-pgrm-for-multicore-sw-scala-java">
<h3><a class="toc-backref" href="#id5">Combine Functional and Imperative Pgrm for Multicore Sw: Scala & Java</a></h3>
<p>趨勢:到處都是多核,但是併發程序呢?</p>
<p>他們研究的對象是Scala和Java,因爲可以編譯後確認JVM字節碼的語義。</p>
<ul class="simple">
<li><dl class="first docutils">
<dt>Java:</dt>
<dd><ul class="first last">
<li>共享內存</li>
<li>顯示創建的線程</li>
<li>手動同步</li>
<li>Wait/Notify機制</li>
</ul>
</dd>
</dl>
</li>
<li><dl class="first docutils">
<dt>Scala:</dt>
<dd><ul class="first last">
<li>高階函數</li>
<li>Actors, 消息傳遞</li>
<li>lists, filters, iterators</li>
<li>while</li>
<li>共享狀態, OO</li>
<li>import java.* 能從java導入任何庫</li>
<li>auto type inferance 自動類型推導</li>
</ul>
</dd>
</dl>
</li>
</ul>
<p>實驗的參與者都經過4周的訓練,實驗項目是工業等級的開發項目</p>
<p>結果:</p>
<p>scala 的項目平均比java多花38%的時間,主要都是花在Test和debug上的時間。</p>
<p>程序員的經驗和總體時間相關,但是對test和debug沒有顯著影響。</p>
<p>scala的爲了讓編程更有效率的設計,導致debug更困難。比如類型推導,debug
的時候需要手動推導,來理解正在發生什麼。</p>
<p>scala的程序比java小,中位數2.6%,平均15.2%</p>
<ul class="simple">
<li><dl class="first docutils">
<dt>性能比較:</dt>
<dd><ul class="first last">
<li>單核:scala的線性程序的性能比java好</li>
<li><dl class="first docutils">
<dt>4核:</dt>
<dd><ul class="first last">
<li>scala 7s @ 4 threads</li>
<li>java 4si @ 8 threads</li>
<li><dl class="first docutils">
<dt>median</dt>
<dd><ul class="first last">
<li>83s scala</li>
<li>98s java</li>
</ul>
</dd>
</dl>
</li>
</ul>
</dd>
</dl>
</li>
<li>32core: best scala 34s @ 64 threads</li>
</ul>
</dd>
</dl>
</li>
<li><dl class="first docutils">
<dt>結論</dt>
<dd><ul class="first last">
<li>java有更好的scalability</li>
</ul>
</dd>
</dl>
</li>
<li><dl class="first docutils">
<dt>scala類型推導</dt>
<dd><ul class="first last">
<li>45%說對攜帶碼有幫助</li>
<li>85%說導致程序錯誤</li>
</ul>
</dd>
</dl>
</li>
<li><dl class="first docutils">
<dt>調試</dt>
<dd><ul class="first last">
<li>23%認爲scala簡單</li>
<li>77%認爲java簡單</li>
</ul>
</dd>
</dl>
</li>
</ul>
<p>multi-paradigram are better</p>
</div>
<div class="section" id="sound-empirical-evidence-in-software-testing">
<h3><a class="toc-backref" href="#id6">Sound Empirical Evidence in Software Testing</a></h3>
<p>Test data generation 測試數據自動生成</p>
<p>Large Empirical Studies - not always possible</p>
<p>For open source software - big enough</p>
</div>
<div class="section" id="identifing-linux-bug-fixing-patch">
<h3><a class="toc-backref" href="#id7">Identifing Linux Bug Fixing Patch</a></h3>
<ul class="simple">
<li><dl class="first docutils">
<dt>current practice:</dt>
<dd><ul class="first last">
<li>manual</li>
</ul>
</dd>
</dl>
</li>
<li><dl class="first docutils">
<dt>Current research:</dt>
<dd><ul class="first last">
<li>keywords in commits</li>
<li>link bug reports in bugzilla</li>
</ul>
</dd>
</dl>
</li>
</ul>
<p>Try to solve classification problem</p>
<ul class="simple">
<li><dl class="first docutils">
<dt>issue</dt>
<dd><ul class="first last">
<li>pre-identified</li>
<li>post-identified</li>
</ul>
</dd>
</dl>
</li>
<li><dl class="first docutils">
<dt>data</dt>
<dd><ul class="first last">
<li>from commit log</li>
</ul>
</dd>
</dl>
</li>
<li><dl class="first docutils">
<dt>feature extraction</dt>
<dd><ul class="first last">
<li>text pre-process stemmed non-stop words</li>
</ul>
</dd>
</dl>
</li>
<li>model learning</li>
</ul>
<p>research questions</p>
</div>
<div class="section" id="active-refinement-of-clone-anomaly-reports">
<h3><a class="toc-backref" href="#id8">Active Refinement of Clone Anomaly Reports</a></h3>
<p>motivating</p>
<ul class="simple">
<li>code clones, clone groups</li>
<li>clone used to detect bugs</li>
<li>anomaly : inconsistent clone group
many anomaly clone are note bug, high false positive</li>
</ul>
<dl class="docutils">
<dt>approach</dt>
<dd><ul class="first last simple">
<li>reorder by sorted bug reports</li>
</ul>
</dd>
</dl>
</div>
</div>
<hr class="docutils"/>
<div class="section" id="june7">
<h2><a class="toc-backref" href="#id9">June7</a></h2>
<div class="section" id="keynotes-2-sustainability-with-software-an-industrial-perspective">
<h3><a class="toc-backref" href="#id10">Keynotes 2: Sustainability with Software - An Industrial Perspective</a></h3>
<p>Sustainability</p>
<ul class="simple">
<li><dl class="first docutils">
<dt>Classic View: Idenpendent view with overlap</dt>
<dd><ul class="first last">
<li>Social</li>
<li>Environment</li>
<li>Economic</li>
</ul>
</dd>
</dl>
</li>
<li><dl class="first docutils">
<dt>Nested viw</dt>
<dd><ul class="first last">
<li><dl class="first docutils">
<dt>Environment</dt>
<dd><ul class="first last">
<li><dl class="first docutils">
<dt>Social</dt>
<dd><ul class="first last">
<li>Economic</li>
</ul>
</dd>
</dl>
</li>
</ul>
</dd>
</dl>
</li>
</ul>
</dd>
</dl>
</li>
</ul>
<dl class="docutils">
<dt>Triple bottom line</dt>
<dd><ul class="first last simple">
<li><dl class="first docutils">
<dt>economic</dt>
<dd>-global business, networks , global econ</dd>
</dl>
</li>
<li><dl class="first docutils">
<dt>env</dt>
<dd><ul class="first last">
<li>natural res, climate change, population grow</li>
</ul>
</dd>
</dl>
</li>
<li><dl class="first docutils">
<dt>social</dt>
<dd><ul class="first last">
<li>awareness, connectivity, accountability</li>
</ul>
</dd>
</dl>
</li>
</ul>
</dd>
</dl>
<div class="section" id="green-it">
<h4><a class="toc-backref" href="#id11">Green IT</a></h4>
<ul class="simple">
<li><dl class="first docutils">
<dt>reduce IT energy</dt>
<dd><ul class="first last">
<li>more than 50% cooling - doing nothing</li>
</ul>
</dd>
</dl>
</li>
<li><dl class="first docutils">
<dt>mini e-waste: not properly recycled</dt>
<dd><ul class="first last">
<li>80% in EU</li>
<li>75% in US</li>
</ul>
</dd>
</dl>
</li>
<li>foster dematerialization</li>
</ul>
<p>In-Memory Technology: Expected Sustainable Benefits</p>
</div>
<div class="section" id="what-can-we-do">
<h4><a class="toc-backref" href="#id12">What can we do?</a></h4>
<blockquote>
<ul class="simple">
<li>consider all software lifecycle phases in your design</li>
<li>avoid energy expensive behavior in your codes</li>
<li>design lean architectures</li>
</ul>
</blockquote>
</div>
<div class="section" id="green-by-it">
<h4><a class="toc-backref" href="#id13">Green by IT</a></h4>
<blockquote>
<ul class="simple">
<li>2% green IT</li>
<li>98% green IT</li>
</ul>
</blockquote>
</div>
</div>
<div class="section" id="on-how-often-code-is-cloned-across-repositories">
<h3><a class="toc-backref" href="#id14">On How Often code is cloned across repositories</a></h3>
<p>Line based hashing code clone detection</p>
<p>never do anything harder than sorting</p>
<p>hashing a window of 5 lines of normalized (tokenized) code, dropping
3/4 of the hashing</p>
<p>把ccfinder一個月的工作縮短到了3, 4天。沒有比較presion和recall。</p>
<table border="0" class="table docutils borderless">
<colgroup>
<col width="11%"/>
<col width="89%"/>
</colgroup>
<tbody valign="top">
<tr><td>14%</td>
<td>type1</td>
</tr>
<tr><td>16%</td>
<td>type2</td>
</tr>
<tr><td>17%</td>
<td>type3 (not really type2)</td>
</tr>
</tbody>
</table>
</div>
<div class="section" id="graph-based-analysis-and-prediction-for-sw-evolution">
<h3><a class="toc-backref" href="#id15">Graph-based analysis and prediction for sw evolution</a></h3>
<div class="section" id="graph-are-everywhere">
<h4><a class="toc-backref" href="#id16">graph are everywhere</a></h4>
<ul class="simple">
<li>internet topology</li>
<li>social net</li>
<li>chemistry</li>
<li>biology</li>
</ul>
<p>in sw
- func call graph
- module dependency graph</p>
<p>developer interaction graph
- commit logs
- bug reports</p>
<p>experiment 11 oss, 27~171 release, > 9 years</p>
</div>
<div class="section" id="predictors">
<h4><a class="toc-backref" href="#id17">predictors</a></h4>
<ul class="simple">
<li><dl class="first docutils">
<dt>NodeRank</dt>
<dd><ul class="first last">
<li>similar to pagerank of google</li>
<li>measure relative importance of each node</li>
<li><dl class="first docutils">
<dt>func call graph with noderank</dt>
<dd><ul class="first last">
<li>compare rank with severity scale on bugzilla</li>
</ul>
</dd>
</dl>
</li>
<li><dl class="first docutils">
<dt>correlation between noderank and BugSeverity</dt>
<dd><ul class="first last">
<li>func level 0.48 ~ 0.86 varies among projects.</li>
<li>model level > func level</li>
</ul>
</dd>
</dl>
</li>
</ul>
</dd>
</dl>
</li>
<li><dl class="first docutils">
<dt>ModularityRatio</dt>
<dd><ul class="first last">
<li>cohesion/coupling ratio: IntraDep(M)/InterDep(M)</li>
<li>forecast mantencance effort</li>
<li><dl class="first docutils">
<dt>use for</dt>
<dd><ul class="first last">
<li>identify modules that need redesign or refactoring</li>
</ul>
</dd>
</dl>
</li>
</ul>
</dd>
</dl>
</li>
<li><dl class="first docutils">
<dt>EditDistance</dt>
<dd><ul class="first last">
<li>bug-based developer collaboration graphs</li>
<li>ED(G1,G2)=|V1|+|V2|-2|V1交V2|+|E1|+|E2|-2|E1交E2|</li>
<li><dl class="first docutils">
<dt>use for</dt>
<dd><ul class="first last">
<li>release planning</li>
<li>resource allocation</li>
</ul>
</dd>
</dl>
</li>
</ul>
</dd>
</dl>
</li>
</ul>
<p>graph metrics</p>
<ul class="simple">
<li><dl class="first docutils">
<dt>graph diameter</dt>
<dd><ul class="first last">
<li>average node degree indicates reuse</li>
</ul>
</dd>
</dl>
</li>
<li>clustering coefficient</li>
<li>assortativity</li>
<li>num of cycles</li>
</ul>
</div>
<div class="section" id="conclusion">
<h4><a class="toc-backref" href="#id18">Conclusion</a></h4>
<p>"Actionable intelligence" from graph evolution</p>
<ul class="simple">
<li>studie 11 large long-live projs</li>
<li>predictors</li>
<li>identify pivotal moments in evolution</li>
</ul>
</div>
</div>
<div class="section" id="what-make-long-term-contributors-willingness-and-opportunity-in-oss">
<h3><a class="toc-backref" href="#id19">What make long term contributors: willingness and opportunity in OSS</a></h3>
<p>OSS don't work without contributors form community</p>
<p>mozilla (2000-2008)</p>
<p>10^2.2 LTC <- 2 order -> 10^4.2 new contributors <- 3.5 order -> 10^7.7 users</p>
<p>gnome (1999-2007)</p>
<p>10^2.5 LTC <- 1.5 order -> 10^4.0 new contributors <- 3.5 order -> 10^6.5 users</p>
<div class="section" id="approach">
<h4><a class="toc-backref" href="#id20">approach</a></h4>
<ul class="simple">
<li>read issues of 20 LTC and 20 non-LTC</li>
<li>suvery 56 (36 non-LTC and 20 LTC)</li>
<li>extract practices published on project web sites</li>
</ul>
</div>
<div class="section" id="summeray">
<h4><a class="toc-backref" href="#id21">summeray</a></h4>
<ul class="simple">
<li>Ability/Willingness distinguishes LTCs</li>
<li><dl class="first docutils">
<dt>Environment</dt>
<dd><ul class="first last">
<li><dl class="first docutils">
<dt>macro-climate</dt>
<dd><ul class="first last">
<li>popularity</li>
</ul>
</dd>
</dl>
</li>
<li><dl class="first docutils">
<dt>micro-climate</dt>
<dd><ul class="first last">
<li>attention</li>
<li>bumber of peers</li>
<li>performance of peers</li>
</ul>
</dd>
</dl>
</li>
</ul>
</dd>
</dl>
</li>
</ul>
<p>regression model</p>
<p>newcomers to LTC conversion drops</p>
<dl class="docutils">
<dt>actions in first month predicts LTCs</dt>
<dd><ul class="first last simple">
<li>24% recall</li>
<li>37% precision</li>
</ul>
</dd>
</dl>
</div>
</div>
<div class="section" id="develop-of-auxiliary-functions-should-you-be-agile">
<h3><a class="toc-backref" href="#id22">develop of auxiliary functions: should you be agile?</a></h3>
<p>a empirial assessment of pair programming and test-first programming</p>
<p>can agile help auxiliary functions?</p>
<div class="section" id="experiment">
<h4><a class="toc-backref" href="#id23">experiment</a></h4>
<ul class="simple">
<li>pair vs solo</li>
<li>test-first vs test-last</li>
<li>students vs professors</li>
</ul>
</div>
<div class="section" id="research-questions">
<h4><a class="toc-backref" href="#id24">research questions</a></h4>
<ul class="simple">
<li>r1: can pair help obtain more correct impl</li>
<li>r2: can test-first</li>
<li>r3: dst test1 encourage the impl or more test cases?</li>
<li>r4: does test1 course more coverage</li>
</ul>
</div>
<div class="section" id="result">
<h4><a class="toc-backref" href="#id25">result</a></h4>
<ul class="simple">
<li><dl class="first docutils">
<dt>test-first</dt>
<dd><ul class="first last">
<li>higher coverage</li>
<li>non change with correctness</li>
</ul>
</dd>
</dl>
</li>
<li><dl class="first docutils">
<dt>pair</dt>
<dd><ul class="first last">
<li>improve on correctness</li>
<li>longer total programming time</li>
</ul>
</dd>
</dl>
</li>
</ul>
</div>
</div>
<div class="section" id="static-detection-of-resource-contention-problems-in-server-side-script">
<h3><a class="toc-backref" href="#id26">Static Detection of Resource Contention Problems in Server-side script</a></h3>
<p>Addressed the race condition of accessing database or filesystem of PHP</p>
</div>
<div class="section" id="amplifying-tests-to-validate-exception-handling-code">
<h3><a class="toc-backref" href="#id27">Amplifying Tests to Validate Exception Handling Code</a></h3>
<p>異常處理的代碼不但難寫,而且難以驗證。各種組合情況難以估計,尤其是手機
系統上。</p>
</div>
<div class="section" id="a-tactic-centric-approach-automating-traceability-of-quality-concerns">
<h3><a class="toc-backref" href="#id28">A tactic-centric approach automating traceability of quality concerns</a></h3>
<p>tactic traceability information models</p>
</div>
</div>
MSR 2012 @ ICSE2012-06-02T10:42:00+09:002012-06-02T10:42:00+09:00farseerfctag:farseerfc.me,2012-06-02:/msr2012.html
<div class="section" id="mining-software-repository-2012-icse">
<h2><a class="toc-backref" href="#id3">Mining Software Repository 2012 @ ICSE</a></h2>
<p>參加了今年的MSR,會場在University of Zurich。一大早來到大學,註冊有點
小插曲,顯然瑞士人搞不清楚中國人的名字,3個楊(Yang)姓的中國人的名牌
被搞錯了。然後堀田學長的所屬被寫作了“Japan, Japan”,成爲了全日本的代表。</p>
<div class="section" id="msr-microsoft-research-talk-msr-mining-software-repositories">
<h3><a class="toc-backref" href="#id4">MSR(MicroSoft Research) talk @ MSR(Mining Software Repositories)</a></h3>
<p>首先是來自微軟亞洲研究院(MicroSoft Research @ Asia, MSR Asia)的Keynots,
於是就變成了MSR在MSR的演講。MSR的張冬梅(Dongmei Zhang)女士的演講
分爲關於Software Analysis和XIAO的兩部分。XIAO是MSRA開發的Code Clone
Detector,似乎我要給井上研做的就是這個。想更多瞭解Xiao的細節,不過張女士
演講結束的時候的鼓掌導致了話筒的小故障 …</p></div></div>
<div class="section" id="mining-software-repository-2012-icse">
<h2><a class="toc-backref" href="#id3">Mining Software Repository 2012 @ ICSE</a></h2>
<p>參加了今年的MSR,會場在University of Zurich。一大早來到大學,註冊有點
小插曲,顯然瑞士人搞不清楚中國人的名字,3個楊(Yang)姓的中國人的名牌
被搞錯了。然後堀田學長的所屬被寫作了“Japan, Japan”,成爲了全日本的代表。</p>
<div class="section" id="msr-microsoft-research-talk-msr-mining-software-repositories">
<h3><a class="toc-backref" href="#id4">MSR(MicroSoft Research) talk @ MSR(Mining Software Repositories)</a></h3>
<p>首先是來自微軟亞洲研究院(MicroSoft Research @ Asia, MSR Asia)的Keynots,
於是就變成了MSR在MSR的演講。MSR的張冬梅(Dongmei Zhang)女士的演講
分爲關於Software Analysis和XIAO的兩部分。XIAO是MSRA開發的Code Clone
Detector,似乎我要給井上研做的就是這個。想更多瞭解Xiao的細節,不過張女士
演講結束的時候的鼓掌導致了話筒的小故障。</p>
</div>
<div class="section" id="towards-improving-bts-with-game-mechanisms">
<h3><a class="toc-backref" href="#id5">Towards Improving BTS with Game Mechanisms</a></h3>
<p>感覺這篇的內容基本上就是關於</p>
<p><a class="reference external" href="http://www.joelonsoftware.com/items/2008/09/15.html">http://www.joelonsoftware.com/items/2008/09/15.html</a></p>
<p>這裏寫到的東西,然後說同樣的理論是否可以用於Issue Tracking之類的事情上。
個人感覺這個意義不大,stackoverflow之所以成功是因爲它把開源社區本身就
具有的名譽體系具現化了,本着大家都喜歡被別人奉爲大牛的心態,就如同
wikipedia一樣。同樣的理論如果用於公司內部的Issue Tracking系統上,會得到
完全不同的東西吧。就像MSDN的組織方式雖然和wikipedia是一樣的,但是在MSDN
裏找信息的感覺和在wikipedia完全不一樣。個人不太看好這個方向。</p>
</div>
<div class="section" id="ghtorrent">
<h3><a class="toc-backref" href="#id6">GHTorrent</a></h3>
<p>這篇的slide在這裏可以看到:<a class="reference external" href="http://www.slideshare.net/gousiosg/ghtorrent-githubs-data-from-a-firehose-13184524">http://www.slideshare.net/gousiosg/ghtorrent-githubs-data-from-a-firehose-13184524</a></p>
<p>Data exporter for github. Github的主要數據,代碼,已經可以通過git接口
獲得了,wiki是git的形式保存的。所以這個項目的目的就是暴露別的數據,主要
是issue tracking,code comments,這種。代碼訪問github api,然後用分佈式
實現以克服api的限制,然後提供torrents形式的history下載。github api獲得
的json數據以bson的形式保存在MongoDB裏,解析過的有了Schema之後的數據保存
在MySQL裏並可以導出SQL。</p>
<p>個人的想法,覺得數據如果能夠更統一,全部存在Git裏或許更好,像Wiki一樣。
同樣是要暴露全部歷史記錄的目的,用Torrent自己實現的歷史遠不如用Git的
接口實現的歷史記錄方便吧,git blame之類的也更方便追蹤code comment之類的
作者信息。當然對git的raw date直接讀寫,需要對git的內部原理有足夠的理解,
或許只有github的人有這種能力了。</p>
</div>
<div class="section" id="topic-mining">
<h3><a class="toc-backref" href="#id7">Topic Mining</a></h3>
<p>用得兩個參數, DE 和 AIC,完全不能理解,過後研究。實驗針對了Firefox,
Mylyn, Eclipse三個軟件。試圖從Repo中分析源代碼的identifier和comments,
找到topic和bug之間的關係,比如怎樣的topic更容易導致bug。得出的結論似乎
也很曖昧,只是說核心功能被報告的bug更多,但是不知道原因。這只能表示核心
功能受到更多關注和更多測試吧,並不能說明核心功能就容易產生bug。</p>
<p>不過這個的Slide做得很漂亮,很容易理解。</p>
</div>
<div class="section" id="secold">
<h3><a class="toc-backref" href="#id8">SeCold</a></h3>
<p>A linked data platform for mining software repositories</p>
<p>沒聽懂這個項目的目的。</p>
</div>
<div class="section" id="the-evolution-of-software">
<h3><a class="toc-backref" href="#id9">The evolution of software</a></h3>
<p>第二天的Keynotes,關於將Social Media和Software Development相結合的想法。
或許就是Github賴以成功的基礎。講到代碼中的comment, Tags, uBlog, blog之類
的social的特性和IDE的融合的趨勢。</p>
</div>
<div class="section" id="do-faster-releases-imporve-software-quality">
<h3><a class="toc-backref" href="#id10">Do Faster Releases Imporve Software Quality?</a></h3>
<p>使用Firefox作爲例子。</p>
<p>結論是快速發佈導致bug更多,更容易crash,但是bug更快得到修復,並且用戶
更快轉向新的發佈。</p>
</div>
<div class="section" id="security-vs-performance-bugs-in-firefox">
<h3><a class="toc-backref" href="#id11">Security vs Performance Bugs in Firefox</a></h3>
<p>Performance bugs are regression, blocks release.</p>
</div>
<hr class="docutils"/>
<div class="section" id="id1">
<h3><a class="toc-backref" href="#id12">一些感想</a></h3>
<div class="section" id="commit">
<h4><a class="toc-backref" href="#id13">基於自然語義分析的commit分割</a></h4>
<p>經常工具(比如git)的使用者並沒有按照工具設計者的意圖使用工具,這給MSR
帶來很多困難。舉個例子,git有非常完美的branch系統,通常期望git的使用者
能夠在一次commit裏commit一個功能,比如一個bug的修復,或者一個feature的
添加,但是事實上經常有很多邏輯上的commit被合併在一個裏面了。</p>
<p>或許這不是使用者的錯,而是工具仍然不夠人性的表現。或許我們可以自動把
一次的commit按照語義分割成多個。</p>
<p>分割之後,可以更容易地把issue和commit關聯,也更容易組織更多的研究。</p>
</div>
<div class="section" id="slides">
<h4><a class="toc-backref" href="#id14">關於這次發表中大家用的slides系統</a></h4>
<p>題目爲``Incorporating Version Histories in Information Retrieval Based
Bug Localization''的人用的slide是beamer的。公式很多,overlay很多,列表
很多,圖片很少,典型的beamer做出的slide。思維導圖用得很不錯。今天一天
有至少3個slide是用beamer做的。</p>
<p>題目爲``Towards Improving Bug Tracking Systems with Game Mechanisms''
的人用了prezi,圖片很多,過度很多。但是比如沒有頁號沒有頁眉頁腳,正式
會議的場合不太方便。</p>
<p>至少有六個以上用了Apple Keynotes,Keynotes做出來的東西真的和Powerpoint
做出來的很難區別,其中兩個人用了初始的主題所以才看出來。</p>
<p>剩下的自然是PPT。MSRA的張女士做的雖然是PPT,倒是有很多beamer的感覺,
比如頁眉頁腳和overlay的用法。這些如果都是PPT做出來的,會多很多額外的
人力吧。</p>
<p>值得一提的是有一個題目爲``Green Mining: A Methodology of Relating
Software Change to Power Consumption''的人的slide全是``劣質''的手繪漫畫,
效果意外地好,很低碳很環保很綠色很可愛。具體效果可以參考下面的動畫,雖然
現場看到的不是一個版本:</p>
<p><a class="reference external" href="http://softwareprocess.es/a/greenmining-presentatation-at-queens-20120522.ogv">http://softwareprocess.es/a/greenmining-presentatation-at-queens-20120522.ogv</a></p>
</div>
<div class="section" id="id2">
<h4><a class="toc-backref" href="#id15">微軟是個腹黑娘!</a></h4>
<p>嘛雖然這也不是什麼新聞了。MSR2012的Mining Challenge的贊助商是微軟,管理
組織者來自微軟研究院,獎品是Xbox和Kinect。然後今年的題目是:</p>
<pre class="literal-block">
Mining Android Bug
</pre>
<p>我看到了微軟滿滿的怨氣……</p>
</div>
</div>
</div>