ICSE 2012

June 6

Keynote 1

沒怎麼聽懂,只記得講到了finance is not money但是沒聽懂這個和軟件有什麼關係。

Cost Estimation for Distributed Software Project

講到他們試圖改善現有的模型去更精確地評估軟件開發的開銷。

他們會給PM建議之前的項目的歷史數據,然後對於新項目,他們建議歷史上已有 的項目的數據,從而幫助PM得到更精確的評估。他們試圖儘量減少項目評估對PM 的經驗的需求,從而幫助即使經驗很少的PM也能準確評估項目的開銷。

他們的觀點:

Context-specfic solutions needed!

我們需要更上下文相關的解決方案!

Early user paticipation is key!

早期用戶的參與是關鍵

Characterizing Logging Practices in Open-Source Software

Common mistakes in logging messages

在日誌記錄中容易犯的錯誤

他們學習了歷史上的log記錄,然後試圖找到重複修改的輸出log的語句,確定log 中存在的問題。他們首先確定修改是事後修改。

通常的修改的比例(9027個修改)

45% 靜態文本
27% 打印出的變量
26% 調試等級verbosity
2% 日誌輸出的位置

他們發現有調試等級的變化,是因爲安全漏洞之類的原因,或者在開銷和數據 之間的權衡。

大多數對log的變量的修改都是爲了增加一個參數。他們之前的LogEnhancer是爲了 解決這個問題而提出的,通過靜態檢查,提醒程序員是否忘記了某個參數

對text的修改是因爲要改掉過時的代碼信息,避免誤導用戶。

他們的實驗是採用了基於code clone 的技術,找到所有log語句,然後找不一致 的clone,然後自動提出建議。

Combine Functional and Imperative Pgrm for Multicore Sw: Scala & Java

趨勢:到處都是多核,但是併發程序呢?

他們研究的對象是Scala和Java,因爲可以編譯後確認JVM字節碼的語義。

  • Java:
    • 共享內存
    • 顯示創建的線程
    • 手動同步
    • Wait/Notify機制
  • Scala:
    • 高階函數
    • Actors, 消息傳遞
    • lists, filters, iterators
    • while
    • 共享狀態, OO
    • import java.* 能從java導入任何庫
    • auto type inferance 自動類型推導

實驗的參與者都經過4周的訓練,實驗項目是工業等級的開發項目

結果:

scala 的項目平均比java多花38%的時間,主要都是花在Test和debug上的時間。

程序員的經驗和總體時間相關,但是對test和debug沒有顯著影響。

scala的爲了讓編程更有效率的設計,導致debug更困難。比如類型推導,debug 的時候需要手動推導,來理解正在發生什麼。

scala的程序比java小,中位數2.6%,平均15.2%

  • 性能比較:
    • 單核:scala的線性程序的性能比java好
    • 4核:
      • scala 7s @ 4 threads
      • java 4si @ 8 threads
      • median
        • 83s scala
        • 98s java
    • 32core: best scala 34s @ 64 threads
  • 結論
    • java有更好的scalability
  • scala類型推導
    • 45%說對攜帶碼有幫助
    • 85%說導致程序錯誤
  • 調試
    • 23%認爲scala簡單
    • 77%認爲java簡單

multi-paradigram are better

Sound Empirical Evidence in Software Testing

Test data generation 測試數據自動生成

Large Empirical Studies - not always possible

For open source software - big enough

Identifing Linux Bug Fixing Patch

  • current practice:
    • manual
  • Current research:
    • keywords in commits
    • link bug reports in bugzilla

Try to solve classification problem

  • issue
    • pre-identified
    • post-identified
  • data
    • from commit log
  • feature extraction
    • text pre-process stemmed non-stop words
  • model learning

research questions

Active Refinement of Clone Anomaly Reports

motivating

  • code clones, clone groups
  • clone used to detect bugs
  • anomaly : inconsistent clone group many anomaly clone are note bug, high false positive
approach
  • reorder by sorted bug reports

June7

Keynotes 2: Sustainability with Software - An Industrial Perspective

Sustainability

  • Classic View: Idenpendent view with overlap
    • Social
    • Environment
    • Economic
  • Nested viw
    • Environment
      • Social
        • Economic
Triple bottom line
  • economic
    -global business, networks , global econ
  • env
    • natural res, climate change, population grow
  • social
    • awareness, connectivity, accountability

Green IT

  • reduce IT energy
    • more than 50% cooling - doing nothing
  • mini e-waste: not properly recycled
    • 80% in EU
    • 75% in US
  • foster dematerialization

In-Memory Technology: Expected Sustainable Benefits

What can we do?

  • consider all software lifecycle phases in your design
  • avoid energy expensive behavior in your codes
  • design lean architectures

Green by IT

  • 2% green IT
  • 98% green IT

On How Often code is cloned across repositories

Line based hashing code clone detection

never do anything harder than sorting

hashing a window of 5 lines of normalized (tokenized) code, dropping 3/4 of the hashing

把ccfinder一個月的工作縮短到了3, 4天。沒有比較presion和recall。

14% type1
16% type2
17% type3 (not really type2)

Graph-based analysis and prediction for sw evolution

graph are everywhere

  • internet topology
  • social net
  • chemistry
  • biology

in sw - func call graph - module dependency graph

developer interaction graph - commit logs - bug reports

experiment 11 oss, 27~171 release, > 9 years

predictors

  • NodeRank
    • similar to pagerank of google
    • measure relative importance of each node
    • func call graph with noderank
      • compare rank with severity scale on bugzilla
    • correlation between noderank and BugSeverity
      • func level 0.48 ~ 0.86 varies among projects.
      • model level > func level
  • ModularityRatio
    • cohesion/coupling ratio: IntraDep(M)/InterDep(M)
    • forecast mantencance effort
    • use for
      • identify modules that need redesign or refactoring
  • EditDistance
    • bug-based developer collaboration graphs
    • ED(G1,G2)=|V1|+|V2|-2|V1交V2|+|E1|+|E2|-2|E1交E2|
    • use for
      • release planning
      • resource allocation

graph metrics

  • graph diameter
    • average node degree indicates reuse
  • clustering coefficient
  • assortativity
  • num of cycles

Conclusion

"Actionable intelligence" from graph evolution

  • studie 11 large long-live projs
  • predictors
  • identify pivotal moments in evolution

What make long term contributors: willingness and opportunity in OSS

OSS don't work without contributors form community

mozilla (2000-2008)

10^2.2 LTC <- 2 order -> 10^4.2 new contributors <- 3.5 order -> 10^7.7 users

gnome (1999-2007)

10^2.5 LTC <- 1.5 order -> 10^4.0 new contributors <- 3.5 order -> 10^6.5 users

approach

  • read issues of 20 LTC and 20 non-LTC
  • suvery 56 (36 non-LTC and 20 LTC)
  • extract practices published on project web sites

summeray

  • Ability/Willingness distinguishes LTCs
  • Environment
    • macro-climate
      • popularity
    • micro-climate
      • attention
      • bumber of peers
      • performance of peers

regression model

newcomers to LTC conversion drops

actions in first month predicts LTCs
  • 24% recall
  • 37% precision

develop of auxiliary functions: should you be agile?

a empirial assessment of pair programming and test-first programming

can agile help auxiliary functions?

experiment

  • pair vs solo
  • test-first vs test-last
  • students vs professors

research questions

  • r1: can pair help obtain more correct impl
  • r2: can test-first
  • r3: dst test1 encourage the impl or more test cases?
  • r4: does test1 course more coverage

result

  • test-first
    • higher coverage
    • non change with correctness
  • pair
    • improve on correctness
    • longer total programming time

Static Detection of Resource Contention Problems in Server-side script

Addressed the race condition of accessing database or filesystem of PHP

Amplifying Tests to Validate Exception Handling Code

異常處理的代碼不但難寫,而且難以驗證。各種組合情況難以估計,尤其是手機 系統上。

請用你的 GitHub 賬戶登錄並在這篇文章的 Issue 頁下留言
comments powered by Disqus