KEMBAR78
Gerrit Analytics applied to Android source code | PDF
0
Gerrit User Summit 2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 0
Gerrit Code Analytics
for the Android OpenSource Project
Luca Milanesio
Gerrit Code Review Maintainer
GerritForge
1
Gerrit User Summit 2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 1
About GerritForge
Founded in
the UK
HQ in
London
Committed to
OpenSource
+ Sunnyvale
CA
2
Gerrit User Summit 2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 2
Gerrit DevOps Analytics
§ There’s a lot value in your DevOps
pipeline
§ Information collected from Git, Jenkins,
Jira, you name it
§ Discover and publish meaningful KPI to
make intelligent decisions about
§ People
§ Projects
§ Infrastructure
§ Lower the Risk of a software release
leveraging insights on historical data
3
Gerrit User Summit 2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 3
Continuous Delivery
Analytics Dimensions
People Reviews Projects Commits System Metrics
4
Gerrit User Summit 2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 4
BigData to the rescue
§ Collect all review events
§ Collect all logs
§ Channel them to a central store
§ Crunch and Crunch continuously
§ Never delete
§ Process, inspect and learn
5
Gerrit User Summit 2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 5
GDA Components
The main components of GDA are:
§ GDA Event Collector (Plugin)
This allows for data to be extracted, anonymized
and sent over to the next phase.
§ GDA ELT Engine
This is hosted in the cloud by GerritForge or on-
premises and functions as data mart and
processing for all development related data
§ GDA Dashboard(s)
These are provided by GF according to the
customer needs. Some dashboards are already
available (for people and projects). Others will
be built on purpose.
6
Gerrit User Summit 2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 6
Android OpenSource Project use-case
Issue 10597: AOSP Gerrit stats page
Reported by zoran.jovanovic@sony.com on Wed, Mar 13, 2019,
9:02 AM PDT
AOSP repository and its Gerrit Code Review are a treasure trove of
data.
There are some very interesting and useful stats that could be
presented
to the users. The stats would help in giving recognition for the
contributors and reviewers alike, thus raising the motiviation of
contributors, it would provide an easy access to the long and rich
history of AOSP project etc.
7
Gerrit User Summit 2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 7
The problems to be resolved
P1: How to mirror the AOSP repositories in a
systematic way?
P2: How to scale up the current Gerrit analytics
plugin + ETL?
P3: How to parse foreign Gerrit note-db data?
8
Gerrit User Summit 2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 8
Problem 1: AOSP repositories replication
Gerrit has a replication plugin:
• Define replication remotes
• Define push ref-spec
How to pull instead?
Do you replicate the AOSP repos on your
Gerrit?
How do you automate the process?
9
Gerrit User Summit 2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 9
Problem 1 solved: pull-replication plugin
10
Gerrit User Summit 2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 10
Challenges with pull-replication plugin
1. Why not extending the replication plugin?
• replication.config already defines replication remotes
• Can the push logic be extended?
2. Why not developing a brand-new plugin?
• Copy & paste existing code, keeping the essential?
… or a mix of 1. and 2. ?
11
Gerrit User Summit 2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 11
Design decisions
Refactoring of replication plugin
• Decouple configuration from remote destination
• Started in April, merged in October (6 months!)
• Introduced unit and integration tests (the firsts in 7 years !)
Reuse of the replication plugin logic in the pull-replication
plugin
• Use replication-plugin as a dependency
• Complete reuse of the configuration and logic associated
12
Gerrit User Summit 2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 12
pull-replication demo
$GERRIT_SITE/etc/replication.config
[remote "aosp"]
url = https://android.googlesource.com/${name}
fetch = +refs/heads/*:refs/heads/*
fetch = +refs/tags/*:refs/tags/*
projects = platform/system/core
13
Gerrit User Summit 2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 13
pull-replication demo
ssh localhost pull-replication start --all
==> logs/pull_replication_log <==
[2019-11-16 23:59:22,712] [49829e49] Replication from
https://android.googlesource.com/platform/system/core started...
[2019-11-16 23:59:22,753] [49829e49] Fetch references
[+refs/heads/*:refs/heads/*, +refs/tags/*:refs/tags/*] from
https://android.googlesource.com/platform/system/core
[2019-11-16 23:59:25,243] [49829e49] Replication from
https://android.googlesource.com/platform/system/core completed in 2530ms,
15012ms delay, 0 retries
14
Gerrit User Summit 2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 14
Problem 2: AOSP repositories are BIG
Gerrit analytics plugin slowest
points:
• Processing of branches
• Binary files
• Commits diffs and stats
computation
15
Gerrit User Summit 2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 15
Problem 2: solved
Gerrit analytics plugin performance
improvements:
• Pre-computation of branches
• Reuse of the Gerrit diff-cache for analytics
• Introduction of the analytics-
commits_statistics_cache
16
Gerrit User Summit 2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 16
Analytics performance demo
Gerrit cold start (no cache):
$ time curl -v
'http://localhost:8080/projects/platform%2Fsystem%2Fcore/analyt
ics~contributors' | wc –l
1732
curl -v 0.02s user 0.03s system 0% cpu 1:30.60 total
17
Gerrit User Summit 2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 17
Analytics added cache and performance
$ ssh localhost gerrit show-caches | grep analytics
analytics-commits_statistics_cache| 54011 | 1.6ms | 0% |
$ time curl -v
'http://localhost:8080/projects/platform%2Fsystem%2Fcore/analytics~contributors
' | wc –l
1732
curl -v 0.01s user 0.02s system 2% cpu 1.045 total
90x times faster with commits-stats cache
18
Gerrit User Summit 2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 18
Analytics bot detection
People or BOTs can be recognized by patter of
commits:
• Detect file type by reg-ex
• Identify commits with BOT-like files only
$GERRIT_SITE/etc/analytics.config:
[contributors]
botlike-filename-regexp = OWNERS
19
Gerrit User Summit 2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 19
Problem 3: foreign changes processing
Gerrit changes in NoteDb have a Server-ID baked-in
Example:
$ git show refs/changes/00/100000/meta
commit 0014ca6443ac0af338e2677b45e538782bb7a12e (origin/00/100000/meta,
refs/changes/00/100000/meta)
Author: beckysiegel <1030207@173816e5-2b9a-37c3-8a2e-48639d4f1153>
Date: Sat Mar 3 18:38:17 2018 +0000
Update patch set 1
Hashtag added: enhancement
Patch-set: 1
Hashtags: enhancement
Tag: autogenerated:gerrit:setHashtag
20
Gerrit User Summit 2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 20
Solution: allow parsing of foreign NoteDb
Dec 2018: proposed - 9 months later, rejected and
abandoned
21
Gerrit User Summit 2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 21
Solution revamped: Dave shows another way
Aug 2019: proposed - 1 month later, merged! (thanks, DavidO)
22
Gerrit User Summit 2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 22
Analytics deployment on GerritHub
Gerrit master
(review-3)
World
Traffic (R/W)
Gerrit master
(review-4)
HAproxy HAproxy
Gerrit master
(review-1)
Gerrit master
(review-2)
HAproxy HAproxy
Analytics Traffic (R/W)
Multi-site plugin
Multi-site plugin
Multi-site plugin
Multi-site plugin
23
Gerrit User Summit 2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 23
Analytics next steps
1. Enable pull-replication on review-
{3,4}.gerrithub.io
2. Process AOSP repos contributors analytics
once-a-day
3. Publish commits stats to
analytics.gerrithub.io
4. Enrich with hashtags change data
24
Gerrit User Summit 2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 24
Poll: have you implemented multi-master/HA?
Image from: http://cypp.rutgers.edu/ru-voting/political-information/public-opinion-polls/
25
Gerrit User Summit 2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 25
Wants to know more?
GerritForge.com/contact

Gerrit Analytics applied to Android source code

  • 1.
    0 Gerrit User Summit2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 0 Gerrit Code Analytics for the Android OpenSource Project Luca Milanesio Gerrit Code Review Maintainer GerritForge
  • 2.
    1 Gerrit User Summit2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 1 About GerritForge Founded in the UK HQ in London Committed to OpenSource + Sunnyvale CA
  • 3.
    2 Gerrit User Summit2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 2 Gerrit DevOps Analytics § There’s a lot value in your DevOps pipeline § Information collected from Git, Jenkins, Jira, you name it § Discover and publish meaningful KPI to make intelligent decisions about § People § Projects § Infrastructure § Lower the Risk of a software release leveraging insights on historical data
  • 4.
    3 Gerrit User Summit2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 3 Continuous Delivery Analytics Dimensions People Reviews Projects Commits System Metrics
  • 5.
    4 Gerrit User Summit2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 4 BigData to the rescue § Collect all review events § Collect all logs § Channel them to a central store § Crunch and Crunch continuously § Never delete § Process, inspect and learn
  • 6.
    5 Gerrit User Summit2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 5 GDA Components The main components of GDA are: § GDA Event Collector (Plugin) This allows for data to be extracted, anonymized and sent over to the next phase. § GDA ELT Engine This is hosted in the cloud by GerritForge or on- premises and functions as data mart and processing for all development related data § GDA Dashboard(s) These are provided by GF according to the customer needs. Some dashboards are already available (for people and projects). Others will be built on purpose.
  • 7.
    6 Gerrit User Summit2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 6 Android OpenSource Project use-case Issue 10597: AOSP Gerrit stats page Reported by zoran.jovanovic@sony.com on Wed, Mar 13, 2019, 9:02 AM PDT AOSP repository and its Gerrit Code Review are a treasure trove of data. There are some very interesting and useful stats that could be presented to the users. The stats would help in giving recognition for the contributors and reviewers alike, thus raising the motiviation of contributors, it would provide an easy access to the long and rich history of AOSP project etc.
  • 8.
    7 Gerrit User Summit2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 7 The problems to be resolved P1: How to mirror the AOSP repositories in a systematic way? P2: How to scale up the current Gerrit analytics plugin + ETL? P3: How to parse foreign Gerrit note-db data?
  • 9.
    8 Gerrit User Summit2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 8 Problem 1: AOSP repositories replication Gerrit has a replication plugin: • Define replication remotes • Define push ref-spec How to pull instead? Do you replicate the AOSP repos on your Gerrit? How do you automate the process?
  • 10.
    9 Gerrit User Summit2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 9 Problem 1 solved: pull-replication plugin
  • 11.
    10 Gerrit User Summit2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 10 Challenges with pull-replication plugin 1. Why not extending the replication plugin? • replication.config already defines replication remotes • Can the push logic be extended? 2. Why not developing a brand-new plugin? • Copy & paste existing code, keeping the essential? … or a mix of 1. and 2. ?
  • 12.
    11 Gerrit User Summit2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 11 Design decisions Refactoring of replication plugin • Decouple configuration from remote destination • Started in April, merged in October (6 months!) • Introduced unit and integration tests (the firsts in 7 years !) Reuse of the replication plugin logic in the pull-replication plugin • Use replication-plugin as a dependency • Complete reuse of the configuration and logic associated
  • 13.
    12 Gerrit User Summit2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 12 pull-replication demo $GERRIT_SITE/etc/replication.config [remote "aosp"] url = https://android.googlesource.com/${name} fetch = +refs/heads/*:refs/heads/* fetch = +refs/tags/*:refs/tags/* projects = platform/system/core
  • 14.
    13 Gerrit User Summit2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 13 pull-replication demo ssh localhost pull-replication start --all ==> logs/pull_replication_log <== [2019-11-16 23:59:22,712] [49829e49] Replication from https://android.googlesource.com/platform/system/core started... [2019-11-16 23:59:22,753] [49829e49] Fetch references [+refs/heads/*:refs/heads/*, +refs/tags/*:refs/tags/*] from https://android.googlesource.com/platform/system/core [2019-11-16 23:59:25,243] [49829e49] Replication from https://android.googlesource.com/platform/system/core completed in 2530ms, 15012ms delay, 0 retries
  • 15.
    14 Gerrit User Summit2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 14 Problem 2: AOSP repositories are BIG Gerrit analytics plugin slowest points: • Processing of branches • Binary files • Commits diffs and stats computation
  • 16.
    15 Gerrit User Summit2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 15 Problem 2: solved Gerrit analytics plugin performance improvements: • Pre-computation of branches • Reuse of the Gerrit diff-cache for analytics • Introduction of the analytics- commits_statistics_cache
  • 17.
    16 Gerrit User Summit2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 16 Analytics performance demo Gerrit cold start (no cache): $ time curl -v 'http://localhost:8080/projects/platform%2Fsystem%2Fcore/analyt ics~contributors' | wc –l 1732 curl -v 0.02s user 0.03s system 0% cpu 1:30.60 total
  • 18.
    17 Gerrit User Summit2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 17 Analytics added cache and performance $ ssh localhost gerrit show-caches | grep analytics analytics-commits_statistics_cache| 54011 | 1.6ms | 0% | $ time curl -v 'http://localhost:8080/projects/platform%2Fsystem%2Fcore/analytics~contributors ' | wc –l 1732 curl -v 0.01s user 0.02s system 2% cpu 1.045 total 90x times faster with commits-stats cache
  • 19.
    18 Gerrit User Summit2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 18 Analytics bot detection People or BOTs can be recognized by patter of commits: • Detect file type by reg-ex • Identify commits with BOT-like files only $GERRIT_SITE/etc/analytics.config: [contributors] botlike-filename-regexp = OWNERS
  • 20.
    19 Gerrit User Summit2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 19 Problem 3: foreign changes processing Gerrit changes in NoteDb have a Server-ID baked-in Example: $ git show refs/changes/00/100000/meta commit 0014ca6443ac0af338e2677b45e538782bb7a12e (origin/00/100000/meta, refs/changes/00/100000/meta) Author: beckysiegel <1030207@173816e5-2b9a-37c3-8a2e-48639d4f1153> Date: Sat Mar 3 18:38:17 2018 +0000 Update patch set 1 Hashtag added: enhancement Patch-set: 1 Hashtags: enhancement Tag: autogenerated:gerrit:setHashtag
  • 21.
    20 Gerrit User Summit2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 20 Solution: allow parsing of foreign NoteDb Dec 2018: proposed - 9 months later, rejected and abandoned
  • 22.
    21 Gerrit User Summit2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 21 Solution revamped: Dave shows another way Aug 2019: proposed - 1 month later, merged! (thanks, DavidO)
  • 23.
    22 Gerrit User Summit2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 22 Analytics deployment on GerritHub Gerrit master (review-3) World Traffic (R/W) Gerrit master (review-4) HAproxy HAproxy Gerrit master (review-1) Gerrit master (review-2) HAproxy HAproxy Analytics Traffic (R/W) Multi-site plugin Multi-site plugin Multi-site plugin Multi-site plugin
  • 24.
    23 Gerrit User Summit2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 23 Analytics next steps 1. Enable pull-replication on review- {3,4}.gerrithub.io 2. Process AOSP repos contributors analytics once-a-day 3. Publish commits stats to analytics.gerrithub.io 4. Enrich with hashtags change data
  • 25.
    24 Gerrit User Summit2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 24 Poll: have you implemented multi-master/HA? Image from: http://cypp.rutgers.edu/ru-voting/political-information/public-opinion-polls/
  • 26.
    25 Gerrit User Summit2019 – GerritForge Inc. – Sunnyvale CA GerritForge.com 25 Wants to know more? GerritForge.com/contact