03-5211-7750 平日|09:30~18:00

Analyzing Debian packages with Neo4j - Part 2 UDD and Graph DB Schema

検討に役立つお役立ち資料を無料でご利用いただけます

お役立ち資料をダウンロードFREE

The Ultimate Debian Database UDD

In the first part of this series of articles on analyzing Debian packages with Neo4j we gave a short introduction to Debian and the life time and structure of Debian packages.(first part)

The current second part first describes the Ultimate Debian Database UDD and how to map the information presented here from the UDD into a Graph Database by developing the database scheme, that is the set of nodes and relations, together with their attributes, from the inherent properties of Debian packages.

The next part will describe how to get the data from the UDD into Neo4j, give some sample queries, and discuss further work.

The Ultimate Debian Database UDD

 gathers a lot of data about various aspects of Debian in the same SQL database. It allows users to
 easily access and combine all these data. Data currently being imported include: Packages and
 Sources files, from Debian and Ubuntu, Bugs from the Debian BTS, Popularity contest, History of
 uploads, History of migrations to testing, Lintian, Orphaned packages, Carnivore, Debtags, Ubuntu
 bugs (from Launchpad), Packages in NEW queue, DDTP translations. Debian Wiki

Collection all these information and obviously having been grown over time, the database exhibits a highly de-normalized structure with ample duplication of the same information. As a consequence, reading the SQL code fetching data from the UDD and presenting them in a coherent interface tends to be highly convoluted.

This lets us to the project of putting (parts) of the UDD into a graph database, removing all the duplication on the way and representing the connections between the entities in a natural graph way.

Developing the database schema

Recall from the first column that there are source packages and binary packages in Debian, and that the same binary package can be built in different versions from different source packages. Thus we decided to have both source and binary packages as separate entities, that is nodes of the graph, and the two being connected via a binary relation builds.

Considering dependencies between Debian packages we recall the fact that there are versioned and unversioned dependencies. We thus decide to have again different entities for versioned source and binary packages, and unversioned source and binary packages.

The above considerations leads to the following sets of nodes and relations:

vsp -[:is_instance_of]-> sp
vbp -[:is_instance_of]-> bp
sp -[:builds]-> bp
vbp -[:next]-> vbp
vsp -[:next]-> vsp

where vsp stands for versioned source package, sp for (unversioned) source package, and analog for binary packages. The versioned variants carry besides the name attribute also a version attribute in the node.
The relations are is_instance_of between versioned and unversioned packages, builds between versioned source and versioned binary packages, and next that defines an order on the versions.

An example of a simple graph for the binary package luasseq which has was originally built from the source package luasseq but was then taken over into the TeX Live packages and built from a different source.



Next we want to register suites, that is associating which package has been included in which release of Debian. Thus we add a new node type suite and a new relation contains which connects suites and versioned binary packages vbp:
suite -[:contains]-> vbp

Nodes of type suite only contain one attribute name. We could add release dates etc, but refrained from it for now. Adding the suites to the above diagram we obtain the following:



Next we add maintainers. The new node type mnt has two attributes: name and email. Here it would be nice to add alternative email addresses as well as alternative spellings of the name, something that is quite common. We add a relation maintains to versioned source and binary packages only since, as we have seen, the maintainership can change over the history of a package:

mnt -[:maintains]-> vbp
mnt -[:maintains]-> vsp

This leads us to the following graph:

This concludes the first (easy) part with basic node types and relations. We now turn to the more complicated part to represent dependencies between packages in the graph.

Representing dependencies

For simple dependencies (versioned or unversioned, but no alternatives) we represent the dependency relation with two attributes reltype and relversion specifying the relation type (<<, <=, ==, >=, >>) and the version as string. For unversioned relations we use the reltype=none and relversion=1:

vbp -[:depends reltype: TYPE, relversion: VERS]-> bp

Adding all the dependencies to the above graph, we obtain the following graph:


Our last step is dealing with alternative dependencies. Recall from the first blog that a relation between two Debian packages can have alternative targets like in

Depends: musixtex (>= 1:0.98-1) | texlive-music

which means that either musixtex or texlive-music needs to be installed to satisfy this dependency.

We treat these kind of dependencies by introducing a new node type altdep and a new relation is_satisfied_by between altdep nodes and versioned or unversioned binary packages (vbp, bp).

The following slice of our graph shows binary packages pmx which has alternative dependencies as above:

Summary of nodes, relations, and attributes

Let us summarize the node types, relations types, and respective attributes we have deduced from the requirements and data in the Debian packages:

Nodes and attributes
 ・mnt: name, email
 ・bp, sp, suite, altdeps: name
 ・vbp, vsp: name, version

Relations and attributes
 ・breaks, build_conflicts, build_conflicts_indep, build_depends, build_depends_indep, conflicts, depends,
  enhances, is_satisfied_by, pre_depends, provides, recommends, replaces, suggests
  Attributes: reltype, relversion
 ・builds, contains, is_instance_of, maintains, next: no attributes


Next is ...
Now that we have the graph database schema set up, we need to pull data from the UDD and put them into the Graph database. This will be discussed in the next entry of this series.


■関連ページ
【アクセリアのサービス一覧】
 ・サービスNAVI

Norbert Preining

アクセリア株式会社 研究開発部 所属
北陸先端科学技術大学院大学ソフトウェア検証研究センター 研究員
ウィーン工科大学 研究員
Debian開発者
TeX User Group (取締役会員)、Kurt Gödel Society (取締役会員)
ACM, ACM SigLog, 日本数式処理学会、ドイツ数学論理学会
Contact usお問い合わせ

サービスにご興味をお持ちの方は
お気軽にお問い合わせください。

Webからお問い合わせ

お問い合わせ

お電話からお問い合わせ

03-5211-7750

平日09:30 〜 18:00

Download資料ダウンロード

製品紹介やお役立ち資料を無料でご活用いただけます。

Magazineメルマガ登録

最新の製品情報などタイムリーな情報を配信しています。

Free Service

PageSpeed Insights シミュレータ

CDNによるコンテンツの最適化を行った場合のPageSpeed Insightsのスコアをシミュレートしてレポートします。