Analyzing Debian packages with Neo4j - Part 2 UDD and Graph DB Schema
![](/wp/wp-content/uploads/2020/01/1118-1024x374-1.jpg)
The Ultimate Debian Database UDD
The current second part first describes the Ultimate Debian Database UDD and how to map the information presented here from the UDD into a Graph Database by developing the database scheme, that is the set of nodes and relations, together with their attributes, from the inherent properties of Debian packages.
The next part will describe how to get the data from the UDD into Neo4j, give some sample queries, and discuss further work.
The Ultimate Debian Database UDD
gathers a lot of data about various aspects of Debian in the same SQL database. It allows users to
easily access and combine all these data. Data currently being imported include: Packages and
Sources files, from Debian and Ubuntu, Bugs from the Debian BTS, Popularity contest, History of
uploads, History of migrations to testing, Lintian, Orphaned packages, Carnivore, Debtags, Ubuntu
bugs (from Launchpad), Packages in NEW queue, DDTP translations. Debian Wiki
Collection all these information and obviously having been grown over time, the database exhibits a highly de-normalized structure with ample duplication of the same information. As a consequence, reading the SQL code fetching data from the UDD and presenting them in a coherent interface tends to be highly convoluted.
This lets us to the project of putting (parts) of the UDD into a graph database, removing all the duplication on the way and representing the connections between the entities in a natural graph way.
Developing the database schema
Considering dependencies between Debian packages we recall the fact that there are versioned and unversioned dependencies. We thus decide to have again different entities for versioned source and binary packages, and unversioned source and binary packages.
The above considerations leads to the following sets of nodes and relations:
vsp -[:is_instance_of]-> sp
vbp -[:is_instance_of]-> bp
sp -[:builds]-> bp
vbp -[:next]-> vbp
vsp -[:next]-> vsp
where vsp stands for versioned source package, sp for (unversioned) source package, and analog for binary packages. The versioned variants carry besides the name attribute also a version attribute in the node.
The relations are is_instance_of between versioned and unversioned packages, builds between versioned source and versioned binary packages, and next that defines an order on the versions.
An example of a simple graph for the binary package luasseq which has was originally built from the source package luasseq but was then taken over into the TeX Live packages and built from a different source.
![](/wp/wp-content/uploads/2017/12/simple-graph.png)
Next we want to register suites, that is associating which package has been included in which release of Debian. Thus we add a new node type suite and a new relation contains which connects suites and versioned binary packages vbp:
Nodes of type suite only contain one attribute name. We could add release dates etc, but refrained from it for now. Adding the suites to the above diagram we obtain the following:
![](/wp/wp-content/uploads/2017/12/add-releases.png)
Next we add maintainers. The new node type mnt has two attributes: name and email. Here it would be nice to add alternative email addresses as well as alternative spellings of the name, something that is quite common. We add a relation maintains to
mnt -[:maintains]-> vbp
mnt -[:maintains]-> vsp
This leads us to the following graph:
![](/wp/wp-content/uploads/2017/12/add-maintainer.png)
This concludes the first (easy) part with basic node types and relations. We now turn to the more complicated part to represent dependencies between packages in the graph.
Representing dependencies
vbp -[:depends reltype: TYPE, relversion: VERS]-> bp
Adding all the dependencies to the above graph, we obtain the following graph:
![](/wp/wp-content/uploads/2017/12/add-deps.png)
Our last step is dealing with alternative dependencies. Recall from the first blog that a relation between two Debian packages can have alternative targets like in
Depends: musixtex (>= 1:0.98-1) | texlive-music
which means that either musixtex or texlive-music needs to be installed to satisfy this dependency.
We treat these kind of dependencies by introducing a new node type altdep and a new relation is_satisfied_by between altdep nodes and versioned or unversioned binary packages (vbp, bp).
The following slice of our graph shows binary packages pmx which has alternative dependencies as above:
![](/wp/wp-content/uploads/2017/12/altdeps.png)
Summary of nodes, relations, and attributes
Nodes and attributes
・mnt: name, email
・bp, sp, suite, altdeps: name
・vbp, vsp: name, version
Relations and attributes
・breaks, build_conflicts, build_conflicts_indep, build_depends, build_depends_indep, conflicts, depends,
enhances, is_satisfied_by, pre_depends, provides, recommends, replaces, suggests
Attributes: reltype, relversion
・builds, contains, is_instance_of, maintains, next: no attributes
Next is ...
Now that we have the graph database schema set up, we need to pull data from the UDD and put them into the Graph database. This will be discussed in the next entry of this series.
■関連ページ
【アクセリアのサービス一覧】
・サービスNAVI
サービスにご興味をお持ちの方は
お気軽にお問い合わせください。
Webからお問い合わせ
お問い合わせお電話からお問い合わせ
平日09:30 〜 18:00
Free Service