How I finally tracked my Debian uploads correctly

A long time ago, I became aware of UDD (Ultimate Debian Database), which gathers various Debian data into a single SQL database.

At that time, we were trying to do something simple: list the contributions (package uploads) of our local community, Debian Brasília. We ended up with a script that counted uploads to unstable and experimental.

I was never satisfied with the final result because some uploads were always missing. Here is an example:

debci (3.0) experimental; urgency=medium
...
   [ Sergio de almeida cipriano Junior ]
   * Fix Style/GlovalVars issue
   * Rename blacklist to rejectlist
...

I made changes in debci 3.0, but the upload was done by someone else. This kind of contribution cannot be tracked by that script.

Then, a few years ago, I learned about Minechangelogs, which allows us to search through the changelogs of all Debian packages currently published.

Today, I decided to explore how this was done, since I couldn't find anything useful for that kind of query in UDD's tables.

That's when I came across ProjectB. It was my first time hearing about it. ProjectB is a database that stores all the metadata about the packages in the Debian archive, including the changelogs of those packages.

Now that I'm a Debian Developer, I have access to this database. If you also have access and want to try some queries, you can do this:

$ ssh <username>@mirror.ftp-master.debian.org -N -L 15434:danzi.debian.org:5435
$ psql postgresql://guest@localhost:15434/projectb?sslmode=allow

In the end, it finally solved my problem.

Using the code below, with UDD, I get 38 uploads:

import psycopg2

contributor = 'almeida cipriano'

try:
    connection = psycopg2.connect(
        user="udd-mirror",
        password="udd-mirror",
        host="udd-mirror.debian.net",
        port="5432",
        database="udd"
    )

    cursor = connection.cursor()

    query = f"SELECT source,version,date,distribution,signed_by_name \
FROM public.upload_history \
WHERE changed_by_name ILIKE '%{contributor}%' \
ORDER BY date;"

    cursor.execute(query)
    records = cursor.fetchall()

    print(f"I have {len(records)} uploads.")

    cursor.close()
    connection.close()

except (Exception, psycopg2.Error) as error:
    print("Error while fetching data from PostgreSQL", error)

Using the code bellow, with ProjectB, I get 43 uploads (the correct amount):

import psycopg2

contributor = 'almeida cipriano'

try:
    # SSH tunnel is required to access the database:
    # ssh <username>@mirror.ftp-master.debian.org -N -L 15434:danzi.debian.org:5435
    connection = psycopg2.connect(
        user="guest",
        host="localhost",
        port="15434",
        database="projectb",
        sslmode="allow"
    )
    connection.set_client_encoding('UTF8')

    cursor = connection.cursor()

    query = f"SELECT c.source, c.version, c.changedby \
FROM changes c \
JOIN changelogs ch ON ch.id = c.changelog_id \
WHERE c.source != 'debian-keyring' \
  AND (\
    ch.changelog ILIKE '%{contributor}%' \
    OR c.changedby ILIKE '%{contributor}%' \
  )\
ORDER BY c.seen;"

    cursor.execute(query)
    records = cursor.fetchall()

    print(f"I have {len(records)} uploads.")

    cursor.close()
    connection.close()

except (Exception, psycopg2.Error) as error:
    print("Error while fetching data from PostgreSQL", error)

It feels good to finally solve this itch I've had for years.


Written on 2025-07-05.