README ¶
DockerHub Metadata
1.46 million Docker image configuration and manifest files on DockerHub fetched in June 2019.
A manifest points to the layers of an image and its configuration. A configuration carries all the
metadata: architecture, OS, environment variables, entry point, default command, etc., including the layer creation history.
The latter allows to reconstruct docker history
without having to pull images. As a whole, the provided information can be used to partially (no ADD
, stages) recover Dockerfile-s
for any image on DockerHub which has it.
The dataset consists of 2 files:
configs.tar.xz
- configuration JSON files, 16GB uncompressed.manifests.tar.xz
- manifest JSON files, 8.5GB uncompressed.
Format
The directory structure is the same for configurations and manifests. The top level directory is
the first two letters of the image name, the inner directories correspond to the name, including the /
.
:latest
is stripped from the file names.
Examples: the configuration for tensorflow/tensorflow:2.0.0b0
will be at
te/tensorflow/tensorflow:2.0.0b0.json
, and for mongo:latest
at mo/mongo.json
.
The manifest format is defined at https://docs.docker.com/registry/spec/manifest-v2-2 The configuration format is defined at https://github.com/moby/moby/blob/master/image/spec/v1.2.md
Origin
DockerHub API. We modified skopeo to fetch configurations
and manifests at blazing speed (less than 3 hours for the whole DockerHub), the modified source for
cmd/skopeo/inspect.go
is included into this repository. Image list fetcher is written in Python
an is also included.
How to reproduce:
pip3 install -r requirements.txt
python3 list_docker_images.py > images.txt
cp inspect.go /path/to/skopeo/cmd/skopeo/inspect.go
make -C /path/to/skopeo/ binary
cat images.txt | /path/to/skopeo/skopeo inspect
Limitations
- Only i386, amd64, arm and arm64 Linux images were considered.
- Custom image registries were not processed, e.g. microsoft-dotnet-core-samples.
License
Code: MIT. Compilation: Open Data Commons Open Database License (ODbL). Actual contents: DockerHub Terms of Service.
Documentation ¶
There is no documentation for this package.