--- Authors: - Ayaz Akram - Jason Lowe-Power --- # Artifacts ## gem5art artifacts All unique objects used during gem5 experiments are termed "artifacts" in gem5art. Examples of artifacts include: gem5 binary, gem5 source code repo, Linux kernel source repo, linux binary, disk image, and packer binary (used to build the disk image). The goal of this infrastructure is to keep a record of all the artifacts used in a particular experiment and to return the set of used artifacts when the same experiment needs to be performed in the future. The description of an artifact serves as the documentation of how that artifact was created. One of the goals of gem5art is for these artifacts to be self contained. With just the metadata stored with the artifact a third party should be able to perfectly reproduce the artifact. (We are still working toward this goal. For instance, we are looking into using docker to create artifacts to separate artifact creation from the host platform its run on.) Each artifact is characterized by a set of attributes, described below: - command: command used to build this artifact - typ: type of the artifact e.g. binary, git repo etc. - name: name of the artifact - cwd: current working directory, where the command to build the artifact is run - path: actual path of the location of the artifact - inputs: a list of the artifacts used to build the current artifact - documentation: a docstring explaining the purpose of the artifact and any other useful information that can help to reproduce the artifact Additionally, each artifact also has the following implicit information. - hash: an MD5 hash for a binary artifact or a git hash for a git artifact - time: time of the creation of an artifact - id: a UUID associated with the artifact - git: a dictionary containing the origin, current commit and the repo name for a git artifact (will be an empty dictionary for other types of artifacts) These attribute are not specified by the user, but are generated by gem5art automatically (when the `Artifact` object is created for the first time). An example of how a user would create a gem5 binary artifact using gem5art is shown below. In this example, the type, name, and documentation are up to the user of gem5art. You're encouraged to use names that are easy to remember when you later query the database. The documentation attribute should be used to completely describe the artifact that you are saving. ```python gem5_binary = Artifact.registerArtifact( command = 'scons build/X86/gem5.opt', typ = 'gem5 binary', name = 'gem5', cwd = 'gem5/', path = 'gem5/build/X86/gem5.opt', inputs = [gem5_repo,], documentation = ''' Default gem5 binary compiled for the X86 ISA. This was built from the main gem5 repo (gem5.googlesource.com) without any modifications. We recently updated to the current gem5 master which has a fix for memory channel address striping. ''' ) ``` Another goal of gem5art is to enable sharing of artifacts among multiple users, which is achieved through the use of the centralized database. Basically, whenever a user tries to create a new artifact, the database is searched to find if the same artifact exists there. If it does, the user can download the matching artifact for use. Otherwise, the newly created artifact is uploaded to the database for later use. The use of database also avoids running identical experiments (by generating an error message if a user tries to execute exact run which already exists in the database). ### Creating artifacts To create an `Artifact`, you must use [`registerArtifact`](artifacts.html#gem5art.artifact.artifact.Artifact.registerArtifact) as shown in the above example as well. This is a factory method which will initially create the artifact. When calling `registerArtifact`, the artifact will automatically be added to the database. If it already exists, a pointer to that artifact will be returned. The parameters to the `registerArtifact` function are meant for *documentation*, not as explicit directions to create the artifact from scratch. In the future, this feature may be added to gem5art. Note: While creating new artifacts, warning messages showing that certain attributes (except hash and id) of two artifacts don't match (when artifact similarity is checked in the code) might appear. Users should make sure that they understand the reasons of any such warnings. ### Using artifacts from the database You can create an artifact with just a UUID if it is already stored in the database. The behavior will be the same as when creating an artifact that already exists. All of the properties of the artifact will be populated from the database. ## ArtifactDB The particular database used in this work is [MongoDB](https://www.mongodb.com/). We use MongoDB since it can easily store large files (e.g., disk images), is tightly integrated with Python through [pymongo](https://api.mongodb.com/python/current/), and has an interface that is flexible as the needs of gem5art changes. Currently, it's required to run a database to use gem5. However, we are planning on changing this default to allow gem5art to be used standalone as well. gem5art allows you to connect to any database, but by default assumes there is a MongoDB instance running on the localhost at `mongodb://localhost:27017`. You can use the environment variable `GEM5ART_DB` to specify the default database to connect when running simple scripts. Additionally, you can specify the location of the database when calling `getDBConnection` in your scripts. In case no database exists or a user want their own database, you can create a new database by creating a new directory and running the mongodb docker image. See the [MongoDB docker documentation](https://hub.docker.com/_/mongo) or the [MongoDB documentation](https://docs.mongodb.com/) for more information. ```sh `docker run -p 27017:27017 -v :/data/db --name mongo- -d mongo` ``` This uses the official [MongoDB Docker image](https://hub.docker.com/_/mongo) to run the database at the default port on the localhost. If the Docker container is killed, it can be restarted with the same command line and the database should be consistent. ### Connecting to an existing database By default, gem5art will assume the database is running at `mongodb://localhost:27017`, which is MongoDB's default on the localhost. The environment variable `GEM5ART_DB` can override this default. Otherwise, to programmatically set a database URI when using gem5art, you can pass a URI to the `getDatabaseConnection` function. Currently, gem5art only supports MongoDB database backends, but extending this to other databases should be straightforward. ### Searching the Database gem5art provides a few convience functions for searching and accessing the database. These functions can be found in `artifact.common_queries`. Specifically, we provide the following functions: - `getByName`: Returns all objects mathching `name` in database. - `getDiskImages`: Returns a generator of disk images (type = disk image). - `getLinuxBinaries`: Returns a generator of Linux kernel binaries (type = kernel). - `getgem5Binaries`: Returns a generator of gem5 binaries (type = gem5 binary). ### Downloading from the Database You can also download a file associated with an artifact using functions provided by gem5art. A good way to search and download items from the database is by using the Python interactive shell. You can search the database with the functions provided by the `artifact` module (e.g., [`getByName`](artifacts.html#gem5art.artifact.artifact.getByName), [`getByType`](artifacts.html#gem5art.artifact.artifact.getByType), etc.). Then, once you've found the ID of the artifact you'd like to download, you can call [`downloadFile`](artifacts.html#gem5art.artifact._artifactdb.ArtifactDB.downloadFile). See the example below. ```sh $ python Python 3.6.8 (default, Oct 7 2019, 12:59:55) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from gem5art.artifact import * >>> db = getDBConnection() >>> for i in getDiskImages(db, limit=2): print(i) ... ubuntu id: d4a54de8-3a1f-4d4d-9175-53c15e647afd type: disk image path: disk-image/ubuntu-image/ubuntu inputs: packer:fe8ba737-ffd4-44fa-88b7-9cd072f82979, fs-x86-test:94092971-4277-4d38-9e4a-495a7119a5e5, m5:69dad8b1-48d0-43dd-a538-f3196a894804 Ubuntu with m5 binary installed and root auto login ubuntu id: c54b8805-48d6-425d-ac81-9b1badba206e type: disk image path: disk-image/ubuntu-image/ubuntu inputs: packer:fe8ba737-ffd4-44fa-88b7-9cd072f82979, fs-x86-test:5bfaab52-7d04-49f2-8fea-c5af8a7f34a8, m5:69dad8b1-48d0-43dd-a538-f3196a894804 Ubuntu with m5 binary installed and root auto login >>> for i in getLinuxBinaries(db, limit=2): print(i) ... vmlinux-5.2.3 id: 8cfd9fbe-24d0-40b5-897e-beca3df80dd2 type: kernel path: linux-stable/vmlinux-5.2.3 inputs: fs-x86-test:94092971-4277-4d38-9e4a-495a7119a5e5, linux-stable:25feca9a-3642-458e-a179-f3705266b2fe Kernel binary for 5.2.3 with simple config file vmlinux-5.2.3 id: 9721d8c9-dc41-49ba-ab5c-3ed169e24166 type: kernel path: linux-stable/vmlinux-5.2.3 inputs: npb:85e6dd97-c946-4596-9b52-0bb145810d68, linux-stable:25feca9a-3642-458e-a179-f3705266b2fe Kernel binary for 5.2.3 with simple config file >>> from uuid import UUID >>> db.downloadFile(UUID('8cfd9fbe-24d0-40b5-897e-beca3df80dd2'), 'linux-stable/vmlinux-5.2.3') ``` For another example, assume there is a disk image named `npb` (containing [NAS Parallel](https://www.nas.nasa.gov/) Benchmarks) in your database and you want to download the disk image to your local directory. You can do the following to download the disk image: ```python import gem5art.artifact db = gem5art.artifact.getDBConnection() disks = gem5art.artifact.getByName(db, 'npb') for disk in disks: if disk.type == 'disk image' and disk.documentation == 'npb disk image created on Nov 20': db.downloadFile(disk._id, 'npb') ``` Here, we assume that there can be multiple disk images/artifacts with the name `npb` and we are only interested in downloading the npb disk image with a particular documentation ('npb disk image created on Nov 20'). Also, note that there is not a single way to download files from the database (although they will eventually use the downloadFile function). The dual of the [downloadFile](artifacts.html#gem5art.artifact._artifactdb.ArtifactDB.downloadFile) method used above is [upload](artifacts.html#gem5art.artifact._artifactdb.ArtifactDB.upload). #### Database schema Alternative, you can use the pymongo Python module or the mongodb command line interface to interact with the database. See the [MongoDB documentation](https://docs.mongodb.com/) for more information on how to query the MongoDB database. gem5art has two collections. `artifact_database.artifacts` stores all of the metadata for the artifacts and `artifact_database.fs` is a [GridFS](https://docs.mongodb.com/manual/core/gridfs/) store for all of the files. The files in the GridFS use the same UUIDs as the Artifacts as their primary keys. You can list all of the details of all of the artifacts by running the following in Python. ```python #!/usr/bin/env python3 from pymongo import MongoClient db = MongoClient().artifact_database for i in db.artifacts.find(): print(i) ``` gem5art also provides a few methods to search the database for artifacts of a particular type or name. For example, to find all disk images in a database you can do the following: ```python import gem5art.artifact db = gem5art.artifact.getDBConnection('mongodb://localhost') for i in gem5art.artifact.getDiskImages(db): print(i) ``` Other similar methods include: `getLinuxBinaries()`, `getgem5Binaries()` You can use getByName() method to search database for artifacts using the name attribute. For example, to search for gem5 named artifacts: ```python import gem5art.artifact db = gem5art.artifact.getDBConnection('mongodb://localhost') for i in gem5art.artifact.getByName(db, "gem5"): print(i) ``` ## Artifacts API Documentation ```eval_rst Artifact Module -------- .. automodule:: gem5art.artifact :members: Artifact -------- .. automodule:: gem5art.artifact.artifact :members: :undoc-members: Artifact -------- .. automodule:: gem5art.artifact.artifact.Artifact :members: :undoc-members: Helper Functions for Common Queries ----------------------------------- .. automodule:: gem5art.artifact.common_queries :members: :undoc-members: AritifactDB ----------- This is mostly internal. .. automodule:: gem5art.artifact._artifactdb :members: :undoc-members: ```