Authors:
- Ayaz Akram
- Jason Lowe-Power
Artifacts¶
Introduction¶
As discussed before, all unique objects used during gem5 experiments are termed “artifacts” in gem5art. Examples of artifacts include: gem5 binary, gem5 source code repo, Linux kernel source repo, linux binary, disk image, and packer binary (used to build the disk image). The goal of this infrastructure is to keep a record of all the artifacts used in a particular experiment and to return the set of used artifacts when the same experiment needs to be performed in the future.
The description of an artifact serves as the documentation of how that artifact was created. One of the goals of gem5art is for these artifacts to be self contained. With just the metadata stored with the artifact a third party should be able to perfectly reproduce the artifact. (We are still working toward this goal. For instance, we are looking into using docker to create artifacts to separate artifact creation from the host platform its run on.)
Each artifact is characterized by a set of attributes, described below:
- command: command used to build this artifact
- typ: type of the artifact e.g. binary, git repo etc.
- name: name of the artifact
- cwd: current working directory, where the command to build the artifact is run
- path: actual path of the location of the artifact
- inputs: a list of the artifacts used to build the current artifact
- documentation: a docstring explaining the purpose of the artifact and any other useful information that can help to reproduce the artifact
Additionally, each artifact also has the following implicit information.
- hash: an MD5 hash for a binary artifact or a git hash for a git artifact
- time: time of the creation of an artifact
- id: a UUID associated with the artifact
- git: a dictionary containing the origin, current commit and the repo name for a git artifact (will be an empty dictionary for other types of artifacts)
These attribute are not specified by the user, but are generated by gem5art automatically (when the Artifact
object is created for the first time).
An example of how a user would create a gem5 binary artifact using gem5art is shown below. In this example, the type, name, and documentation are up to the user of gem5art. You’re encouraged to use names that are easy to remember when you later query the database. The documentation attribute should be used to completely describe the artifact that you are saving.
gem5_binary = Artifact.registerArtifact(
command = 'scons build/X86/gem5.opt',
typ = 'gem5 binary',
name = 'gem5',
cwd = 'gem5/',
path = 'gem5/build/X86/gem5.opt',
inputs = [gem5_repo,],
documentation = '''
Default gem5 binary compiled for the X86 ISA.
This was built from the main gem5 repo (gem5.googlesource.com) without
any modifications. We recently updated to the current gem5 master
which has a fix for memory channel address striping.
'''
)
Another goal of gem5art is to enable sharing of artifacts among multiple users, which is achieved through the use of the centralized database. Basically, whenever a user tries to create a new artifact, the database is searched to find if the same artifact exists there. If it does, the user can download the matching artifact for use. Otherwise, the newly created artifact is uploaded to the database for later use. The use of database also avoids running identical experiments (by generating an error message if a user tries to execute exact run which already exists in the database).
Creating artifacts¶
To create an Artifact
, you must use registerArtifact
as shown in the above example as well.
This is a factory method which will initially create the artifact.
TO DO: Add more details here.
Note: While creating new artifacts, warning messages showing that certain attributes (except hash and id) of two artifacts don’t match (when artifact similarity is checked in the code) might appear. Users should make sure that they understand the reasons of any such warnings.
Using artifacts from the database¶
You can create an artifact with just a UUID if it is already stored in the database.
ArtifactDB¶
The particular database used in this work is MongoDB. We use MongoDB since it can easily store large files (e.g., disk images), is tightly integrated with Python through pymongo, and has an interface that is flexible as the needs of gem5art changes.
Currently, it’s required to run a database to use gem5. However, we are planning on changing this default to allow gem5art to be used standalone as well.
gem5art assumes there is a MongoDB instance running on the localhost. In a future version, we will support accessing remote databases as well.
In case no database exists or a user want their own database, following steps should be taken to create a new database:
- Create a new directory
- Run:
`docker run -p 27017:27017 -v <absolute path to the created directory>:/data/db --name mongo-<some tag> -d mongo`
This uses the official MongoDB Docker image to run the database at the default port on the localhost. If the Docker container is killed, it can be restarted with the same command line and the database should be consistent.
Searching the Database¶
You use the pymongo Python module or the mongodb command line interface to interact with the database. See the MongoDB documentation for more information on how to query the MongoDB database.
gem5art has two collections.
artifact_database.artifacts
stores all of the metadata for the artifacts and artifact_database.fs
is a GridFS store for all of the files.
The files in the GridFS use the same UUIDs as the Artifacts as their primary keys.
You can list all of the details of all of the artifacts by running the following in Python.
#!/usr/bin/env python3
from pymongo import MongoClient
db = MongoClient().artifact_database
for i in db.artifacts.find():print(i)
gem5art also provides a few methods to search the database for artifacts of a particular type or name. For example, to find all disk images in a database you can do the following:
import gem5art.artifact
for i in gem5art.artifact.getDiskImages():print(i)
Other similar methods include: getLinuxBinaries(), getgem5Binaries()
You can use getByName() method to search database for artifacts using the name attribute. For example, to search for gem5 named artifacts:
import gem5art.artifact
for i in gem5art.artifact.getByName("gem5"):print(i)
Downloading from the Database¶
You can also download a file associated with an artifact using functions provided by gem5art. A good way to search and download items from the database is by using the Python interactive shell.
You can search the database with the functions provided by the artifact
module (e.g., getByName
, getByType
, etc.).
Then, once you’ve found the ID of the artifact you’d like to download, you can call downloadFile
.
See the example below.
$ python
Python 3.6.8 (default, Oct 7 2019, 12:59:55)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from gem5art.artifact import *
>>> db = getDBConnection()
>>> for i in getDiskImages(limit=2): print(i)
...
ubuntu
id: d4a54de8-3a1f-4d4d-9175-53c15e647afd
type: disk image
path: disk-image/ubuntu-image/ubuntu
inputs: packer:fe8ba737-ffd4-44fa-88b7-9cd072f82979, fs-x86-test:94092971-4277-4d38-9e4a-495a7119a5e5, m5:69dad8b1-48d0-43dd-a538-f3196a894804
Ubuntu with m5 binary installed and root auto login
ubuntu
id: c54b8805-48d6-425d-ac81-9b1badba206e
type: disk image
path: disk-image/ubuntu-image/ubuntu
inputs: packer:fe8ba737-ffd4-44fa-88b7-9cd072f82979, fs-x86-test:5bfaab52-7d04-49f2-8fea-c5af8a7f34a8, m5:69dad8b1-48d0-43dd-a538-f3196a894804
Ubuntu with m5 binary installed and root auto login
>>> for i in getLinuxBinaries(limit=2): print(i)
...
vmlinux-5.2.3
id: 8cfd9fbe-24d0-40b5-897e-beca3df80dd2
type: kernel
path: linux-stable/vmlinux-5.2.3
inputs: fs-x86-test:94092971-4277-4d38-9e4a-495a7119a5e5, linux-stable:25feca9a-3642-458e-a179-f3705266b2fe
Kernel binary for 5.2.3 with simple config file
vmlinux-5.2.3
id: 9721d8c9-dc41-49ba-ab5c-3ed169e24166
type: kernel
path: linux-stable/vmlinux-5.2.3
inputs: npb:85e6dd97-c946-4596-9b52-0bb145810d68, linux-stable:25feca9a-3642-458e-a179-f3705266b2fe
Kernel binary for 5.2.3 with simple config file
>>> from uuid import UUID
>>> db.downloadFile(UUID('8cfd9fbe-24d0-40b5-897e-beca3df80dd2'), 'linux-stable/vmlinux-5.2.3')
For another example, assume there is a disk image named npb
(containing NAS Parallel Benchmarks) in your database and you want to download the disk image to your local directory. You can do the following to download the disk image:
import gem5art.artifact
db = gem5art.artifact.getDBConnection()
disks = gem5art.artifact.getByName('npb')
for disk in disks:
if disk.type == 'disk image' and disk.documentation == 'npb disk image created on Nov 20':
db.downloadFile(disk._id, 'npb')
Here, we assume that there can be multiple disk images/artifacts with the name npb
and we are only interested in downloading the npb disk image with a particular documentation (‘npb disk image created on Nov 20’). Also, note that there is not a single way to download files from the database (although they will eventually use the downloadFile function).
The dual of the downloadFile method used above is upload.
Artifacts API Documentation¶
Artifact Module¶
This is the gem5 artifact package
Artifact¶
File contains the Artifact class and helper functions
-
class
gem5art.artifact.artifact.
Artifact
(other: Union[str, uuid.UUID, Dict[str, Any]])¶ A base artifact class. It holds following attributes of an artifact:
- name: name of the artifact
- command: bash command used to generate the artifact
- path: path of the location of the artifact
- time: time of creation of the artifact
- documentation: a string to describe the artifact
- ID: unique identifier of the artifact
- inputs: list of the input artifacts used to create this artifact stored as a list of uuids
-
classmethod
registerArtifact
(command: str, name: str, cwd: str, typ: str, path: Union[str, pathlib.Path], documentation: str, inputs: List[Artifact] = []) → gem5art.artifact.artifact.Artifact¶ Constructs a new artifact.
This assume either it’s not in the database or it is the exact same as when it was added to the database
-
gem5art.artifact.artifact.
getByName
(name: str, limit: int = 0) → Iterator[gem5art.artifact.artifact.Artifact]¶ Returns all objects mathching name in database.
Limit specifies the maximum number of results to return.
-
gem5art.artifact.artifact.
getDiskImages
(limit: int = 0) → Iterator[gem5art.artifact.artifact.Artifact]¶ Returns a generator of disk images (type = disk image).
Limit specifies the maximum number of results to return.
-
gem5art.artifact.artifact.
getGit
(path: pathlib.Path) → Dict[str, str]¶ Returns dictionary with origin, current commit, and repo name for the base repository for path. An exception is generated if the repo is dirty or doesn’t exist
-
gem5art.artifact.artifact.
getHash
(path: pathlib.Path) → str¶ Returns an md5 hash for the file in self.path.
-
gem5art.artifact.artifact.
getLinuxBinaries
(limit: int = 0) → Iterator[gem5art.artifact.artifact.Artifact]¶ Returns a generator of Linux kernel binaries (type = kernel).
Limit specifies the maximum number of results to return.
-
gem5art.artifact.artifact.
getgem5Binaries
(limit: int = 0) → Iterator[gem5art.artifact.artifact.Artifact]¶ Returns a generator of gem5 binaries (type = gem5 binary).
Limit specifies the maximum number of results to return.
Artifact¶
A base artifact class. It holds following attributes of an artifact:
- name: name of the artifact
- command: bash command used to generate the artifact
- path: path of the location of the artifact
- time: time of creation of the artifact
- documentation: a string to describe the artifact
- ID: unique identifier of the artifact
- inputs: list of the input artifacts used to create this artifact stored as a list of uuids
AritifactDB¶
This is mostly internal.
-
class
gem5art.artifact._artifactdb.
ArtifactDB
¶ Abstract base class for all artifact DBs.
-
downloadFile
(key: uuid.UUID, path: pathlib.Path) → None¶ Download the file with the _id key to the path. Will overwrite the file if it currently exists.
-
get
(key: Union[uuid.UUID, str]) → Dict[str, str]¶ Key can be a UUID or a string. Returns a dictionary to construct an artifact.
-
put
(key: uuid.UUID, artifact: Dict[str, Union[str, uuid.UUID]]) → None¶ Insert the artifact into the database with the key
-
searchByLikeNameType
(name: str, typ: str, limit: int) → Iterable[Dict[str, Any]]¶ Returns an iterable of all artifacts in the database that match some type and a regex name. Note: Not all DB implementations will implement this function
-
searchByName
(name: str, limit: int) → Iterable[Dict[str, Any]]¶ Returns an iterable of all artifacts in the database that match some name. Note: Not all DB implementations will implement this function
-
searchByNameType
(name: str, typ: str, limit: int) → Iterable[Dict[str, Any]]¶ Returns an iterable of all artifacts in the database that match some name and type. Note: Not all DB implementations will implement this function
-
searchByType
(typ: str, limit: int) → Iterable[Dict[str, Any]]¶ Returns an iterable of all artifacts in the database that match some type. Note: Not all DB implementations will implement this function
-
upload
(key: uuid.UUID, path: pathlib.Path) → None¶ Upload the file at path to the database with _id of key
-
-
class
gem5art.artifact._artifactdb.
ArtifactMongoDB
¶ This is a mongodb database connector for storing Artifacts (as defined in artifact.py).
This database stores the data in three collections: - artifacts: This stores the json serialized Artifact class - files and chunks: These two collections store the large files required
for some artifacts. Within the files collection, the _id is the UUID of the artifact.-
downloadFile
(key: uuid.UUID, path: pathlib.Path) → None¶ Download the file with the _id key to the path. Will overwrite the file if it currently exists.
-
get
(key: Union[uuid.UUID, str]) → Dict[str, str]¶ Key can be a UUID or a string. Returns a dictionary to construct an artifact.
-
put
(key: uuid.UUID, artifact: Dict[str, Union[str, uuid.UUID]]) → None¶ Insert the artifact into the database with the key
-
searchByLikeNameType
(name: str, typ: str, limit: int) → Iterable[Dict[str, Any]]¶ Returns an iterable of all artifacts in the database that match some type and a regex name.
-
searchByName
(name: str, limit: int) → Iterable[Dict[str, Any]]¶ Returns an iterable of all artifacts in the database that match some name.
-
searchByNameType
(name: str, typ: str, limit: int) → Iterable[Dict[str, Any]]¶ Returns an iterable of all artifacts in the database that match some name and type.
-
searchByType
(typ: str, limit: int) → Iterable[Dict[str, Any]]¶ Returns an iterable of all artifacts in the database that match some type.
-
upload
(key: uuid.UUID, path: pathlib.Path) → None¶ Upload the file at path to the database with _id of key
-
-
gem5art.artifact._artifactdb.
getDBConnection
(typ: Type[gem5art.artifact._artifactdb.ArtifactDB] = <class 'gem5art.artifact._artifactdb.ArtifactMongoDB'>) → gem5art.artifact._artifactdb.ArtifactDB¶ Returns the database connection
Eventually, this should likely read from a config file to get the database information. However, for now, we’ll use mongodb defaults