Concepts¶

Lab is centred around three core concepts: Reproducibility, Logging, and Model Persistence. Lab is designed to integrate with your existing training scripts, with imposing as few constraints as possible.

Reproducibility¶

Lab Projects are designed to be shared and re-used. This feature makes havy use of Python’s virtualenv module, enabling users to precisely define modules and environments that are required to run the associated experiments.

Every Project is initiated using a requirements.txt file.

Logging¶

Lab was designed to benchmark multiple predictive models and hyperparameters. To accomplish this, it implements a simple API that stores:

Feature names
Hyperparameters
Performance metrics
Model files

Model Persistence¶

Models are logged using the joblib module. This applies to both sklearn and keras experiments. This simple structure allows for a quick performance assessment and deployment of a model of choice into production.

Example Use Cases¶

At Bering, we use Lab for a number of use cases:

Data Scientists track individual experiments locally on their machine, consistently organising all files and artefacts for reproducibility. By setting up a naming schema, Teams can work together on the same datasets to benchmark performance of novel ML algorithms.

Production Engineers assess model performances and decide on the best possible model to be served in production environments. Lab’s strict model versioning serves as a link between research and development environment and evolving production components.

ML Researchers can publish code to GitHub as a Lab Project, making it easy for others to reproduce findings.