Spotml Config file
SpotML needs a
spotml.yaml
configuration file in the root folder. An example configuration file looks like below:project:
name: mnist
maxIdleMinutes: 15
syncFilters:
- exclude:
- .git/*
- .idea/*
- '*/__pycache__/*'
containers:
- &DEFAULT_CONTAINER
projectDir: /workspace/project
# file: docker/Dockerfile
image: tensorflow/tensorflow:latest-py3
volumeMounts:
- name: workspace
mountPath: /workspace
env:
PYTHONPATH: /workspace/project
ports:
# tensorboard
- containerPort: 6006
hostPort: 6006
# jupyter
- containerPort: 8888
hostPort: 8888
instances:
- name: aws-1
provider: aws
parameters:
region: us-east-1
instanceType: t2.large
spotStrategy: on-demand
ports: [6006, 6007, 8888]
rootVolumeSize: 125
volumes:
- name: workspace
parameters:
size: 50
scripts:
train: |
python train.py
tensorboard: |
tensorboard --port 6006 --logdir results/
jupyter: |
CUDA_VISIBLE_DEVICES="" jupyter notebook --allow-root --ip 0.0.0.0
Name of the project, this name is used as a prefix in all the aws resources created.
project:
name: mnist
Maximum idle time before which instance must automatically be shut down. Set this to
0
to turn off idle time checking.project:
name: mnist
maxIdleMinutes: 15
SpotML periodically(every 5 mins) checks instances for idle time. It track them by checking if the docker instance has any active running commands or if there was any tty(keyboard) activity. If it finds no activity and no running commands for more than maxIdleMinutes it terminates the instance.
By default SpotML syncs all the files/folders in the project directory. You can use this to exclude files you don't want to sync to instance
project:
name: mnist
syncFilters:
- exclude:
- .git/*
- .idea/*
- '*/__pycache__/*'
Specify the docker image to use to launch the container. This works for simple cases where you don't need a custom Dockerfile with anything else installed.
containers:
- &DEFAULT_CONTAINER
projectDir: /workspace/project
image: tensorflow/tensorflow:latest-py3
When you need a custom Dockerfile, to customize the instance. Specify the path to the Dockerfile.
containers:
- &DEFAULT_CONTAINER
projectDir: /workspace/project
file: docker/Dockerfile
Environment variables available in the container
containers:
- &DEFAULT_CONTAINER
env:
PYTHONPATH: /workspace/project\
Ports that should be exposed in the container and the host instance so that you can access apps like jupyter notebook from your browser.
containers:
- &DEFAULT_CONTAINER
ports:
# tensorboard
- containerPort: 6006
hostPort: 6006
# jupyter
- containerPort: 8888
hostPort: 8888
An identifier for the aws resources created. This name is used as a prefix in all the aws resources created.
containers:
- &DEFAULT_CONTAINER
projectDir: /workspace/project
image: tensorflow/tensorflow:latest-py3
Right now we only support
aws
as the providerinstances:
- name: aws-1
provider: aws
instances:
- name: aws-1
provider: aws
parameters:
region: us-east-1
instanceType: t2.large
spotStrategy: on-demand
ports: [6006, 6007, 8888]
rootVolumeSize: 125
region - The aws region to create resources.
spotStrategy - Options for this is either
on-demand
or spot
.on-demand
- Launch an aws on demand instancespot
- Launch a spot instance only.
ports - Ports to be exposed in the aws instance.
rootVolumeSize - The size(GB) of the root EBS volume attached to instance. This will be destroyed after instance terminates
volumes - The persistent EBS volumes that are to be attached to the instance. These will not be destroyed after instance terminates. These are re-attached the next time the instance starts and the data is preserved.
instances:
- name: aws-1
provider: aws
parameters:
volumes:
- name: workspace
parameters:
size: 50
This section has script configurations that are used in the sptoML managed runs.
scripts:
train: |
python train.py
tensorboard: |
tensorboard --port 6006 --logdir results/
Last modified 1yr ago