Spot ML
Search…
Spotml Config file
SpotML needs a spotml.yaml configuration file in the root folder. An example configuration file looks like below:
1
project:
2
name: mnist
3
maxIdleMinutes: 15
4
syncFilters:
5
- exclude:
6
- .git/*
7
- .idea/*
8
- '*/__pycache__/*'
9
10
containers:
11
- &DEFAULT_CONTAINER
12
projectDir: /workspace/project
13
# file: docker/Dockerfile
14
image: tensorflow/tensorflow:latest-py3
15
volumeMounts:
16
- name: workspace
17
mountPath: /workspace
18
env:
19
PYTHONPATH: /workspace/project
20
ports:
21
# tensorboard
22
- containerPort: 6006
23
hostPort: 6006
24
# jupyter
25
- containerPort: 8888
26
hostPort: 8888
27
28
instances:
29
- name: aws-1
30
provider: aws
31
parameters:
32
region: us-east-1
33
instanceType: t2.large
34
spotStrategy: on-demand
35
ports: [6006, 6007, 8888]
36
rootVolumeSize: 125
37
volumes:
38
- name: workspace
39
parameters:
40
size: 50
41
42
scripts:
43
train: |
44
python train.py
45
46
tensorboard: |
47
tensorboard --port 6006 --logdir results/
48
49
jupyter: |
50
CUDA_VISIBLE_DEVICES="" jupyter notebook --allow-root --ip 0.0.0.0
51
Copied!

Project Parameters

name

Name of the project, this name is used as a prefix in all the aws resources created.
1
project:
2
name: mnist
Copied!

maxIdleMinutes

Maximum idle time before which instance must automatically be shut down. Set this to 0 to turn off idle time checking.
1
project:
2
name: mnist
3
maxIdleMinutes: 15
Copied!
SpotML periodically(every 5 mins) checks instances for idle time. It track them by checking if the docker instance has any active running commands or if there was any tty(keyboard) activity. If it finds no activity and no running commands for more than maxIdleMinutes it terminates the instance.

syncFilters (optional)

By default SpotML syncs all the files/folders in the project directory. You can use this to exclude files you don't want to sync to instance
1
project:
2
name: mnist
3
syncFilters:
4
- exclude:
5
- .git/*
6
- .idea/*
7
- '*/__pycache__/*'
Copied!

Container Parameters

image

Specify the docker image to use to launch the container. This works for simple cases where you don't need a custom Dockerfile with anything else installed.
1
containers:
2
- &DEFAULT_CONTAINER
3
projectDir: /workspace/project
4
image: tensorflow/tensorflow:latest-py3
Copied!

file (if above image is not specified)

When you need a custom Dockerfile, to customize the instance. Specify the path to the Dockerfile.
1
containers:
2
- &DEFAULT_CONTAINER
3
projectDir: /workspace/project
4
file: docker/Dockerfile
Copied!

env

Environment variables available in the container
1
containers:
2
- &DEFAULT_CONTAINER
3
env:
4
PYTHONPATH: /workspace/project\
Copied!

ports

Ports that should be exposed in the container and the host instance so that you can access apps like jupyter notebook from your browser.
1
containers:
2
- &DEFAULT_CONTAINER
3
ports:
4
# tensorboard
5
- containerPort: 6006
6
hostPort: 6006
7
# jupyter
8
- containerPort: 8888
9
hostPort: 8888
Copied!

Instance Parameters

name

An identifier for the aws resources created. This name is used as a prefix in all the aws resources created.
1
containers:
2
- &DEFAULT_CONTAINER
3
projectDir: /workspace/project
4
image: tensorflow/tensorflow:latest-py3
Copied!

provider

Right now we only support aws as the provider
1
instances:
2
- name: aws-1
3
provider: aws
Copied!

parameters

1
instances:
2
- name: aws-1
3
provider: aws
4
parameters:
5
region: us-east-1
6
instanceType: t2.large
7
spotStrategy: on-demand
8
ports: [6006, 6007, 8888]
9
rootVolumeSize: 125
Copied!
region - The aws region to create resources.
instanceType - The aws instance type to launch
spotStrategy - Options for this is either on-demand or spot.
  • on-demand - Launch an aws on demand instance
  • spot - Launch a spot instance only.
ports - Ports to be exposed in the aws instance.
rootVolumeSize - The size(GB) of the root EBS volume attached to instance. This will be destroyed after instance terminates
volumes - The persistent EBS volumes that are to be attached to the instance. These will not be destroyed after instance terminates. These are re-attached the next time the instance starts and the data is preserved.
1
instances:
2
- name: aws-1
3
provider: aws
4
parameters:
5
volumes:
6
- name: workspace
7
parameters:
8
size: 50
Copied!

Script Parameters

This section has script configurations that are used in the sptoML managed runs.
1
scripts:
2
train: |
3
python train.py
4
5
tensorboard: |
6
tensorboard --port 6006 --logdir results/
Copied!