Spotml Config file

SpotML needs a spotml.yaml configuration file in the root folder. An example configuration file looks like below:

project:
  name: mnist
  maxIdleMinutes: 15
  syncFilters:
    - exclude:
        - .git/*
        - .idea/*
        - '*/__pycache__/*'

containers:
  - &DEFAULT_CONTAINER
    projectDir: /workspace/project
#    file: docker/Dockerfile
    image: tensorflow/tensorflow:latest-py3
    volumeMounts:
      - name: workspace
        mountPath: /workspace
    env:
      PYTHONPATH: /workspace/project
    ports:
      # tensorboard
      - containerPort: 6006
        hostPort: 6006
      # jupyter
      - containerPort: 8888
        hostPort: 8888

instances:
  - name: aws-1
    provider: aws
    parameters:
      region: us-east-1
      instanceType: t2.large
      spotStrategy: on-demand
      ports: [6006, 6007, 8888]
      rootVolumeSize: 125
      volumes:
        - name: workspace
          parameters:
            size: 50

scripts:
  train: |
    python train.py

  tensorboard: |
    tensorboard --port 6006 --logdir results/

  jupyter: |
    CUDA_VISIBLE_DEVICES="" jupyter notebook --allow-root --ip 0.0.0.0

Project Parameters

name

Name of the project, this name is used as a prefix in all the aws resources created.

project:
  name: mnist

maxIdleMinutes

Maximum idle time before which instance must automatically be shut down. Set this to 0 to turn off idle time checking.

project:
  name: mnist
  maxIdleMinutes: 15

SpotML periodically(every 5 mins) checks instances for idle time. It track them by checking if the docker instance has any active running commands or if there was any tty(keyboard) activity. If it finds no activity and no running commands for more than maxIdleMinutes it terminates the instance.

syncFilters (optional)

By default SpotML syncs all the files/folders in the project directory. You can use this to exclude files you don't want to sync to instance

project:
  name: mnist
  syncFilters:
    - exclude:
        - .git/*
        - .idea/*
        - '*/__pycache__/*'

Container Parameters

image

Specify the docker image to use to launch the container. This works for simple cases where you don't need a custom Dockerfile with anything else installed.

containers:
  - &DEFAULT_CONTAINER
    projectDir: /workspace/project
    image: tensorflow/tensorflow:latest-py3

file (if above image is not specified)

When you need a custom Dockerfile, to customize the instance. Specify the path to the Dockerfile.

containers:
  - &DEFAULT_CONTAINER
    projectDir: /workspace/project
    file: docker/Dockerfile

env

Environment variables available in the container

containers:
  - &DEFAULT_CONTAINER
    env:
      PYTHONPATH: /workspace/project\

ports

Ports that should be exposed in the container and the host instance so that you can access apps like jupyter notebook from your browser.

containers:
  - &DEFAULT_CONTAINER
    ports:
      # tensorboard
      - containerPort: 6006
        hostPort: 6006
      # jupyter
      - containerPort: 8888
        hostPort: 8888

Instance Parameters

name

An identifier for the aws resources created. This name is used as a prefix in all the aws resources created.

containers:
  - &DEFAULT_CONTAINER
    projectDir: /workspace/project
    image: tensorflow/tensorflow:latest-py3

provider

Right now we only support aws as the provider

instances:
  - name: aws-1
    provider: aws

parameters

instances:
  - name: aws-1
    provider: aws
    parameters:
      region: us-east-1
      instanceType: t2.large
      spotStrategy: on-demand
      ports: [6006, 6007, 8888]
      rootVolumeSize: 125

region - The aws region to create resources.

instanceType - The aws instance type to launch

spotStrategy - Options for this is either on-demand or spot.

  • on-demand - Launch an aws on demand instance

  • spot - Launch a spot instance only.

ports - Ports to be exposed in the aws instance.

rootVolumeSize - The size(GB) of the root EBS volume attached to instance. This will be destroyed after instance terminates

volumes - The persistent EBS volumes that are to be attached to the instance. These will not be destroyed after instance terminates. These are re-attached the next time the instance starts and the data is preserved.

instances:
  - name: aws-1
    provider: aws
    parameters:
      volumes:
        - name: workspace
          parameters:
            size: 50

Script Parameters

This section has script configurations that are used in the sptoML managed runs.

scripts:
  train: |
    python train.py

  tensorboard: |
    tensorboard --port 6006 --logdir results/

Last updated