Getting started(5 mins)
SpotML uses your AWS credentials to manage the runs for you. You can verify that you have the credentials setup if you see the contents of
~/.aws/credentialsto be something like below.
aws_access_key_id = your_aws_access_key_id
aws_secret_access_key = your_aws_secret_access_key
Secondly, the above
access keyIAM user needs to have the permissions to create all the aws resources.
pip install spotml --upgrade
- How to
startan aws instance to train MNIST code.
- How to
sshinto instance to check progress.
- How to
downloadresults to local machine
1. Clone the repo
git clone https://github.com/SpotML/spotml-examples.git
2. Start the instance
Wait for the instance to start, you will see an output like below once complete.
By default, SpotML will track the instance for idle time. If the instance is idle for more than 30 mins, it's automatically terminated.
3. SSH into instance
4. Download the generated model file.
Make sure you have disconnected from the above ssh session. Once you have, from your local terminal type below command to download the generated model file.
spotml download -i 'my_model.h5'
- How to let SpotML automatically manage a long-running training on
- SpotML automatically restarts the interrupted instances and resumes training.
- SpotML automatically turns off idle instance.
1. Update config file
Open the spotml.yaml file and find the line that says
change it to
Also notice the scripts section of config file like below.
This allows you to configure keywords like train to run your custom training command.
2. Run the script
spotml run train
This should produce output like below. If the instance is not already running, spotML tries to spawn a new spot instance and runs the above script once the instance is ready.
Note that if a spot instance is not available, spotML backend service keeps trying every 15 mins, until it can spawn the instance. So you can turn off your laptop and do other things, while SpotML tries to schedule the run.
If you intend to cancel the scheduled run, type
spotml run stop
3. Check Status
You can check the status of the
instance, and the
runwith the above command. Once the instance is running you should see an output like below.
To also check any logs generated when starting the instance type
spotml status --logs
Once you see the run status as
statuscommand you can ssh into the actual run session by typing
spotml sh run
This opens the tmux session where spotML ran the
You can also just ssh into a separate normal ssh session by typing below command as before for an interactive session.