Data scientists need a consistent, reproducible environment for machine learning (ML) and data science workloads that manages dependencies and is secure. AWS Deep Learning Containers already offers prebuilt Docker images for training and serving models on common frameworks such as TensorFlow, PyTorch, and MXNet. To enhance this experience, we announced a public beta of the SageMaker open source distribution at JupyterCon 2023. This provides a unified end-to-end ML experience to ML developers of varying experience levels. Developers no longer need to switch between different framework containers for experimentation or when moving from local JupyterLab environments and SageMaker notebooks to production jobs in SageMaker. The open source SageMaker distribution supports the most common packages and libraries for data science, ML, and visualization, including TensorFlow, PyTorch, Scikit-learn, Pandas, and Matplotlib. You can start using the Amazon ECR Public Gallery container starting today.
In this post, we show you how you can use the SageMaker open source distribution to quickly experiment with your local environment and easily promote them to jobs in SageMaker.
For our example, we show the training of an image classification model using PyTorch. We use the publicly available KMNIST dataset in PyTorch. We train a neural network model, test the model’s performance, and finally print the training and test loss. The complete workbook for this example is available in the SageMaker Studio Lab examples repository. We start the experiment on a local laptop using the open source distribution, move it to Amazon SageMaker Studio to use a larger instance, and then schedule the notebook as a notebook job.
You need the following prerequisites:
Configure your local environment
You can start using the open source distribution directly on your local laptop. To start JupyterLab, run the following commands in your terminal:
You can replace
ECR_IMAGE_ID with any of the image tags available in the Amazon ECR Public Gallery or choose
latest-gpu tag if you are using a machine that supports GPU.
This command will start JupyterLab and provide a URL to the terminal, such as
http://127.0.0.1:8888/lab?token=<token>. Copy the link and enter it in your favorite browser to launch JupyterLab.
Studio is an end-to-end integrated development environment (IDE) for ML that enables developers and data scientists to build, train, deploy, and monitor ML models at scale. Studio offers an extensive list of native images with common frameworks and packages, including Data Science, TensorFlow, PyTorch, and Spark. These images make it easy for data scientists to get started with ML by simply choosing a framework and instance type of their choice for computation.
You can now use the open source distribution of SageMaker in Studio using the port your own Studio image feature. To add the open source distribution to your SageMaker domain, follow these steps:
- Add the open source distribution to your account’s Amazon Elastic Container Registry (Amazon ECR) repository by running the following commands in your terminal:
- Create a SageMaker image and attach it to the Studio domain:
- In the SageMaker console, launch Studio by choosing your domain and existing user profile.
- Optionally, restart Studio by following the steps in Shutting Down and Updating SageMaker Studio.
Download the notebook
Download the sample notebook locally from the GitHub repository.
Open the notebook in the IDE of your choice and add a cell to the beginning of the notebook to install it
torchsummary package is not part of the distribution, and installing it in the notebook will ensure that the notebook runs end-to-end. We recommend using
micromamba to manage environments and dependencies. Add the following cell to the workbook and save the workbook:
Experiment in the local notebook
Load the notebook into the JupyterLab UI you launched by choosing the load icon as shown in the screenshot below.
When it’s loaded, launch it
cv-kmnist.ipynb notebook You can start running the cells right away, without having to install any dependencies like torch, matplotlib, or ipywidgets.
If you followed the steps above, you can see that you can use the distribution locally from your laptop. In the next step, we use the same distribution in Studio to take advantage of Studio features.
Move the experiment to Studio (optional)
Optionally, we promote experimentation in Studio. One of the advantages of Studio is that the underlying computing resources are fully elastic, so you can easily dial the available resources up or down, and the changes happen automatically in the background without interrupting your work. If you wanted to run the same notebook as before on a larger dataset and compute instance, you can migrate to Studio.
Navigate to the Studio UI you launched earlier and choose the upload icon to upload the notebook.
After starting the notebook, you will be prompted to choose the image and instance type. In the kernel launcher, choose
sagemaker-runtime like the image and an
ml.t3.medium example, then choose Select.
Now you can run the notebook end-to-end without requiring any changes to the notebook from your local development environment in Studio notebooks!
Schedule the notebook as a job
When you’re done experimenting, SageMaker offers several options for producing your notebook, including training jobs and SageMaker pipelines. One such option is to directly run the notebook as a scheduled, non-interactive notebook job using SageMaker notebook jobs. For example, you may want to retrain your model periodically or draw inferences on incoming data periodically and generate reports for consumption by your stakeholders.
From Studio, choose the notebook job icon to start the notebook job. If you have installed the Laptop Jobs extension locally on your laptop, you can also schedule it directly from your laptop. See the Installation Guide to configure the Notebook Job Extension locally.
The notebook job automatically uses the ECR image URI from the open source distribution, so you can program the notebook job directly.
choose Run according to schedulechoose a schedule, for example every week on Saturday, and choose To create. You can also choose run now if you want to see the results immediately.
When the first work in the notebook is complete, you can view the outputs of the notebook directly from the Studio user interface by selecting notebook under Output files.
In addition to using the publicly available ECR image directly for ML workloads, the open source distribution offers the following benefits:
- The Dockerfile used to build the image is publicly available for developers to explore and build their own images. You can also inherit this image as a base image and install your custom libraries to have a playable environment.
- If you are not used to Docker and prefer to use Conda environments in your JupyterLab environment, we offer a
env.outfile for each of the released versions. You can use the instructions in the file to create your own Conda environment that will mimic the same environment. For example, see the CPU environment file cpu.env.out.
- You can use the GPU versions of the image to run GPU-enabled workloads such as deep learning and image processing.
Follow the steps below to clean up your resources:
- If you have programmed your notebook to run on a schedule, pause or delete the schedule at Notebook working definitions tab to avoid paying for future jobs.
- Close all Studio applications to avoid paying for unused computer usage. See Close and update Studio apps for instructions.
- Optionally, delete the Studio domain if you have created one.
Maintaining a reproducible environment at different stages of the ML lifecycle is one of the biggest challenges for data scientists and developers. With the SageMaker open source distribution, we provide an image with mutually compatible versions of the most common ML frameworks and packages. The distribution is also open source, providing developers with transparency into the packages and build processes, making it easy to customize their own distribution.
In this post, we showed you how to use the distribution in your local environment, in Studio, and as a container for your training jobs. This feature is currently in public beta. We encourage you to try it out and share your feedback and issues in the public GitHub repository!
About the authors
Durga Surya is an ML solutions architect on the Amazon SageMaker Service SA team. He is passionate about making machine learning accessible to everyone. In his 4 years at AWS, he has helped configure AI/ML platforms for enterprise customers. When not working, she loves motorcycle rides, mystery novels, and long walks with her 5-year-old husky.
Ketan Vijayvargiya is a senior software development engineer at Amazon Web Services (AWS). His areas of focus are machine learning, distributed systems and open source. Outside of work, he likes to spend his time cozying up and enjoying nature.
At Ikaroa, we believe in the power of open source technologies and are excited to offer our customers a great way to get started with Amazon SageMaker Distribution. SageMaker Distribution is a fully managed, open-source library that makes it easy for developers to quickly build and deploy machine learning models on Amazon Web Services. With SageMaker Distribution, you can quickly design, train, and deploy your machine learning models in the cloud, with features like automatic hyperparameter tuning, distributed training, and modelVersioning.
Whether you’re a seasoned data scientist or just getting started, Amazon SageMaker Distribution makes it easier than ever to get started with machine learning. With built-in support for popular open-source machine learning frameworks like TensorFlow, PyTorch, Keras, and Mxnet, you can build powerful models in just a few clicks. You can also leverage its powerful autoscaling features to scale your models according to your needs.
With support from the Ikaroa team, you can get the most out of Amazon SageMaker Distribution. We provide comprehensive training materials, expert consulting, and 24/7 support to make sure you have all the resources you need to get started with Amazon SageMaker Distribution. Our team is committed to helping you quickly find your machine learning solution and deliver it to your customers.
If you’re ready to take your development to the next level, get started with a free trial of Amazon SageMaker Distribution today. Contact the Ikaroa team to get started. With our expertise, you’ll be able to quickly develop an effective machine learning solution to take your business to the next level.