The Jupyter Project is a multi-stakeholder open source project that creates applications, open standards, and tools for data science, machine learning (ML), and computational science. First released in 2011, Jupyter Notebook has become a de facto standard tool used by millions of users worldwide in every possible academic, research, and industrial sector. Jupyter enables users to work with code and data interactively, and to create and share computational narratives that provide a complete and reproducible record of their work.
Given the importance of Jupyter to data scientists and ML developers, AWS is an active sponsor and contributor to the Jupyter Project. Our goal is to work in the open source community to help Jupyter be the best possible laptop platform for data science and ML. AWS is a platinum sponsor of the Jupyter Project through the NumFOCUS Foundation, and I am proud and honored to lead a dedicated team of AWS engineers who contribute to Jupyter software and participate in the Jupyter community and governance. Our open source contributions to Jupyter include JupyterLab, Jupyter Server, and the Jupyter Notebook subprojects. We are also members of Jupyter’s Safety, Diversity, Equity, and Inclusion (DEI) working groups. Alongside these open source contributions, we have AWS product teams working to integrate Jupyter with products like Amazon SageMaker.
Today at JupyterCon, we’re excited to announce several new tools for Jupyter users to improve their experience and increase development productivity. All of these tools are open source and can be used anywhere you run Jupyter.
Introducing two generative AI extensions for Jupyter
Generative AI can significantly increase the productivity of data scientists and developers while writing code. Today we’re announcing two Jupyter extensions that bring generative AI to Jupyter users via a chat UI, IPython magic commands, and autocompletion. These extensions allow you to perform a wide range of development tasks using generative AI models in JupyterLab and Jupyter notebooks.
Jupyter AI, an open source project to bring generative AI to Jupyter notebooks
Using the power of great language models like ChatGPT, AI21’s Jurassic-2, and (coming soon) Amazon Titan, Jupyter AI is an open source project that brings generative AI features to Jupyter notebooks. For example, using a large language model, Jupyter AI can help a programmer generate, debug, and explain their source code. Jupyter AI can also answer questions about local files and generate entire notebooks from a simple natural language message. Jupyter AI offers both magic commands that work on any laptop or IPython shell, as well as a friendly chat user interface in JupyterLab. Both experiences work with dozens of models from a wide range of model providers. JupyterLab users can select any text or notebook cell, enter a natural language message to perform a task with the selection, and then insert the AI-generated response wherever they want. Jupyter AI is integrated with Jupyter’s MIME type system, which allows you to work with input and output of any type supported by Jupyter (text, images, etc.). Jupyter AI also provides integration points that allow third parties to configure their own models. Jupyter AI is an official open source project of the Jupyter Project.
Amazon CodeWhisperer Jupyter extension
Autocompletion is critical for developers, and generative AI can significantly improve the code suggestion experience. That’s why we announced the general availability of Amazon CodeWhisperer in early 2023. It’s an AI coding companion that uses core models under the hood to dramatically improve developer productivity. This works by generating real-time code suggestions based on developer feedback in natural language and previous code in their integrated development environment (IDE).
Today we’re excited to announce that JupyterLab users can install and use the CodeWhisperer extension for free to generate real-time, single-line, or full-feature code suggestions for Python notebooks in JupyterLab and Amazon SageMaker Studio . With CodeWhisperer, you can write a natural language comment that describes a specific task in English, such as “Create a pandas dataframe from a CSV file”. Based on this information, CodeWhisperer recommends one or more snippets of code directly in the notebook that can accomplish the task. You can quickly and easily accept the main suggestion, see more suggestions, or continue writing your own code.
During its preview, CodeWhisperer proved to be great at generating code to speed up coding tasks, helping developers complete tasks an average of 57% faster. Additionally, developers who used CodeWhisperer were 27% more likely to complete a coding task successfully than those who did not. This is a huge leap forward in developer productivity. CodeWhisperer also includes a built-in reference tracker that detects if a code suggestion might resemble open source training data and can flag those suggestions.
Introducing new Jupyter extensions for building, training, and deploying ML at scale
Our mission at AWS is to democratize access to ML across all industries. To achieve this goal, starting in 2017, we released the Amazon SageMaker Notebook instance, a fully managed compute instance running Jupyter that includes all popular ML and data science packages. In 2019, we took a major leap forward with the release of SageMaker Studio, an IDE for ML built on top of JupyterLab that lets you build, train, tune, debug, deploy, and monitor models from a single application. Tens of thousands of customers use Studio to power data science teams of all sizes. In 2021, we further extended the benefits of SageMaker to the community of millions of Jupyter users by launching Amazon SageMaker Studio Lab, a free notebook service, again based on JupyterLab, that includes free computation and persistent storage.
Today, we’re excited to announce three new capabilities to help you scale your ML development faster.
Notebook programming
In 2022, we released a new capability to allow our customers to run notebooks as scheduled jobs in SageMaker Studio and Studio Lab. Thanks to this capability, many of our customers have saved time by not having to manually configure complex cloud infrastructure to scale their ML workflows.
We’re excited to announce that the Notebook Scheduling Tool is now an open source Jupyter extension that allows JupyterLab users to run and schedule notebooks in SageMaker anywhere JupyterLab is running. Users can select a notebook and automate it as a job that runs in a production environment using a simple yet powerful user interface. After selecting a notebook, the tool takes a snapshot of the entire notebook, packages its dependencies into a container, builds the infrastructure, runs the notebook as an automated job on a user-set schedule, and deprovisions the infrastructure once finished work This cuts the time it takes to move a notebook into production from weeks to hours.
SageMaker Open Source Distribution
Data scientists and developers want to start developing ML applications quickly, and it can be complex to install mutually compatible versions of all the necessary packages. To eliminate manual work and improve productivity, we are pleased to announce a new open source distribution that includes the most popular packages for ML, data science, and data visualization. This distribution includes deep learning frameworks such as PyTorch, TensorFlow, and Keras; popular Python packages such as NumPy, scikit-learn, and pandas; and IDEs such as JupyterLab and Jupyter Notebook. The distribution is versioned using SemVer and will be released regularly from now on. The container is available through the Amazon ECR public gallery and its source code is available on GitHub. This provides companies with transparency into the packages and creation process, making it easier for them to replicate, customize or re-certify the distribution. The base image includes pip and Conda/Mamba, so data scientists can quickly install additional packages to meet their specific needs.
Amazon CodeGuru Jupyter extension
Amazon CodeGuru Security now supports security and code quality scans in JupyterLab and SageMaker Studio. This new capability helps notebook users detect security vulnerabilities such as injection errors, data leaks, weak cryptography, or missing encryption in the notebook cells. You can also catch many common problems that affect the readability, reproducibility, and correctness of computational notebooks, such as misuse of ML library APIs, invalid execution order, and nondeterminism. When vulnerabilities or quality issues are identified in the notebook, CodeGuru generates recommendations that allow you to fix these issues based on AWS security best practices.
conclusion
We’re excited to see how the Jupyter community will use these tools to scale development, increase productivity, and leverage generative AI to transform their industries. Check out the following resources to learn more about Jupyter on AWS and how to install and get started with these new tools:
About the author
Brian Granger is the leader of the Python project, co-founder of the Jupyter Project, and active contributor to other open source projects focused on data science in Python. In 2016, he co-created the Altair package for statistical visualization in Python. He is a member of the advisory board of the NumFOCUS Foundation, a faculty member of Cal Poly’s Center for Innovation and Entrepreneurship, and a senior principal technologist at AWS.