git clone git-clone-URL
Working on data science projects
Creating and importing notebooks
You can create a blank notebook or import a notebook from a number of different sources.
Creating a new notebook
You can create a new Jupyter notebook from an existing notebook container image to access its resources and properties. The Notebook server control panel contains a list of available container images that you can run as a single-user notebook server.
-
Ensure that you have logged in to Open Data Hub.
-
Ensure that you have launched your notebook server and logged in to Jupyter.
-
The notebook image exists in a registry, image stream, and is accessible.
-
Click File → New → Notebook.
-
If prompted, select a kernel for your notebook from the list.
If you want to use a kernel, click Select. If you do not want to use a kernel, click No Kernel.
-
Check that the notebook file is visible in the JupyterLab interface.
Notebook images for data scientists
Open Data Hub contains Jupyter notebook images optimized with industry-leading tools and libraries required for your data science work. To provide a consistent, stable platform for your model development, all notebook images contain the same version of Python. Notebook images available on Open Data Hub are pre-built and ready for you to use immediately after Open Data Hub is installed or upgraded. Notebook images are upgraded quarterly to ensure that you are working with the latest supported version.
Open Data Hub contains the following notebook images that are installed by default:
Image name | Description |
---|---|
CUDA |
If you are working with compute-intensive data science models that require GPU support, use the Compute Unified Device Architecture (CUDA) notebook image to gain access to the NVIDIA CUDA Toolkit. Using this toolkit, you can optimize your work using GPU-accelerated libraries and optimization tools. |
Standard Data Science |
Use the Standard Data Science notebook image for models that do not require TensorFlow or PyTorch. This image contains commonly used libraries to assist you in developing your machine learning models. |
TensorFlow |
TensorFlow is an open source platform for machine learning. With TensorFlow, you can build, train and deploy your machine learning models. TensorFlow contains advanced data visualization features, such as computational graph visualizations. It also allows you to easily monitor and track the progress of your models. |
PyTorch |
PyTorch is an open source machine learning library optimized for deep learning. If you are working with computer vision or natural language processing models, use the Pytorch notebook image. |
Minimal Python |
If you do not require advanced machine learning features, or additional resources for compute-intensive data science work, you can use the Minimal Python image to develop your models. |
Uploading an existing notebook file from local storage
You can load an existing notebook from local storage into JupyterLab to continue work, or adapt a project for a new use case.
-
Credentials for logging in to Jupyter.
-
A launched and running notebook server.
-
A notebook file exists in your local storage.
-
In the File Browser in the left sidebar of the JupyterLab interface, click Upload Files (
).
-
Locate and select the notebook file and click Open.
The file is displayed in the File Browser.
-
The notebook file displays in the File Browser in the left sidebar of the JupyterLab interface.
-
You can open the notebook file in JupyterLab.
Uploading an existing notebook file from a Git repository using JupyterLab
You can use the JupyterLab user interface to clone a Git repository into your workspace to continue your work or integrate files from an external project.
-
A launched and running Jupyter server.
-
Read access for the Git repository you want to clone.
-
Copy the HTTPS URL for the Git repository.
-
On GitHub, click ⤓ Code → HTTPS and click the Clipboard button.
-
On GitLab, click Clone and click the Clipboard button under Clone with HTTPS.
-
-
In the JupyterLab interface click the Git Clone button (
).
You can also click Git → Clone a repository in the menu, or click the Git icon (
) and click the Clone a repository button.
The Clone a repo dialog appears.
-
Enter the HTTPS URL of the repository that contains your notebook.
-
Click CLONE.
-
If prompted, enter your username and password for the Git repository.
-
Check that the contents of the repository are visible in the file browser in JupyterLab, or run the ls command in the Terminal to verify that the repository is shown as a directory.
Uploading an existing notebook file from a Git repository using the command line interface
You can use the command line interface to clone a Git repository into your workspace to continue your work or integrate files from an external project.
-
A launched and running Jupyter server.
-
Copy the HTTPS URL for the Git repository.
-
On GitHub, click ⤓ Code → HTTPS and click the Clipboard button.
-
On GitLab, click Clone and click the Clipboard button under Clone with HTTPS.
-
-
In JupyterLab, click File → New → Terminal to open a Terminal window.
-
Enter the
git clone
command.Replace git-clone-URL with the HTTPS URL, for example:
[1234567890@jupyter-nb-jdoe ~]$ git clone https://github.com/example/myrepo.git Cloning into myrepo... remote: Enumerating objects: 11, done. remote: Counting objects: 100% (11/11), done. remote: Compressing objects: 100% (10/10), done. remote: Total 2821 (delta 1), reused 5 (delta 1), pack-reused 2810 Receiving objects: 100% (2821/2821), 39.17 MiB | 23.89 MiB/s, done. Resolving deltas: 100% (1416/1416), done.
-
Check that the contents of the repository are visible in the file browser in JupyterLab, or run the ls command in the terminal to verify that the repository is shown as a directory.
Additional resources
Collaborating on notebooks using Git
If your notebooks or other files are stored in Git version control, you can import them from a Git repository onto your notebook server to work with them in JupyterLab. When you are ready, you can push your changes back to the Git repository so that others can review or use your models.
Uploading an existing notebook file from a Git repository using JupyterLab
You can use the JupyterLab user interface to clone a Git repository into your workspace to continue your work or integrate files from an external project.
-
A launched and running Jupyter server.
-
Read access for the Git repository you want to clone.
-
Copy the HTTPS URL for the Git repository.
-
On GitHub, click ⤓ Code → HTTPS and click the Clipboard button.
-
On GitLab, click Clone and click the Clipboard button under Clone with HTTPS.
-
-
In the JupyterLab interface click the Git Clone button (
).
You can also click Git → Clone a repository in the menu, or click the Git icon (
) and click the Clone a repository button.
The Clone a repo dialog appears.
-
Enter the HTTPS URL of the repository that contains your notebook.
-
Click CLONE.
-
If prompted, enter your username and password for the Git repository.
-
Check that the contents of the repository are visible in the file browser in JupyterLab, or run the ls command in the Terminal to verify that the repository is shown as a directory.
Uploading an existing notebook file from a Git repository using the command line interface
You can use the command line interface to clone a Git repository into your workspace to continue your work or integrate files from an external project.
-
A launched and running Jupyter server.
-
Copy the HTTPS URL for the Git repository.
-
On GitHub, click ⤓ Code → HTTPS and click the Clipboard button.
-
On GitLab, click Clone and click the Clipboard button under Clone with HTTPS.
-
-
In JupyterLab, click File → New → Terminal to open a Terminal window.
-
Enter the
git clone
command.git clone git-clone-URL
Replace git-clone-URL with the HTTPS URL, for example:
[1234567890@jupyter-nb-jdoe ~]$ git clone https://github.com/example/myrepo.git Cloning into myrepo... remote: Enumerating objects: 11, done. remote: Counting objects: 100% (11/11), done. remote: Compressing objects: 100% (10/10), done. remote: Total 2821 (delta 1), reused 5 (delta 1), pack-reused 2810 Receiving objects: 100% (2821/2821), 39.17 MiB | 23.89 MiB/s, done. Resolving deltas: 100% (1416/1416), done.
-
Check that the contents of the repository are visible in the file browser in JupyterLab, or run the ls command in the terminal to verify that the repository is shown as a directory.
Updating your project with changes from a remote Git repository
You can pull changes made by other users into your data science project from a remote Git repository.
-
You have configured the remote Git repository.
-
You have already imported the Git repository into JupyterLab, and the contents of the repository are visible in the file browser in JupyterLab.
-
You have permissions to pull files from the remote Git repository to your local repository.
-
You have credentials for logging in to Jupyter.
-
You have a launched and running Jupyter server.
-
In the JupyterLab interface, click the Git button (
).
-
Click the Pull latest changes button (
).
-
You can view the changes pulled from the remote repository in the History tab of the Git pane.
Pushing project changes to a Git repository
To build and deploy your application in a production environment, upload your work to a remote Git repository.
-
You have opened a notebook in the JupyterLab interface.
-
You have already added the relevant Git repository to your notebook server.
-
You have permission to push changes to the relevant Git repository.
-
You have installed the Git version control extension.
-
Click File → Save All to save any unsaved changes.
-
Click the Git icon (
) to open the Git pane in the JupyterLab interface.
-
Confirm that your changed files appear under Changed.
If your changed files appear under Untracked, click Git → Simple Staging to enable a simplified Git process.
-
Commit your changes.
-
Ensure that all files under Changed have a blue checkmark beside them.
-
In the Summary field, enter a brief description of the changes you made.
-
Click Commit.
-
-
Click Git → Push to Remote to push your changes to the remote repository.
-
When prompted, enter your Git credentials and click OK.
-
Your most recently pushed changes are visible in the remote Git repository.
Working on data science projects
As a data scientist, you can organize your data science work into a single project. A data science project in Open Data Hub can consist of the following components:
-
Workbenches: Creating a workbench allows you to add a Jupyter notebook to your project.
-
Cluster storage: For data science projects that require data to be retained, you can add cluster storage to the project.
-
Data connections: Adding a data connection to your project allows you to connect data inputs to your workbenches.
-
Models and model servers: Deploy a trained data science model to serve intelligent applications. Your model is deployed with an endpoint that allows applications to send requests to the model.
Using data science projects
Creating a data science project
To start your data science work, create a data science project. Creating a project helps you organize your work in one place. You can also enhance the capabilities of your data science project by adding workbenches, adding storage to your project’s cluster, adding data connections, and adding model servers.
-
You have logged in to Open Data Hub.
-
If you are using specialized Open Data Hub groups, you are part of the user group or admin group (for example,
odh-users
) in OpenShift.
-
From the Open Data Hub dashboard, click Data Science Projects.
The Data science projects page opens.
-
Click Create data science project.
The Create a data science project dialog opens.
-
Enter a name for your data science project.
-
Optional: Edit the resource name for your data science project. The resource name must consist of lowercase alphanumeric characters, -, and must start and end with an alphanumeric character.
-
Enter a description for your data science project.
-
Click Create.
The Project details page opens. From here, you can create workbenches, add cluster storage, and add data connections to your project.
-
The data science project that you created is displayed on the Data science projects page.
Updating a data science project
You can update your data science project’s details by changing your project’s name and description text.
-
You have logged in to Open Data Hub.
-
If you are using specialized Open Data Hub groups, you are part of the user group or admin group (for example,
rhods-users
) in OpenShift. -
You have created a data science project.
-
From the Open Data Hub dashboard, click Data Science Projects.
The Data science projects page opens.
-
Click the action menu (⋮) beside the project whose details you want to update and click Edit project.
The Edit data science project dialog opens.
-
Optional: Update the name for your data science project.
-
Optional: Update the description for your data science project.
-
Click Update.
-
The data science project that you updated is displayed on the Data science projects page.
Deleting a data science project
You can delete data science projects so that they do not appear on the Open Data Hub Data science projects page when you no longer want to use them.
-
You have logged in to Open Data Hub.
-
If you are using specialized Open Data Hub groups, you are part of the user group or admin group (for example,
rhods-users
) in OpenShift. -
You have created a data science project.
-
From the Open Data Hub dashboard, click Data Science Projects.
The Data science projects page opens.
-
Click the action menu (⋮) beside the project that you want to delete and click Delete project.
The Delete project dialog opens.
-
Enter the project’s name in the text field to confirm that you intend to delete it.
-
Click Delete project.
-
The data science project that you deleted is no longer displayed on the Data science projects page.
-
Deleting a data science project deletes any associated workbenches, cluster storage, and data connections. This data is permanently deleted and is not recoverable.
Using project workbenches
Creating a project workbench
To examine and work with data models in an isolated area, you can create a workbench. This workbench enables you to create a new Jupyter notebook from an existing notebook container image to access its resources and properties. For data science projects that require data to be retained, you can add container storage to the workbench you are creating.
-
You have logged in to Open Data Hub.
-
If you are using specialized Open Data Hub groups, you are part of the user group or admin group (for example,
odh-users
) in OpenShift. -
You have created a data science project that you can add a workbench to.
-
From the Open Data Hub dashboard, click Data Science Projects.
The Data science projects page opens.
-
Click the name of the project that you want to add the workbench to.
The Details page for the project opens.
-
Click Create workbench in the Workbenches section.
The Create workbench page opens.
-
Configure the properties of the workbench you are creating.
-
Enter a name for your workbench.
-
Enter a description for your workbench.
-
Select the notebook image to use for your workbench server.
-
Select the container size for your server.
-
Optional: Select and specify values for any new environment variables.
-
Configure the storage for your Open Data Hub cluster.
-
Select Create new persistent storage to create storage that is retained after you log out of Open Data Hub. Fill in the relevant fields to define the storage.
-
Select Use existing persistent storage to reuse existing storage then select the storage from the Persistent storage list.
-
-
-
Click Create workbench.
-
The workbench that you created appears on the Details page for the project.
-
Any cluster storage that you associated with the workbench during the creation process appears on the Details page for the project.
-
The Status column, located in the Workbenches section of the Details page, displays a status of Starting when the workbench server is starting, and Running when the workbench has successfully started.
Starting a workbench
You can manually start a data science project’s workbench from the Details page for the project. By default, workbenches start immediately after you create them.
-
You have logged in to Open Data Hub.
-
If you are using specialized Open Data Hub groups, you are part of the user group or admin group (for example,
rhods-users
) in OpenShift. -
You have created a data science project that contains a workbench.
-
From the Open Data Hub dashboard, click Data Science Projects.
The Data science projects page opens.
-
Click the name of the project whose workbench you want to start.
The Details page for the project opens.
-
Click the toggle in the Status column for the relevant workbench to start a workbench that is not running.
The status of the workbench that you started changes from Stopped to Running. After the workbench has started, click Open to open the workbench’s notebook.
-
The workbench that you started appears on the Details page for the project with the status of Running.
Updating a project workbench
If your data science work requires you to change your workbench’s notebook image, container size, or identifying information, you can modify the properties of your project’s workbench.
-
You have logged in to Open Data Hub.
-
If you are using specialized Open Data Hub groups, you are part of the user group or admin group (for example,
rhods-users
) in OpenShift. -
You have created a data science project that contains a workbench.
-
From the Open Data Hub dashboard, click Data Science Projects.
The Data science projects page opens.
-
Click the name of the project whose workbench you want to update.
The Details page for the project opens.
-
Click the action menu (⋮) beside the workbench that you want to update in the Workbenches section and click Edit workbench.
The Edit workbench page opens.
-
Update the workbench’s properties.
-
Update the name for your workbench, if applicable.
-
Update description for your workbench, if applicable.
-
Select a new notebook image to use for your workbench server, if applicable.
-
Select a new container size for your server, if applicable.
-
-
Click Update workbench.
-
The workbench that you updated appears on the Details page for the project.
Deleting a workbench from a data science project
You can delete workbenches from your data science projects to help you remove Jupyter notebooks that are no longer relevant to your work.
-
You have logged in to Open Data Hub.
-
If you are using specialized Open Data Hub groups, you are part of the user group or admin group (for example,
rhods-users
) in OpenShift. -
You have created a data science project with a workbench.
-
From the Open Data Hub dashboard, click Data Science Projects.
The Data science projects page opens.
-
Click the name of the project that you want to delete the workbench from.
The Details page for the project opens.
-
Click the action menu (⋮) beside the workbench that you want to delete in the Workbenches section and click Delete workbench.
The Delete workbench dialog opens.
-
Enter the name of the workbench in the text field to confirm that you intend to delete it.
-
Click Delete workbench.
-
The workbench that you deleted is no longer displayed in the Workbenches section on the project Details page.
-
The custom resource (CR) associated with the workbench’s Jupyter notebook is deleted.
Using data connections
Adding a data connection to your data science project
You can enhance your data science project by adding a connection to a data source. When you want to work with a very large data sets, you can store your data in an Amazon Web Services (AWS) Simple Storage Service (S3) bucket so that you do not fill up your local storage.
-
You have logged in to Open Data Hub.
-
If you are using specialized Open Data Hub groups, you are part of the user group or admin group (for example,
rhods-users
) in OpenShift. -
You have created a data science project that you can add a data connection to.
-
From the Open Data Hub dashboard, click Data Science Projects.
The Data science projects page opens.
-
Click the name of the project that you want to add a data connection to.
The Details page for the project opens.
-
Click Add data connection in the Data connections section.
The Add data connection dialog opens.
-
Enter a name for the data connection.
-
Enter your access key ID for Amazon Web Services in the AWS_ACCESS_KEY_ID field.
-
Enter your secret access key for the account you specified in the AWS_SECRET_ACCESS_KEY_ID field.
-
Enter the endpoint of your AWS S3 storage in the AWS_S3_ENDPOINT field.
-
Enter the default region of your AWS account in the AWS_DEFAULT_REGION field.
-
Enter the name of the AWS S3 bucket in the AWS_S3_BUCKET field.
-
Click Add data connection.
-
The data connection that you added appears in the Data connections section on the Details page for the project.
Deleting a data connection
You can delete data connections from your data science projects to help you remove connections that are no longer relevant to your work.
-
You have logged in to Open Data Hub.
-
If you are using specialized Open Data Hub groups, you are part of the user group or admin group (for example,
rhods-users
) in OpenShift. -
You have created a data science project with a data connection.
-
From the Open Data Hub dashboard, click Data Science Projects.
The Data science projects page opens.
-
Click the name of the project that you want to delete the data connection from.
The Details page for the project opens.
-
Click the action menu (⋮) beside the data connection that you want to delete in the Data connections section and click Delete data connection.
The Delete data connection dialog opens.
-
Enter the name of the data connection in the text field to confirm that you intend to delete it.
-
Click Delete data connection.
-
The data connection that you deleted is no longer displayed in the Data connections section on the project Details page.
Updating a connected data source
To use an existing data source with a different workbench, you can change the data source that is connected to your project’s workbench.
-
You have logged in to Open Data Hub.
-
If you are using specialized Open Data Hub groups, you are part of the user group or admin group (for example,
rhods-users
) in OpenShift. -
You have created a data science project, created a workbench, and you have defined a data connection.
-
From the Open Data Hub dashboard, click Data Science Projects.
The Data science projects page opens.
-
Click the name of the project whose data source you want to change.
The Details page for the project opens.
-
Click the action menu (⋮) beside the data source that you want to change in the Data connections section and click Change connected workbenches.
The Update connected workbenches dialog opens.
-
Select an existing workbench to connect the data source to from the list.
-
Click Update connected workbenches.
-
The data connection that you changed is displayed in the Data connections section on the project Details page.
-
You can access your S3 data source using environment variables in the connected workbench.
Configuring cluster storage
Adding cluster storage to your data science project
For data science projects that require data to be retained, you can add cluster storage to the project. Additionally, you can also connect cluster storage to a specific project’s workbench.
-
You have logged in to Open Data Hub.
-
If you are using specialized Open Data Hub groups, you are part of the user group or admin group (for example,
rhods-users
) in OpenShift. -
You have created a data science project that you can add cluster storage to.
-
From the Open Data Hub dashboard, click Data Science Projects.
The Data science projects page opens.
-
Click the name of the project that you want to add the cluster storage to.
The Details page for the project opens.
-
Click Add cluster storage in the Cluster storage section.
The Add storage dialog opens.
-
Enter a name for the cluster storage.
-
Enter a description for the cluster storage.
-
Under Persistent storage size, enter a new size in gigabytes. The minimum size is 1 GiB, and the maximum size is 16384 GiB.
-
Optional: Select a workbench from the list to connect the cluster storage to an existing workbench.
-
If you selected a workbench to connect the storage to, enter the storage directory in the Mount folder field.
-
Click Add storage.
-
The cluster storage that you added appears in the Cluster storage section on the Details page for the project.
-
A new persistent volume claim (PVC) is created with the storage size that you defined.
-
The persistent volume claim (PVC) is visible as an attached storage in the Workbenches section on the Details page for the project.
Updating cluster storage
If your data science work requires you to change the identifying information of a project’s cluster storage or the workbench that the storage is connected to, you can update your project’s cluster storage to change these properties.
Note
|
You cannot change size of a persistent volume claim (PVC) that you have previously defined as cluster storage. |
-
You have logged in to Open Data Hub.
-
If you are using specialized Open Data Hub groups, you are part of the user group or admin group (for example,
rhods-users
) in OpenShift. -
You have created a data science project that contains cluster storage.
-
From the Open Data Hub dashboard, click Data Science Projects.
The Data science projects page opens.
-
Click the name of the project whose storage you want to update.
The Details page for the project opens.
-
Click the action menu (⋮) beside the storage that you want to update in the Cluster storage section and click Edit storage.
The Edit storage page opens.
-
Update the storage’s properties.
-
Update the name for the storage, if applicable.
-
Update description for the storage, if applicable.
-
Update the workbench that the storage is connected to, if applicable.
-
If you selected a new workbench to connect the storage to, enter the storage directory in the Mount folder field.
-
-
Click Update storage.
-
The storage that you updated appears in the Cluster storage section on the Details page for the project.
Deleting cluster storage from a data science project
You can delete cluster storage from your data science projects to help you free up resources and delete unwanted storage space.
-
You have logged in to Open Data Hub.
-
If you are using specialized Open Data Hub groups, you are part of the user group or admin group (for example,
rhods-users
) in OpenShift. -
You have created a data science project with cluster storage.
-
From the Open Data Hub dashboard, click Data Science Projects.
The Data science projects page opens.
-
Click the name of the project that you want to delete the storage from.
The Details page for the project opens.
-
Click the action menu (⋮) beside the storage that you want to delete in the Cluster storage section and click Delete storage.
The Delete storage dialog opens.
-
Enter the name of the storage in the text field to confirm that you intend to delete it.
-
Click Delete storage.
-
The storage that you deleted is no longer displayed in the Cluster storage section on the project Details page.
-
The persistent volume (PV) and persistent volume claim (PVC) associated with the cluster storage are both permanently deleted. This data is not recoverable.
Configuring model servers
Configuring a model server for your data science project
Before you can successfully deploy a data science model on Open Data Hub, you must configure a model server. This includes configuring the number of replicas being deployed, the server size, the token authorization, and how the project is accessed.
-
You have logged in to Open Data Hub.
-
If you are using specialized Open Data Hub groups, you are part of the user group or admin group (for example,
rhods-users
) in OpenShift. -
You have created a data science project that you can add a model server to.
-
From the Open Data Hub dashboard, click Data Science Projects.
The Data science projects page opens.
-
Click the name of the project that you want to configure a model server for.
The Details page for the project opens.
-
Click Configure server in the Models and model servers section.
The Configure model server dialog appears.
-
Configure your project’s model server in the Compute resources per replica section.
-
Select the number of model server replicas to deploy.
-
Select one of the following server sizes for your model:
-
Small
-
Medium
-
Large
-
Custom
-
-
Optional: If you selected Custom, configure the following settings in the Model server size section to customize your model server:
-
Enter the number of CPUs to use with your model in the CPUs requested field.
-
Enter the maximum number of CPUs required for use with your model in the CPU limit field.
-
Enter the requested memory for the model server in gigabytes (GB) in the Memory requested field.
-
Enter the maximum memory limit for the model server in gigabytes (GB) in the Memory limit field.
-
-
Optional: Select the Make deployed available via an external route check box in the Model route section to make your deployed model available externally.
-
Optional: Select the Require token authorization check box in the Token Authorization section to apply token authentication to your model server.
-
Edit the service account name for which the token will be generated in the Token secret field. The generated token is created and displayed when the model server is configured.
-
To add an additional service account, click Add a service account and enter the relevant information in the Token secret field.
-
-
-
The model server that you configured is displayed in the Models and model servers section on the Details page for the project.
Updating a model server
You can update your data science project’s model server by changing details, such as the number of deployed replicas, the server size, the token authorization, and how the project is accessed.
-
You have logged in to Open Data Hub.
-
If you are using specialized Open Data Hub groups, you are part of the user group or admin group (for example,
rhods-users
) in OpenShift. -
You have created a data science project that has a model server assigned.
-
From the Open Data Hub dashboard, click Data Science Projects.
The Data science projects page opens.
-
Click the name of the project whose model server details you want to update.
The Details page for the project opens.
-
In the Models and model servers section, locate the model server you want to update, and click the action menu (⋮) → Edit model server.
The Configure model server dialog opens.
-
Update the model server’s properties.
-
Select the number of model server replicas to deploy.
-
Select one of the following server sizes for your model:
-
Small
-
Medium
-
Large
-
Custom
-
-
Optional: If you selected Custom, configure the following settings in the Model server size section to customize your model server:
-
Enter the number of CPUs to use with your model in the CPUs requested field.
-
Enter the maximum number of CPUs required for use with your model in the CPU limit field.
-
Enter the requested memory for the model server in gigabytes (GB) in the Memory requested field.
-
Enter the maximum memory limit for the model server in gigabytes (GB) in the Memory limit field.
-
-
Optional: Select the Make deployed available via an external route check box in the Model route section to make your deployed model available externally.
-
Optional: Select the Require token authorization check box in the Token Authorization section to apply token authentication to your model server.
-
Edit the service account name for which the token will be generated in the Token secret field. The generated token is created and displayed when the model server is configured.
-
To add an additional service account, click Add a service account and enter the relevant information in the Token secret field.
-
-
-
Click Configure.
-
The model server that you updated is displayed in the Models and model servers section on the Details page for the project.
Deleting a model server
You can delete model servers that you have assigned to host your data science projects. This enables you to remove model servers that you no longer require. If you delete a project’s model server, models hosted by the server are then unavailable for applications to use.
-
You have logged in to Open Data Hub.
-
If you are using specialized Open Data Hub groups, you are part of the user group or admin group (for example,
rhods-users
) in OpenShift. -
You have created a data science project and an associated model server.
-
From the Open Data Hub dashboard, click Data Science Projects.
The Data science projects page opens.
-
Click the name of the project that you want to delete the model server from.
The Details page for the project opens.
-
Click the action menu (⋮) beside the project whose model server you want to delete in the Models and model servers section and click Delete model server.
The Delete model server dialog opens.
-
Enter the name of the model server in the text field to confirm that you intend to delete it.
-
Click Delete model server.
-
The model server that you deleted is no longer displayed in the Models and model servers section on the project Details page.
Viewing Python packages installed on your notebook server
You can check which Python packages are installed on your notebook server and which version of the package you have by running the pip
tool in a notebook cell.
-
Log in to Jupyter and open a notebook.
-
Enter the following in a new cell in your notebook:
!pip list
-
Run the cell.
-
The output shows an alphabetical list of all installed Python packages and their versions. For example, if you use this command immediately after creating a notebook server that uses the Minimal image, the first packages shown are similar to the following:
Package Version --------------------------------- ---------- aiohttp 3.7.3 alembic 1.5.2 appdirs 1.4.4 argo-workflows 3.6.1 argon2-cffi 20.1.0 async-generator 1.10 async-timeout 3.0.1 attrdict 2.0.1 attrs 20.3.0 backcall 0.2.0
Installing Python packages on your notebook server
You can install Python packages that are not part of the default notebook server image by adding the package and the version to a requirements.txt
file and then running the pip install
command in a notebook cell.
Note
|
You can also install packages directly, but using a requirements.txt file so that the packages stated in the file can be easily re-used across different notebooks is recommended. In addition, using a requirements.txt file is also useful when using a S2I build to deploy a model.
endif:[]
|
-
Log in to Jupyter and open a notebook.
-
Create a new text file using one of the following methods:
-
Click + to open a new launcher and click Text file.
-
Click File → New → Text File.
-
-
Rename the text file to
requirements.txt
.-
Right-click on the name of the file and click Rename Text. The Rename File dialog opens.
-
Enter
requirements.txt
in the New Name field and click Rename.
-
-
Add the packages to install to the
requirements.txt
file.altair
You can specify the exact version to install by using the
==
(equal to) operator, for example:altair==4.1.0
Specifying exact package versions to enhance the stability of your notebook server over time is recommended. New package versions can introduce undesirable or unexpected changes in your environment’s behavior. To install multiple packages at the same time, place each package on a separate line.
-
Install the packages in
requirements.txt
to your server using a notebook cell.-
Create a new cell in your notebook and enter the following command:
!pip install -r requirements.txt
-
Run the cell by pressing Shift and Enter.
ImportantThis installs the package on your notebook server, but you must still run the
import
directive in a code cell to use the package in your code.import altair
-
-
Confirm that the packages in
requirements.txt
appear in the list of packages installed on the notebook server. Viewing Python packages installed on your notebook server for details.
Updating notebook server settings by restarting your server
You can update the settings on your notebook server by stopping and relaunching the notebook server. For example, if your server runs out of memory, you can restart the server to make the container size larger.
-
A running notebook server.
-
Log in to Jupyter.
-
Click File → Hub Control Panel.
The Notebook server control panel opens.
-
Click the Stop notebook server button.
The Stop server dialog opens.
-
Click Stop server to confirm your decision.
The Start a notebook server page opens.
-
Update the relevant notebook server settings and click Start server.
-
The notebook server starts and contains your updated settings.
Model serving on Open Data Hub
As a data scientist, you can deploy your trained machine-learning models to serve intelligent applications in production. After you have deployed your model, applications can send requests to the model using its deployed API endpoint.
Configuring model servers
Configuring a model server for your data science project
Before you can successfully deploy a data science model on Open Data Hub, you must configure a model server. This includes configuring the number of replicas being deployed, the server size, the token authorization, and how the project is accessed.
-
You have logged in to Open Data Hub.
-
If you are using specialized Open Data Hub groups, you are part of the user group or admin group (for example,
rhods-users
) in OpenShift. -
You have created a data science project that you can add a model server to.
-
From the Open Data Hub dashboard, click Data Science Projects.
The Data science projects page opens.
-
Click the name of the project that you want to configure a model server for.
The Details page for the project opens.
-
Click Configure server in the Models and model servers section.
The Configure model server dialog appears.
-
Configure your project’s model server in the Compute resources per replica section.
-
Select the number of model server replicas to deploy.
-
Select one of the following server sizes for your model:
-
Small
-
Medium
-
Large
-
Custom
-
-
Optional: If you selected Custom, configure the following settings in the Model server size section to customize your model server:
-
Enter the number of CPUs to use with your model in the CPUs requested field.
-
Enter the maximum number of CPUs required for use with your model in the CPU limit field.
-
Enter the requested memory for the model server in gigabytes (GB) in the Memory requested field.
-
Enter the maximum memory limit for the model server in gigabytes (GB) in the Memory limit field.
-
-
Optional: Select the Make deployed available via an external route check box in the Model route section to make your deployed model available externally.
-
Optional: Select the Require token authorization check box in the Token Authorization section to apply token authentication to your model server.
-
Edit the service account name for which the token will be generated in the Token secret field. The generated token is created and displayed when the model server is configured.
-
To add an additional service account, click Add a service account and enter the relevant information in the Token secret field.
-
-
-
The model server that you configured is displayed in the Models and model servers section on the Details page for the project.
Updating a model server
You can update your data science project’s model server by changing details, such as the number of deployed replicas, the server size, the token authorization, and how the project is accessed.
-
You have logged in to Open Data Hub.
-
If you are using specialized Open Data Hub groups, you are part of the user group or admin group (for example,
rhods-users
) in OpenShift. -
You have created a data science project that has a model server assigned.
-
From the Open Data Hub dashboard, click Data Science Projects.
The Data science projects page opens.
-
Click the name of the project whose model server details you want to update.
The Details page for the project opens.
-
In the Models and model servers section, locate the model server you want to update, and click the action menu (⋮) → Edit model server.
The Configure model server dialog opens.
-
Update the model server’s properties.
-
Select the number of model server replicas to deploy.
-
Select one of the following server sizes for your model:
-
Small
-
Medium
-
Large
-
Custom
-
-
Optional: If you selected Custom, configure the following settings in the Model server size section to customize your model server:
-
Enter the number of CPUs to use with your model in the CPUs requested field.
-
Enter the maximum number of CPUs required for use with your model in the CPU limit field.
-
Enter the requested memory for the model server in gigabytes (GB) in the Memory requested field.
-
Enter the maximum memory limit for the model server in gigabytes (GB) in the Memory limit field.
-
-
Optional: Select the Make deployed available via an external route check box in the Model route section to make your deployed model available externally.
-
Optional: Select the Require token authorization check box in the Token Authorization section to apply token authentication to your model server.
-
Edit the service account name for which the token will be generated in the Token secret field. The generated token is created and displayed when the model server is configured.
-
To add an additional service account, click Add a service account and enter the relevant information in the Token secret field.
-
-
-
Click Configure.
-
The model server that you updated is displayed in the Models and model servers section on the Details page for the project.
Deleting a model server
You can delete model servers that you have assigned to host your data science projects. This enables you to remove model servers that you no longer require. If you delete a project’s model server, models hosted by the server are then unavailable for applications to use.
-
You have logged in to Open Data Hub.
-
If you are using specialized Open Data Hub groups, you are part of the user group or admin group (for example,
rhods-users
) in OpenShift. -
You have created a data science project and an associated model server.
-
From the Open Data Hub dashboard, click Data Science Projects.
The Data science projects page opens.
-
Click the name of the project that you want to delete the model server from.
The Details page for the project opens.
-
Click the action menu (⋮) beside the project whose model server you want to delete in the Models and model servers section and click Delete model server.
The Delete model server dialog opens.
-
Enter the name of the model server in the text field to confirm that you intend to delete it.
-
Click Delete model server.
-
The model server that you deleted is no longer displayed in the Models and model servers section on the project Details page.
Working with deployed models
Deploying a model in Open Data Hub
You can deploy trained models on Open Data Hub to enable you to test and implement them into intelligent applications. Deploying a model makes it available as a service that you can access using an API. This enables you to return predictions based on data inputs.
-
You have logged in to Open Data Hub.
-
If you are using specialized Open Data Hub groups, you are part of the user group or admin group (for example,
rhods-users
) in OpenShift. -
You have created a data science project that contains an associated model server.
-
From the Open Data Hub dashboard, click Data Science Projects.
The Data science projects page opens.
-
Click the name of the project containing the model that you want to deploy.
The Details page for the project opens.
-
Click Deploy model in the Models and model servers section.
The Deploy model dialog appears.
-
Configure the properties for deploying your model.
-
Enter a name for the model that you are deploying.
-
Select a model framework for the model that you are deploying.
-
Specify the location of your model.
-
Select Existing data connection to use a data connection that you previously defined.
-
Select the name of the data connection from the list.
-
Enter the folder path containing the model.
-
-
Select New data connection to add a new data connection that your model can access.
-
Enter a name for the data connection.
-
Enter your access key ID for Amazon Web Services in the AWS_ACCESS_KEY_ID field.
-
Enter your secret access key for the account you specified in the AWS_ACCESS_KEY_ID field.
-
Enter the endpoint of your AWS S3 storage in the AWS_S3_ENDPOINT field.
-
Enter the default region of your AWS account in the AWS_DEFAULT_REGION field.
-
Enter the name of the AWS S3 bucket in the AWS_S3_BUCKET field.
-
Enter the folder path of the file containing the data.
-
-
-
Click Configure.
-
-
The model you deployed is displayed in the Deployed models table on the Model Serving page, accessed from the dashboard.
Viewing a deployed model
To analyse the results of your work, you can view a list of deployed models on Open Data Hub. You can also view the current statuses of deployed models and their endpoints.
-
You have logged in to Open Data Hub.
-
If you are using specialized Open Data Hub groups, you are part of the user group or admin group (for example,
rhods-users
) in OpenShift. -
There are active and deployed data science models available on the Model Serving page.
-
From the Open Data Hub dashboard, click Model Serving.
The Model Serving page opens.
-
Review the list of deployed models.
Inference endpoints are displayed in the Inference endpoint column in the Deployed models table.
-
Optional: Click the Copy button (
) on the relevant row to copy the model’s inference endpoint to the clipboard.
-
A list of previously deployed data science models is displayed on the Model Serving page.
Updating the deployment properties of a deployed model
You can update the deployment properties of a model that has been deployed previously. This allows you to change the model’s data connection and name.
-
You have logged in to Open Data Hub.
-
If you are using specialized Open Data Hub groups, you are part of the user group or admin group (for example,
rhods-users
) in OpenShift. -
You have deployed a model on Open Data Hub.
-
From the Open Data Hub dashboard, click Model serving.
The Model Serving page opens.
-
Click the action menu (⋮) beside the model whose deployment properties you want to update and click Edit.
The Deploy model dialog opens.
-
Update the deployment properties of the model.
-
Update the name of the model.
-
Select an option to specify the location of your model.
-
If you selected Existing data connection, update the folder path of your existing data connection.
-
If you selected New data connection, update the relevant fields to add a new data connection.
-
Update the data connection name.
-
Update your access key ID for Amazon Web Services in the AWS_ACCESS_KEY_ID field.
-
Update your secret access key for the account you specified in the AWS_ACCESS_KEY_ID field.
-
Update the endpoint of your AWS S3 storage in the AWS_S3_ENDPOINT field.
-
Update the default region of your AWS account in the AWS_DEFAULT_REGION field.
-
Update the name of the AWS S3 bucket in the AWS_S3_BUCKET field.
-
Update the folder path of the file containing the data.
-
-
-
Click Configure.
-
-
The model whose deployment properties you updated is displayed on the Model Serving page.
Deleting a deployed model
You can delete models you have previously deployed. This enables you to remove deployed models that are no longer required.
-
You have logged in to Open Data Hub.
-
If you are using specialized Open Data Hub groups, you are part of the user group or admin group (for example,
rhods-users
) in OpenShift. -
You have have deployed a model.
-
From the Open Data Hub dashboard, click Model serving.
The Model Serving page opens.
-
Click the action menu (⋮) beside the deployed model that you want to delete and click Delete.
The Delete deployed model dialog opens.
-
Enter the name of the deployed model in the text field to confirm that you intend to delete it.
-
Click Delete deployed model.
-
The model that you deleted is no longer displayed on the Model Serving page.
Troubleshooting common problems in Jupyter for administrators
If your users are experiencing errors in Open Data Hub relating to Jupyter, their notebooks, or their notebook server, read this section to understand what could be causing the problem, and how to resolve the problem.
A user receives a 404: Page not found error when logging in to Jupyter
If you have configured specialized Open Data Hub user groups, the user name might not be added to the default user group for Open Data Hub.
Check whether the user is part of the default user group.
-
Find the names of groups allowed access to Jupyter.
-
Log in to Open Data Hub web console.
-
Click User Management → Groups.
-
Click the name of your user group, for example,
rhods-users
.The Group details page for that group appears.
-
-
Click the Details tab for the group and confirm that the Users section for the relevant group, contains the users who have permission to access Jupyter.
-
If the user is not added to any of the groups allowed access to Jupyter, add them.
A user’s notebook server does not start
The OpenShift Container Platform cluster that hosts the user’s notebook server might not have access to enough resources, or the Jupyter pod may have failed.
-
Log in to Open Data Hub web console.
-
Delete and restart the notebook server pod for this user.
-
Click Workloads → Pods and set the Project to
rhods-notebooks
. -
Search for the notebook server pod that belongs to this user, for example,
jupyter-nb-<username>-*
.If the notebook server pod exists, an intermittent failure may have occurred in the notebook server pod.
If the notebook server pod for the user does not exist, continue with diagnosis.
-
-
Check the resources currently available in the OpenShift cluster against the resources required by the selected notebook server image.
If worker nodes with sufficient CPU and RAM are available for scheduling in the cluster, continue with diagnosis.
-
Check the state of the Jupyter pod.
-
If there was an intermittent failure of the notebook server pod:
-
Delete the notebook server pod that belongs to the user.
-
Ask the user to start their notebook server again.
-
-
If the notebook server does not have sufficient resources to run the selected notebook server image, either add more resources to the OpenShift cluster, or choose a smaller image size.
The user receives a database or disk is full error or a no space left on device error when they run notebook cells
The user might have run out of storage space on their notebook server.
-
Log in to Jupyter and start the notebook server that belongs to the user having problems. If the notebook server does not start, follow these steps to check whether the user has run out of storage space:
-
Log in to {product-short} web console.
-
Click Workloads → Pods and set the Project to
rhods-notebooks
. -
Click the notebook server pod that belongs to this user, for example,
jupyter-nb-<idp>-<username>-*
. -
Click Logs. The user has exceeded their available capacity if you see lines similar to the following:
Unexpected error while saving file: XXXX database or disk is full
-
Increase the user’s available storage by expanding their persistent volume.
* Work with the user to identify files that can be deleted from the /opt/app-root/src
directory on their notebook server to free up their existing storage space.
Troubleshooting common problems in Jupyter for users
If you are seeing errors in Open Data Hub related to Jupyter, your notebooks, or your notebook server, read this section to understand what could be causing the problem.
I see a 403: Forbidden error when I log in to Jupyter
If your administrator has configured specialized Open Data Hub user groups, your user name might not be added to the default user group or the default administrator group for Open Data Hub.
-
Contact your administrator so that they can add you to the correct group/s.
My notebook server does not start
The OpenShift Container Platform cluster that hosts your notebook server might not have access to enough resources, or the Jupyter pod may have failed.
Check the logs in the Events section in OpenShift for error messages associated with the problem. For example:
Server requested 2021-10-28T13:31:29.830991Z [Warning] 0/7 nodes are available: 2 Insufficient memory, 2 node(s) had taint {node-role.kubernetes.io/infra: }, that the pod didn't tolerate, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.
Contact your administrator with details of any relevant error messages so that they can perform further checks.
I see a database or disk is full error or a no space left on device error when I run my notebook cells
You might have run out of storage space on your notebook server.
Contact your administrator so that they can perform further checks.