.. _setup: Environment Setup ====================== GraphStorm supports two environment setup methods: - Install GraphStorm as a pip package. This method works well for development and test on a single machine. - Setup a GraphStorm Docker image. This method is good for using GraphStorm in distributed environments that commonly used in production. .. _setup_pip: 1. Setup GraphStorm with pip Packages -------------------------------------- Prerequisites ............... 1. **Linux OS**: The current version of GraphStorm supports Linux as the Operation System. We tested GraphStorm on both Ubuntu (22.04 or later version) and Amazon Linux 2. 2. **Python3**: The current version of GraphStorm requires Python installed with the version larger than **3.8**. 3. (Optional) GraphStorm supports **Nvidia GPUs** for using GraphStorm in GPU environments. Install GraphStorm ................... Users can use ``pip`` or ``pip3`` to install GraphStorm. .. code-block:: bash pip install graphstorm Install Dependencies ..................... Users should install PyTorch v2.1.0 and DGL v1.1.3 that is the core dependency of GraphStorm using the following commands. For Nvidia GPU environment: .. code-block:: bash # for CUDA 11 pip install torch==2.1.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 pip install dgl==1.1.3+cu118 -f https://data.dgl.ai/wheels/cu118/repo.html # for CUDA 12 pip install torch==2.1.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 pip install dgl==1.1.3+cu121 -f https://data.dgl.ai/wheels/cu121/repo.html For CPU environment: .. code-block:: bash pip install torch==2.1.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu pip install dgl==1.1.3 -f https://data.dgl.ai/wheels-internal/repo.html Configure SSH No-password login (optional) .......................................... To perform distributed training in a cluster of machines, please use the following commands to configure a local SSH no-password login that GraphStorm relies on. .. note:: The "SSH No-password login" is **NOT** needed for GraphStorm's Standalone mode, i.e., running GraphStorm in one machine only. .. code-block:: bash ssh-keygen -t rsa -f ~/.ssh/id_rsa -N '' cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys Then use this command to test if the SSH no-password login works. .. code-block:: bash ssh 127.0.0.1 If everything is right, the above command will enter another Linux shell process. Then exit this new shell with the command ``exit``. Clone GraphStorm Toolkits (Optional) .......................................... GraphStorm provides a set of toolkits, including scripts, tools, and examples, which can facilitate the use of GraphStrom. * **graphstorm/training_scripts/** and **graphstorm/inference_scripts/** include examplar configuration yaml files that used in GraphStorm documentations and tutorials. * **graphstorm/examples** includes Python code for customized models and customized data preparation. * **graphstorm/tools** includes graph partition and related Python code. * **graphstorm/sagemaker** include commands and code to run GraphStorm on Amazon SageMaker. Users can clone GraphStorm source code to obtain these toolkits. .. code-block:: bash git clone https://github.com/awslabs/graphstorm.git .. _setup_docker: 2. Setup GraphStorm Docker Environment --------------------------------------- Prerequisites ............... 1. **Docker**: You need to install Docker in your environment as the `Docker documentation `_ suggests, and the `Nvidia Container Toolkit `_. For example, in an AWS EC2 instance without Docker preinstalled, you can run the following commands to install Docker. .. code-block:: bash sudo apt-get update sudo apt update sudo apt install Docker.io If using AWS `Deep Learning AMI GPU version`, the Nvidia Container Toolkit has been preinstalled. 2. (Optional) GraphStorm supports **Nvidia GPUs** for using GraphStorm in GPU environments. .. _build_docker: Build a GraphStorm Docker image from source code ................................................. Please use the following command to build a Docker image from source: .. code-block:: bash git clone https://github.com/awslabs/graphstorm.git cd /path-to-graphstorm/docker/ bash /path-to-graphstorm/docker/build_docker_oss4local.sh /path-to-graphstorm/ image-name image-tag device There are four positional arguments for ``build_docker_oss4local.sh``: 1. **path-to-graphstorm** (**required**), is the absolute path of the "graphstorm" folder, where you cloned the GraphStorm source code. For example, the path could be ``/code/graphstorm``. 2. **image-name** (optional), is the assigned name of the to be built Docker image. Default is ``graphstorm``. 3. **image-tag** (optional), is the assigned tag prefix of the Docker image. Default is ``local``. 4. **device** (optional), is the intended device for the docker image. This ges suffixed to ``image-tag``. Default is ``gpu``, can also build a ``cpu`` image. If Docker requires you to run it as a root user and you don't want to preface all docker commands with sudo, you can check the solution available `here `_. You can use the below command to check if the new Docker image is created successfully. .. code:: bash docker image ls If the build succeeds, there should be a new Docker image, named *:*, e.g., ``graphstorm:local-gpu``. To push the image to ECR you can use the `push_gsf_container.sh` script. It takes 4 positional arguments, `image-name` `image-tag-device`, `region`, and `account`. For example to push the local GPU image to the us-west-2 on AWS account `1234567890` use: .. code-block:: bash bash docker/push_gsf_container.sh graphstorm local-gpu us-west-2 1234567890 Create a GraphStorm Container .............................. First, you need to create a GraphStorm container based on the Docker image built in the previous step. Run the following command: .. code:: bash docker run --gpus all --network=host -v /dev/shm:/dev/shm/ -d --name test graphstorm:local-gpu This command will create a GraphStorm container, named ``test`` and run the container as a daemon. Then connect to the container by running the following command: .. code:: bash docker container exec -it test /bin/bash If succeeds, the command prompt will change to the container's, like .. code-block:: console root@:/# .. note:: If you are preparing the environment to run GraphStorm in a distributed setting, specific instruction for running a Docker image with the NFS folder is given in the :ref:`Use GraphStorm in a Distributed Cluster`.