### Cài đặt Jupyter qua conda ``` conda install -c conda-forge jupyterhub # installs jupyterhub and proxy conda install jupyterlab notebook # needed if running the notebook servers in the same environment ``` ### File cấu hình Filename: jupyterhub_config.py ``` c = get_config() #noqa c.JupyterHub.ip = '0.0.0.0' c.JupyterHub.port = 8080 c.Authenticator.admin_users = { 'hoangdh' } # c.LocalAuthenticator.create_system_users=True c.FileContentsManager.delete_to_trash = False ``` ### Systemd Filename: /lib/systemd/system/jupyterhub.service ``` [Unit] Description=JupyterLab Server [Service] Type=Simple User=root Group=root WorkingDirectory=/data/jupyterhub/ Environment="PATH=/data/softs/anaconda/envs/jupyterhub/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin" ExecStart=/data/softs/anaconda/envs/jupyterhub/bin/jupyterhub -f /data/jupyterhub/jupyterhub_config.py [Install] WantedBy=multi-user.target ``` ### Tạo môi trường Python cho Jupyter - Tạo user cho hub ``` useradd -m -d /data/users/bi_shark/ -s /bin/bash bi_shark passwd bi_shark ``` - Khởi tạo conda cho user ``` su - bi_shark /data/bigdata/anaconda3/bin/conda init ``` - Đăng nhập lại và tạo môi trường mới cho user ``` conda create -n bi_shark python=3.9 -y ``` - Cài kernel mới cho Jupyter ``` conda activate bi_shark pip install --upgrade pip pip install ipykernel python -m ipykernel install --user --name="bi_shark" --display-name="Python 3.9 (Conda)" ``` - Thêm các biến môi trường riêng cho user > vi /data/users/bi_shark/.local/share/jupyter/kernels/bi_shark/run.sh ``` #!/usr/bin/bash export JAVA_HOME=/opt/softs/jdk1.8.0_331 export HADOOP_HOME=/opt/softs/hadoop export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export CLASSPATH=$($HADOOP_HOME/bin/hdfs classpath --glob) export HADOOP_PREFIX=$HADOOP_HOME export SPARK_LOCAL_HOSTNAME=10.10.10.10 export SPARK_HOME=/opt/softs/spark # Change me export USER_ENV="/data/users/bi_shark/.conda/envs/bi_shark" export PATH="${USER_ENV}/bin:/data/bigdata/anaconda3/condabin/:${PATH}" exec ${USER_ENV}/bin/python -m ipykernel "$@" ``` > chmod +x /data/users/bi_shark/.local/share/jupyter/kernels/bi_shark/run.sh - Sửa file khởi động của kernel Ta thay file khởi động kernel với script ta vừa tạo bên trên > vi /data/users/bi_shark/.local/share/jupyter/kernels/bi_/kernel.json ``` { "argv": [ "/data/users/bi_shark/.local/share/jupyter/kernels/bi_shark/run.sh", "-f", "{connection_file}" ], "display_name": "Python 3.9 (Conda)", "language": "python", "metadata": { "debugger": true } } ``` Bonus: Script tự tạo môi trường Conda và Kernel tương ứng ``` #!/bin/bash ENV_NAME="${USER}_p${1}" ENV_DISPLAYNAME="Python ${1} (Conda)" conda create --name ${USER}_p${1} python=${1} -y eval "$(conda shell.bash hook)" conda activate ${ENV_NAME} pip install --upgrade pip pip install ipykernel python -m ipykernel install --user --name="${ENV_NAME}" --display-name="${ENV_DISPLAYNAME}" cat > ~/.local/share/jupyter/kernels/${ENV_NAME}/run.sh << EOF #!/usr/bin/bash export JAVA_HOME=/opt/softs/jdk1.8.0_331 export HADOOP_HOME=/opt/softs/hadoop export HADOOP_CONF_DIR=\$HADOOP_HOME/etc/hadoop export CLASSPATH=\$(\$HADOOP_HOME/bin/hdfs classpath --glob) export HADOOP_PREFIX=\$HADOOP_HOME export SPARK_LOCAL_HOSTNAME=10.10.10.10 export SPARK_HOME=/opt/softs/spark # Change me export USER_ENV="/data/users/${USER}/.conda/envs/${ENV_NAME}" export PATH="\${USER_ENV}/bin:/data/bigdata/anaconda3/condabin/:\${PATH}" exec \${USER_ENV}/bin/python -m ipykernel "\$@" EOF chmod +x ~/.local/share/jupyter/kernels/${ENV_NAME}/run.sh cat > ~/.local/share/jupyter/kernels/${ENV_NAME}/kernel.json << EOF { "argv": [ "~/.local/share/jupyter/kernels/${ENV_NAME}/run.sh", "-f", "{connection_file}" ], "display_name": "${ENV_DISPLAYNAME}", "language": "python", "metadata": { "debugger": true } } EOF ``` Ví dụ: Cài đặt môi trường mới cho user `bi_tuna` ``` useradd -m -d /data/users/bi_tuna/ -s /bin/bash bi_tuna passwd bi_tuna ``` Chuyển sang user `bi_tuna`; sao chép script và chạy script để tạo môi trường mới với Python 3.10 ``` su - bi_tuna /data/bigdata/anaconda3/bin/conda init logout su - bi_tuna ./script.sh 3.10 ``` Tham khảo: https://help.rc.ufl.edu/doc/Managing_Python_environments_and_Jupyter_kernels ### Cài đặt PyArrow > pip install pyarrow ``` export JAVA_HOME=/opt/softs/jdk1.8.0_331 export HADOOP_HOME=/opt/softs/hadoop export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export SPARK_HOME=/opt/softs/spark export PATH=${JAVA_HOME}/bin:${PATH} export CLASSPATH=`/opt/softs/hadoop/bin/hdfs classpath --glob` ``` Lưu ý: Chỗ CLASSPATH cần chạy lệnh và copy output vào biến.