Skip to content

Instantly share code, notes, and snippets.

@Zabrane
Forked from usrbinkat/Dockerfile
Created April 19, 2025 11:29
Show Gist options
  • Save Zabrane/db6cfcebf15e35265c71cd5007ac4a57 to your computer and use it in GitHub Desktop.
Save Zabrane/db6cfcebf15e35265c71cd5007ac4a57 to your computer and use it in GitHub Desktop.

Revisions

  1. @usrbinkat usrbinkat revised this gist Oct 27, 2024. 1 changed file with 181 additions and 3 deletions.
    184 changes: 181 additions & 3 deletions README.md
    Original file line number Diff line number Diff line change
    @@ -1,4 +1,182 @@
    # How to create PDF files from Markdown
    # Pandoc Docker Container

    1. Docker Build
    2. Docker run Pandoc
    A Docker container for converting Markdown files to high-quality PDFs using Pandoc and XeLaTeX.

    ## Features

    - **Easy Conversion**: Transform Markdown files into professional PDFs effortlessly.
    - **High-Quality Output**: Leverages XeLaTeX and custom fonts for superior typography and Unicode support.
    - **Fully Featured**: Pre-installed with Pandoc, extensive LaTeX packages, and fonts for comprehensive PDF generation.
    - **Customizable**: Modify the Dockerfile to suit your specific needs or extend functionality.
    - **Pipeline Ready**: Ideal for integration into CI/CD pipelines or automated documentation workflows.

    ## Table of Contents

    - [Getting Started](#getting-started)
    - [Prerequisites](#prerequisites)
    - [Installation](#installation)
    - [Usage](#usage)
    - [Simple Conversion](#simple-conversion)
    - [Advanced Conversion](#advanced-conversion)
    - [Examples](#examples)
    - [Building the Docker Image](#building-the-docker-image)
    - [Customization](#customization)
    - [Contributing](#contributing)
    - [License](#license)
    - [Acknowledgments](#acknowledgments)

    ## Getting Started

    ### Prerequisites

    - **Docker**: Ensure Docker is installed on your system. [Get Docker](https://www.docker.com/get-started)

    ### Installation

    Pull the Docker image from Docker Hub:

    ```bash
    docker pull containercraft/pandoc
    ```

    Or build the image locally using the provided Dockerfile:

    ```bash
    docker build --progress plain --tag containercraft/pandoc -f Dockerfile .
    ```

    ## Usage

    ### Simple Conversion

    To convert a Markdown file (`my_document.md`) to PDF:

    ```bash
    docker run --rm -v $(pwd):/convert containercraft/pandoc my_document.md
    ```

    - `--rm`: Automatically removes the container after execution.
    - `-v $(pwd):/convert`: Mounts the current directory into the container.
    - `my_document.md`: The Markdown file to convert.

    The generated PDF (`my_document.pdf`) will be saved in your current directory.

    ### Advanced Conversion

    The container uses an entrypoint script (`pandoc-entrypoint`) with the following Pandoc command:

    ```bash
    pandoc my_document.md -o my_document.pdf \
    -V mainfont="Noto Serif" \
    -V monofont="Noto Mono" \
    -V geometry:margin=1in \
    --highlight-style=kate \
    --pdf-engine=xelatex \
    --toc -N
    ```

    #### Explanation of Options:

    - `-V mainfont="Noto Serif"`: Sets the main text font.
    - `-V monofont="Noto Mono"`: Sets the monospaced font.
    - `-V geometry:margin=1in`: Sets document margins.
    - `--highlight-style=kate`: Applies syntax highlighting style.
    - `--pdf-engine=xelatex`: Uses XeLaTeX for better font and Unicode support.
    - `--toc`: Includes a table of contents.
    - `-N`: Numbers the sections.

    #### Custom Usage

    To customize the conversion process, you can:

    - **Modify the Entrypoint Script**: Adjust `pandoc-entrypoint` with your preferred options.
    - **Run Pandoc Directly**: Access the container's shell and run Pandoc commands manually.

    ```bash
    docker run --rm -it -v $(pwd):/convert containercraft/pandoc /bin/bash
    ```

    Once inside the container:

    ```bash
    pandoc my_document.md -o my_document.pdf [your options]
    ```

    ## Examples

    ### Batch Conversion

    Convert all Markdown files in a directory:

    ```bash
    for file in *.md; do
    docker run --rm -v $(pwd):/convert containercraft/pandoc "$file"
    done
    ```

    ### Integration with CI/CD Pipelines

    Use the container in automated workflows:

    **GitHub Actions Example:**

    ```yaml
    jobs:
    build_pdf:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v2
    - name: Convert Markdown to PDF
    run: |
    docker run --rm -v ${{ github.workspace }}:/convert containercraft/pandoc my_document.md
    ```
    **GitLab CI/CD Example:**
    ```yaml
    pdf_generation:
    image: containercraft/pandoc
    script:
    - pandoc my_document.md -o my_document.pdf
    artifacts:
    paths:
    - my_document.pdf
    ```
    ## Building the Docker Image
    Clone the repository and build the image:
    ```bash
    git clone https://github.com/yourusername/pandoc-docker.git
    cd pandoc-docker
    docker build --progress plain --tag containercraft/pandoc -f Dockerfile .
    ```

    ## Customization

    ### Modify the Dockerfile

    - **Add Packages**: Include additional LaTeX packages or fonts by modifying the `APT_PKGS` variable.
    - **Change Entrypoint**: Update `pandoc-entrypoint` to alter default Pandoc options.

    ### Extend Functionality

    - **Install Additional Tools**: Install tools like `tesseract-ocr` for OCR capabilities.
    - **Integrate Filters**: Add Pandoc filters or Lua scripts for advanced processing.

    ## Contributing

    Contributions are welcome! Please:

    1. **Fork the Repository**: Click the "Fork" button on GitHub.
    2. **Create a Feature Branch**: `git checkout -b feature/your-feature`
    3. **Commit Your Changes**: `git commit -m 'Add your feature'`
    4. **Push to the Branch**: `git push origin feature/your-feature`
    5. **Open a Pull Request**: Describe your changes and submit.

    ## Acknowledgments

    - **[Pandoc](https://pandoc.org/)**: Universal document converter.
    - **[LaTeX Project](https://www.latex-project.org/)**: High-quality typesetting system.
    - **[Docker](https://www.docker.com/)**: Containerization platform.
  2. @usrbinkat usrbinkat created this gist Oct 27, 2024.
    81 changes: 81 additions & 0 deletions Dockerfile
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,81 @@
    ###############################################################################
    # Use:
    # - docker build --progress plain --tag docker.io/containercraft/pandoc -f Dockerfile .
    # - docker run --rm -it --name pandoc --hostname pandoc --volume .:/convert docker.io/containercraft/pandoc my_document.md

    ###############################################################################
    FROM docker.io/library/ubuntu:24.04
    LABEL tag="pandoc"
    ENV DEVCONTAINER="pandoc"
    SHELL ["/bin/bash", "-c", "-e"]

    #################################################################################
    # Environment Variables

    # Set locale to en_US.UTF-8
    ENV LANG=en_US.UTF-8
    ENV LANGUAGE=en_US:en
    ENV LC_ALL=en_US.UTF-8
    # Disable timezone prompts
    ENV TZ=UTC
    # Disable package manager prompts
    ENV DEBIAN_FRONTEND=noninteractive
    # Set default bin directory for new packages
    ENV BIN="/usr/local/bin"
    # Set default binary install command
    ENV INSTALL="install -m 755 -o root -g root"

    # Common Dockerfile Container Build Functions
    ENV apt_update="apt-get update"
    ENV apt_install="TERM=linux DEBIAN_FRONTEND=noninteractive apt-get install -q --yes --no-install-recommends"
    ENV apt_clean="apt-get clean && apt-get autoremove -y && apt-get purge -y --auto-remove"
    ENV curl="/usr/bin/curl --silent --show-error --tlsv1.2 --location"
    ENV dir_clean="\
    rm -rf \
    /var/lib/{apt,cache,log} \
    /usr/share/{doc,man,locale} \
    /var/cache/apt \
    /root/.cache \
    /var/tmp/* \
    /tmp/* \
    "

    #################################################################################
    # Base package and user configuration
    #################################################################################

    # Apt Packages
    ARG APT_PKGS="\
    locales \
    pandoc \
    texlive-latex-base \
    texlive-fonts-recommended \
    texlive-fonts-extra \
    texlive-latex-extra \
    texlive-xetex \
    texlive-luatex \
    texlive-science \
    fonts-lmodern \
    fonts-noto-cjk \
    fonts-noto-core \
    fonts-noto-color-emoji \
    "

    # Install Base Packages and Remove Unnecessary Ones
    RUN echo \
    && export TEST="pandoc --version" \
    && ${apt_update} \
    && bash -c "${apt_install} ${APT_PKGS}" \
    && locale-gen \
    && update-locale LANG=en_US.UTF-8 \
    && bash -c "${apt_clean}" \
    && ${dir_clean} \
    && ${TEST} \
    && true

    #################################################################################
    # Set the default command
    #################################################################################
    ADD ./rootfs /
    WORKDIR /convert
    ENTRYPOINT ["pandoc-entrypoint"]
    4 changes: 4 additions & 0 deletions README.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,4 @@
    # How to create PDF files from Markdown

    1. Docker Build
    2. Docker run Pandoc
    12 changes: 12 additions & 0 deletions pandoc-entrypoint
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,12 @@
    #!/bin/bash -x

    file_name="$(echo $1 | sed 's/\.md//')"
    echo "INFO >> Converting file to pdf: ${file_name}.md > ${file_name}.pdf"

    pandoc ${file_name}.md -o ${file_name}.pdf \
    -V mainfont="Noto Serif" \
    -V monofont="Noto Mono" \
    -V geometry:margin=1in \
    --highlight-style=kate \
    --pdf-engine=xelatex \
    --toc -N