Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save eff-kay/afadab8b679d3f154ecfa7766d869f14 to your computer and use it in GitHub Desktop.
Save eff-kay/afadab8b679d3f154ecfa7766d869f14 to your computer and use it in GitHub Desktop.

Revisions

  1. @DannyQuah DannyQuah revised this gist Jan 29, 2023. 1 changed file with 17 additions and 14 deletions.
    31 changes: 17 additions & 14 deletions 2020.08-D.Quah-Pandoc-Workflow-Markdown-PDF.md
    Original file line number Diff line number Diff line change
    @@ -5,7 +5,7 @@ by Danny Quah, Aug 2020 (revised Jan 2022)
    *TL;DR: I write technical articles in LaTeX. But shorter, non-technical writings are easier to do in Markdown. How do I produce PDF from Markdown documents? Answer: provide YAML information in the Markdown; run Pandoc (typically through a Makefile or Atom's Markdown Preview Enhanced). To make all this work, some adjustment is needed in Pandoc options and template files.*


    Pandoc is a filter that takes a written document in a given format, and produces a version of that same document in yet a different format. I use Pandoc primarily to transform Markdown documents to PDF, but I also draw on Pandoc to convert Word or ODT documents to Markdown. And vice versa.
    Pandoc is a filter that takes a written document in a particular format, and produces a version of that same document in yet a different format. I use Pandoc primarily to transform Markdown documents to PDF, but I also draw on Pandoc to convert Word or ODT documents to Markdown. And vice versa.

    Available official Pandoc documentation is voluminous. So as a matter of logic the knowledge to generate PDF from Markdown, to the user's desired degree of control, is already extant, out there somewhere. But a user just beginning might not find a good starting point, and without the ability to produce something useful quickly to show for their efforts, that user can lose the incentive to discover more, experiment, and improve. This writeup provides such a starting point for the beginner.

    @@ -17,7 +17,7 @@ To do its job, Pandoc has many options available from the command line. Using th
    pandoc --read=odt --write=markdown oldarchive.odt -o mydocument.md
    ```

    In this example, the input is the Open Document format file `oldarchive.odt`; output the Markdown document `mydocument.md`. Alternatively, I might have wanted the instruction:
    In this example, the input is the Open Document format file `oldarchive.odt`, and output the Markdown document `mydocument.md`. Alternatively, I might have given instruction:

    ```
    pandoc --read=markdown --write=docx oldarchive.md -o mydocument.docx
    @@ -27,19 +27,22 @@ where now input is the Markdown file `oldarchive.md` and output the Word documen

    The conversion will never be perfect, but in many cases the result provides a fine starting point for further fine-tuning.


    ## Underlying Attributes

    In the two examples just given, what you see when you render or display the input file, whether on-screen or on paper, is also approximately what you will see when you display the output file. In many Pandoc applications, that is all the user wants. The user, or their collaborators, will thereafter seek to make further changes only by editing the output file.
    With conversion into odt or docx files, what you see when you render or display the file on-screen is also approximately what you will see when you print the file into PDF or hardcopy. In many Pandoc applications, that is all the user wants. The user, or their collaborators, will thereafter seek to make further changes only by editing the generated odt or docx file.

    I myself prefer to work with or edit Markdown but people I work with insist on using Word. So I convert my Markdown file to docx, and thereafter that latter document is what we hand back and forth in our editing.

    I myself prefer to edit Markdown directly but people I work with insist on using Word. So I convert my Markdown file to docx, and thereafter that latter document is what we hand back and forth in our editing.
    When the output is PDF, however, the result is no longer to be edited. Or, more accurately, to change the output file, users do not operate on the output file directly. Instead, a user will go back to the input file, make the alterations there, and then re-execute Pandoc to produce a new PDF. The PDF is literally just for show.

    When the output is PDF, however, the result is no longer to be edited. Or, more accurately, to change the output file, users do not operate on the output file directly. Instead, a user will go back to the input file, make the alterations there, and then re-execute Pandoc to produce a new PDF. The PDF is just for show.
    (To be clear, current technology does allow editing of a PDF file, by either special hooks or by its conversion to docx, and then editing that. This, however, is different from what I mean by going back to editing the input file and then re-generating the PDF.)

    This workflow having the PDF endpoint will be familiar to TeX users. With Pandoc, however, it is not just TeX or its relatives that can provide input for beautifully structured PDFs. Markdown documents can provide that input as well. Since Markdown is lighter-weight than TeX, it will not of course do everything that the latter can. However, for many routine, not especially technical, documents, Markdown provides a fine input engine as an alternative to TeX. The actual process is one where the Markdown document gets translated into LaTeX code, and then that result is fed into LaTeX itself.
    This workflow with a PDF endpoint will be familiar to TeX users. With Pandoc, however, it is not just TeX or LaTeX input or relatives that provide input for beautifully structured PDFs. Markdown documents can provide that input as well. Since Markdown is lighter-weight than TeX, it will not of course do everything that the latter can. However, for many routine, not especially technical, documents, Markdown provides a fine input engine as an alternative to TeX. The actual process behind the scenes, however, turns out to be one where the Markdown document gets translated into TeX (or LaTeX) code first, and then that is fed into TeX (or LaTeX) itself. The intermediate step, however, is invisible to the user.

    (A purist who insists on writing everything in TeX might remember that LaTeX stands in relation to TeX much as I've described for Markdown, except that LaTeX is higher-up in the hierarchy and thus closer to TeX. But, again, for many routine, not especially technical, writing, Markdown is already perfectly serviceable. Pandoc can be viewed as akin to `lex` and `yacc` for programmers. Who wants to code their own lexical analyzer in C from scratch every single time?)
    (A purist who insists on writing everything in TeX might remember that LaTeX stands in relation to TeX much as I've described for Markdown to either LaTeX or TeX. My view is that for many routine, not especially technical, writing, Markdown is already perfectly serviceable. Pandoc for writers can be viewed as akin to `lex` and `yacc` for programmers. Who wants to code their own lexical analyzer in C from scratch every single time?)

    Here, however, is both opportunity and potential pitfall. LaTeX and TeX are not translators that operate character by character or paragraph by paragraph. Instead, how they work is through structure. But how does structural information get conveyed from Markdown to LaTeX? Where in a Markdown document is encoded, in a single place, the information that, perhaps, every second-level heading is followed by a mediumskip? Or that all figures should float but be placed towards the top of the page closest to where they are first referenced?
    Here, however, is both opportunity and potential pitfall. LaTeX and TeX are not translators that operate character by character or paragraph by paragraph. Instead, how they work is through structure: this is what allows changing a single option in LaTeX to alter the entire look of the document. But how does structural information get conveyed from Markdown to LaTeX? Where in a Markdown document is encoded, in a single place, the information that, say, every second-level heading is followed by a mediumskip? Or that all figures should float but be placed towards the top of the page closest to where they are first referenced?

    This last feature might be observed to be empirically true in a specific document, but was that the intention of the author? Or did the figures just happen to come out that way? How can the author convey the information on what they intend here, in a way that Pandoc and thereafter LaTeX can use?

    @@ -50,9 +53,9 @@ The answer is two-fold: first, through YAML information; second, through templat
    ## Markdown to PDF via LaTeX


    For generating PDF, Pandoc will, behind the scenes, call on LaTeX or a related program. Pandoc can, to a great extent, do all this invisibly, but obviously those programs need to be installed somewhere on your system.
    For generating PDF, Pandoc will, behind the scenes, call on LaTeX or a related program. Pandoc can do all this invisibly, but obviously those called programs need to be installed somewhere on your system. Using this flow, by providing the right directives (not in Markdown itself but in another language that Markdown is able to work with) the Markdown document can provide information on the structure of the PDF output desired.

    Any Markdown document can begin with YAML information, i.e., a section that starts and ends with three dashes in sequence on a line by themselves. In between those, individual lines contain key-value pairs that provide structural information on the document. Thus, a simple YAML header might be:
    Any Markdown document can begin with YAML information, i.e., a section that starts and ends with three dashes in sequence on a line by themselves. Between those beginning and ending three dashes, individual lines contain key-value pairs that provide structural information on the document. Thus, for instance, a simple YAML header might be:

    ```yaml
    fileName: Pandoc-2020.08.md
    @@ -62,13 +65,13 @@ Tags: Software
    # Created: Sun 2020.08.09.1517 -- Danny Quah ([email protected])
    ```

    (preceded by and ending with the three-dash lines, of course). Like Python, YAML takes whitespace indentation to be significant. Comments are introduced by the `#` symbol, and are ignored by the processor.
    (preceded by and ending with the three-dash lines, of course). Like Python, YAML takes whitespace indentation to be significant, so don't try to prettify your file by introducing extraneous white spaces at the beginning of a YAML line. Comments are introduced by the `#` symbol, and are ignored by the processor.

    As with the `#`-introduced comments, however, as far as Markdown rendering is concerned, the YAML key-value pairs too are ignored. Indeed, on most systems, *all* YAML information is ignored by Markdown. If you have a file containing just the above lines in, say, the file `file.md`, opening this file with a Markdown previewer typically shows the file contains nothing to display. `Typora` will open up `file.md` and display the YAML header but in non-editable form. Similarly, Github. (So, to edit YAML, you'll need to open the file with a text editor like `Vim` or `Atom` or similar.)
    Markdown rendering ignores both `#`-introduced comments and YAML key-value pairs. Indeed, on most systems, *all* YAML information is ignored by Markdown. If you have a file containing just the above lines in, say, the file `file.md`, opening this file with a Markdown previewer typically shows the file contains no content to display. `Typora` will open up `file.md` and display the YAML header but in non-editable form. Github, similarly. (So, to edit YAML, you'll need to open the file with a text editor like `Vim` or `Atom` or similar.)

    But if YAML is ignored, what is the point to it? Here is what's critical: YAML is used by Pandoc and by LaTeX when these two generate a PDF document from Markdown. YAML information is read off the Markdown document by Pandoc, gets passed by Pandoc to LaTeX, and with the latter employing YAML's key-value pairs as directives for structuring the document.
    But if Markdown ignores YAML, what is the point? Here is what's critical: YAML is used by Pandoc and by LaTeX when these two come together to generate PDF from Markdown. Pandoc reads YAML, translates the key-value pairs data into LaTeX directives and then ships everything off to LaTeX, now with input all ready to structure the output.

    Thus, in many of my Markdown files destined for PDF output, the YAML header contains also:
    Thus, in many of my Markdown files destined for PDF output, the YAML header contains:
    ```
    ## Front Matter
    title: Readable Title for My Article
  2. @DannyQuah DannyQuah revised this gist Jan 28, 2022. 1 changed file with 0 additions and 215 deletions.
    215 changes: 0 additions & 215 deletions 2020.08-D.Quah-Pandoc-Workflow-Markdown-PDF-hold.md
    Original file line number Diff line number Diff line change
    @@ -1,215 +0,0 @@
    # My Pandoc Markdown-PDF Workflow for Routine, Not Especially Technical, Writing

    by Danny Quah, Aug 2020 (revised Jan 2022)

    *TL;DR: I write technical articles in LaTeX. But shorter, non-technical writings are easier to do in Markdown. How do I produce PDF from Markdown documents? Answer: provide YAML information in the Markdown; run Pandoc (typically through a Makefile or Atom's Markdown Preview Enhanced). To make all this work, some adjustment is needed in Pandoc options and template files.*


    Pandoc is a filter that takes a written document in a given format, and produces a version of that same document in yet a different format. I use Pandoc primarily to transform Markdown documents to PDF, but I also draw on Pandoc to convert Word or ODT documents to Markdown. And vice versa.

    Available official Pandoc documentation is voluminous. So as a matter of logic the knowledge to generate PDF from Markdown, to the user's desired degree of control, is already extant, out there somewhere. But a user just beginning might not find a good starting point, and without the ability to produce something useful quickly to show for their efforts, that user can lose the incentive to discover more, experiment, and improve. This writeup provides such a starting point for the beginner.

    ## Pandoc Basics

    To do its job, Pandoc has many options available from the command line. Using those can be as easy as:

    ```
    pandoc --read=odt --write=markdown oldarchive.odt -o mydocument.md
    ```

    In this example, the input is the Open Document format file `oldarchive.odt`; output the Markdown document `mydocument.md`. Alternatively, I might have wanted the instruction:

    ```
    pandoc --read=markdown --write=docx oldarchive.md -o mydocument.docx
    ```

    where now input is the Markdown file `oldarchive.md` and output the Word document `mydocument.docx`.

    The conversion will never be perfect, but in many cases the result provides a fine starting point for further fine-tuning.

    ## Underlying Attributes

    In the two examples just given, what you see when you render or display the input file, whether on-screen or on paper, is also approximately what you will see when you display the output file. In many Pandoc applications, that is all the user wants. The user, or their collaborators, will thereafter seek to make further changes only by editing the output file.

    I myself prefer to edit Markdown directly but the people I work with insist on using Word. So I convert my Markdown file to docx, and thereafter that latter document is what we hand back and forth in our editing.

    When the output is PDF, however, the result is no longer to be edited. Or, more accurately, to change the output file, users do not operate on the output file directly. Instead, a user will go back to the input file, make the alterations there, and then re-execute Pandoc to produce a new PDF. The PDF is just for show.

    This workflow having the PDF endpoint will be familiar to TeX users. With Pandoc, however, it is not just TeX or its relatives that can provide input for beautifully structured PDFs. Markdown documents can provide that input as well. Since Markdown is lighter-weight than TeX, it will not of course do everything that the latter can. However, for many routine, not especially technical, documents, Markdown provides a fine input engine as an alternative to TeX. The actual process is one where the Markdown document gets translated into LaTeX code, and then that result is fed into LaTeX itself.

    (A purist who insists on writing everything in TeX might remember that LaTeX stands in relation to TeX much as I've described for Markdown, except that LaTeX is higher-up in the hierarchy and thus closer to TeX. But, again, for many routine, not especially technical, writing, Markdown is already perfectly serviceable. Pandoc can be viewed as akin to `lex` and `yacc` for programmers. Who wants to code their own lexical analyzer in C from scratch every single time?)

    Here, however, is both opportunity and potential pitfall. LaTeX and TeX are not translators that operate character by character or paragraph by paragraph. Instead, how they work is through structure. But how does structural information get conveyed from Markdown to LaTeX? Where in a Markdown document is encoded, in a single place, the information that, perhaps, every second-level heading is followed by a mediumskip? Or that all figures should float but be placed towards the top of the page closest to where they are first referenced?

    This last feature might be observed to be empirically true in a specific document, but was that the intention of the author? Or did the figures just happen to come out that way? How can the author convey the information on what they intend here, in a way that Pandoc and thereafter LaTeX can use?

    The answer is two-fold: first, through YAML information; second, through template files.



    ## Markdown to PDF via LaTeX


    For generating PDF, Pandoc will, behind the scenes, call on LaTeX or a related program. Pandoc can, to a great extent, do all this invisibly, but obviously those programs need to be installed somewhere on your system.

    Any Markdown document can begin with YAML information, i.e., a section that starts and ends with three dashes in sequence on a line by themselves. In between those, individual lines contain key-value pairs that provide structural information on the document. Thus, a simple YAML header might be:

    ```yaml
    fileName: Pandoc-2020.08.md
    # Last-edited: Sun 2020.08.09.1841 -- Danny Quah ([email protected])
    Type: Notes
    Tags: Software
    # Created: Sun 2020.08.09.1517 -- Danny Quah ([email protected])
    ```

    (preceded by and ending with the three-dash lines, of course). Like Python, YAML takes whitespace indentation to be significant. Comments are introduced by the `#` symbol, and are ignored by the processor.

    As with the `#`-introduced comments, however, as far as Markdown rendering is concerned, the YAML key-value pairs too are ignored. Indeed, on most systems, *all* YAML information is ignored by Markdown. If you have a file containing just the above lines in, say, the file `file.md`, opening this file with a Markdown previewer typically shows the file contains nothing to display. `Typora` will open up `file.md` and display the YAML header but in non-editable form. Similarly, Github. (So, to edit YAML, you'll need to open the file with a text editor like `Vim` or `Atom` or similar.)

    But if YAML is ignored, what is the point to it? Here is what's critical: YAML is used by Pandoc and by LaTeX when these two generate a PDF document from Markdown. YAML information is read off the Markdown document by Pandoc, gets passed by Pandoc to LaTeX, and with the latter employing YAML's key-value pairs as directives for structuring the document.

    Thus, in many of my Markdown files destined for PDF output, the YAML header contains also:
    ```
    ## Front Matter
    title: Readable Title for My Article
    author:
    - name: Danny Quah
    affiliation: Lee Kuan Yew School of Public Policy, NUS
    email: [email protected]
    number: 1
    - name: My Coauthor
    affiliation: Economics Department, NUS
    email: [email protected]
    number: 2
    date: June 2020
    # abstract:
    # keywords:
    # thanks:
    ## Formatting
    fontsize: 12pt
    # mainfont: "gentium" # See https://fonts.google.com/ for fonts
    # sansfont: "Raleway"
    # monofont: "IBM Plex Mono"
    mathfont: ccmath
    # fontfamily: concrete | gentium | libertine
    # documentclass: article | scrartcl
    fontfamily: concrete
    documentclass: article
    classoption:
    - notitlepage
    - onecolumn
    fontenc: T1
    geometry:
    - a4paper
    - top=35mm
    - left=30mm
    - heightrounded
    header-includes:
    - |
    ```{=latex}
    \usepackage{amsmath,amsfonts,euscript,tikz,fancyhdr,float}
    \floatplacement{figure}{H}
    ```
    pagestyle: headings
    ```
    (obviously somewhere between the beginning and ending 3-dash `---` lines).
    This is almost all it takes to generate sensible-looking PDF from my Markdown document. The problem that remains is the `author` key-value pair. The `\maketitle` command in LaTeX does not understand `affiliation` and `email` keys, only `author`. Thus, how above I have written the author information, to include affiliation and email explicitly, will fail on the standard Pandoc latex template. Instead, the YAML key-value pair needs to be
    ```
    author:
    - Danny Quah `\\\\`{=latex} Lee Kuan Yew School of Public Policy, NUS `\\\\`{=latex} [email protected]
    ```
    so that to add a second author, write:
    ```
    author:
    - Danny Quah `\\\\`{=latex} Lee Kuan Yew School of Public Policy, NUS `\\\\`{=latex} [email protected]
    - My Coauthor `\\\\`{=latex} Economics Department, NUS `\\\\`{=latex} [email protected]
    ```
    This works for me.
    It can be neater, however, to separate out affiliation and email information explicitly, as in the `YAML` above. To use that then, the latex template that Pandoc uses will need to be modified.
    First, generate the default latex template that will subsequently be changed:
    ```
    pandoc -D latex > mytemplate.tex
    ```
    I put `mytemplate.tex` in `~/.pandoc/templates/` as that latter is the default personal folder that will be recognised subsequently by the Pandoc option
    ```shell
    --template=mytemplate.tex
    ```

    Now open up `mytemplate.tex` in a text editor and change the statement (or recognisable statement block) from:

    ```yaml
    \author{$for(author)$$author$$sep$ \and $endfor$}
    ```

    to

    ```yaml
    $if(author)$
    \usepackage{authblk}
    $for(author)$
    $if(author.name)$
    $if(author.number)$
    \author[$author.number$]{$author.name$}
    $else$
    \author[]{$author.name$}
    $endif$
    $if(author.affiliation)$
    $if(author.email)$
    \affil{$author.affiliation$ \thanks{$author.email$}}
    $else$
    \affil{$author.affiliation$}
    $endif$
    $endif$
    $else$
    \author{$author$}
    $endif$
    $endfor$
    $endif$
    ```
    As you can see, the replacement code contains reference to `affiliation` and `email`, as in the `YAML` header, but which is not generally available in LaTeX. What makes this work is that the replacement code also loads in the package `authblk` (in its second line), that will then properly situate the `affiliation` and `email` key-values when the LaTeX `\maketitle` instruction is invoked.

    If you decide to use the more elaborate, compacted `YAML` header and not change the latex template, then

    ```
    pandoc --standalone --read=markdown --write=pdf --pdfengine=pdflatex myinput.md -o myinput.pdf
    ```

    produces the desired `myinput.pdf`. If, however, you decide to use the more explicit and structured `affiliation` and `email` YAML header and the modified `mytemplate.tex` then use instead

    ```
    pandoc --standalone --read=markdown --write=pdf --template=mytemplate.tex --pdfengine=pdflatex myinput.md -o myinput.pdf
    ```

    instead, i.e., add the explicit new `--template` to your Pandoc call.

    If you want to inspect the LaTeX code that's produced along the way, you can undertake this production in two steps:

    ```
    pandoc --standalone --read=markdown --write=latex+raw_tex myinput.md -o myinput.tex
    pdflatex myinput.tex &>/dev/null
    ```

    adding in `--template=mytemplate.tex` as needed in the `pandoc` call.



    ## References

    https://maehr.github.io/academic-pandoc-template/
    https://learnbyexample.github.io/tutorial/ebook-generation/customizing-pandoc/

    https://pandoc.org/MANUAL.html#extension-pandoc_title_block
    https://uoftcoders.github.io/studyGroup/lessons/misc/pandoc-intro/lesson/
    https://opensource.com/article/18/9/pandoc-research-paper
    https://en.wikibooks.org/wiki/LaTeX/Title_Creation

    <!---
    Invisible section // Quah-D-2020.08-Pandoc-Workflow-Markdown-PDF.md
    -->

  3. @DannyQuah DannyQuah revised this gist Jan 28, 2022. 2 changed files with 219 additions and 4 deletions.
    215 changes: 215 additions & 0 deletions 2020.08-D.Quah-Pandoc-Workflow-Markdown-PDF-hold.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,215 @@
    # My Pandoc Markdown-PDF Workflow for Routine, Not Especially Technical, Writing

    by Danny Quah, Aug 2020 (revised Jan 2022)

    *TL;DR: I write technical articles in LaTeX. But shorter, non-technical writings are easier to do in Markdown. How do I produce PDF from Markdown documents? Answer: provide YAML information in the Markdown; run Pandoc (typically through a Makefile or Atom's Markdown Preview Enhanced). To make all this work, some adjustment is needed in Pandoc options and template files.*


    Pandoc is a filter that takes a written document in a given format, and produces a version of that same document in yet a different format. I use Pandoc primarily to transform Markdown documents to PDF, but I also draw on Pandoc to convert Word or ODT documents to Markdown. And vice versa.

    Available official Pandoc documentation is voluminous. So as a matter of logic the knowledge to generate PDF from Markdown, to the user's desired degree of control, is already extant, out there somewhere. But a user just beginning might not find a good starting point, and without the ability to produce something useful quickly to show for their efforts, that user can lose the incentive to discover more, experiment, and improve. This writeup provides such a starting point for the beginner.

    ## Pandoc Basics

    To do its job, Pandoc has many options available from the command line. Using those can be as easy as:

    ```
    pandoc --read=odt --write=markdown oldarchive.odt -o mydocument.md
    ```

    In this example, the input is the Open Document format file `oldarchive.odt`; output the Markdown document `mydocument.md`. Alternatively, I might have wanted the instruction:

    ```
    pandoc --read=markdown --write=docx oldarchive.md -o mydocument.docx
    ```

    where now input is the Markdown file `oldarchive.md` and output the Word document `mydocument.docx`.

    The conversion will never be perfect, but in many cases the result provides a fine starting point for further fine-tuning.

    ## Underlying Attributes

    In the two examples just given, what you see when you render or display the input file, whether on-screen or on paper, is also approximately what you will see when you display the output file. In many Pandoc applications, that is all the user wants. The user, or their collaborators, will thereafter seek to make further changes only by editing the output file.

    I myself prefer to edit Markdown directly but the people I work with insist on using Word. So I convert my Markdown file to docx, and thereafter that latter document is what we hand back and forth in our editing.

    When the output is PDF, however, the result is no longer to be edited. Or, more accurately, to change the output file, users do not operate on the output file directly. Instead, a user will go back to the input file, make the alterations there, and then re-execute Pandoc to produce a new PDF. The PDF is just for show.

    This workflow having the PDF endpoint will be familiar to TeX users. With Pandoc, however, it is not just TeX or its relatives that can provide input for beautifully structured PDFs. Markdown documents can provide that input as well. Since Markdown is lighter-weight than TeX, it will not of course do everything that the latter can. However, for many routine, not especially technical, documents, Markdown provides a fine input engine as an alternative to TeX. The actual process is one where the Markdown document gets translated into LaTeX code, and then that result is fed into LaTeX itself.

    (A purist who insists on writing everything in TeX might remember that LaTeX stands in relation to TeX much as I've described for Markdown, except that LaTeX is higher-up in the hierarchy and thus closer to TeX. But, again, for many routine, not especially technical, writing, Markdown is already perfectly serviceable. Pandoc can be viewed as akin to `lex` and `yacc` for programmers. Who wants to code their own lexical analyzer in C from scratch every single time?)

    Here, however, is both opportunity and potential pitfall. LaTeX and TeX are not translators that operate character by character or paragraph by paragraph. Instead, how they work is through structure. But how does structural information get conveyed from Markdown to LaTeX? Where in a Markdown document is encoded, in a single place, the information that, perhaps, every second-level heading is followed by a mediumskip? Or that all figures should float but be placed towards the top of the page closest to where they are first referenced?

    This last feature might be observed to be empirically true in a specific document, but was that the intention of the author? Or did the figures just happen to come out that way? How can the author convey the information on what they intend here, in a way that Pandoc and thereafter LaTeX can use?

    The answer is two-fold: first, through YAML information; second, through template files.



    ## Markdown to PDF via LaTeX


    For generating PDF, Pandoc will, behind the scenes, call on LaTeX or a related program. Pandoc can, to a great extent, do all this invisibly, but obviously those programs need to be installed somewhere on your system.

    Any Markdown document can begin with YAML information, i.e., a section that starts and ends with three dashes in sequence on a line by themselves. In between those, individual lines contain key-value pairs that provide structural information on the document. Thus, a simple YAML header might be:

    ```yaml
    fileName: Pandoc-2020.08.md
    # Last-edited: Sun 2020.08.09.1841 -- Danny Quah ([email protected])
    Type: Notes
    Tags: Software
    # Created: Sun 2020.08.09.1517 -- Danny Quah ([email protected])
    ```

    (preceded by and ending with the three-dash lines, of course). Like Python, YAML takes whitespace indentation to be significant. Comments are introduced by the `#` symbol, and are ignored by the processor.

    As with the `#`-introduced comments, however, as far as Markdown rendering is concerned, the YAML key-value pairs too are ignored. Indeed, on most systems, *all* YAML information is ignored by Markdown. If you have a file containing just the above lines in, say, the file `file.md`, opening this file with a Markdown previewer typically shows the file contains nothing to display. `Typora` will open up `file.md` and display the YAML header but in non-editable form. Similarly, Github. (So, to edit YAML, you'll need to open the file with a text editor like `Vim` or `Atom` or similar.)

    But if YAML is ignored, what is the point to it? Here is what's critical: YAML is used by Pandoc and by LaTeX when these two generate a PDF document from Markdown. YAML information is read off the Markdown document by Pandoc, gets passed by Pandoc to LaTeX, and with the latter employing YAML's key-value pairs as directives for structuring the document.

    Thus, in many of my Markdown files destined for PDF output, the YAML header contains also:
    ```
    ## Front Matter
    title: Readable Title for My Article
    author:
    - name: Danny Quah
    affiliation: Lee Kuan Yew School of Public Policy, NUS
    email: [email protected]
    number: 1
    - name: My Coauthor
    affiliation: Economics Department, NUS
    email: [email protected]
    number: 2
    date: June 2020
    # abstract:
    # keywords:
    # thanks:
    ## Formatting
    fontsize: 12pt
    # mainfont: "gentium" # See https://fonts.google.com/ for fonts
    # sansfont: "Raleway"
    # monofont: "IBM Plex Mono"
    mathfont: ccmath
    # fontfamily: concrete | gentium | libertine
    # documentclass: article | scrartcl
    fontfamily: concrete
    documentclass: article
    classoption:
    - notitlepage
    - onecolumn
    fontenc: T1
    geometry:
    - a4paper
    - top=35mm
    - left=30mm
    - heightrounded
    header-includes:
    - |
    ```{=latex}
    \usepackage{amsmath,amsfonts,euscript,tikz,fancyhdr,float}
    \floatplacement{figure}{H}
    ```
    pagestyle: headings
    ```
    (obviously somewhere between the beginning and ending 3-dash `---` lines).
    This is almost all it takes to generate sensible-looking PDF from my Markdown document. The problem that remains is the `author` key-value pair. The `\maketitle` command in LaTeX does not understand `affiliation` and `email` keys, only `author`. Thus, how above I have written the author information, to include affiliation and email explicitly, will fail on the standard Pandoc latex template. Instead, the YAML key-value pair needs to be
    ```
    author:
    - Danny Quah `\\\\`{=latex} Lee Kuan Yew School of Public Policy, NUS `\\\\`{=latex} [email protected]
    ```
    so that to add a second author, write:
    ```
    author:
    - Danny Quah `\\\\`{=latex} Lee Kuan Yew School of Public Policy, NUS `\\\\`{=latex} [email protected]
    - My Coauthor `\\\\`{=latex} Economics Department, NUS `\\\\`{=latex} [email protected]
    ```
    This works for me.
    It can be neater, however, to separate out affiliation and email information explicitly, as in the `YAML` above. To use that then, the latex template that Pandoc uses will need to be modified.
    First, generate the default latex template that will subsequently be changed:
    ```
    pandoc -D latex > mytemplate.tex
    ```
    I put `mytemplate.tex` in `~/.pandoc/templates/` as that latter is the default personal folder that will be recognised subsequently by the Pandoc option
    ```shell
    --template=mytemplate.tex
    ```

    Now open up `mytemplate.tex` in a text editor and change the statement (or recognisable statement block) from:

    ```yaml
    \author{$for(author)$$author$$sep$ \and $endfor$}
    ```

    to

    ```yaml
    $if(author)$
    \usepackage{authblk}
    $for(author)$
    $if(author.name)$
    $if(author.number)$
    \author[$author.number$]{$author.name$}
    $else$
    \author[]{$author.name$}
    $endif$
    $if(author.affiliation)$
    $if(author.email)$
    \affil{$author.affiliation$ \thanks{$author.email$}}
    $else$
    \affil{$author.affiliation$}
    $endif$
    $endif$
    $else$
    \author{$author$}
    $endif$
    $endfor$
    $endif$
    ```
    As you can see, the replacement code contains reference to `affiliation` and `email`, as in the `YAML` header, but which is not generally available in LaTeX. What makes this work is that the replacement code also loads in the package `authblk` (in its second line), that will then properly situate the `affiliation` and `email` key-values when the LaTeX `\maketitle` instruction is invoked.

    If you decide to use the more elaborate, compacted `YAML` header and not change the latex template, then

    ```
    pandoc --standalone --read=markdown --write=pdf --pdfengine=pdflatex myinput.md -o myinput.pdf
    ```

    produces the desired `myinput.pdf`. If, however, you decide to use the more explicit and structured `affiliation` and `email` YAML header and the modified `mytemplate.tex` then use instead

    ```
    pandoc --standalone --read=markdown --write=pdf --template=mytemplate.tex --pdfengine=pdflatex myinput.md -o myinput.pdf
    ```

    instead, i.e., add the explicit new `--template` to your Pandoc call.

    If you want to inspect the LaTeX code that's produced along the way, you can undertake this production in two steps:

    ```
    pandoc --standalone --read=markdown --write=latex+raw_tex myinput.md -o myinput.tex
    pdflatex myinput.tex &>/dev/null
    ```

    adding in `--template=mytemplate.tex` as needed in the `pandoc` call.



    ## References

    https://maehr.github.io/academic-pandoc-template/
    https://learnbyexample.github.io/tutorial/ebook-generation/customizing-pandoc/

    https://pandoc.org/MANUAL.html#extension-pandoc_title_block
    https://uoftcoders.github.io/studyGroup/lessons/misc/pandoc-intro/lesson/
    https://opensource.com/article/18/9/pandoc-research-paper
    https://en.wikibooks.org/wiki/LaTeX/Title_Creation

    <!---
    Invisible section // Quah-D-2020.08-Pandoc-Workflow-Markdown-PDF.md
    -->

    8 changes: 4 additions & 4 deletions 2020.08-D.Quah-Pandoc-Workflow-Markdown-PDF.md
    Original file line number Diff line number Diff line change
    @@ -31,7 +31,7 @@ The conversion will never be perfect, but in many cases the result provides a fi

    In the two examples just given, what you see when you render or display the input file, whether on-screen or on paper, is also approximately what you will see when you display the output file. In many Pandoc applications, that is all the user wants. The user, or their collaborators, will thereafter seek to make further changes only by editing the output file.

    I myself prefer to edit Markdown directly but the people I work with insist on using Word. So I convert my Markdown file to docx, and thereafter that latter document is what we hand back and forth in our editing.
    I myself prefer to edit Markdown directly but people I work with insist on using Word. So I convert my Markdown file to docx, and thereafter that latter document is what we hand back and forth in our editing.

    When the output is PDF, however, the result is no longer to be edited. Or, more accurately, to change the output file, users do not operate on the output file directly. Instead, a user will go back to the input file, make the alterations there, and then re-execute Pandoc to produce a new PDF. The PDF is just for show.

    @@ -110,11 +110,11 @@ header-includes:
    ```{=latex}
    \usepackage{amsmath,amsfonts,euscript,tikz,fancyhdr,float}
    \floatplacement{figure}{H}
    ```
    \```
    pagestyle: headings
    ```

    (obviously somewhere between the beginning and ending 3-dash `---` lines).
    (somewhere between the beginning and ending 3-dash `---` lines. Also, the line right after `floatplacement{figure}...` should contain an indented three backquotes, but I'm having trouble getting GitHub's markdown processor to process it that way rather than as the premature end of my codeblock. Here, I've written that sequence in with a backslash qualifier instead, but that backslash obviously needs to be removed in production code.)

    This is almost all it takes to generate sensible-looking PDF from my Markdown document. The problem that remains is the `author` key-value pair. The `\maketitle` command in LaTeX does not understand `affiliation` and `email` keys, only `author`. Thus, how above I have written the author information, to include affiliation and email explicitly, will fail on the standard Pandoc latex template. Instead, the YAML key-value pair needs to be

    @@ -130,7 +130,7 @@ author:
    ```
    This works for me.

    It can be neater, however, to separate out affiliation and email information explicitly, as in the `YAML` above. To use that then, the latex template that Pandoc uses will need to be modified.
    It can be neater, however, to separate out affiliation and email information explicitly, as in the `YAML` above. To use that, the latex template that Pandoc uses will need to be modified.

    First, generate the default latex template that will subsequently be changed:
    ```
  4. @DannyQuah DannyQuah revised this gist Jan 12, 2022. 1 changed file with 0 additions and 10 deletions.
    10 changes: 0 additions & 10 deletions 2020.08-D.Quah-Pandoc-Workflow-Markdown-PDF.md
    Original file line number Diff line number Diff line change
    @@ -1,13 +1,9 @@
    <<<<<<< HEAD:2020.08-D.Quah-Pandoc-Workflow-Markdown-PDF.md
    # My Pandoc Markdown-PDF Workflow for Routine, Not Especially Technical, Writing

    by Danny Quah, Aug 2020 (revised Jan 2022)

    *TL;DR: I write technical articles in LaTeX. But shorter, non-technical writings are easier to do in Markdown. How do I produce PDF from Markdown documents? Answer: provide YAML information in the Markdown; run Pandoc (typically through a Makefile or Atom's Markdown Preview Enhanced). To make all this work, some adjustment is needed in Pandoc options and template files.*

    =======
    # My Pandoc Markdown-PDF Workflow for Routine, Not Especially Technical Writing
    >>>>>>> 44e46ecc5d59573df70fd0441212248169237489:Quah-D-2020.08-Pandoc-Workflow-Markdown-PDF.md

    Pandoc is a filter that takes a written document in a given format, and produces a version of that same document in yet a different format. I use Pandoc primarily to transform Markdown documents to PDF, but I also draw on Pandoc to convert Word or ODT documents to Markdown. And vice versa.

    @@ -176,13 +172,9 @@ $if(author)$
    $endfor$
    $endif$
    ```
    <<<<<<< HEAD:2020.08-D.Quah-Pandoc-Workflow-Markdown-PDF.md
    As you can see, the replacement code contains reference to `affiliation` and `email`, as in the `YAML` header, but which is not generally available in LaTeX. What makes this work is that the replacement code also loads in the package `authblk` (in its second line), that will then properly situate the `affiliation` and `email` key-values when the LaTeX `\maketitle` instruction is invoked.

    If you decide to use the more elaborate, compacted `YAML` header and not change the latex template, then
    =======
    As you can see, the replacement code contains reference to `affiliation` and `email`, as in the `YAML` header, but which is not generally available in LaTeX. What makes this work is that the replacement code also loads in the package `authblk` (in its second line), that will then properly situate the `affiliation` and `email` key-values when the LaTeX `\maketitle` instruction is invoked.
    >>>>>>> 44e46ecc5d59573df70fd0441212248169237489:Quah-D-2020.08-Pandoc-Workflow-Markdown-PDF.md

    ```
    pandoc --standalone --read=markdown --write=pdf --pdfengine=pdflatex myinput.md -o myinput.pdf
    @@ -217,8 +209,6 @@ https://uoftcoders.github.io/studyGroup/lessons/misc/pandoc-intro/lesson/
    https://opensource.com/article/18/9/pandoc-research-paper
    https://en.wikibooks.org/wiki/LaTeX/Title_Creation



    <!---
    Invisible section // Quah-D-2020.08-Pandoc-Workflow-Markdown-PDF.md
    -->
  5. @DannyQuah DannyQuah revised this gist Jan 12, 2022. 1 changed file with 8 additions and 0 deletions.
    8 changes: 8 additions & 0 deletions 2020.08-D.Quah-Pandoc-Workflow-Markdown-PDF.md
    Original file line number Diff line number Diff line change
    @@ -1,9 +1,13 @@
    <<<<<<< HEAD:2020.08-D.Quah-Pandoc-Workflow-Markdown-PDF.md
    # My Pandoc Markdown-PDF Workflow for Routine, Not Especially Technical, Writing

    by Danny Quah, Aug 2020 (revised Jan 2022)

    *TL;DR: I write technical articles in LaTeX. But shorter, non-technical writings are easier to do in Markdown. How do I produce PDF from Markdown documents? Answer: provide YAML information in the Markdown; run Pandoc (typically through a Makefile or Atom's Markdown Preview Enhanced). To make all this work, some adjustment is needed in Pandoc options and template files.*

    =======
    # My Pandoc Markdown-PDF Workflow for Routine, Not Especially Technical Writing
    >>>>>>> 44e46ecc5d59573df70fd0441212248169237489:Quah-D-2020.08-Pandoc-Workflow-Markdown-PDF.md
    Pandoc is a filter that takes a written document in a given format, and produces a version of that same document in yet a different format. I use Pandoc primarily to transform Markdown documents to PDF, but I also draw on Pandoc to convert Word or ODT documents to Markdown. And vice versa.

    @@ -172,9 +176,13 @@ $if(author)$
    $endfor$
    $endif$
    ```
    <<<<<<< HEAD:2020.08-D.Quah-Pandoc-Workflow-Markdown-PDF.md
    As you can see, the replacement code contains reference to `affiliation` and `email`, as in the `YAML` header, but which is not generally available in LaTeX. What makes this work is that the replacement code also loads in the package `authblk` (in its second line), that will then properly situate the `affiliation` and `email` key-values when the LaTeX `\maketitle` instruction is invoked.

    If you decide to use the more elaborate, compacted `YAML` header and not change the latex template, then
    =======
    As you can see, the replacement code contains reference to `affiliation` and `email`, as in the `YAML` header, but which is not generally available in LaTeX. What makes this work is that the replacement code also loads in the package `authblk` (in its second line), that will then properly situate the `affiliation` and `email` key-values when the LaTeX `\maketitle` instruction is invoked.
    >>>>>>> 44e46ecc5d59573df70fd0441212248169237489:Quah-D-2020.08-Pandoc-Workflow-Markdown-PDF.md
    ```
    pandoc --standalone --read=markdown --write=pdf --pdfengine=pdflatex myinput.md -o myinput.pdf
  6. @DannyQuah DannyQuah renamed this gist Jan 12, 2022. 1 changed file with 65 additions and 111 deletions.
    Original file line number Diff line number Diff line change
    @@ -1,105 +1,49 @@
    ---
    fileName: Quah-D-2020.08-Pandoc-Workflow-Markdown-PDF.md
    # Last-edited: Sun 2020.10.11.1809 -- Danny Quah ([email protected])
    Type: Notes
    Tags: Software
    # Created: Sun 2020.08.09.1517 -- Danny Quah ([email protected])
    # My Pandoc Markdown-PDF Workflow for Routine, Not Especially Technical, Writing

    output: pdf_document
    by Danny Quah, Aug 2020 (revised Jan 2022)

    ## Front Matter
    title: My Pandoc Markdown-PDF Workflow for Routine, Not Especially Technical Writing*
    author:
    - Danny Quah `\\\\`{=latex} Lee Kuan Yew School of Public Policy, NUS `\\\\`{=latex} [email protected]
    # - My Coauthor `\\\\`{=latex} Economics Department, NUS `\\\\`{=latex} [email protected]
    #author:
    # - name: Danny Quah
    # affiliation: Lee Kuan Yew School of Public Policy, NUS
    # email: [email protected]
    # number: 1
    # - name: My Coauthor
    # affiliation: Economics Department, NUS
    # email: [email protected]
    # number: 2
    date: theMonth theYear
    # abstract:
    # keywords:
    # thanks:
    *TL;DR: I write technical articles in LaTeX. But shorter, non-technical writings are easier to do in Markdown. How do I produce PDF from Markdown documents? Answer: provide YAML information in the Markdown; run Pandoc (typically through a Makefile or Atom's Markdown Preview Enhanced). To make all this work, some adjustment is needed in Pandoc options and template files.*

    ## Formatting
    fontsize: 12pt
    # mainfont: "gentium" # See https://fonts.google.com/ for fonts
    # sansfont: "Raleway"
    # monofont: "IBM Plex Mono"
    # mathfont:
    # fontfamily: concrete | gentium | libertine
    # documentclass: article | scrartcl
    fontfamily: gentium
    documentclass: article
    classoption:
    - notitlepage
    - onecolumn
    fontenc: T1
    geometry:
    - a4paper
    - top=35mm
    - left=30mm
    - heightrounded
    header-includes:
    - |
    ```{=latex}
    \usepackage{amsmath,amsfonts,euscript,tikz,fancyhdr,float}
    \floatplacement{figure}{H}
    ```
    pagestyle: headings
    ---


    # My Pandoc Markdown-PDF Workflow for Routine, Not Especially Technical Writing

    by
    Danny Quah
    Aug 2020
    Pandoc is a filter that takes a written document in a given format, and produces a version of that same document in yet a different format. I use Pandoc primarily to transform Markdown documents to PDF, but I also draw on Pandoc to convert Word or ODT documents to Markdown. And vice versa.

    *TL;DR: I write technical articles in LaTeX. But shorter, non-technical writings are easier to do in Markdown. How do I produce PDF from Markdown documents? Answer: provide YAML information in the Markdown; run Pandoc (typically through a Makefile or Atom's Markdown Preview Enhanced). To make all this work, some adjustment is needed in Pandoc options and template files.*
    Available official Pandoc documentation is voluminous. So as a matter of logic the knowledge to generate PDF from Markdown, to the user's desired degree of control, is already extant, out there somewhere. But a user just beginning might not find a good starting point, and without the ability to produce something useful quickly to show for their efforts, that user can lose the incentive to discover more, experiment, and improve. This writeup provides such a starting point for the beginner.

    ## Pandoc Basics

    Pandoc is a filter that takes a written document in its given format, and produces a version of that same document in yet a different format. I use Pandoc primarily to transform Markdown documents to PDF, but I also draw on Pandoc to convert Word or ODT documents to Markdown. Or vice versa.
    To do its job, Pandoc has many options available from the command line. Using those can be as easy as:

    Available official Pandoc documentation is voluminous. So as a matter of logic the knowledge to generate PDF from Markdown, to the user's desired degree of control, is already extant, out there somewhere. But a user just beginning might not find a good starting point, and without the ability to produce something useful quickly to show for their efforts, that user can lose the incentive to discover more, experiment, and improve. This writeup provides such a starting point for the beginner.
    ```
    pandoc --read=odt --write=markdown oldarchive.odt -o mydocument.md
    ```

    ## Pandoc Basics
    In this example, the input is the Open Document format file `oldarchive.odt`; output the Markdown document `mydocument.md`. Alternatively, I might have wanted the instruction:

    To do its job, Pandoc has many options available from the command line. Using those can be as easy as just:
    ```shell
    $ pandoc --read=odt --write=markdown oldarchive.odt -o mydocument.md
    ```

    In this example, the input is the Open Document format file `oldarchive.odt`; output the Markdown document `mydocument.md`. Alternatively, I might have wanted the instruction:
    ```shell
    $ pandoc --read=markdown --write=docx oldarchive.md -o mydocument.docx
    pandoc --read=markdown --write=docx oldarchive.md -o mydocument.docx
    ```
    where now input is the Markdown file `oldarchive.md` and output the Word document `mydocument.docx`.

    The conversion will never be perfect, but in many cases the result provides a fine starting point for further fine-tuning.
    where now input is the Markdown file `oldarchive.md` and output the Word document `mydocument.docx`.

    The conversion will never be perfect, but in many cases the result provides a fine starting point for further fine-tuning.

    ## Underlying Attributes
    ## Underlying Attributes

    In the two examples just given, what you see when you render or display the input file, whether on-screen or on paper, is also approximately what you will see when you display the output file. In many Pandoc applications, that is all the user wants. The user, or their collaborators, will thereafter seek to make further changes only by editing the output file.
    In the two examples just given, what you see when you render or display the input file, whether on-screen or on paper, is also approximately what you will see when you display the output file. In many Pandoc applications, that is all the user wants. The user, or their collaborators, will thereafter seek to make further changes only by editing the output file.

    I myself prefer to edit Markdown directly but the people I work with insist on using Word. So I convert my Markdown file to docx, and thereafter that latter document is what we hand back and forth in our editing.
    I myself prefer to edit Markdown directly but the people I work with insist on using Word. So I convert my Markdown file to docx, and thereafter that latter document is what we hand back and forth in our editing.

    When the output is PDF, however, the result is no longer to be edited. Or, more accurately, to change the output file, users do not operate on the output file directly. Instead, a user will go back to the input file, make the alterations there, and then re-execute Pandoc to produce a new PDF. The PDF is just for show.
    When the output is PDF, however, the result is no longer to be edited. Or, more accurately, to change the output file, users do not operate on the output file directly. Instead, a user will go back to the input file, make the alterations there, and then re-execute Pandoc to produce a new PDF. The PDF is just for show.

    This workflow having the PDF endpoint will be familiar to TeX users. With Pandoc, however, it is not just TeX or its relatives that can provide input for beautifully structured PDFs. Markdown documents can provide that input as well. Since Markdown is lighter-weight than TeX, it will not of course do everything that the latter can. However, for many routine, not especially technical documents, Markdown provides a fine input engine as an alternative to TeX. The actual process is one where the Markdown document gets translated into LaTeX code, and then that result is fed into LaTeX itself.
    This workflow having the PDF endpoint will be familiar to TeX users. With Pandoc, however, it is not just TeX or its relatives that can provide input for beautifully structured PDFs. Markdown documents can provide that input as well. Since Markdown is lighter-weight than TeX, it will not of course do everything that the latter can. However, for many routine, not especially technical, documents, Markdown provides a fine input engine as an alternative to TeX. The actual process is one where the Markdown document gets translated into LaTeX code, and then that result is fed into LaTeX itself.

    (A purist who insists on writing everything in TeX might remember that LaTeX stands in relation to TeX much as I've described for Markdown, except that LaTeX is higher-up in the hierarchy and thus closer to TeX. But, again, for many routine, not especially technical writing, Markdown is already perfectly serviceable. Pandoc can be viewed as akin to `lex` and `yacc` for programmers. Who wants to code their own lexical analyzer in C from scratch every single time?)
    (A purist who insists on writing everything in TeX might remember that LaTeX stands in relation to TeX much as I've described for Markdown, except that LaTeX is higher-up in the hierarchy and thus closer to TeX. But, again, for many routine, not especially technical, writing, Markdown is already perfectly serviceable. Pandoc can be viewed as akin to `lex` and `yacc` for programmers. Who wants to code their own lexical analyzer in C from scratch every single time?)

    Here, however, is both opportunity and potential pitfall. LaTeX and TeX are not translators that operate character by character or paragraph by paragraph. Instead, how they work is through structure. But how does structural information get conveyed from Markdown to LaTeX? Where in a Markdown document is encoded, in a single place, the information that, perhaps, every second-level heading is followed by a mediumskip? Or that all figures should float but be placed towards the top of the page closest to where they are first referenced?
    Here, however, is both opportunity and potential pitfall. LaTeX and TeX are not translators that operate character by character or paragraph by paragraph. Instead, how they work is through structure. But how does structural information get conveyed from Markdown to LaTeX? Where in a Markdown document is encoded, in a single place, the information that, perhaps, every second-level heading is followed by a mediumskip? Or that all figures should float but be placed towards the top of the page closest to where they are first referenced?

    This last feature might be observed to be empirically true in a specific document, but was that the intention of the author? Or did the figures just happen to come out that way? How can the author convey the information on what they intend here, in a way that Pandoc and thereafter LaTeX can use?
    This last feature might be observed to be empirically true in a specific document, but was that the intention of the author? Or did the figures just happen to come out that way? How can the author convey the information on what they intend here, in a way that Pandoc and thereafter LaTeX can use?

    The answer is two-fold: first, through YAML information; second, through template files.
    The answer is two-fold: first, through YAML information; second, through template files.



    @@ -109,6 +53,7 @@ The answer is two-fold: first, through YAML information; second, through templat
    For generating PDF, Pandoc will, behind the scenes, call on LaTeX or a related program. Pandoc can, to a great extent, do all this invisibly, but obviously those programs need to be installed somewhere on your system.

    Any Markdown document can begin with YAML information, i.e., a section that starts and ends with three dashes in sequence on a line by themselves. In between those, individual lines contain key-value pairs that provide structural information on the document. Thus, a simple YAML header might be:

    ```yaml
    fileName: Pandoc-2020.08.md
    # Last-edited: Sun 2020.08.09.1841 -- Danny Quah ([email protected])
    @@ -117,14 +62,14 @@ Tags: Software
    # Created: Sun 2020.08.09.1517 -- Danny Quah ([email protected])
    ```

    (preceded and ending with the three-dash sequence lines, of course). Like Python, YAML takes whitespace indentation to be significant. Comments are introduced by the `#` symbol, and are ignored by the processor.
    (preceded by and ending with the three-dash lines, of course). Like Python, YAML takes whitespace indentation to be significant. Comments are introduced by the `#` symbol, and are ignored by the processor.

    As with the `#`-introduced comments, however, as far as Markdown rendering is concerned, the YAML key-value pairs too are ignored. Indeed, on most systems, *all* YAML information is ignored by Markdown. If you have a file containing just the above lines in, say, the file `file.md`, opening this file with a Markdown previewer typically shows the file contains nothing to display. `Typora` will open up `file.md` and display the YAML header but in non-editable form. Similarly, Github. (So, to edit YAML, you'll need to open the file with a text editor like `Vim` or `Atom` or simiar.)
    As with the `#`-introduced comments, however, as far as Markdown rendering is concerned, the YAML key-value pairs too are ignored. Indeed, on most systems, *all* YAML information is ignored by Markdown. If you have a file containing just the above lines in, say, the file `file.md`, opening this file with a Markdown previewer typically shows the file contains nothing to display. `Typora` will open up `file.md` and display the YAML header but in non-editable form. Similarly, Github. (So, to edit YAML, you'll need to open the file with a text editor like `Vim` or `Atom` or similar.)

    But if YAML is ignored, what is the point to it? Here is what's critical: YAML is used by Pandoc and by LaTeX when these two generate a PDF document from Markdown. YAML information is read off the Markdown document by Pandoc, gets passed by Pandoc to LaTeX, and with the latter employing YAML's key-value pairs as directives for structuring the document.

    Thus, in many of my Markdown files destined for PDF output, the YAML header contains also:
    ````
    ```
    ## Front Matter
    title: Readable Title for My Article
    author:
    @@ -167,11 +112,12 @@ header-includes:
    \floatplacement{figure}{H}
    ```
    pagestyle: headings
    ````
    ```
    (obviously somewhere between the beginning and ending 3-dash `---` lines).
    (obviously somewhere between the beginning and ending 3-dash `---` lines).
    This is almost all it takes to generate sensible-looking PDF from my Markdown document. The problem that remains is the `author` key-value pair. The `\maketitle` command in LaTeX does not understand `affiliation` and `email` keys, only `author`. Thus, how above I have written the author information, to include affiliation and email explicitly, will fail on the standard Pandoc latex template. Instead, the YAML key-value pair needs to be
    This is almost all it takes to generate a sensible-looking PDF from my Markdown document. The problem that remains is the `author` key-value pair. The `\maketitle` command in LaTeX does not understand `affiliation` and `email` keys, only `author`. Thus, how above I have written the author information, to include affiliation and email explicitly, will fail on the standard Pandoc latex template. Instead, the YAML key-value pair needs to be
    ```
    author:
    - Danny Quah `\\\\`{=latex} Lee Kuan Yew School of Public Policy, NUS `\\\\`{=latex} [email protected]
    @@ -186,19 +132,23 @@ This works for me.
    It can be neater, however, to separate out affiliation and email information explicitly, as in the `YAML` above. To use that then, the latex template that Pandoc uses will need to be modified.
    First, generate the default latex template that will subsequently be changed:
    ```shell
    $ pandoc -D latex > mytemplate.tex
    First, generate the default latex template that will subsequently be changed:
    ```
    I put `mytemplate.tex` in `~/.pandoc/templates/` as that latter is the default personal folder that will be recognised subsequently by the Pandoc option
    pandoc -D latex > mytemplate.tex
    ```
    I put `mytemplate.tex` in `~/.pandoc/templates/` as that latter is the default personal folder that will be recognised subsequently by the Pandoc option
    ```shell
    --template=mytemplate.tex
    ```
    Now open up `mytemplate.tex` in a text editor and change the statement (or recognisable statement block) from:

    Now open up `mytemplate.tex` in a text editor and change the statement (or recognisable statement block) from:

    ```yaml
    \author{$for(author)$$author$$sep$ \and $endfor$}
    ```
    to

    to

    ```yaml
    $if(author)$
    \usepackage{authblk}
    @@ -222,40 +172,44 @@ $if(author)$
    $endfor$
    $endif$
    ```
    As you can see, the replacement code contains reference to `affiliation` and `email`, as in the `YAML` header, but which is not generally available in LaTeX. What makes this work is that the replacement code also loads in the package `authbok` (in its second line), that will then properly situate the `affiliation` and `email` key-values when the LaTeX `\maketitle` instruction is invoked.
    As you can see, the replacement code contains reference to `affiliation` and `email`, as in the `YAML` header, but which is not generally available in LaTeX. What makes this work is that the replacement code also loads in the package `authblk` (in its second line), that will then properly situate the `affiliation` and `email` key-values when the LaTeX `\maketitle` instruction is invoked.

    If you decide to use the more elaborate, compacted `YAML` header and not change the latex template, then

    If you decide to use the more elaborate, compacted `YAML` header and not change the latex template, then
    ```
    $ pandoc --standalone --read=markdown --write=pdf --pdfengine=pdflatex myinput.md -o myinput.pdf
    pandoc --standalone --read=markdown --write=pdf --pdfengine=pdflatex myinput.md -o myinput.pdf
    ```
    produces the desired `myinput.pdf`. If, however, you decide to use the more explicit and structured `affiliation` and `email` YAML header and the modified `mytemplate.tex` then use

    produces the desired `myinput.pdf`. If, however, you decide to use the more explicit and structured `affiliation` and `email` YAML header and the modified `mytemplate.tex` then use instead

    ```
    $ pandoc --standalone --read=markdown --write=pdf --template=mytemplate.tex --pdfengine=pdflatex myinput.md -o myinput.pdf
    pandoc --standalone --read=markdown --write=pdf --template=mytemplate.tex --pdfengine=pdflatex myinput.md -o myinput.pdf
    ```
    instead, i.e., add the explicit new `--template` to your Pandoc call.

    If you want to inspect the LaTeX code that's produced along the way, you can undertake this production in two steps:
    ```shell
    $ pandoc --standalone --read=markdown --write=latex+raw_tex myinput.md -o myinput.tex
    $ pdflatex myinput.tex &>/dev/null
    ```
    adding in `--template=mytemplate.tex` as needed in the `pandoc` call.
    instead, i.e., add the explicit new `--template` to your Pandoc call.

    If you want to inspect the LaTeX code that's produced along the way, you can undertake this production in two steps:

    ```
    pandoc --standalone --read=markdown --write=latex+raw_tex myinput.md -o myinput.tex
    pdflatex myinput.tex &>/dev/null
    ```

    adding in `--template=mytemplate.tex` as needed in the `pandoc` call.

    ## References

    https://maehr.github.io/academic-pandoc-template/

    https://learnbyexample.github.io/tutorial/ebook-generation/customizing-pandoc/
    ## References

    https://pandoc.org/MANUAL.html#extension-pandoc_title_block
    https://maehr.github.io/academic-pandoc-template/
    https://learnbyexample.github.io/tutorial/ebook-generation/customizing-pandoc/

    https://uoftcoders.github.io/studyGroup/lessons/misc/pandoc-intro/lesson/
    https://pandoc.org/MANUAL.html#extension-pandoc_title_block
    https://uoftcoders.github.io/studyGroup/lessons/misc/pandoc-intro/lesson/
    https://opensource.com/article/18/9/pandoc-research-paper
    https://en.wikibooks.org/wiki/LaTeX/Title_Creation

    https://opensource.com/article/18/9/pandoc-research-paper

    https://en.wikibooks.org/wiki/LaTeX/Title_Creation

    <!---
    Invisible section // Quah-D-2020.08-Pandoc-Workflow-Markdown-PDF.md
  7. @DannyQuah DannyQuah revised this gist Sep 12, 2021. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion Quah-D-2020.08-Pandoc-Workflow-Markdown-PDF.md
    Original file line number Diff line number Diff line change
    @@ -165,7 +165,7 @@ $if(author)$
    $endfor$
    $endif$
    ```
    As you can see, the replacement code contains reference to `affiliation` and `email`, as in the `YAML` header, but which is not generally available in LaTeX. What makes this work is that the replacement code also loads in the package `authbok` (in its second line), that will then properly situate the `affiliation` and `email` key-values when the LaTeX `\maketitle` instruction is invoked.
    As you can see, the replacement code contains reference to `affiliation` and `email`, as in the `YAML` header, but which is not generally available in LaTeX. What makes this work is that the replacement code also loads in the package `authblk` (in its second line), that will then properly situate the `affiliation` and `email` key-values when the LaTeX `\maketitle` instruction is invoked.

    If you decide to use the more elaborate, compacted `YAML` header and not change the latex template, then
    ```
  8. @DannyQuah DannyQuah revised this gist Mar 7, 2021. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion Quah-D-2020.08-Pandoc-Workflow-Markdown-PDF.md
    Original file line number Diff line number Diff line change
    @@ -62,7 +62,7 @@ Tags: Software

    (preceded and ending with the three-dash sequence lines, of course). Like Python, YAML takes whitespace indentation to be significant. Comments are introduced by the `#` symbol, and are ignored by the processor.

    As with the `#`-introduced comments, however, as far as Markdown rendering is concerned, the YAML key-value pairs too are ignored. Indeed, on most systems, *all* YAML information is ignored by Markdown. If you have a file containing just the above lines in, say, the file `file.md`, opening this file with a Markdown previewer typically shows the file contains nothing to display. `Typora` will open up `file.md` and display the YAML header but in non-editable form. Similarly, Github. (So, to edit YAML, you'll need to open the file with a text editor like `Vim` or `Atom` or simiar.)
    As with the `#`-introduced comments, however, as far as Markdown rendering is concerned, the YAML key-value pairs too are ignored. Indeed, on most systems, *all* YAML information is ignored by Markdown. If you have a file containing just the above lines in, say, the file `file.md`, opening this file with a Markdown previewer typically shows the file contains nothing to display. `Typora` will open up `file.md` and display the YAML header but in non-editable form. Similarly, Github. (So, to edit YAML, you'll need to open the file with a text editor like `Vim` or `Atom` or similar.)

    But if YAML is ignored, what is the point to it? Here is what's critical: YAML is used by Pandoc and by LaTeX when these two generate a PDF document from Markdown. YAML information is read off the Markdown document by Pandoc, gets passed by Pandoc to LaTeX, and with the latter employing YAML's key-value pairs as directives for structuring the document.

  9. @DannyQuah DannyQuah revised this gist Oct 11, 2020. 1 changed file with 0 additions and 57 deletions.
    57 changes: 0 additions & 57 deletions Quah-D-2020.08-Pandoc-Workflow-Markdown-PDF.md
    Original file line number Diff line number Diff line change
    @@ -1,60 +1,3 @@
    ---
    fileName: Quah-D-2020.08-Pandoc-Workflow-Markdown-PDF.md
    # Last-edited: Sun 2020.10.11.1809 -- Danny Quah ([email protected])
    Type: Notes
    Tags: Software
    # Created: Sun 2020.08.09.1517 -- Danny Quah ([email protected])

    output: pdf_document

    ## Front Matter
    title: My Pandoc Markdown-PDF Workflow for Routine, Not Especially Technical Writing*
    author:
    - Danny Quah `\\\\`{=latex} Lee Kuan Yew School of Public Policy, NUS `\\\\`{=latex} [email protected]
    # - My Coauthor `\\\\`{=latex} Economics Department, NUS `\\\\`{=latex} [email protected]
    #author:
    # - name: Danny Quah
    # affiliation: Lee Kuan Yew School of Public Policy, NUS
    # email: [email protected]
    # number: 1
    # - name: My Coauthor
    # affiliation: Economics Department, NUS
    # email: [email protected]
    # number: 2
    date: theMonth theYear
    # abstract:
    # keywords:
    # thanks:

    ## Formatting
    fontsize: 12pt
    # mainfont: "gentium" # See https://fonts.google.com/ for fonts
    # sansfont: "Raleway"
    # monofont: "IBM Plex Mono"
    # mathfont:
    # fontfamily: concrete | gentium | libertine
    # documentclass: article | scrartcl
    fontfamily: gentium
    documentclass: article
    classoption:
    - notitlepage
    - onecolumn
    fontenc: T1
    geometry:
    - a4paper
    - top=35mm
    - left=30mm
    - heightrounded
    header-includes:
    - |
    ```{=latex}
    \usepackage{amsmath,amsfonts,euscript,tikz,fancyhdr,float}
    \floatplacement{figure}{H}
    ```
    pagestyle: headings
    ---


    # My Pandoc Markdown-PDF Workflow for Routine, Not Especially Technical Writing

    by
  10. @DannyQuah DannyQuah revised this gist Oct 11, 2020. 1 changed file with 21 additions and 19 deletions.
    40 changes: 21 additions & 19 deletions Quah-D-2020.08-Pandoc-Workflow-Markdown-PDF.md
    Original file line number Diff line number Diff line change
    @@ -1,6 +1,6 @@
    ---
    fileName: Quah-D-2020.08-Pandoc-Workflow-Markdown-PDF.md
    # Last-edited: Mon 2020.08.10.0955 -- Danny Quah ([email protected])
    # Last-edited: Sun 2020.10.11.1809 -- Danny Quah ([email protected])
    Type: Notes
    Tags: Software
    # Created: Sun 2020.08.09.1517 -- Danny Quah ([email protected])
    @@ -37,26 +37,23 @@ fontsize: 12pt
    fontfamily: gentium
    documentclass: article
    classoption:
    - notitlepage
    - onecolumn
    - notitlepage
    - onecolumn
    fontenc: T1
    geometry:
    - a4paper
    - top=35mm
    - left=30mm
    - heightrounded
    - a4paper
    - top=35mm
    - left=30mm
    - heightrounded
    header-includes:
    - |
    - |
    ```{=latex}
    \usepackage{amsmath,amsfonts,euscript,tikz,fancyhdr,float}
    \floatplacement{figure}{H}
    ```
    pagestyle: headings
    ---

    <!---
    Invisible section
    -->

    # My Pandoc Markdown-PDF Workflow for Routine, Not Especially Technical Writing

    @@ -155,16 +152,16 @@ mathfont: ccmath
    fontfamily: concrete
    documentclass: article
    classoption:
    - notitlepage
    - onecolumn
    - notitlepage
    - onecolumn
    fontenc: T1
    geometry:
    - a4paper
    - top=35mm
    - left=30mm
    - heightrounded
    - a4paper
    - top=35mm
    - left=30mm
    - heightrounded
    header-includes:
    - |
    - |
    ```{=latex}
    \usepackage{amsmath,amsfonts,euscript,tikz,fancyhdr,float}
    \floatplacement{figure}{H}
    @@ -258,4 +255,9 @@ https://uoftcoders.github.io/studyGroup/lessons/misc/pandoc-intro/lesson/

    https://opensource.com/article/18/9/pandoc-research-paper

    https://en.wikibooks.org/wiki/LaTeX/Title_Creation
    https://en.wikibooks.org/wiki/LaTeX/Title_Creation

    <!---
    Invisible section // Quah-D-2020.08-Pandoc-Workflow-Markdown-PDF.md
    -->

  11. @DannyQuah DannyQuah revised this gist Aug 10, 2020. 1 changed file with 9 additions and 9 deletions.
    18 changes: 9 additions & 9 deletions Quah-D-2020.08-Pandoc-Workflow-Markdown-PDF.md
    Original file line number Diff line number Diff line change
    @@ -1,6 +1,6 @@
    ---
    fileName: Quah-D-2020.08-Pandoc-Workflow-Markdown-PDF.md
    # Last-edited: Mon 2020.08.10.0900 -- Danny Quah ([email protected])
    # Last-edited: Mon 2020.08.10.0955 -- Danny Quah ([email protected])
    Type: Notes
    Tags: Software
    # Created: Sun 2020.08.09.1517 -- Danny Quah ([email protected])
    @@ -67,9 +67,9 @@ Aug 2020
    *TL;DR: I write technical articles in LaTeX. But shorter, non-technical writings are easier to do in Markdown. How do I produce PDF from Markdown documents? Answer: provide YAML information in the Markdown; run Pandoc (typically through a Makefile or Atom's Markdown Preview Enhanced). To make all this work, some adjustment is needed in Pandoc options and template files.*


    Pandoc is a filter that takes a written document in its given format, and produces a version of that same document in yet a different format. I use Pandoc primarily to transform Markdown documents to PDF, but I also draw on Pandoc to convert Word or ODT documents to Markdown.
    Pandoc is a filter that takes a written document in its given format, and produces a version of that same document in yet a different format. I use Pandoc primarily to transform Markdown documents to PDF, but I also draw on Pandoc to convert Word or ODT documents to Markdown. Or vice versa.

    Aavailable official Pandoc documentation is voluminous. So as a matter of logic the knowledge to generate PDF from Markdown, to the user's desired degree of control, is already extant, out there somewhere. But a user just beginning might not find a good starting point, and without the ability to produce something useful quickly to show for their efforts, that user can lose the incentive to discover more, experiment, and improve. This writeup provides such a starting point for the beginning user.
    Available official Pandoc documentation is voluminous. So as a matter of logic the knowledge to generate PDF from Markdown, to the user's desired degree of control, is already extant, out there somewhere. But a user just beginning might not find a good starting point, and without the ability to produce something useful quickly to show for their efforts, that user can lose the incentive to discover more, experiment, and improve. This writeup provides such a starting point for the beginner.

    ## Pandoc Basics

    @@ -90,15 +90,15 @@ The conversion will never be perfect, but in many cases the result provides a fi

    In the two examples just given, what you see when you render or display the input file, whether on-screen or on paper, is also approximately what you will see when you display the output file. In many Pandoc applications, that is all the user wants. The user, or their collaborators, will thereafter seek to make further changes only by editing the output file.

    I myself prefer to edit Markdown directly but the people I work with insist on using Word.. So I convert my Markdown file to docx, and thereafter that latter document is what we hand back and forth in our editing.
    I myself prefer to edit Markdown directly but the people I work with insist on using Word. So I convert my Markdown file to docx, and thereafter that latter document is what we hand back and forth in our editing.

    When the output is PDF, however, the result is no longer to be edited. Or, more accurately, to change the output file, users do not operate on the output file directly. Instead, a user will go back to the input file, make the alterations there, and then re-execute Pandoc to produce a new PDF. The PDF is just for show.

    This workflow having the PDF endpoint will be familiar to TeX users. With Pandoc, however, it is not just TeX or its relatives that can provide input for beautifully structured PDFs. Markdown documents can provide that input as well. Since Markdown is lighter-weight than TeX, it will not of course do everything that the latter can. However, for many routine, not especially technical documents, Markdown provides a fine input engine as an alternative to TeX. The actual process is one where the Markdown document gets translated into LaTeX code, and then that result is fed into LaTeX itself.

    (A purist who insists on writing everything in TeX might remember that LaTeX stands in relation to TeX much as I've described for Markdown, except that LaTeX is higher-up in the hierarchy and thus closer to TeX. But, again, for many routine, not especially technical writing, Markdown is already perfectly serviceable. Pandoc can be viewed as akin to `lex` and `yacc` for programmers. Who wants to code their own lexical analyzer in C from scratch every single time?)

    Here, however, is both opportunity and potential pitfall. LaTeX and TeX are not translators that operate character by character or paragraph by paragraph. Instead, how they work is through structure. How does structural information get conveyed from Markdown to LaTeX? Where in a Markdown document is encoded, in a single place, the information that, perhaps, every second-level heading is followed by a mediumskip? Or that all figures should float but be placed towards the top of the page closest to where they are first referenced?
    Here, however, is both opportunity and potential pitfall. LaTeX and TeX are not translators that operate character by character or paragraph by paragraph. Instead, how they work is through structure. But how does structural information get conveyed from Markdown to LaTeX? Where in a Markdown document is encoded, in a single place, the information that, perhaps, every second-level heading is followed by a mediumskip? Or that all figures should float but be placed towards the top of the page closest to where they are first referenced?

    This last feature might be observed to be empirically true in a specific document, but was that the intention of the author? Or did the figures just happen to come out that way? How can the author convey the information on what they intend here, in a way that Pandoc and thereafter LaTeX can use?

    @@ -111,7 +111,7 @@ The answer is two-fold: first, through YAML information; second, through templat

    For generating PDF, Pandoc will, behind the scenes, call on LaTeX or a related program. Pandoc can, to a great extent, do all this invisibly, but obviously those programs need to be installed somewhere on your system.

    Any Markdown document can begin with YAML information, i.e., a section that starts and ends with three dashes in sequence on a line by themselves. In between individual lines contain key-value pairs that provide structural information on the document. Thus, a simple YAML header might be:
    Any Markdown document can begin with YAML information, i.e., a section that starts and ends with three dashes in sequence on a line by themselves. In between those, individual lines contain key-value pairs that provide structural information on the document. Thus, a simple YAML header might be:
    ```yaml
    fileName: Pandoc-2020.08.md
    # Last-edited: Sun 2020.08.09.1841 -- Danny Quah ([email protected])
    @@ -120,9 +120,9 @@ Tags: Software
    # Created: Sun 2020.08.09.1517 -- Danny Quah ([email protected])
    ```

    (preceded and ending with the three-dash sequence lines, of course). Like Python, YAML uses whitespace indentation for structure. Comments are introduced by the `#` symbol, and are ignored by the processor.
    (preceded and ending with the three-dash sequence lines, of course). Like Python, YAML takes whitespace indentation to be significant. Comments are introduced by the `#` symbol, and are ignored by the processor.

    As with the `#`-introduced comments, however, as far as Markdown rendering is concerned, the YAML key-value pairs too are ignored. Indeed, on most systems, *all* YAML information is ignored by Markdown. If you have a file containing just the above lines in, say, the file `file.md`, opening this file with a Markdown previewer typically shows the file contains nothing to display. `Typora` will open up `file.md` and display the YAML header but in non-editable form. (So, to edit YAML, you'll need to open the file with a text editor like `Vim` or `Atom` or simiar.)
    As with the `#`-introduced comments, however, as far as Markdown rendering is concerned, the YAML key-value pairs too are ignored. Indeed, on most systems, *all* YAML information is ignored by Markdown. If you have a file containing just the above lines in, say, the file `file.md`, opening this file with a Markdown previewer typically shows the file contains nothing to display. `Typora` will open up `file.md` and display the YAML header but in non-editable form. Similarly, Github. (So, to edit YAML, you'll need to open the file with a text editor like `Vim` or `Atom` or simiar.)

    But if YAML is ignored, what is the point to it? Here is what's critical: YAML is used by Pandoc and by LaTeX when these two generate a PDF document from Markdown. YAML information is read off the Markdown document by Pandoc, gets passed by Pandoc to LaTeX, and with the latter employing YAML's key-value pairs as directives for structuring the document.

    @@ -172,7 +172,7 @@ header-includes:
    pagestyle: headings
    ````

    (obviously somewhere between the beginning and ending `---` lines).
    (obviously somewhere between the beginning and ending 3-dash `---` lines).

    This is almost all it takes to generate a sensible-looking PDF from my Markdown document. The problem that remains is the `author` key-value pair. The `\maketitle` command in LaTeX does not understand `affiliation` and `email` keys, only `author`. Thus, how above I have written the author information, to include affiliation and email explicitly, will fail on the standard Pandoc latex template. Instead, the YAML key-value pair needs to be
    ```
  12. @DannyQuah DannyQuah revised this gist Aug 10, 2020. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion Quah-D-2020.08-Pandoc-Workflow-Markdown-PDF.md
    Original file line number Diff line number Diff line change
    @@ -1,6 +1,6 @@
    ---
    fileName: Quah-D-2020.08-Pandoc-Workflow-Markdown-PDF.md
    # Last-edited: Mon 2020.08.10.0841 -- Danny Quah ([email protected])
    # Last-edited: Mon 2020.08.10.0900 -- Danny Quah ([email protected])
    Type: Notes
    Tags: Software
    # Created: Sun 2020.08.09.1517 -- Danny Quah ([email protected])
  13. @DannyQuah DannyQuah created this gist Aug 10, 2020.
    261 changes: 261 additions & 0 deletions Quah-D-2020.08-Pandoc-Workflow-Markdown-PDF.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,261 @@
    ---
    fileName: Quah-D-2020.08-Pandoc-Workflow-Markdown-PDF.md
    # Last-edited: Mon 2020.08.10.0841 -- Danny Quah ([email protected])
    Type: Notes
    Tags: Software
    # Created: Sun 2020.08.09.1517 -- Danny Quah ([email protected])

    output: pdf_document

    ## Front Matter
    title: My Pandoc Markdown-PDF Workflow for Routine, Not Especially Technical Writing*
    author:
    - Danny Quah `\\\\`{=latex} Lee Kuan Yew School of Public Policy, NUS `\\\\`{=latex} [email protected]
    # - My Coauthor `\\\\`{=latex} Economics Department, NUS `\\\\`{=latex} [email protected]
    #author:
    # - name: Danny Quah
    # affiliation: Lee Kuan Yew School of Public Policy, NUS
    # email: [email protected]
    # number: 1
    # - name: My Coauthor
    # affiliation: Economics Department, NUS
    # email: [email protected]
    # number: 2
    date: theMonth theYear
    # abstract:
    # keywords:
    # thanks:

    ## Formatting
    fontsize: 12pt
    # mainfont: "gentium" # See https://fonts.google.com/ for fonts
    # sansfont: "Raleway"
    # monofont: "IBM Plex Mono"
    # mathfont:
    # fontfamily: concrete | gentium | libertine
    # documentclass: article | scrartcl
    fontfamily: gentium
    documentclass: article
    classoption:
    - notitlepage
    - onecolumn
    fontenc: T1
    geometry:
    - a4paper
    - top=35mm
    - left=30mm
    - heightrounded
    header-includes:
    - |
    ```{=latex}
    \usepackage{amsmath,amsfonts,euscript,tikz,fancyhdr,float}
    \floatplacement{figure}{H}
    ```
    pagestyle: headings
    ---

    <!---
    Invisible section
    -->

    # My Pandoc Markdown-PDF Workflow for Routine, Not Especially Technical Writing

    by
    Danny Quah
    Aug 2020

    *TL;DR: I write technical articles in LaTeX. But shorter, non-technical writings are easier to do in Markdown. How do I produce PDF from Markdown documents? Answer: provide YAML information in the Markdown; run Pandoc (typically through a Makefile or Atom's Markdown Preview Enhanced). To make all this work, some adjustment is needed in Pandoc options and template files.*


    Pandoc is a filter that takes a written document in its given format, and produces a version of that same document in yet a different format. I use Pandoc primarily to transform Markdown documents to PDF, but I also draw on Pandoc to convert Word or ODT documents to Markdown.

    Aavailable official Pandoc documentation is voluminous. So as a matter of logic the knowledge to generate PDF from Markdown, to the user's desired degree of control, is already extant, out there somewhere. But a user just beginning might not find a good starting point, and without the ability to produce something useful quickly to show for their efforts, that user can lose the incentive to discover more, experiment, and improve. This writeup provides such a starting point for the beginning user.

    ## Pandoc Basics

    To do its job, Pandoc has many options available from the command line. Using those can be as easy as just:
    ```shell
    $ pandoc --read=odt --write=markdown oldarchive.odt -o mydocument.md
    ```

    In this example, the input is the Open Document format file `oldarchive.odt`; output the Markdown document `mydocument.md`. Alternatively, I might have wanted the instruction:
    ```shell
    $ pandoc --read=markdown --write=docx oldarchive.md -o mydocument.docx
    ```
    where now input is the Markdown file `oldarchive.md` and output the Word document `mydocument.docx`.

    The conversion will never be perfect, but in many cases the result provides a fine starting point for further fine-tuning.

    ## Underlying Attributes

    In the two examples just given, what you see when you render or display the input file, whether on-screen or on paper, is also approximately what you will see when you display the output file. In many Pandoc applications, that is all the user wants. The user, or their collaborators, will thereafter seek to make further changes only by editing the output file.

    I myself prefer to edit Markdown directly but the people I work with insist on using Word.. So I convert my Markdown file to docx, and thereafter that latter document is what we hand back and forth in our editing.

    When the output is PDF, however, the result is no longer to be edited. Or, more accurately, to change the output file, users do not operate on the output file directly. Instead, a user will go back to the input file, make the alterations there, and then re-execute Pandoc to produce a new PDF. The PDF is just for show.

    This workflow having the PDF endpoint will be familiar to TeX users. With Pandoc, however, it is not just TeX or its relatives that can provide input for beautifully structured PDFs. Markdown documents can provide that input as well. Since Markdown is lighter-weight than TeX, it will not of course do everything that the latter can. However, for many routine, not especially technical documents, Markdown provides a fine input engine as an alternative to TeX. The actual process is one where the Markdown document gets translated into LaTeX code, and then that result is fed into LaTeX itself.

    (A purist who insists on writing everything in TeX might remember that LaTeX stands in relation to TeX much as I've described for Markdown, except that LaTeX is higher-up in the hierarchy and thus closer to TeX. But, again, for many routine, not especially technical writing, Markdown is already perfectly serviceable. Pandoc can be viewed as akin to `lex` and `yacc` for programmers. Who wants to code their own lexical analyzer in C from scratch every single time?)

    Here, however, is both opportunity and potential pitfall. LaTeX and TeX are not translators that operate character by character or paragraph by paragraph. Instead, how they work is through structure. How does structural information get conveyed from Markdown to LaTeX? Where in a Markdown document is encoded, in a single place, the information that, perhaps, every second-level heading is followed by a mediumskip? Or that all figures should float but be placed towards the top of the page closest to where they are first referenced?

    This last feature might be observed to be empirically true in a specific document, but was that the intention of the author? Or did the figures just happen to come out that way? How can the author convey the information on what they intend here, in a way that Pandoc and thereafter LaTeX can use?

    The answer is two-fold: first, through YAML information; second, through template files.



    ## Markdown to PDF via LaTeX


    For generating PDF, Pandoc will, behind the scenes, call on LaTeX or a related program. Pandoc can, to a great extent, do all this invisibly, but obviously those programs need to be installed somewhere on your system.

    Any Markdown document can begin with YAML information, i.e., a section that starts and ends with three dashes in sequence on a line by themselves. In between individual lines contain key-value pairs that provide structural information on the document. Thus, a simple YAML header might be:
    ```yaml
    fileName: Pandoc-2020.08.md
    # Last-edited: Sun 2020.08.09.1841 -- Danny Quah ([email protected])
    Type: Notes
    Tags: Software
    # Created: Sun 2020.08.09.1517 -- Danny Quah ([email protected])
    ```

    (preceded and ending with the three-dash sequence lines, of course). Like Python, YAML uses whitespace indentation for structure. Comments are introduced by the `#` symbol, and are ignored by the processor.

    As with the `#`-introduced comments, however, as far as Markdown rendering is concerned, the YAML key-value pairs too are ignored. Indeed, on most systems, *all* YAML information is ignored by Markdown. If you have a file containing just the above lines in, say, the file `file.md`, opening this file with a Markdown previewer typically shows the file contains nothing to display. `Typora` will open up `file.md` and display the YAML header but in non-editable form. (So, to edit YAML, you'll need to open the file with a text editor like `Vim` or `Atom` or simiar.)

    But if YAML is ignored, what is the point to it? Here is what's critical: YAML is used by Pandoc and by LaTeX when these two generate a PDF document from Markdown. YAML information is read off the Markdown document by Pandoc, gets passed by Pandoc to LaTeX, and with the latter employing YAML's key-value pairs as directives for structuring the document.

    Thus, in many of my Markdown files destined for PDF output, the YAML header contains also:
    ````
    ## Front Matter
    title: Readable Title for My Article
    author:
    - name: Danny Quah
    affiliation: Lee Kuan Yew School of Public Policy, NUS
    email: [email protected]
    number: 1
    - name: My Coauthor
    affiliation: Economics Department, NUS
    email: [email protected]
    number: 2
    date: June 2020
    # abstract:
    # keywords:
    # thanks:
    ## Formatting
    fontsize: 12pt
    # mainfont: "gentium" # See https://fonts.google.com/ for fonts
    # sansfont: "Raleway"
    # monofont: "IBM Plex Mono"
    mathfont: ccmath
    # fontfamily: concrete | gentium | libertine
    # documentclass: article | scrartcl
    fontfamily: concrete
    documentclass: article
    classoption:
    - notitlepage
    - onecolumn
    fontenc: T1
    geometry:
    - a4paper
    - top=35mm
    - left=30mm
    - heightrounded
    header-includes:
    - |
    ```{=latex}
    \usepackage{amsmath,amsfonts,euscript,tikz,fancyhdr,float}
    \floatplacement{figure}{H}
    ```
    pagestyle: headings
    ````

    (obviously somewhere between the beginning and ending `---` lines).

    This is almost all it takes to generate a sensible-looking PDF from my Markdown document. The problem that remains is the `author` key-value pair. The `\maketitle` command in LaTeX does not understand `affiliation` and `email` keys, only `author`. Thus, how above I have written the author information, to include affiliation and email explicitly, will fail on the standard Pandoc latex template. Instead, the YAML key-value pair needs to be
    ```
    author:
    - Danny Quah `\\\\`{=latex} Lee Kuan Yew School of Public Policy, NUS `\\\\`{=latex} [email protected]
    ```
    so that to add a second author, write:
    ```
    author:
    - Danny Quah `\\\\`{=latex} Lee Kuan Yew School of Public Policy, NUS `\\\\`{=latex} [email protected]
    - My Coauthor `\\\\`{=latex} Economics Department, NUS `\\\\`{=latex} [email protected]
    ```
    This works for me.

    It can be neater, however, to separate out affiliation and email information explicitly, as in the `YAML` above. To use that then, the latex template that Pandoc uses will need to be modified.

    First, generate the default latex template that will subsequently be changed:
    ```shell
    $ pandoc -D latex > mytemplate.tex
    ```
    I put `mytemplate.tex` in `~/.pandoc/templates/` as that latter is the default personal folder that will be recognised subsequently by the Pandoc option
    ```shell
    --template=mytemplate.tex
    ```
    Now open up `mytemplate.tex` in a text editor and change the statement (or recognisable statement block) from:
    ```yaml
    \author{$for(author)$$author$$sep$ \and $endfor$}
    ```
    to
    ```yaml
    $if(author)$
    \usepackage{authblk}
    $for(author)$
    $if(author.name)$
    $if(author.number)$
    \author[$author.number$]{$author.name$}
    $else$
    \author[]{$author.name$}
    $endif$
    $if(author.affiliation)$
    $if(author.email)$
    \affil{$author.affiliation$ \thanks{$author.email$}}
    $else$
    \affil{$author.affiliation$}
    $endif$
    $endif$
    $else$
    \author{$author$}
    $endif$
    $endfor$
    $endif$
    ```
    As you can see, the replacement code contains reference to `affiliation` and `email`, as in the `YAML` header, but which is not generally available in LaTeX. What makes this work is that the replacement code also loads in the package `authbok` (in its second line), that will then properly situate the `affiliation` and `email` key-values when the LaTeX `\maketitle` instruction is invoked.

    If you decide to use the more elaborate, compacted `YAML` header and not change the latex template, then
    ```
    $ pandoc --standalone --read=markdown --write=pdf --pdfengine=pdflatex myinput.md -o myinput.pdf
    ```
    produces the desired `myinput.pdf`. If, however, you decide to use the more explicit and structured `affiliation` and `email` YAML header and the modified `mytemplate.tex` then use
    ```
    $ pandoc --standalone --read=markdown --write=pdf --template=mytemplate.tex --pdfengine=pdflatex myinput.md -o myinput.pdf
    ```
    instead, i.e., add the explicit new `--template` to your Pandoc call.

    If you want to inspect the LaTeX code that's produced along the way, you can undertake this production in two steps:
    ```shell
    $ pandoc --standalone --read=markdown --write=latex+raw_tex myinput.md -o myinput.tex
    $ pdflatex myinput.tex &>/dev/null
    ```
    adding in `--template=mytemplate.tex` as needed in the `pandoc` call.



    ## References

    https://maehr.github.io/academic-pandoc-template/

    https://learnbyexample.github.io/tutorial/ebook-generation/customizing-pandoc/

    https://pandoc.org/MANUAL.html#extension-pandoc_title_block

    https://uoftcoders.github.io/studyGroup/lessons/misc/pandoc-intro/lesson/

    https://opensource.com/article/18/9/pandoc-research-paper

    https://en.wikibooks.org/wiki/LaTeX/Title_Creation