Forked from DannyQuah/2020.08-D.Quah-Pandoc-Workflow-Markdown-PDF.md
Created
October 17, 2024 06:58
-
-
Save eff-kay/afadab8b679d3f154ecfa7766d869f14 to your computer and use it in GitHub Desktop.
Revisions
-
DannyQuah revised this gist
Jan 29, 2023 . 1 changed file with 17 additions and 14 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -5,7 +5,7 @@ by Danny Quah, Aug 2020 (revised Jan 2022) *TL;DR: I write technical articles in LaTeX. But shorter, non-technical writings are easier to do in Markdown. How do I produce PDF from Markdown documents? Answer: provide YAML information in the Markdown; run Pandoc (typically through a Makefile or Atom's Markdown Preview Enhanced). To make all this work, some adjustment is needed in Pandoc options and template files.* Pandoc is a filter that takes a written document in a particular format, and produces a version of that same document in yet a different format. I use Pandoc primarily to transform Markdown documents to PDF, but I also draw on Pandoc to convert Word or ODT documents to Markdown. And vice versa. Available official Pandoc documentation is voluminous. So as a matter of logic the knowledge to generate PDF from Markdown, to the user's desired degree of control, is already extant, out there somewhere. But a user just beginning might not find a good starting point, and without the ability to produce something useful quickly to show for their efforts, that user can lose the incentive to discover more, experiment, and improve. This writeup provides such a starting point for the beginner. @@ -17,7 +17,7 @@ To do its job, Pandoc has many options available from the command line. Using th pandoc --read=odt --write=markdown oldarchive.odt -o mydocument.md ``` In this example, the input is the Open Document format file `oldarchive.odt`, and output the Markdown document `mydocument.md`. Alternatively, I might have given instruction: ``` pandoc --read=markdown --write=docx oldarchive.md -o mydocument.docx @@ -27,19 +27,22 @@ where now input is the Markdown file `oldarchive.md` and output the Word documen The conversion will never be perfect, but in many cases the result provides a fine starting point for further fine-tuning. ## Underlying Attributes With conversion into odt or docx files, what you see when you render or display the file on-screen is also approximately what you will see when you print the file into PDF or hardcopy. In many Pandoc applications, that is all the user wants. The user, or their collaborators, will thereafter seek to make further changes only by editing the generated odt or docx file. I myself prefer to work with or edit Markdown but people I work with insist on using Word. So I convert my Markdown file to docx, and thereafter that latter document is what we hand back and forth in our editing. When the output is PDF, however, the result is no longer to be edited. Or, more accurately, to change the output file, users do not operate on the output file directly. Instead, a user will go back to the input file, make the alterations there, and then re-execute Pandoc to produce a new PDF. The PDF is literally just for show. (To be clear, current technology does allow editing of a PDF file, by either special hooks or by its conversion to docx, and then editing that. This, however, is different from what I mean by going back to editing the input file and then re-generating the PDF.) This workflow with a PDF endpoint will be familiar to TeX users. With Pandoc, however, it is not just TeX or LaTeX input or relatives that provide input for beautifully structured PDFs. Markdown documents can provide that input as well. Since Markdown is lighter-weight than TeX, it will not of course do everything that the latter can. However, for many routine, not especially technical, documents, Markdown provides a fine input engine as an alternative to TeX. The actual process behind the scenes, however, turns out to be one where the Markdown document gets translated into TeX (or LaTeX) code first, and then that is fed into TeX (or LaTeX) itself. The intermediate step, however, is invisible to the user. (A purist who insists on writing everything in TeX might remember that LaTeX stands in relation to TeX much as I've described for Markdown to either LaTeX or TeX. My view is that for many routine, not especially technical, writing, Markdown is already perfectly serviceable. Pandoc for writers can be viewed as akin to `lex` and `yacc` for programmers. Who wants to code their own lexical analyzer in C from scratch every single time?) Here, however, is both opportunity and potential pitfall. LaTeX and TeX are not translators that operate character by character or paragraph by paragraph. Instead, how they work is through structure: this is what allows changing a single option in LaTeX to alter the entire look of the document. But how does structural information get conveyed from Markdown to LaTeX? Where in a Markdown document is encoded, in a single place, the information that, say, every second-level heading is followed by a mediumskip? Or that all figures should float but be placed towards the top of the page closest to where they are first referenced? This last feature might be observed to be empirically true in a specific document, but was that the intention of the author? Or did the figures just happen to come out that way? How can the author convey the information on what they intend here, in a way that Pandoc and thereafter LaTeX can use? @@ -50,9 +53,9 @@ The answer is two-fold: first, through YAML information; second, through templat ## Markdown to PDF via LaTeX For generating PDF, Pandoc will, behind the scenes, call on LaTeX or a related program. Pandoc can do all this invisibly, but obviously those called programs need to be installed somewhere on your system. Using this flow, by providing the right directives (not in Markdown itself but in another language that Markdown is able to work with) the Markdown document can provide information on the structure of the PDF output desired. Any Markdown document can begin with YAML information, i.e., a section that starts and ends with three dashes in sequence on a line by themselves. Between those beginning and ending three dashes, individual lines contain key-value pairs that provide structural information on the document. Thus, for instance, a simple YAML header might be: ```yaml fileName: Pandoc-2020.08.md @@ -62,13 +65,13 @@ Tags: Software # Created: Sun 2020.08.09.1517 -- Danny Quah ([email protected]) ``` (preceded by and ending with the three-dash lines, of course). Like Python, YAML takes whitespace indentation to be significant, so don't try to prettify your file by introducing extraneous white spaces at the beginning of a YAML line. Comments are introduced by the `#` symbol, and are ignored by the processor. Markdown rendering ignores both `#`-introduced comments and YAML key-value pairs. Indeed, on most systems, *all* YAML information is ignored by Markdown. If you have a file containing just the above lines in, say, the file `file.md`, opening this file with a Markdown previewer typically shows the file contains no content to display. `Typora` will open up `file.md` and display the YAML header but in non-editable form. Github, similarly. (So, to edit YAML, you'll need to open the file with a text editor like `Vim` or `Atom` or similar.) But if Markdown ignores YAML, what is the point? Here is what's critical: YAML is used by Pandoc and by LaTeX when these two come together to generate PDF from Markdown. Pandoc reads YAML, translates the key-value pairs data into LaTeX directives and then ships everything off to LaTeX, now with input all ready to structure the output. Thus, in many of my Markdown files destined for PDF output, the YAML header contains: ``` ## Front Matter title: Readable Title for My Article -
DannyQuah revised this gist
Jan 28, 2022 . 1 changed file with 0 additions and 215 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,215 +0,0 @@ -
DannyQuah revised this gist
Jan 28, 2022 . 2 changed files with 219 additions and 4 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,215 @@ # My Pandoc Markdown-PDF Workflow for Routine, Not Especially Technical, Writing by Danny Quah, Aug 2020 (revised Jan 2022) *TL;DR: I write technical articles in LaTeX. But shorter, non-technical writings are easier to do in Markdown. How do I produce PDF from Markdown documents? Answer: provide YAML information in the Markdown; run Pandoc (typically through a Makefile or Atom's Markdown Preview Enhanced). To make all this work, some adjustment is needed in Pandoc options and template files.* Pandoc is a filter that takes a written document in a given format, and produces a version of that same document in yet a different format. I use Pandoc primarily to transform Markdown documents to PDF, but I also draw on Pandoc to convert Word or ODT documents to Markdown. And vice versa. Available official Pandoc documentation is voluminous. So as a matter of logic the knowledge to generate PDF from Markdown, to the user's desired degree of control, is already extant, out there somewhere. But a user just beginning might not find a good starting point, and without the ability to produce something useful quickly to show for their efforts, that user can lose the incentive to discover more, experiment, and improve. This writeup provides such a starting point for the beginner. ## Pandoc Basics To do its job, Pandoc has many options available from the command line. Using those can be as easy as: ``` pandoc --read=odt --write=markdown oldarchive.odt -o mydocument.md ``` In this example, the input is the Open Document format file `oldarchive.odt`; output the Markdown document `mydocument.md`. Alternatively, I might have wanted the instruction: ``` pandoc --read=markdown --write=docx oldarchive.md -o mydocument.docx ``` where now input is the Markdown file `oldarchive.md` and output the Word document `mydocument.docx`. The conversion will never be perfect, but in many cases the result provides a fine starting point for further fine-tuning. ## Underlying Attributes In the two examples just given, what you see when you render or display the input file, whether on-screen or on paper, is also approximately what you will see when you display the output file. In many Pandoc applications, that is all the user wants. The user, or their collaborators, will thereafter seek to make further changes only by editing the output file. I myself prefer to edit Markdown directly but the people I work with insist on using Word. So I convert my Markdown file to docx, and thereafter that latter document is what we hand back and forth in our editing. When the output is PDF, however, the result is no longer to be edited. Or, more accurately, to change the output file, users do not operate on the output file directly. Instead, a user will go back to the input file, make the alterations there, and then re-execute Pandoc to produce a new PDF. The PDF is just for show. This workflow having the PDF endpoint will be familiar to TeX users. With Pandoc, however, it is not just TeX or its relatives that can provide input for beautifully structured PDFs. Markdown documents can provide that input as well. Since Markdown is lighter-weight than TeX, it will not of course do everything that the latter can. However, for many routine, not especially technical, documents, Markdown provides a fine input engine as an alternative to TeX. The actual process is one where the Markdown document gets translated into LaTeX code, and then that result is fed into LaTeX itself. (A purist who insists on writing everything in TeX might remember that LaTeX stands in relation to TeX much as I've described for Markdown, except that LaTeX is higher-up in the hierarchy and thus closer to TeX. But, again, for many routine, not especially technical, writing, Markdown is already perfectly serviceable. Pandoc can be viewed as akin to `lex` and `yacc` for programmers. Who wants to code their own lexical analyzer in C from scratch every single time?) Here, however, is both opportunity and potential pitfall. LaTeX and TeX are not translators that operate character by character or paragraph by paragraph. Instead, how they work is through structure. But how does structural information get conveyed from Markdown to LaTeX? Where in a Markdown document is encoded, in a single place, the information that, perhaps, every second-level heading is followed by a mediumskip? Or that all figures should float but be placed towards the top of the page closest to where they are first referenced? This last feature might be observed to be empirically true in a specific document, but was that the intention of the author? Or did the figures just happen to come out that way? How can the author convey the information on what they intend here, in a way that Pandoc and thereafter LaTeX can use? The answer is two-fold: first, through YAML information; second, through template files. ## Markdown to PDF via LaTeX For generating PDF, Pandoc will, behind the scenes, call on LaTeX or a related program. Pandoc can, to a great extent, do all this invisibly, but obviously those programs need to be installed somewhere on your system. Any Markdown document can begin with YAML information, i.e., a section that starts and ends with three dashes in sequence on a line by themselves. In between those, individual lines contain key-value pairs that provide structural information on the document. Thus, a simple YAML header might be: ```yaml fileName: Pandoc-2020.08.md # Last-edited: Sun 2020.08.09.1841 -- Danny Quah ([email protected]) Type: Notes Tags: Software # Created: Sun 2020.08.09.1517 -- Danny Quah ([email protected]) ``` (preceded by and ending with the three-dash lines, of course). Like Python, YAML takes whitespace indentation to be significant. Comments are introduced by the `#` symbol, and are ignored by the processor. As with the `#`-introduced comments, however, as far as Markdown rendering is concerned, the YAML key-value pairs too are ignored. Indeed, on most systems, *all* YAML information is ignored by Markdown. If you have a file containing just the above lines in, say, the file `file.md`, opening this file with a Markdown previewer typically shows the file contains nothing to display. `Typora` will open up `file.md` and display the YAML header but in non-editable form. Similarly, Github. (So, to edit YAML, you'll need to open the file with a text editor like `Vim` or `Atom` or similar.) But if YAML is ignored, what is the point to it? Here is what's critical: YAML is used by Pandoc and by LaTeX when these two generate a PDF document from Markdown. YAML information is read off the Markdown document by Pandoc, gets passed by Pandoc to LaTeX, and with the latter employing YAML's key-value pairs as directives for structuring the document. Thus, in many of my Markdown files destined for PDF output, the YAML header contains also: ``` ## Front Matter title: Readable Title for My Article author: - name: Danny Quah affiliation: Lee Kuan Yew School of Public Policy, NUS email: [email protected] number: 1 - name: My Coauthor affiliation: Economics Department, NUS email: [email protected] number: 2 date: June 2020 # abstract: # keywords: # thanks: ## Formatting fontsize: 12pt # mainfont: "gentium" # See https://fonts.google.com/ for fonts # sansfont: "Raleway" # monofont: "IBM Plex Mono" mathfont: ccmath # fontfamily: concrete | gentium | libertine # documentclass: article | scrartcl fontfamily: concrete documentclass: article classoption: - notitlepage - onecolumn fontenc: T1 geometry: - a4paper - top=35mm - left=30mm - heightrounded header-includes: - | ```{=latex} \usepackage{amsmath,amsfonts,euscript,tikz,fancyhdr,float} \floatplacement{figure}{H} ``` pagestyle: headings ``` (obviously somewhere between the beginning and ending 3-dash `---` lines). This is almost all it takes to generate sensible-looking PDF from my Markdown document. The problem that remains is the `author` key-value pair. The `\maketitle` command in LaTeX does not understand `affiliation` and `email` keys, only `author`. Thus, how above I have written the author information, to include affiliation and email explicitly, will fail on the standard Pandoc latex template. Instead, the YAML key-value pair needs to be ``` author: - Danny Quah `\\\\`{=latex} Lee Kuan Yew School of Public Policy, NUS `\\\\`{=latex} [email protected] ``` so that to add a second author, write: ``` author: - Danny Quah `\\\\`{=latex} Lee Kuan Yew School of Public Policy, NUS `\\\\`{=latex} [email protected] - My Coauthor `\\\\`{=latex} Economics Department, NUS `\\\\`{=latex} [email protected] ``` This works for me. It can be neater, however, to separate out affiliation and email information explicitly, as in the `YAML` above. To use that then, the latex template that Pandoc uses will need to be modified. First, generate the default latex template that will subsequently be changed: ``` pandoc -D latex > mytemplate.tex ``` I put `mytemplate.tex` in `~/.pandoc/templates/` as that latter is the default personal folder that will be recognised subsequently by the Pandoc option ```shell --template=mytemplate.tex ``` Now open up `mytemplate.tex` in a text editor and change the statement (or recognisable statement block) from: ```yaml \author{$for(author)$$author$$sep$ \and $endfor$} ``` to ```yaml $if(author)$ \usepackage{authblk} $for(author)$ $if(author.name)$ $if(author.number)$ \author[$author.number$]{$author.name$} $else$ \author[]{$author.name$} $endif$ $if(author.affiliation)$ $if(author.email)$ \affil{$author.affiliation$ \thanks{$author.email$}} $else$ \affil{$author.affiliation$} $endif$ $endif$ $else$ \author{$author$} $endif$ $endfor$ $endif$ ``` As you can see, the replacement code contains reference to `affiliation` and `email`, as in the `YAML` header, but which is not generally available in LaTeX. What makes this work is that the replacement code also loads in the package `authblk` (in its second line), that will then properly situate the `affiliation` and `email` key-values when the LaTeX `\maketitle` instruction is invoked. If you decide to use the more elaborate, compacted `YAML` header and not change the latex template, then ``` pandoc --standalone --read=markdown --write=pdf --pdfengine=pdflatex myinput.md -o myinput.pdf ``` produces the desired `myinput.pdf`. If, however, you decide to use the more explicit and structured `affiliation` and `email` YAML header and the modified `mytemplate.tex` then use instead ``` pandoc --standalone --read=markdown --write=pdf --template=mytemplate.tex --pdfengine=pdflatex myinput.md -o myinput.pdf ``` instead, i.e., add the explicit new `--template` to your Pandoc call. If you want to inspect the LaTeX code that's produced along the way, you can undertake this production in two steps: ``` pandoc --standalone --read=markdown --write=latex+raw_tex myinput.md -o myinput.tex pdflatex myinput.tex &>/dev/null ``` adding in `--template=mytemplate.tex` as needed in the `pandoc` call. ## References https://maehr.github.io/academic-pandoc-template/ https://learnbyexample.github.io/tutorial/ebook-generation/customizing-pandoc/ https://pandoc.org/MANUAL.html#extension-pandoc_title_block https://uoftcoders.github.io/studyGroup/lessons/misc/pandoc-intro/lesson/ https://opensource.com/article/18/9/pandoc-research-paper https://en.wikibooks.org/wiki/LaTeX/Title_Creation <!--- Invisible section // Quah-D-2020.08-Pandoc-Workflow-Markdown-PDF.md --> This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -31,7 +31,7 @@ The conversion will never be perfect, but in many cases the result provides a fi In the two examples just given, what you see when you render or display the input file, whether on-screen or on paper, is also approximately what you will see when you display the output file. In many Pandoc applications, that is all the user wants. The user, or their collaborators, will thereafter seek to make further changes only by editing the output file. I myself prefer to edit Markdown directly but people I work with insist on using Word. So I convert my Markdown file to docx, and thereafter that latter document is what we hand back and forth in our editing. When the output is PDF, however, the result is no longer to be edited. Or, more accurately, to change the output file, users do not operate on the output file directly. Instead, a user will go back to the input file, make the alterations there, and then re-execute Pandoc to produce a new PDF. The PDF is just for show. @@ -110,11 +110,11 @@ header-includes: ```{=latex} \usepackage{amsmath,amsfonts,euscript,tikz,fancyhdr,float} \floatplacement{figure}{H} \``` pagestyle: headings ``` (somewhere between the beginning and ending 3-dash `---` lines. Also, the line right after `floatplacement{figure}...` should contain an indented three backquotes, but I'm having trouble getting GitHub's markdown processor to process it that way rather than as the premature end of my codeblock. Here, I've written that sequence in with a backslash qualifier instead, but that backslash obviously needs to be removed in production code.) This is almost all it takes to generate sensible-looking PDF from my Markdown document. The problem that remains is the `author` key-value pair. The `\maketitle` command in LaTeX does not understand `affiliation` and `email` keys, only `author`. Thus, how above I have written the author information, to include affiliation and email explicitly, will fail on the standard Pandoc latex template. Instead, the YAML key-value pair needs to be @@ -130,7 +130,7 @@ author: ``` This works for me. It can be neater, however, to separate out affiliation and email information explicitly, as in the `YAML` above. To use that, the latex template that Pandoc uses will need to be modified. First, generate the default latex template that will subsequently be changed: ``` -
DannyQuah revised this gist
Jan 12, 2022 . 1 changed file with 0 additions and 10 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,13 +1,9 @@ # My Pandoc Markdown-PDF Workflow for Routine, Not Especially Technical, Writing by Danny Quah, Aug 2020 (revised Jan 2022) *TL;DR: I write technical articles in LaTeX. But shorter, non-technical writings are easier to do in Markdown. How do I produce PDF from Markdown documents? Answer: provide YAML information in the Markdown; run Pandoc (typically through a Makefile or Atom's Markdown Preview Enhanced). To make all this work, some adjustment is needed in Pandoc options and template files.* Pandoc is a filter that takes a written document in a given format, and produces a version of that same document in yet a different format. I use Pandoc primarily to transform Markdown documents to PDF, but I also draw on Pandoc to convert Word or ODT documents to Markdown. And vice versa. @@ -176,13 +172,9 @@ $if(author)$ $endfor$ $endif$ ``` As you can see, the replacement code contains reference to `affiliation` and `email`, as in the `YAML` header, but which is not generally available in LaTeX. What makes this work is that the replacement code also loads in the package `authblk` (in its second line), that will then properly situate the `affiliation` and `email` key-values when the LaTeX `\maketitle` instruction is invoked. If you decide to use the more elaborate, compacted `YAML` header and not change the latex template, then ``` pandoc --standalone --read=markdown --write=pdf --pdfengine=pdflatex myinput.md -o myinput.pdf @@ -217,8 +209,6 @@ https://uoftcoders.github.io/studyGroup/lessons/misc/pandoc-intro/lesson/ https://opensource.com/article/18/9/pandoc-research-paper https://en.wikibooks.org/wiki/LaTeX/Title_Creation <!--- Invisible section // Quah-D-2020.08-Pandoc-Workflow-Markdown-PDF.md --> -
DannyQuah revised this gist
Jan 12, 2022 . 1 changed file with 8 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,9 +1,13 @@ <<<<<<< HEAD:2020.08-D.Quah-Pandoc-Workflow-Markdown-PDF.md # My Pandoc Markdown-PDF Workflow for Routine, Not Especially Technical, Writing by Danny Quah, Aug 2020 (revised Jan 2022) *TL;DR: I write technical articles in LaTeX. But shorter, non-technical writings are easier to do in Markdown. How do I produce PDF from Markdown documents? Answer: provide YAML information in the Markdown; run Pandoc (typically through a Makefile or Atom's Markdown Preview Enhanced). To make all this work, some adjustment is needed in Pandoc options and template files.* ======= # My Pandoc Markdown-PDF Workflow for Routine, Not Especially Technical Writing >>>>>>> 44e46ecc5d59573df70fd0441212248169237489:Quah-D-2020.08-Pandoc-Workflow-Markdown-PDF.md Pandoc is a filter that takes a written document in a given format, and produces a version of that same document in yet a different format. I use Pandoc primarily to transform Markdown documents to PDF, but I also draw on Pandoc to convert Word or ODT documents to Markdown. And vice versa. @@ -172,9 +176,13 @@ $if(author)$ $endfor$ $endif$ ``` <<<<<<< HEAD:2020.08-D.Quah-Pandoc-Workflow-Markdown-PDF.md As you can see, the replacement code contains reference to `affiliation` and `email`, as in the `YAML` header, but which is not generally available in LaTeX. What makes this work is that the replacement code also loads in the package `authblk` (in its second line), that will then properly situate the `affiliation` and `email` key-values when the LaTeX `\maketitle` instruction is invoked. If you decide to use the more elaborate, compacted `YAML` header and not change the latex template, then ======= As you can see, the replacement code contains reference to `affiliation` and `email`, as in the `YAML` header, but which is not generally available in LaTeX. What makes this work is that the replacement code also loads in the package `authblk` (in its second line), that will then properly situate the `affiliation` and `email` key-values when the LaTeX `\maketitle` instruction is invoked. >>>>>>> 44e46ecc5d59573df70fd0441212248169237489:Quah-D-2020.08-Pandoc-Workflow-Markdown-PDF.md ``` pandoc --standalone --read=markdown --write=pdf --pdfengine=pdflatex myinput.md -o myinput.pdf -
DannyQuah renamed this gist
Jan 12, 2022 . 1 changed file with 65 additions and 111 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,105 +1,49 @@ # My Pandoc Markdown-PDF Workflow for Routine, Not Especially Technical, Writing by Danny Quah, Aug 2020 (revised Jan 2022) *TL;DR: I write technical articles in LaTeX. But shorter, non-technical writings are easier to do in Markdown. How do I produce PDF from Markdown documents? Answer: provide YAML information in the Markdown; run Pandoc (typically through a Makefile or Atom's Markdown Preview Enhanced). To make all this work, some adjustment is needed in Pandoc options and template files.* Pandoc is a filter that takes a written document in a given format, and produces a version of that same document in yet a different format. I use Pandoc primarily to transform Markdown documents to PDF, but I also draw on Pandoc to convert Word or ODT documents to Markdown. And vice versa. Available official Pandoc documentation is voluminous. So as a matter of logic the knowledge to generate PDF from Markdown, to the user's desired degree of control, is already extant, out there somewhere. But a user just beginning might not find a good starting point, and without the ability to produce something useful quickly to show for their efforts, that user can lose the incentive to discover more, experiment, and improve. This writeup provides such a starting point for the beginner. ## Pandoc Basics To do its job, Pandoc has many options available from the command line. Using those can be as easy as: ``` pandoc --read=odt --write=markdown oldarchive.odt -o mydocument.md ``` In this example, the input is the Open Document format file `oldarchive.odt`; output the Markdown document `mydocument.md`. Alternatively, I might have wanted the instruction: ``` pandoc --read=markdown --write=docx oldarchive.md -o mydocument.docx ``` where now input is the Markdown file `oldarchive.md` and output the Word document `mydocument.docx`. The conversion will never be perfect, but in many cases the result provides a fine starting point for further fine-tuning. ## Underlying Attributes In the two examples just given, what you see when you render or display the input file, whether on-screen or on paper, is also approximately what you will see when you display the output file. In many Pandoc applications, that is all the user wants. The user, or their collaborators, will thereafter seek to make further changes only by editing the output file. I myself prefer to edit Markdown directly but the people I work with insist on using Word. So I convert my Markdown file to docx, and thereafter that latter document is what we hand back and forth in our editing. When the output is PDF, however, the result is no longer to be edited. Or, more accurately, to change the output file, users do not operate on the output file directly. Instead, a user will go back to the input file, make the alterations there, and then re-execute Pandoc to produce a new PDF. The PDF is just for show. This workflow having the PDF endpoint will be familiar to TeX users. With Pandoc, however, it is not just TeX or its relatives that can provide input for beautifully structured PDFs. Markdown documents can provide that input as well. Since Markdown is lighter-weight than TeX, it will not of course do everything that the latter can. However, for many routine, not especially technical, documents, Markdown provides a fine input engine as an alternative to TeX. The actual process is one where the Markdown document gets translated into LaTeX code, and then that result is fed into LaTeX itself. (A purist who insists on writing everything in TeX might remember that LaTeX stands in relation to TeX much as I've described for Markdown, except that LaTeX is higher-up in the hierarchy and thus closer to TeX. But, again, for many routine, not especially technical, writing, Markdown is already perfectly serviceable. Pandoc can be viewed as akin to `lex` and `yacc` for programmers. Who wants to code their own lexical analyzer in C from scratch every single time?) Here, however, is both opportunity and potential pitfall. LaTeX and TeX are not translators that operate character by character or paragraph by paragraph. Instead, how they work is through structure. But how does structural information get conveyed from Markdown to LaTeX? Where in a Markdown document is encoded, in a single place, the information that, perhaps, every second-level heading is followed by a mediumskip? Or that all figures should float but be placed towards the top of the page closest to where they are first referenced? This last feature might be observed to be empirically true in a specific document, but was that the intention of the author? Or did the figures just happen to come out that way? How can the author convey the information on what they intend here, in a way that Pandoc and thereafter LaTeX can use? The answer is two-fold: first, through YAML information; second, through template files. @@ -109,6 +53,7 @@ The answer is two-fold: first, through YAML information; second, through templat For generating PDF, Pandoc will, behind the scenes, call on LaTeX or a related program. Pandoc can, to a great extent, do all this invisibly, but obviously those programs need to be installed somewhere on your system. Any Markdown document can begin with YAML information, i.e., a section that starts and ends with three dashes in sequence on a line by themselves. In between those, individual lines contain key-value pairs that provide structural information on the document. Thus, a simple YAML header might be: ```yaml fileName: Pandoc-2020.08.md # Last-edited: Sun 2020.08.09.1841 -- Danny Quah ([email protected]) @@ -117,14 +62,14 @@ Tags: Software # Created: Sun 2020.08.09.1517 -- Danny Quah ([email protected]) ``` (preceded by and ending with the three-dash lines, of course). Like Python, YAML takes whitespace indentation to be significant. Comments are introduced by the `#` symbol, and are ignored by the processor. As with the `#`-introduced comments, however, as far as Markdown rendering is concerned, the YAML key-value pairs too are ignored. Indeed, on most systems, *all* YAML information is ignored by Markdown. If you have a file containing just the above lines in, say, the file `file.md`, opening this file with a Markdown previewer typically shows the file contains nothing to display. `Typora` will open up `file.md` and display the YAML header but in non-editable form. Similarly, Github. (So, to edit YAML, you'll need to open the file with a text editor like `Vim` or `Atom` or similar.) But if YAML is ignored, what is the point to it? Here is what's critical: YAML is used by Pandoc and by LaTeX when these two generate a PDF document from Markdown. YAML information is read off the Markdown document by Pandoc, gets passed by Pandoc to LaTeX, and with the latter employing YAML's key-value pairs as directives for structuring the document. Thus, in many of my Markdown files destined for PDF output, the YAML header contains also: ``` ## Front Matter title: Readable Title for My Article author: @@ -167,11 +112,12 @@ header-includes: \floatplacement{figure}{H} ``` pagestyle: headings ``` (obviously somewhere between the beginning and ending 3-dash `---` lines). This is almost all it takes to generate sensible-looking PDF from my Markdown document. The problem that remains is the `author` key-value pair. The `\maketitle` command in LaTeX does not understand `affiliation` and `email` keys, only `author`. Thus, how above I have written the author information, to include affiliation and email explicitly, will fail on the standard Pandoc latex template. Instead, the YAML key-value pair needs to be ``` author: - Danny Quah `\\\\`{=latex} Lee Kuan Yew School of Public Policy, NUS `\\\\`{=latex} [email protected] @@ -186,19 +132,23 @@ This works for me. It can be neater, however, to separate out affiliation and email information explicitly, as in the `YAML` above. To use that then, the latex template that Pandoc uses will need to be modified. First, generate the default latex template that will subsequently be changed: ``` pandoc -D latex > mytemplate.tex ``` I put `mytemplate.tex` in `~/.pandoc/templates/` as that latter is the default personal folder that will be recognised subsequently by the Pandoc option ```shell --template=mytemplate.tex ``` Now open up `mytemplate.tex` in a text editor and change the statement (or recognisable statement block) from: ```yaml \author{$for(author)$$author$$sep$ \and $endfor$} ``` to ```yaml $if(author)$ \usepackage{authblk} @@ -222,40 +172,44 @@ $if(author)$ $endfor$ $endif$ ``` As you can see, the replacement code contains reference to `affiliation` and `email`, as in the `YAML` header, but which is not generally available in LaTeX. What makes this work is that the replacement code also loads in the package `authblk` (in its second line), that will then properly situate the `affiliation` and `email` key-values when the LaTeX `\maketitle` instruction is invoked. If you decide to use the more elaborate, compacted `YAML` header and not change the latex template, then ``` pandoc --standalone --read=markdown --write=pdf --pdfengine=pdflatex myinput.md -o myinput.pdf ``` produces the desired `myinput.pdf`. If, however, you decide to use the more explicit and structured `affiliation` and `email` YAML header and the modified `mytemplate.tex` then use instead ``` pandoc --standalone --read=markdown --write=pdf --template=mytemplate.tex --pdfengine=pdflatex myinput.md -o myinput.pdf ``` instead, i.e., add the explicit new `--template` to your Pandoc call. If you want to inspect the LaTeX code that's produced along the way, you can undertake this production in two steps: ``` pandoc --standalone --read=markdown --write=latex+raw_tex myinput.md -o myinput.tex pdflatex myinput.tex &>/dev/null ``` adding in `--template=mytemplate.tex` as needed in the `pandoc` call. ## References https://maehr.github.io/academic-pandoc-template/ https://learnbyexample.github.io/tutorial/ebook-generation/customizing-pandoc/ https://pandoc.org/MANUAL.html#extension-pandoc_title_block https://uoftcoders.github.io/studyGroup/lessons/misc/pandoc-intro/lesson/ https://opensource.com/article/18/9/pandoc-research-paper https://en.wikibooks.org/wiki/LaTeX/Title_Creation <!--- Invisible section // Quah-D-2020.08-Pandoc-Workflow-Markdown-PDF.md -
DannyQuah revised this gist
Sep 12, 2021 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -165,7 +165,7 @@ $if(author)$ $endfor$ $endif$ ``` As you can see, the replacement code contains reference to `affiliation` and `email`, as in the `YAML` header, but which is not generally available in LaTeX. What makes this work is that the replacement code also loads in the package `authblk` (in its second line), that will then properly situate the `affiliation` and `email` key-values when the LaTeX `\maketitle` instruction is invoked. If you decide to use the more elaborate, compacted `YAML` header and not change the latex template, then ``` -
DannyQuah revised this gist
Mar 7, 2021 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -62,7 +62,7 @@ Tags: Software (preceded and ending with the three-dash sequence lines, of course). Like Python, YAML takes whitespace indentation to be significant. Comments are introduced by the `#` symbol, and are ignored by the processor. As with the `#`-introduced comments, however, as far as Markdown rendering is concerned, the YAML key-value pairs too are ignored. Indeed, on most systems, *all* YAML information is ignored by Markdown. If you have a file containing just the above lines in, say, the file `file.md`, opening this file with a Markdown previewer typically shows the file contains nothing to display. `Typora` will open up `file.md` and display the YAML header but in non-editable form. Similarly, Github. (So, to edit YAML, you'll need to open the file with a text editor like `Vim` or `Atom` or similar.) But if YAML is ignored, what is the point to it? Here is what's critical: YAML is used by Pandoc and by LaTeX when these two generate a PDF document from Markdown. YAML information is read off the Markdown document by Pandoc, gets passed by Pandoc to LaTeX, and with the latter employing YAML's key-value pairs as directives for structuring the document. -
DannyQuah revised this gist
Oct 11, 2020 . 1 changed file with 0 additions and 57 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,60 +1,3 @@ # My Pandoc Markdown-PDF Workflow for Routine, Not Especially Technical Writing by -
DannyQuah revised this gist
Oct 11, 2020 . 1 changed file with 21 additions and 19 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,6 +1,6 @@ --- fileName: Quah-D-2020.08-Pandoc-Workflow-Markdown-PDF.md # Last-edited: Sun 2020.10.11.1809 -- Danny Quah ([email protected]) Type: Notes Tags: Software # Created: Sun 2020.08.09.1517 -- Danny Quah ([email protected]) @@ -37,26 +37,23 @@ fontsize: 12pt fontfamily: gentium documentclass: article classoption: - notitlepage - onecolumn fontenc: T1 geometry: - a4paper - top=35mm - left=30mm - heightrounded header-includes: - | ```{=latex} \usepackage{amsmath,amsfonts,euscript,tikz,fancyhdr,float} \floatplacement{figure}{H} ``` pagestyle: headings --- # My Pandoc Markdown-PDF Workflow for Routine, Not Especially Technical Writing @@ -155,16 +152,16 @@ mathfont: ccmath fontfamily: concrete documentclass: article classoption: - notitlepage - onecolumn fontenc: T1 geometry: - a4paper - top=35mm - left=30mm - heightrounded header-includes: - | ```{=latex} \usepackage{amsmath,amsfonts,euscript,tikz,fancyhdr,float} \floatplacement{figure}{H} @@ -258,4 +255,9 @@ https://uoftcoders.github.io/studyGroup/lessons/misc/pandoc-intro/lesson/ https://opensource.com/article/18/9/pandoc-research-paper https://en.wikibooks.org/wiki/LaTeX/Title_Creation <!--- Invisible section // Quah-D-2020.08-Pandoc-Workflow-Markdown-PDF.md --> -
DannyQuah revised this gist
Aug 10, 2020 . 1 changed file with 9 additions and 9 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,6 +1,6 @@ --- fileName: Quah-D-2020.08-Pandoc-Workflow-Markdown-PDF.md # Last-edited: Mon 2020.08.10.0955 -- Danny Quah ([email protected]) Type: Notes Tags: Software # Created: Sun 2020.08.09.1517 -- Danny Quah ([email protected]) @@ -67,9 +67,9 @@ Aug 2020 *TL;DR: I write technical articles in LaTeX. But shorter, non-technical writings are easier to do in Markdown. How do I produce PDF from Markdown documents? Answer: provide YAML information in the Markdown; run Pandoc (typically through a Makefile or Atom's Markdown Preview Enhanced). To make all this work, some adjustment is needed in Pandoc options and template files.* Pandoc is a filter that takes a written document in its given format, and produces a version of that same document in yet a different format. I use Pandoc primarily to transform Markdown documents to PDF, but I also draw on Pandoc to convert Word or ODT documents to Markdown. Or vice versa. Available official Pandoc documentation is voluminous. So as a matter of logic the knowledge to generate PDF from Markdown, to the user's desired degree of control, is already extant, out there somewhere. But a user just beginning might not find a good starting point, and without the ability to produce something useful quickly to show for their efforts, that user can lose the incentive to discover more, experiment, and improve. This writeup provides such a starting point for the beginner. ## Pandoc Basics @@ -90,15 +90,15 @@ The conversion will never be perfect, but in many cases the result provides a fi In the two examples just given, what you see when you render or display the input file, whether on-screen or on paper, is also approximately what you will see when you display the output file. In many Pandoc applications, that is all the user wants. The user, or their collaborators, will thereafter seek to make further changes only by editing the output file. I myself prefer to edit Markdown directly but the people I work with insist on using Word. So I convert my Markdown file to docx, and thereafter that latter document is what we hand back and forth in our editing. When the output is PDF, however, the result is no longer to be edited. Or, more accurately, to change the output file, users do not operate on the output file directly. Instead, a user will go back to the input file, make the alterations there, and then re-execute Pandoc to produce a new PDF. The PDF is just for show. This workflow having the PDF endpoint will be familiar to TeX users. With Pandoc, however, it is not just TeX or its relatives that can provide input for beautifully structured PDFs. Markdown documents can provide that input as well. Since Markdown is lighter-weight than TeX, it will not of course do everything that the latter can. However, for many routine, not especially technical documents, Markdown provides a fine input engine as an alternative to TeX. The actual process is one where the Markdown document gets translated into LaTeX code, and then that result is fed into LaTeX itself. (A purist who insists on writing everything in TeX might remember that LaTeX stands in relation to TeX much as I've described for Markdown, except that LaTeX is higher-up in the hierarchy and thus closer to TeX. But, again, for many routine, not especially technical writing, Markdown is already perfectly serviceable. Pandoc can be viewed as akin to `lex` and `yacc` for programmers. Who wants to code their own lexical analyzer in C from scratch every single time?) Here, however, is both opportunity and potential pitfall. LaTeX and TeX are not translators that operate character by character or paragraph by paragraph. Instead, how they work is through structure. But how does structural information get conveyed from Markdown to LaTeX? Where in a Markdown document is encoded, in a single place, the information that, perhaps, every second-level heading is followed by a mediumskip? Or that all figures should float but be placed towards the top of the page closest to where they are first referenced? This last feature might be observed to be empirically true in a specific document, but was that the intention of the author? Or did the figures just happen to come out that way? How can the author convey the information on what they intend here, in a way that Pandoc and thereafter LaTeX can use? @@ -111,7 +111,7 @@ The answer is two-fold: first, through YAML information; second, through templat For generating PDF, Pandoc will, behind the scenes, call on LaTeX or a related program. Pandoc can, to a great extent, do all this invisibly, but obviously those programs need to be installed somewhere on your system. Any Markdown document can begin with YAML information, i.e., a section that starts and ends with three dashes in sequence on a line by themselves. In between those, individual lines contain key-value pairs that provide structural information on the document. Thus, a simple YAML header might be: ```yaml fileName: Pandoc-2020.08.md # Last-edited: Sun 2020.08.09.1841 -- Danny Quah ([email protected]) @@ -120,9 +120,9 @@ Tags: Software # Created: Sun 2020.08.09.1517 -- Danny Quah ([email protected]) ``` (preceded and ending with the three-dash sequence lines, of course). Like Python, YAML takes whitespace indentation to be significant. Comments are introduced by the `#` symbol, and are ignored by the processor. As with the `#`-introduced comments, however, as far as Markdown rendering is concerned, the YAML key-value pairs too are ignored. Indeed, on most systems, *all* YAML information is ignored by Markdown. If you have a file containing just the above lines in, say, the file `file.md`, opening this file with a Markdown previewer typically shows the file contains nothing to display. `Typora` will open up `file.md` and display the YAML header but in non-editable form. Similarly, Github. (So, to edit YAML, you'll need to open the file with a text editor like `Vim` or `Atom` or simiar.) But if YAML is ignored, what is the point to it? Here is what's critical: YAML is used by Pandoc and by LaTeX when these two generate a PDF document from Markdown. YAML information is read off the Markdown document by Pandoc, gets passed by Pandoc to LaTeX, and with the latter employing YAML's key-value pairs as directives for structuring the document. @@ -172,7 +172,7 @@ header-includes: pagestyle: headings ```` (obviously somewhere between the beginning and ending 3-dash `---` lines). This is almost all it takes to generate a sensible-looking PDF from my Markdown document. The problem that remains is the `author` key-value pair. The `\maketitle` command in LaTeX does not understand `affiliation` and `email` keys, only `author`. Thus, how above I have written the author information, to include affiliation and email explicitly, will fail on the standard Pandoc latex template. Instead, the YAML key-value pair needs to be ``` -
DannyQuah revised this gist
Aug 10, 2020 . 1 changed file with 1 addition and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,6 +1,6 @@ --- fileName: Quah-D-2020.08-Pandoc-Workflow-Markdown-PDF.md # Last-edited: Mon 2020.08.10.0900 -- Danny Quah ([email protected]) Type: Notes Tags: Software # Created: Sun 2020.08.09.1517 -- Danny Quah ([email protected]) -
DannyQuah created this gist
Aug 10, 2020 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,261 @@ --- fileName: Quah-D-2020.08-Pandoc-Workflow-Markdown-PDF.md # Last-edited: Mon 2020.08.10.0841 -- Danny Quah ([email protected]) Type: Notes Tags: Software # Created: Sun 2020.08.09.1517 -- Danny Quah ([email protected]) output: pdf_document ## Front Matter title: My Pandoc Markdown-PDF Workflow for Routine, Not Especially Technical Writing* author: - Danny Quah `\\\\`{=latex} Lee Kuan Yew School of Public Policy, NUS `\\\\`{=latex} [email protected] # - My Coauthor `\\\\`{=latex} Economics Department, NUS `\\\\`{=latex} [email protected] #author: # - name: Danny Quah # affiliation: Lee Kuan Yew School of Public Policy, NUS # email: [email protected] # number: 1 # - name: My Coauthor # affiliation: Economics Department, NUS # email: [email protected] # number: 2 date: theMonth theYear # abstract: # keywords: # thanks: ## Formatting fontsize: 12pt # mainfont: "gentium" # See https://fonts.google.com/ for fonts # sansfont: "Raleway" # monofont: "IBM Plex Mono" # mathfont: # fontfamily: concrete | gentium | libertine # documentclass: article | scrartcl fontfamily: gentium documentclass: article classoption: - notitlepage - onecolumn fontenc: T1 geometry: - a4paper - top=35mm - left=30mm - heightrounded header-includes: - | ```{=latex} \usepackage{amsmath,amsfonts,euscript,tikz,fancyhdr,float} \floatplacement{figure}{H} ``` pagestyle: headings --- <!--- Invisible section --> # My Pandoc Markdown-PDF Workflow for Routine, Not Especially Technical Writing by Danny Quah Aug 2020 *TL;DR: I write technical articles in LaTeX. But shorter, non-technical writings are easier to do in Markdown. How do I produce PDF from Markdown documents? Answer: provide YAML information in the Markdown; run Pandoc (typically through a Makefile or Atom's Markdown Preview Enhanced). To make all this work, some adjustment is needed in Pandoc options and template files.* Pandoc is a filter that takes a written document in its given format, and produces a version of that same document in yet a different format. I use Pandoc primarily to transform Markdown documents to PDF, but I also draw on Pandoc to convert Word or ODT documents to Markdown. Aavailable official Pandoc documentation is voluminous. So as a matter of logic the knowledge to generate PDF from Markdown, to the user's desired degree of control, is already extant, out there somewhere. But a user just beginning might not find a good starting point, and without the ability to produce something useful quickly to show for their efforts, that user can lose the incentive to discover more, experiment, and improve. This writeup provides such a starting point for the beginning user. ## Pandoc Basics To do its job, Pandoc has many options available from the command line. Using those can be as easy as just: ```shell $ pandoc --read=odt --write=markdown oldarchive.odt -o mydocument.md ``` In this example, the input is the Open Document format file `oldarchive.odt`; output the Markdown document `mydocument.md`. Alternatively, I might have wanted the instruction: ```shell $ pandoc --read=markdown --write=docx oldarchive.md -o mydocument.docx ``` where now input is the Markdown file `oldarchive.md` and output the Word document `mydocument.docx`. The conversion will never be perfect, but in many cases the result provides a fine starting point for further fine-tuning. ## Underlying Attributes In the two examples just given, what you see when you render or display the input file, whether on-screen or on paper, is also approximately what you will see when you display the output file. In many Pandoc applications, that is all the user wants. The user, or their collaborators, will thereafter seek to make further changes only by editing the output file. I myself prefer to edit Markdown directly but the people I work with insist on using Word.. So I convert my Markdown file to docx, and thereafter that latter document is what we hand back and forth in our editing. When the output is PDF, however, the result is no longer to be edited. Or, more accurately, to change the output file, users do not operate on the output file directly. Instead, a user will go back to the input file, make the alterations there, and then re-execute Pandoc to produce a new PDF. The PDF is just for show. This workflow having the PDF endpoint will be familiar to TeX users. With Pandoc, however, it is not just TeX or its relatives that can provide input for beautifully structured PDFs. Markdown documents can provide that input as well. Since Markdown is lighter-weight than TeX, it will not of course do everything that the latter can. However, for many routine, not especially technical documents, Markdown provides a fine input engine as an alternative to TeX. The actual process is one where the Markdown document gets translated into LaTeX code, and then that result is fed into LaTeX itself. (A purist who insists on writing everything in TeX might remember that LaTeX stands in relation to TeX much as I've described for Markdown, except that LaTeX is higher-up in the hierarchy and thus closer to TeX. But, again, for many routine, not especially technical writing, Markdown is already perfectly serviceable. Pandoc can be viewed as akin to `lex` and `yacc` for programmers. Who wants to code their own lexical analyzer in C from scratch every single time?) Here, however, is both opportunity and potential pitfall. LaTeX and TeX are not translators that operate character by character or paragraph by paragraph. Instead, how they work is through structure. How does structural information get conveyed from Markdown to LaTeX? Where in a Markdown document is encoded, in a single place, the information that, perhaps, every second-level heading is followed by a mediumskip? Or that all figures should float but be placed towards the top of the page closest to where they are first referenced? This last feature might be observed to be empirically true in a specific document, but was that the intention of the author? Or did the figures just happen to come out that way? How can the author convey the information on what they intend here, in a way that Pandoc and thereafter LaTeX can use? The answer is two-fold: first, through YAML information; second, through template files. ## Markdown to PDF via LaTeX For generating PDF, Pandoc will, behind the scenes, call on LaTeX or a related program. Pandoc can, to a great extent, do all this invisibly, but obviously those programs need to be installed somewhere on your system. Any Markdown document can begin with YAML information, i.e., a section that starts and ends with three dashes in sequence on a line by themselves. In between individual lines contain key-value pairs that provide structural information on the document. Thus, a simple YAML header might be: ```yaml fileName: Pandoc-2020.08.md # Last-edited: Sun 2020.08.09.1841 -- Danny Quah ([email protected]) Type: Notes Tags: Software # Created: Sun 2020.08.09.1517 -- Danny Quah ([email protected]) ``` (preceded and ending with the three-dash sequence lines, of course). Like Python, YAML uses whitespace indentation for structure. Comments are introduced by the `#` symbol, and are ignored by the processor. As with the `#`-introduced comments, however, as far as Markdown rendering is concerned, the YAML key-value pairs too are ignored. Indeed, on most systems, *all* YAML information is ignored by Markdown. If you have a file containing just the above lines in, say, the file `file.md`, opening this file with a Markdown previewer typically shows the file contains nothing to display. `Typora` will open up `file.md` and display the YAML header but in non-editable form. (So, to edit YAML, you'll need to open the file with a text editor like `Vim` or `Atom` or simiar.) But if YAML is ignored, what is the point to it? Here is what's critical: YAML is used by Pandoc and by LaTeX when these two generate a PDF document from Markdown. YAML information is read off the Markdown document by Pandoc, gets passed by Pandoc to LaTeX, and with the latter employing YAML's key-value pairs as directives for structuring the document. Thus, in many of my Markdown files destined for PDF output, the YAML header contains also: ```` ## Front Matter title: Readable Title for My Article author: - name: Danny Quah affiliation: Lee Kuan Yew School of Public Policy, NUS email: [email protected] number: 1 - name: My Coauthor affiliation: Economics Department, NUS email: [email protected] number: 2 date: June 2020 # abstract: # keywords: # thanks: ## Formatting fontsize: 12pt # mainfont: "gentium" # See https://fonts.google.com/ for fonts # sansfont: "Raleway" # monofont: "IBM Plex Mono" mathfont: ccmath # fontfamily: concrete | gentium | libertine # documentclass: article | scrartcl fontfamily: concrete documentclass: article classoption: - notitlepage - onecolumn fontenc: T1 geometry: - a4paper - top=35mm - left=30mm - heightrounded header-includes: - | ```{=latex} \usepackage{amsmath,amsfonts,euscript,tikz,fancyhdr,float} \floatplacement{figure}{H} ``` pagestyle: headings ```` (obviously somewhere between the beginning and ending `---` lines). This is almost all it takes to generate a sensible-looking PDF from my Markdown document. The problem that remains is the `author` key-value pair. The `\maketitle` command in LaTeX does not understand `affiliation` and `email` keys, only `author`. Thus, how above I have written the author information, to include affiliation and email explicitly, will fail on the standard Pandoc latex template. Instead, the YAML key-value pair needs to be ``` author: - Danny Quah `\\\\`{=latex} Lee Kuan Yew School of Public Policy, NUS `\\\\`{=latex} [email protected] ``` so that to add a second author, write: ``` author: - Danny Quah `\\\\`{=latex} Lee Kuan Yew School of Public Policy, NUS `\\\\`{=latex} [email protected] - My Coauthor `\\\\`{=latex} Economics Department, NUS `\\\\`{=latex} [email protected] ``` This works for me. It can be neater, however, to separate out affiliation and email information explicitly, as in the `YAML` above. To use that then, the latex template that Pandoc uses will need to be modified. First, generate the default latex template that will subsequently be changed: ```shell $ pandoc -D latex > mytemplate.tex ``` I put `mytemplate.tex` in `~/.pandoc/templates/` as that latter is the default personal folder that will be recognised subsequently by the Pandoc option ```shell --template=mytemplate.tex ``` Now open up `mytemplate.tex` in a text editor and change the statement (or recognisable statement block) from: ```yaml \author{$for(author)$$author$$sep$ \and $endfor$} ``` to ```yaml $if(author)$ \usepackage{authblk} $for(author)$ $if(author.name)$ $if(author.number)$ \author[$author.number$]{$author.name$} $else$ \author[]{$author.name$} $endif$ $if(author.affiliation)$ $if(author.email)$ \affil{$author.affiliation$ \thanks{$author.email$}} $else$ \affil{$author.affiliation$} $endif$ $endif$ $else$ \author{$author$} $endif$ $endfor$ $endif$ ``` As you can see, the replacement code contains reference to `affiliation` and `email`, as in the `YAML` header, but which is not generally available in LaTeX. What makes this work is that the replacement code also loads in the package `authbok` (in its second line), that will then properly situate the `affiliation` and `email` key-values when the LaTeX `\maketitle` instruction is invoked. If you decide to use the more elaborate, compacted `YAML` header and not change the latex template, then ``` $ pandoc --standalone --read=markdown --write=pdf --pdfengine=pdflatex myinput.md -o myinput.pdf ``` produces the desired `myinput.pdf`. If, however, you decide to use the more explicit and structured `affiliation` and `email` YAML header and the modified `mytemplate.tex` then use ``` $ pandoc --standalone --read=markdown --write=pdf --template=mytemplate.tex --pdfengine=pdflatex myinput.md -o myinput.pdf ``` instead, i.e., add the explicit new `--template` to your Pandoc call. If you want to inspect the LaTeX code that's produced along the way, you can undertake this production in two steps: ```shell $ pandoc --standalone --read=markdown --write=latex+raw_tex myinput.md -o myinput.tex $ pdflatex myinput.tex &>/dev/null ``` adding in `--template=mytemplate.tex` as needed in the `pandoc` call. ## References https://maehr.github.io/academic-pandoc-template/ https://learnbyexample.github.io/tutorial/ebook-generation/customizing-pandoc/ https://pandoc.org/MANUAL.html#extension-pandoc_title_block https://uoftcoders.github.io/studyGroup/lessons/misc/pandoc-intro/lesson/ https://opensource.com/article/18/9/pandoc-research-paper https://en.wikibooks.org/wiki/LaTeX/Title_Creation