Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save douglasmiranda/01aaefbc04de29b3361f0b74e75fb170 to your computer and use it in GitHub Desktop.

Select an option

Save douglasmiranda/01aaefbc04de29b3361f0b74e75fb170 to your computer and use it in GitHub Desktop.

Revisions

  1. @XinyuIDR XinyuIDR revised this gist Nov 27, 2024. 1 changed file with 1 addition and 1 deletion.
    2 changes: 1 addition & 1 deletion Understanding-PDF-Format.md
    Original file line number Diff line number Diff line change
    @@ -1,4 +1,4 @@
    # Understand PDF Format
    # Understanding PDF Format

    We have been working with PDF files since 1999 and developed complex software to display [PDF](https://blog.idrsolutions.com/what-is-a-pdf/) files. We have learnt a lot about the PDF file format in that time and share our knowledge in the articles below.

  2. @XinyuIDR XinyuIDR revised this gist Nov 27, 2024. 1 changed file with 2 additions and 0 deletions.
    2 changes: 2 additions & 0 deletions Understanding-PDF-Format.md
    Original file line number Diff line number Diff line change
    @@ -1,3 +1,5 @@
    # Understand PDF Format

    We have been working with PDF files since 1999 and developed complex software to display [PDF](https://blog.idrsolutions.com/what-is-a-pdf/) files. We have learnt a lot about the PDF file format in that time and share our knowledge in the articles below.

    There are also a large number of technical terms used with PDF so we have created a [Glossary of Terms](https://blog.idrsolutions.com/glossary-of-pdf-terms/) with all the keywords.
  3. @XinyuIDR XinyuIDR renamed this gist Nov 27, 2024. 1 changed file with 0 additions and 0 deletions.
    File renamed without changes.
  4. @XinyuIDR XinyuIDR revised this gist Nov 27, 2024. No changes.
  5. @XinyuIDR XinyuIDR renamed this gist Nov 27, 2024. 1 changed file with 0 additions and 0 deletions.
    File renamed without changes.
  6. @XinyuIDR XinyuIDR created this gist Nov 27, 2024.
    194 changes: 194 additions & 0 deletions gistfile1.txt
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,194 @@
    We have been working with PDF files since 1999 and developed complex software to display [PDF](https://blog.idrsolutions.com/what-is-a-pdf/) files. We have learnt a lot about the PDF file format in that time and share our knowledge in the articles below.

    There are also a large number of technical terms used with PDF so we have created a [Glossary of Terms](https://blog.idrsolutions.com/glossary-of-pdf-terms/) with all the keywords.

    If you are interested in using our software to display your PDF documents (we can rasterize them, [convert them to HTML5](https://blog.idrsolutions.com/why-convert-pdf-documents-to-html/) or SVG, or provide a complete Java PDF Viewer) pdf why not [setup a call](https://www.idrsolutions.com/contact-us) with us and see if we can help?

    Here is an overview of the topics covered in this article:

    - [Quick Tutorials](https://blog.idrsolutions.com/understanding-the-pdf-file-format/#tutorials)
    - [Guides](https://blog.idrsolutions.com/understanding-the-pdf-file-format/#guides)
    - [Frequently Asked Questions](https://blog.idrsolutions.com/understanding-the-pdf-file-format/#faq)
    - [The PDF File itself](https://blog.idrsolutions.com/understanding-the-pdf-file-format/#pdffile)
    - [Images in PDF](https://blog.idrsolutions.com/understanding-the-pdf-file-format/#images)
    - [Color handling in PDF](https://blog.idrsolutions.com/understanding-the-pdf-file-format/#colorhandling)
    - [Text in PDF](https://blog.idrsolutions.com/understanding-the-pdf-file-format/#text)
    - [Fonts in PDF](https://blog.idrsolutions.com/understanding-the-pdf-file-format/#fonts)
    - [PDF Forms, Annotations & Interactive Elements](https://blog.idrsolutions.com/understanding-the-pdf-file-format/#formsandannotations)
    - [PDF Encryption](https://blog.idrsolutions.com/understanding-the-pdf-file-format/#security)
    - [PDF compression](https://blog.idrsolutions.com/understanding-the-pdf-file-format/#compression)
    - [Make your own PDF file manually](https://blog.idrsolutions.com/understanding-the-pdf-file-format/#makeyourown)

    ## Quick Tutorials:

    How to solve common PDF tasks in Java with our software

    ### BuildVu

    [How to convert a PDF file into HTML](https://blog.idrsolutions.com/how-to-convert-pdf-to-html-in-java-tutorial/)
    [How to convert a PDF file into SVG](https://blog.idrsolutions.com/how-to-convert-pdf-files-to-svg/)

    ### JDeli

    [How to convert an image into PDF file](https://blog.idrsolutions.com/how-to-convert-an-image-to-a-pdf-in-java/)

    ### JPedal

    [How to convert a PDF file to an image](https://blog.idrsolutions.com/how-to-convert-a-pdf-to-image-in-java/)
    [How to rasterize PDF files](https://blog.idrsolutions.com/how-to-rasterize-pdf-files/)
    [How to search a PDF file](https://blog.idrsolutions.com/how-to-search-a-pdf-file-in-java/)
    [How to print a PDF file](https://blog.idrsolutions.com/how-to-print-pdf-files-from-java/)
    [How to access PDF metadata](https://blog.idrsolutions.com/how-to-access-pdf-metadata-in-java/)
    [How to extract text from PDF files](https://blog.idrsolutions.com/how-to-extract-text-from-pdf-files-in-java/)
    [How to extract structured text from PDF files](https://blog.idrsolutions.com/how-to-extract-structured-text-from-pdf-files/)
    [How to create or edit Annotations in a PDF file](https://blog.idrsolutions.com/how-to-create-or-edit-pdf-annotations)
    [How to extract images from a PDF file](https://blog.idrsolutions.com/how-to-extract-images-from-pdf-in-java/) 
    [How to extract clipped Images from a PDF file](https://blog.idrsolutions.com/how-to-extract-clipped-images-from-pdf-file-in-java/)
    [How to copy bookmarks from one PDF to another](https://blog.idrsolutions.com/how-to-copy-bookmarks-from-one-pdf-to-another/)
    [How to find PDF page size](https://blog.idrsolutions.com/how-to-find-pdf-page-size-in-java/)
    [How to view PDF files](https://blog.idrsolutions.com/how-to-view-pdf-files-in-java/)
    [How to extract PDF file form data](https://blog.idrsolutions.com/how-to-extract-pdf-file-form-data-in-java/)
    [How to split a PDF file in Java](https://blog.idrsolutions.com/how-to-split-pdf-files-in-java/)
    [How to remove a page from a PDF file in Java](https://blog.idrsolutions.com/how-to-remove-a-page-from-a-pdf-file-in-java/)
    [How to split a PDF file in Java](https://blog.idrsolutions.com/how-to-split-pdf-files-in-java/)

    ## Guides:

    [Top 9 pdf file questions with answers for developers](https://blog.idrsolutions.com/top-9-pdf-file-questions-with-answers-for-developers/) 
    [What is the PDF file format ?](https://blog.idrsolutions.com/what-is-the-pdf-file-format/)
    [What Java Developers need to know about PDF Files?](https://blog.idrsolutions.com/what-java-developers-need-to-know-about-pdf-files/)

    ## Frequently Asked Questions:

    Questions developers often ask us

    [Why can’t I just open and edit a PDF File?](https://blog.idrsolutions.com/why-cant-i-just-open-and-edit-a-pdf-file/)
    [How do I find out the PDF version used?](https://blog.idrsolutions.com/how-do-i-find-out-the-pdf-version-used/)
    [What is a PDF renderer?](https://blog.idrsolutions.com/what-is-a-pdf-renderer/)
    [What is a tagged PDF?](https://blog.idrsolutions.com/what-is-tagged-pdf/)
    [How big is a PDF Page in bytes?](https://blog.idrsolutions.com/how-big-is-a-pdf-page-size-in-bytes/)
    [What does an OCR PDF file contain?](https://blog.idrsolutions.com/what-does-ocr-pdf-file-contain/)
    [What is PDF Pagesize? CropBox, MediaBox, ArtBox, BleedBox, TrimBox?](https://blog.idrsolutions.com/what-is-pdf-pagesize/)
    [How to calculate PDF Page Size in Inches or Centimetres?](https://blog.idrsolutions.com/how-to-calculate-pdf-page-size-in-inches-or-centimetres/)
    [Why is my PDF Producer showing in Chinese?](https://blog.idrsolutions.com/why-is-my-pdf-producer-showing-up-in-chinese-or-all-the-adventure-of-the-wrongly-encoded-textstream/)
    [How to Embed PDF files in HTML Web Pages](https://blog.idrsolutions.com/how-to-embed-pdf-files-in-html-web-pages/)
    [How to Compare PDF files](https://blog.idrsolutions.com/how-to-compare-pdf-files/)
    [How to handle corrupt PDF files](https://blog.idrsolutions.com/how-to-handle-corrupt-pdf-files/)

    ## The PDF File itself:

    This section covers the actual file format and how it works

    [How to view PDF objects](https://blog.idrsolutions.com/how-to-view-pdf-objects/)
    [How to read a PDF file](https://blog.idrsolutions.com/how-to-read-a-pdf-file/)
    [Where do your PDF objects start in a PDF file?](https://blog.idrsolutions.com/where-do-your-pdf-objects-start-in-a-pdf-file/)
    [Understanding the PDF file format – Text, shapes and images](https://blog.idrsolutions.com/understanding-the-pdf-file-format-text-shapes-and-images/)
    [What are PDF Object Streams?](https://blog.idrsolutions.com/what-are-pdf-object-streams/)
    [Multiple Trailers in a PDF File](https://blog.idrsolutions.com/multiple-trailers-in-a-pdf-file/)
    [What are PDF Xref tables?](https://blog.idrsolutions.com/what-are-pdf-xref-tables/)
    [Understanding PDF Text Objects](https://blog.idrsolutions.com/understanding-pdf-text-objects/)
    [How does a decodeArray work on Images?](https://blog.idrsolutions.com/how-does-decodearray-work/)
    [What is a PDF Dictionary?](https://blog.idrsolutions.com/what-is-a-pdf-dictionary/)
    [What is a Linearized PDF File?](https://blog.idrsolutions.com/what-is-a-linearized-pdf/)
    [What are Form XObjects?](https://blog.idrsolutions.com/what-are-form-xobjects/)
    [How are stacks used in PDF files?](https://blog.idrsolutions.com/how-are-stacks-used-in-pdf-files/)
    [How to identify a PDF File](https://blog.idrsolutions.com/how-to-identify-a-pdf-file/)
    [No Startxref found in last 1024 bytes?](https://blog.idrsolutions.com/no-startxref-found-in-last-1024-bytes-opening-file-what-does-this-error-message-mean-with-a-pdf-file/)
    [How to Embed your own data in PDF files](https://blog.idrsolutions.com/how-to-embed-your-own-data-in-pdf-files/)
    [Why writing a PDF parser is such a challenging task (Part 234)](https://blog.idrsolutions.com/why-writing-a-pdf-parser-is-such-a-challenging-task-part-234/)

    ## Images in PDF:

    This section explores image related topics in the PDF File format

    [How are images stored in a PDF file?](https://blog.idrsolutions.com/how-images-are-stored-in-pdf/)
    [What are Blend Modes in PDF files?](https://blog.idrsolutions.com/what-are-blend-modes-in-pdf/)
    [What are PDF Image Masks?](https://blog.idrsolutions.com/what-are-image-masks/)
    [How to calculate PDF Image DPI?](https://blog.idrsolutions.com/how-to-calculate-pdf-image-dpi/)
    [How to extract Raw JPEG Images from a PDF File?](https://blog.idrsolutions.com/how-to-extract-raw-jpeg-images-from-a-pdf-file/)
    [How do Filter and DecodeParms Objects change a PDF Image?](https://blog.idrsolutions.com/filter-and-decodeparms-objects-for-a-pdf-image/)

    ## Color handling in PDF:

    Color support inside PDF files is very powerful and complex.

    [How does Color work in PDF files?](https://blog.idrsolutions.com/how-does-color-work-in-pdf-files/)
    [How does image color depth work in PDF files?](https://blog.idrsolutions.com/how-does-image-color-depth-work-in-pdf-files/)
    [What is an Indexed Colorspace in a PDF file?](https://blog.idrsolutions.com/what-is-an-indexed-colorspace-in-a-pdf-file/)
    [Why is white a special color in PDF Files?](https://blog.idrsolutions.com/why-is-white-a-special-color-in-pdf-files/)
    [What are ICCBased Colorspaces?](https://blog.idrsolutions.com/what-are-iccbased-colorspaces-in-pdf-files/)

    ## Text in PDF:

    How Text is stored, displayed and extracted from a PDF file

    [How is text stored in a PDF file?](https://blog.idrsolutions.com/how-is-text-stored-in-a-pdf-file/)
    [Why is pdf text extraction problematic?](https://blog.idrsolutions.com/why-is-pdf-text-extraction-problematic/)
    [What is Unicode?](https://blog.idrsolutions.com/beginners-introduction-unicode/)
    [What text format and style information is in a PDF file?](https://blog.idrsolutions.com/what-text-format-and-style-information-in-a-pdf-file/)
    [How to find out if a PDF file contains ‘structured content’](https://blog.idrsolutions.com/how-to-find-out-if-a-pdf-file-has-structured-content/)
    [What does the ActualText dictionary tag do?](https://blog.idrsolutions.com/what-does-the-actualtext-dictionary-tag-do/)
    [How do PDF Text Coordinates work?](https://blog.idrsolutions.com/how-do-pdf-text-coordinates-work/)
    [How are carriage returns, spaces and other gaps defined in a PDF file?](https://blog.idrsolutions.com/how-are-carriage-returns-spaces-and-other-gaps-defined/)
    [PDF Mystery – What is the correct value for a Text Field?](https://blog.idrsolutions.com/pdf-mystery-what-is-the-correct-value-for-a-text-field/)
    [PDF Text extraction – Why can I not extract text from a PDF file?](https://blog.idrsolutions.com/why-can-i-not-extract-text-from-this-pdf-file/)
    [How are text links defined in a PDF file?](https://blog.idrsolutions.com/how-are-text-links-defined-in-a-pdf-file/)
    [How are Text spaces created in a PDF file?](https://blog.idrsolutions.com/how-are-text-spaces-created-in-a-pdf-file)

    ## Fonts in PDF:

    PDF files can use three different font technologies for display

    [Introductory PDF font tutorial](https://blog.idrsolutions.com/introductory-pdf-font-tutorial/)
    [Introduction to PDF Font Technologies](https://blog.idrsolutions.com/pdf-font-technologies/)
    [How are Embedded CMAP tables defined in a PDF File?](https://blog.idrsolutions.com/how-are-embedded-cmap-tables-in-pdf-file/)
    [What are CID Fonts?](https://blog.idrsolutions.com/what-are-cid-fonts/)
    [What are subsetted fonts in PDF files?](https://blog.idrsolutions.com/what-are-subsetted-fonts-in-pdf-files/)
    [Where do PDF viewers get font data for non-embedded fonts?](https://blog.idrsolutions.com/where-do-pdf-viewers-get-font-data-for-non-embedded-fonts/)
    [Problems caused by arial fonts in PDF files](https://blog.idrsolutions.com/problems-caused-by-arial-font-in-pdf-files/)
    [How does TrueType Hinting work?](https://blog.idrsolutions.com/how-does-truetype-hinting-work/)
    [Why are CID Fonts far more complicated than non-CID Fonts?](https://blog.idrsolutions.com/why-are-cid-fonts-far-more-complicated-than-non-cid-fonts/)

    ## PDF Forms, Annotations & Interactive Elements:

    PDF files can contain interactive elements with Forms and Annotations

    [What are PDF Forms?](https://blog.idrsolutions.com/what-are-pdf-forms/)
    [What are AcroForms?](https://blog.idrsolutions.com/what-are-acroforms/)
    [What are XFA Forms?](https://blog.idrsolutions.com/what-are-xfa-forms/)
    [How do PDF files add interactive elements?](https://blog.idrsolutions.com/how-do-pdf-files-add-interactive-elements/)
    [How do Layers work in a PDF file?](https://blog.idrsolutions.com/how-do-layers-work-in-a-pdf-file/)
    [Is it possible to extract flattened form data from a PDF file?](https://blog.idrsolutions.com/is-it-possible-to-extract-flattened-form-data-from-a-pdf-file/)
    [What is PDF Form Flattening?](https://blog.idrsolutions.com/what-is-pdf-form-flattening/)
    [How to display PDF forms in a browser](https://blog.idrsolutions.com/how-to-display-pdf-forms-in-a-browser/)

    ## PDF File Encryption:

    PDF files can have their content protected using encryption.

    [How are PDF files protected?](https://blog.idrsolutions.com/how-are-pdf-files-protected/)
    [Overview of Security Features offered by the PDF file format](https://blog.idrsolutions.com/brief-overview-of-security-features-offered-by-the-pdf-file-format/)
    [How are PDF files password protected?](https://blog.idrsolutions.com/how-are-pdf-files-password-protected/)
    [How to create your own test certificates and keys for signing PDF files](https://blog.idrsolutions.com/how-to-create-your-own-test-certificates-and-keys-for-signing-pdf-files/)

    ## PDF compression:

    PDF files use CCITT, DCT, Flate, LZW and other forms of Compression to reduce the size of a PDF file.

    [What is CCITT compression?](https://blog.idrsolutions.com/what-is-ccitt-compression//)
    [How to Convert CCITT data to TIFF image](https://blog.idrsolutions.com/how-to-convert-ccitt-data-to-tiff/)
    [What is the best option to compress a PDF?](https://blog.idrsolutions.com/what-is-the-best-compression-format-for-pdf/)
    [How does CCITT compress image data?](https://blog.idrsolutions.com/how-does-ccitt-compress-image-data/)

    ## Make your own PDF file manually with our ‘Hello World’ coding example

    One of our developers bravely set out to write the ‘Hello World’ tutorial of PDF files, creating a PDF file from scratch manually, in a text editor. Follow the series:

    [Part 1: PDF Objects and Data Types](https://blog.idrsolutions.com/make-your-own-pdf-file-part-1-pdf-objects-and-data-types/)
    [Part 2: Structure of a PDF file](https://blog.idrsolutions.com/make-your-own-pdf-file-part-2-structure-of-a-pdf-file/)
    [Part 2.5: Create a non working PDF](https://blog.idrsolutions.com/make-your-own-pdf-part-2b-create-your-own-non-working-pdf/)
    [Part 3: DIY Blank Page](https://blog.idrsolutions.com/make-your-own-pdf-file-part-3-diy-blank-page/)
    [Part 4: Hello World Pdf](https://blog.idrsolutions.com/make-your-own-pdf-file-part-4-hello-world-pdf/)
    [Part 5: Path objects](https://blog.idrsolutions.com/make-your-own-pdf-file-part-5-path-objects/)
    [Part 6: Graphics State](https://blog.idrsolutions.com/make-your-own-pdf-file-part-6-graphics-state/)
    [How to edit PDF files using Incremental Updates](https://blog.idrsolutions.com/how-to-edit-pdf-files/)