# Understanding PDF Format We have been working with PDF files since 1999 and developed complex software to display [PDF](https://blog.idrsolutions.com/what-is-a-pdf/) files. We have learnt a lot about the PDF file format in that time and share our knowledge in the articles below. There are also a large number of technical terms used with PDF so we have created a [Glossary of Terms](https://blog.idrsolutions.com/glossary-of-pdf-terms/) with all the keywords. If you are interested in using our software to display your PDF documents (we can rasterize them, [convert them to HTML5](https://blog.idrsolutions.com/why-convert-pdf-documents-to-html/) or SVG, or provide a complete Java PDF Viewer) pdf why not [setup a call](https://www.idrsolutions.com/contact-us) with us and see if we can help? Here is an overview of the topics covered in this article: - [Quick Tutorials](https://blog.idrsolutions.com/understanding-the-pdf-file-format/#tutorials) - [Guides](https://blog.idrsolutions.com/understanding-the-pdf-file-format/#guides) - [Frequently Asked Questions](https://blog.idrsolutions.com/understanding-the-pdf-file-format/#faq) - [The PDF File itself](https://blog.idrsolutions.com/understanding-the-pdf-file-format/#pdffile) - [Images in PDF](https://blog.idrsolutions.com/understanding-the-pdf-file-format/#images) - [Color handling in PDF](https://blog.idrsolutions.com/understanding-the-pdf-file-format/#colorhandling) - [Text in PDF](https://blog.idrsolutions.com/understanding-the-pdf-file-format/#text) - [Fonts in PDF](https://blog.idrsolutions.com/understanding-the-pdf-file-format/#fonts) - [PDF Forms, Annotations & Interactive Elements](https://blog.idrsolutions.com/understanding-the-pdf-file-format/#formsandannotations) - [PDF Encryption](https://blog.idrsolutions.com/understanding-the-pdf-file-format/#security) - [PDF compression](https://blog.idrsolutions.com/understanding-the-pdf-file-format/#compression) - [Make your own PDF file manually](https://blog.idrsolutions.com/understanding-the-pdf-file-format/#makeyourown) ## Quick Tutorials: How to solve common PDF tasks in Java with our software ### BuildVu [How to convert a PDF file into HTML](https://blog.idrsolutions.com/how-to-convert-pdf-to-html-in-java-tutorial/) [How to convert a PDF file into SVG](https://blog.idrsolutions.com/how-to-convert-pdf-files-to-svg/) ### JDeli [How to convert an image into PDF file](https://blog.idrsolutions.com/how-to-convert-an-image-to-a-pdf-in-java/) ### JPedal [How to convert a PDF file to an image](https://blog.idrsolutions.com/how-to-convert-a-pdf-to-image-in-java/) [How to rasterize PDF files](https://blog.idrsolutions.com/how-to-rasterize-pdf-files/) [How to search a PDF file](https://blog.idrsolutions.com/how-to-search-a-pdf-file-in-java/) [How to print a PDF file](https://blog.idrsolutions.com/how-to-print-pdf-files-from-java/) [How to access PDF metadata](https://blog.idrsolutions.com/how-to-access-pdf-metadata-in-java/) [How to extract text from PDF files](https://blog.idrsolutions.com/how-to-extract-text-from-pdf-files-in-java/) [How to extract structured text from PDF files](https://blog.idrsolutions.com/how-to-extract-structured-text-from-pdf-files/) [How to create or edit Annotations in a PDF file](https://blog.idrsolutions.com/how-to-create-or-edit-pdf-annotations) [How to extract images from a PDF file](https://blog.idrsolutions.com/how-to-extract-images-from-pdf-in-java/)  [How to extract clipped Images from a PDF file](https://blog.idrsolutions.com/how-to-extract-clipped-images-from-pdf-file-in-java/) [How to copy bookmarks from one PDF to another](https://blog.idrsolutions.com/how-to-copy-bookmarks-from-one-pdf-to-another/) [How to find PDF page size](https://blog.idrsolutions.com/how-to-find-pdf-page-size-in-java/) [How to view PDF files](https://blog.idrsolutions.com/how-to-view-pdf-files-in-java/) [How to extract PDF file form data](https://blog.idrsolutions.com/how-to-extract-pdf-file-form-data-in-java/) [How to split a PDF file in Java](https://blog.idrsolutions.com/how-to-split-pdf-files-in-java/) [How to remove a page from a PDF file in Java](https://blog.idrsolutions.com/how-to-remove-a-page-from-a-pdf-file-in-java/) [How to split a PDF file in Java](https://blog.idrsolutions.com/how-to-split-pdf-files-in-java/) ## Guides: [Top 9 pdf file questions with answers for developers](https://blog.idrsolutions.com/top-9-pdf-file-questions-with-answers-for-developers/)  [What is the PDF file format ?](https://blog.idrsolutions.com/what-is-the-pdf-file-format/) [What Java Developers need to know about PDF Files?](https://blog.idrsolutions.com/what-java-developers-need-to-know-about-pdf-files/) ## Frequently Asked Questions: Questions developers often ask us [Why can’t I just open and edit a PDF File?](https://blog.idrsolutions.com/why-cant-i-just-open-and-edit-a-pdf-file/) [How do I find out the PDF version used?](https://blog.idrsolutions.com/how-do-i-find-out-the-pdf-version-used/) [What is a PDF renderer?](https://blog.idrsolutions.com/what-is-a-pdf-renderer/) [What is a tagged PDF?](https://blog.idrsolutions.com/what-is-tagged-pdf/) [How big is a PDF Page in bytes?](https://blog.idrsolutions.com/how-big-is-a-pdf-page-size-in-bytes/) [What does an OCR PDF file contain?](https://blog.idrsolutions.com/what-does-ocr-pdf-file-contain/) [What is PDF Pagesize? CropBox, MediaBox, ArtBox, BleedBox, TrimBox?](https://blog.idrsolutions.com/what-is-pdf-pagesize/) [How to calculate PDF Page Size in Inches or Centimetres?](https://blog.idrsolutions.com/how-to-calculate-pdf-page-size-in-inches-or-centimetres/) [Why is my PDF Producer showing in Chinese?](https://blog.idrsolutions.com/why-is-my-pdf-producer-showing-up-in-chinese-or-all-the-adventure-of-the-wrongly-encoded-textstream/) [How to Embed PDF files in HTML Web Pages](https://blog.idrsolutions.com/how-to-embed-pdf-files-in-html-web-pages/) [How to Compare PDF files](https://blog.idrsolutions.com/how-to-compare-pdf-files/) [How to handle corrupt PDF files](https://blog.idrsolutions.com/how-to-handle-corrupt-pdf-files/) ## The PDF File itself: This section covers the actual file format and how it works [How to view PDF objects](https://blog.idrsolutions.com/how-to-view-pdf-objects/) [How to read a PDF file](https://blog.idrsolutions.com/how-to-read-a-pdf-file/) [Where do your PDF objects start in a PDF file?](https://blog.idrsolutions.com/where-do-your-pdf-objects-start-in-a-pdf-file/) [Understanding the PDF file format – Text, shapes and images](https://blog.idrsolutions.com/understanding-the-pdf-file-format-text-shapes-and-images/) [What are PDF Object Streams?](https://blog.idrsolutions.com/what-are-pdf-object-streams/) [Multiple Trailers in a PDF File](https://blog.idrsolutions.com/multiple-trailers-in-a-pdf-file/) [What are PDF Xref tables?](https://blog.idrsolutions.com/what-are-pdf-xref-tables/) [Understanding PDF Text Objects](https://blog.idrsolutions.com/understanding-pdf-text-objects/) [How does a decodeArray work on Images?](https://blog.idrsolutions.com/how-does-decodearray-work/) [What is a PDF Dictionary?](https://blog.idrsolutions.com/what-is-a-pdf-dictionary/) [What is a Linearized PDF File?](https://blog.idrsolutions.com/what-is-a-linearized-pdf/) [What are Form XObjects?](https://blog.idrsolutions.com/what-are-form-xobjects/) [How are stacks used in PDF files?](https://blog.idrsolutions.com/how-are-stacks-used-in-pdf-files/) [How to identify a PDF File](https://blog.idrsolutions.com/how-to-identify-a-pdf-file/) [No Startxref found in last 1024 bytes?](https://blog.idrsolutions.com/no-startxref-found-in-last-1024-bytes-opening-file-what-does-this-error-message-mean-with-a-pdf-file/) [How to Embed your own data in PDF files](https://blog.idrsolutions.com/how-to-embed-your-own-data-in-pdf-files/) [Why writing a PDF parser is such a challenging task (Part 234)](https://blog.idrsolutions.com/why-writing-a-pdf-parser-is-such-a-challenging-task-part-234/) ## Images in PDF: This section explores image related topics in the PDF File format [How are images stored in a PDF file?](https://blog.idrsolutions.com/how-images-are-stored-in-pdf/) [What are Blend Modes in PDF files?](https://blog.idrsolutions.com/what-are-blend-modes-in-pdf/) [What are PDF Image Masks?](https://blog.idrsolutions.com/what-are-image-masks/) [How to calculate PDF Image DPI?](https://blog.idrsolutions.com/how-to-calculate-pdf-image-dpi/) [How to extract Raw JPEG Images from a PDF File?](https://blog.idrsolutions.com/how-to-extract-raw-jpeg-images-from-a-pdf-file/) [How do Filter and DecodeParms Objects change a PDF Image?](https://blog.idrsolutions.com/filter-and-decodeparms-objects-for-a-pdf-image/) ## Color handling in PDF: Color support inside PDF files is very powerful and complex. [How does Color work in PDF files?](https://blog.idrsolutions.com/how-does-color-work-in-pdf-files/) [How does image color depth work in PDF files?](https://blog.idrsolutions.com/how-does-image-color-depth-work-in-pdf-files/) [What is an Indexed Colorspace in a PDF file?](https://blog.idrsolutions.com/what-is-an-indexed-colorspace-in-a-pdf-file/) [Why is white a special color in PDF Files?](https://blog.idrsolutions.com/why-is-white-a-special-color-in-pdf-files/) [What are ICCBased Colorspaces?](https://blog.idrsolutions.com/what-are-iccbased-colorspaces-in-pdf-files/) ## Text in PDF: How Text is stored, displayed and extracted from a PDF file [How is text stored in a PDF file?](https://blog.idrsolutions.com/how-is-text-stored-in-a-pdf-file/) [Why is pdf text extraction problematic?](https://blog.idrsolutions.com/why-is-pdf-text-extraction-problematic/) [What is Unicode?](https://blog.idrsolutions.com/beginners-introduction-unicode/) [What text format and style information is in a PDF file?](https://blog.idrsolutions.com/what-text-format-and-style-information-in-a-pdf-file/) [How to find out if a PDF file contains ‘structured content’](https://blog.idrsolutions.com/how-to-find-out-if-a-pdf-file-has-structured-content/) [What does the ActualText dictionary tag do?](https://blog.idrsolutions.com/what-does-the-actualtext-dictionary-tag-do/) [How do PDF Text Coordinates work?](https://blog.idrsolutions.com/how-do-pdf-text-coordinates-work/) [How are carriage returns, spaces and other gaps defined in a PDF file?](https://blog.idrsolutions.com/how-are-carriage-returns-spaces-and-other-gaps-defined/) [PDF Mystery – What is the correct value for a Text Field?](https://blog.idrsolutions.com/pdf-mystery-what-is-the-correct-value-for-a-text-field/) [PDF Text extraction – Why can I not extract text from a PDF file?](https://blog.idrsolutions.com/why-can-i-not-extract-text-from-this-pdf-file/) [How are text links defined in a PDF file?](https://blog.idrsolutions.com/how-are-text-links-defined-in-a-pdf-file/) [How are Text spaces created in a PDF file?](https://blog.idrsolutions.com/how-are-text-spaces-created-in-a-pdf-file) ## Fonts in PDF: PDF files can use three different font technologies for display [Introductory PDF font tutorial](https://blog.idrsolutions.com/introductory-pdf-font-tutorial/) [Introduction to PDF Font Technologies](https://blog.idrsolutions.com/pdf-font-technologies/) [How are Embedded CMAP tables defined in a PDF File?](https://blog.idrsolutions.com/how-are-embedded-cmap-tables-in-pdf-file/) [What are CID Fonts?](https://blog.idrsolutions.com/what-are-cid-fonts/) [What are subsetted fonts in PDF files?](https://blog.idrsolutions.com/what-are-subsetted-fonts-in-pdf-files/) [Where do PDF viewers get font data for non-embedded fonts?](https://blog.idrsolutions.com/where-do-pdf-viewers-get-font-data-for-non-embedded-fonts/) [Problems caused by arial fonts in PDF files](https://blog.idrsolutions.com/problems-caused-by-arial-font-in-pdf-files/) [How does TrueType Hinting work?](https://blog.idrsolutions.com/how-does-truetype-hinting-work/) [Why are CID Fonts far more complicated than non-CID Fonts?](https://blog.idrsolutions.com/why-are-cid-fonts-far-more-complicated-than-non-cid-fonts/) ## PDF Forms, Annotations & Interactive Elements: PDF files can contain interactive elements with Forms and Annotations [What are PDF Forms?](https://blog.idrsolutions.com/what-are-pdf-forms/) [What are AcroForms?](https://blog.idrsolutions.com/what-are-acroforms/) [What are XFA Forms?](https://blog.idrsolutions.com/what-are-xfa-forms/) [How do PDF files add interactive elements?](https://blog.idrsolutions.com/how-do-pdf-files-add-interactive-elements/) [How do Layers work in a PDF file?](https://blog.idrsolutions.com/how-do-layers-work-in-a-pdf-file/) [Is it possible to extract flattened form data from a PDF file?](https://blog.idrsolutions.com/is-it-possible-to-extract-flattened-form-data-from-a-pdf-file/) [What is PDF Form Flattening?](https://blog.idrsolutions.com/what-is-pdf-form-flattening/) [How to display PDF forms in a browser](https://blog.idrsolutions.com/how-to-display-pdf-forms-in-a-browser/) ## PDF File Encryption: PDF files can have their content protected using encryption. [How are PDF files protected?](https://blog.idrsolutions.com/how-are-pdf-files-protected/) [Overview of Security Features offered by the PDF file format](https://blog.idrsolutions.com/brief-overview-of-security-features-offered-by-the-pdf-file-format/) [How are PDF files password protected?](https://blog.idrsolutions.com/how-are-pdf-files-password-protected/) [How to create your own test certificates and keys for signing PDF files](https://blog.idrsolutions.com/how-to-create-your-own-test-certificates-and-keys-for-signing-pdf-files/) ## PDF compression: PDF files use CCITT, DCT, Flate, LZW and other forms of Compression to reduce the size of a PDF file. [What is CCITT compression?](https://blog.idrsolutions.com/what-is-ccitt-compression//) [How to Convert CCITT data to TIFF image](https://blog.idrsolutions.com/how-to-convert-ccitt-data-to-tiff/) [What is the best option to compress a PDF?](https://blog.idrsolutions.com/what-is-the-best-compression-format-for-pdf/) [How does CCITT compress image data?](https://blog.idrsolutions.com/how-does-ccitt-compress-image-data/) ## Make your own PDF file manually with our ‘Hello World’ coding example One of our developers bravely set out to write the ‘Hello World’ tutorial of PDF files, creating a PDF file from scratch manually, in a text editor. Follow the series: [Part 1: PDF Objects and Data Types](https://blog.idrsolutions.com/make-your-own-pdf-file-part-1-pdf-objects-and-data-types/) [Part 2: Structure of a PDF file](https://blog.idrsolutions.com/make-your-own-pdf-file-part-2-structure-of-a-pdf-file/) [Part 2.5: Create a non working PDF](https://blog.idrsolutions.com/make-your-own-pdf-part-2b-create-your-own-non-working-pdf/) [Part 3: DIY Blank Page](https://blog.idrsolutions.com/make-your-own-pdf-file-part-3-diy-blank-page/) [Part 4: Hello World Pdf](https://blog.idrsolutions.com/make-your-own-pdf-file-part-4-hello-world-pdf/) [Part 5: Path objects](https://blog.idrsolutions.com/make-your-own-pdf-file-part-5-path-objects/) [Part 6: Graphics State](https://blog.idrsolutions.com/make-your-own-pdf-file-part-6-graphics-state/) [How to edit PDF files using Incremental Updates](https://blog.idrsolutions.com/how-to-edit-pdf-files/)