Scanned PDF vs Text PDF: Why Compression Works Differently
Understand why scanned PDFs compress better than text-based PDFs, and learn the best compression strategies for each type.
Not all PDFs are created equal. If you’ve ever wondered why some PDFs compress dramatically while others barely shrink, the answer lies in how the PDF was created.
Two Types of PDFs
Scanned PDFs (Image-Based)
When you scan a document, each page becomes an image. The PDF is essentially a container holding these images. A 100-page scanned document might contain 100 separate JPEG or PNG images.
Characteristics:
- Large file sizes (often 5-20MB per page)
- Text is not selectable (it’s part of the image)
- Created by scanners, phone cameras, or “Print to PDF” from images
- High compression potential (70-95% reduction possible)
Text-Based PDFs (Native)
When you export a document from Word, create a PDF in InDesign, or use “Save as PDF” from most applications, you get a text-based PDF. The text is stored as actual text data, not images.
Characteristics:
- Smaller file sizes (often 50KB-2MB for text documents)
- Text is selectable and searchable
- Created by word processors, design software, or web browsers
- Limited compression potential (10-30% reduction typical)
Why Scanned PDFs Compress Better
The Math Behind It
A scanned page at 300 DPI in color can be 10-30MB as a raw image. Standard JPEG compression reduces this to 1-3MB per page. Further optimization can bring it down to 100-500KB per page.
Example: 50-page color scan
- Original: 500MB (10MB × 50 pages)
- After compression: 25MB (500KB × 50 pages)
- Reduction: 95%
Why Text PDFs Don’t Compress Much
Text-based PDFs are already efficient. The text “Hello World” takes about 11 bytes to store, regardless of font size on the page. There’s simply less redundant data to remove.
Example: 50-page text document
- Original: 2MB
- After compression: 1.5MB
- Reduction: 25%
How to Identify Your PDF Type
Quick Test: Try to Select Text
- Open your PDF in Preview or Adobe Reader
- Try to select text with your cursor
- If you can highlight individual words → Text-based PDF
- If you can only select the whole page as an image → Scanned PDF
Check File Properties
In Preview (Mac):
- Open the PDF
- Go to Tools → Show Inspector
- Look at “Content Creator” — scanners and imaging software indicate scanned PDFs
Best Compression Strategies
For Scanned PDFs
Scanned PDFs respond well to aggressive compression:
- Use grayscale for text documents (removes color data)
- Set 150-200 DPI (sufficient for reading, much smaller than 300 DPI)
- Target specific sizes — you can often achieve 90%+ reduction
- Use multi-pass compression for precise targeting
Recommended settings in SecureCompress:
- Mode: Grayscale
- DPI: 200
- Target: Your required size (e.g., 25MB)
For Text-Based PDFs
Text PDFs need a different approach:
-
First, try your PDF editor’s built-in compression
- In Preview: File → Export → Quartz Filter → Reduce File Size
- In Adobe: File → Save As Other → Reduced Size PDF
-
Remove embedded fonts (if not needed for exact rendering)
-
Downsample images within the document
-
If still too large, convert to scanned format (last resort — loses text selectability)
When to use SecureCompress on text PDFs:
- When you need a specific target size
- When built-in tools don’t reduce enough
- When the PDF contains many embedded images
Mixed Content PDFs
Many real-world PDFs contain both text and images:
- Reports with charts and photos
- Presentations exported to PDF
- Documents with scanned attachments
Strategy for Mixed PDFs
- Identify the heavy content — usually images
- Use moderate compression to preserve text quality
- Test at 200% zoom to verify text readability
- Consider splitting if one section is much larger
Real-World Examples
Example 1: Scanned Tax Documents
Original: 120MB (40 pages, color, 300 DPI) After compression: 8MB (grayscale, 200 DPI) Result: 93% reduction, all text readable
Example 2: Word Document Exported to PDF
Original: 3MB (50 pages, some charts) After compression: 2.2MB Result: 27% reduction (limited by text content)
Example 3: Presentation with Photos
Original: 85MB (30 slides, many photos) After compression: 15MB Result: 82% reduction (photos compressed significantly)
When Compression Won’t Help
Some PDFs won’t compress much regardless of tool:
- Already compressed PDFs — re-compressing yields minimal gains
- Text-only documents — already efficient
- PDFs with vector graphics — vectors are resolution-independent
- Encrypted PDFs — compression algorithms can’t access the data
Tips for Better Results
Before Scanning
If you control the scanning process:
- Scan at 200 DPI (not 300 or 600)
- Use grayscale for text documents
- Use black & white for line art
- Scan directly to PDF (not TIFF then convert)
After Compression
Always verify:
- Text is readable at 100% zoom
- Important details aren’t lost
- File opens correctly
- All pages are present
Summary
| PDF Type | Compression Potential | Best Approach |
|---|---|---|
| Scanned (image-based) | 70-95% | Target-size compression, grayscale, 200 DPI |
| Text-based (native) | 10-30% | Built-in tools first, then target-size if needed |
| Mixed content | 40-80% | Moderate compression, verify text quality |
Understanding your PDF type helps you choose the right compression strategy and set realistic expectations for file size reduction.
Download SecureCompress — optimized for scanned PDF compression.
Ready to compress your PDFs?
Download SecureCompress and hit your target size with local, private processing.