What Hidden Data Lurks in PDFs?
A PDF created by exporting from a Word or Google Docs document typically retains the original author's name and organization, the company domain name, creation and modification timestamps, the specific software version used, complete tracked-changes history including deleted text, and any reviewer comments that may have been marked as resolved. Common embedded document properties include: full legal name and organization of the author, precise creation and last-modified timestamps, the software application and version used to create it, a history of all revision dates and who made changes, and comments or annotations that may have been 'hidden' but not deleted. For legal, medical, financial, and business documents, this metadata represents a potentially serious confidentiality risk β especially when sharing with counterparties who may have adversarial interests.
How to Securely Handle PDFs
- 1Before sharing any PDF externally, inspect its metadata. Open Document Properties in your PDF viewer to see the embedded author name, organization, software used to create it, creation date, and modification history. This information is transmitted automatically every time you share the file.
- 2Remove or properly redact sensitive content. True, legally defensible redaction must overwrite the underlying text data β never simply draw a black rectangle over text, because the text remains fully selectable, searchable, and readable beneath the visual covering shape. Use tools that certifiably delete the underlying data.
- 3Add password protection to PDFs that contain sensitive information before sharing. Use AES-256 encryption β the current industry standard β and set both an open password (required to view the document) and a permissions password (to prevent copying, printing, or editing by the recipient).
PDF Security Best Practices
When you need a clean, metadata-free version of a document, print to PDF from a fresh application rather than exporting from the original source software β this typically strips most embedded metadata and revision history. For legally binding redaction β court filings, legal discovery, regulated industry documents β use tools specifically designed for certified redaction rather than general-purpose PDF editors. Several high-profile document leaks have occurred because journalists or officials used visual covering shapes rather than true content deletion. Be especially cautious with PDFs received from unknown or untrusted sources β PDF files can contain embedded JavaScript that executes automatically when opened, hyperlinks to tracking URLs that reveal when and where you opened the document, and exploit code targeting vulnerabilities in PDF reader software. Always keep your PDF reader software up to date β Adobe Acrobat and other readers regularly patch serious security vulnerabilities that attackers actively exploit through malicious PDF files.