A Two-Step Method for Ensuring Printed Document Integrity using Crossing Number Distances
Table of content
Overview
This page contains the materials presented in the paper A Two-Step Method for Ensuring Printed Document Integrity using Crossing Number Distances.
We propose a two-step method that compares a template with a query document to ensure that the query document has not been tampered with. Our method first reverts geometric transformations the document underwent, and then extracts the crossing numbers in that document image. A Euclidean distance based matching method is applied to the two sets of crossing numbers, and abnormally distant point groups are flagged as potentially modified. A second step in our method is then applied to analyze the statistical properties of these distance values, to ensure that the document has not been altered. Our results when we apply our method to a database containing administrative documents and tampered versions of these documents - all of which underwent a print and scan process - show the validity of our considerations.
You can find in this page the impementation of proposed method in Python 3 and the augmented PaySlip dataset.
Method
The implementation of our method was done using Python and standard image processing libraries, such as OpenCV, matplotlib, and scikit-image. To compare two document images, one can call the check(str image1Path, str image2Path) function. The function will return two booleans, indicating if the match passed the two steps, and will also save images of both the distance map and distance dispersion map.
Dataset
In this work, we have used the PaySlip dataset dedicated to detect the falsifications in payslip images. We have considered a subset of documents with Arial font and a font size of 10. The falsifications of PaySlip dataset were carried out on digital documents, or on documents that were very slightly altered by a print-and-scan process. We therefore printed and scanned a subset of these documents (both genuine and altered versions) in order to verify the robustness of our method to the print-and-scan and double print-and-scan processes.
The augmented PaySlip dataset can be downloaded here. We provide 62 document images printed and scanned using TOSHIBA ColorMFP priner/scanner at 300 dpi and 600 dpi resolutions. Aditionally, we have re-printed and re-scanned 31 documents at 600 dpi resolution to verify the robustness of proposed method to double print-and-scan impact.
Here are the details of augmented PaySlip dataset used in this work:
Resolution | Genuine | Forged |
---|---|---|
PS 300dpi | 10 | 21 |
PS 600dpi | 10 | 21 |
Double PS 600dpi | 10 | 21 |
Example of genuine and fake document from PaySlip dataset
Results
For this augmented PaySlip dataset, we have obtained the following accuracy results for two-class classification task (genuine vs forged document) as a function of the considered print-and-scan resolution:
Resolution | Genuine | Forged |
---|---|---|
PS 300dpi | 100% | 95% |
PS 600dpi | 100% | 90% |
Double PS 600dpi | 50% | 100% |
Total mean | 83% | 95% |
Citation
The code and the augmented dataset could only be used for scientific purposes. It must not be republished other than by the original authors. The scientific use includes processing the data and showing it in publications and presentations. If you use it, please cite:
@InProceedings{yriarte2022two,
author = {Yriarte F., Puteaux P. and Tkachenko I.},
title = {A Two-Step Method for Ensuring Printed Document Integrity using Crossing Number Distances},
booktitle = {IEEE International Workshop on Information Forensics and Security},
month = {December},
year = {2022}
}
Acknowledgements
This work was funded by the project FuzzyDoc supported by the CNRS Research Group of Information, Signal, Image and Vision (CNRS GdR-ISIS).