Skip to content
Snippets Groups Projects
Iuliia Tkachenko's avatar
f5944e60
Name Last commit Last update
assets
subset/arial
README.md
fuzzyDoc.ipynb

A Two-Step Method for Ensuring Printed Document Integrity using Crossing Number Distances

Table of content

Overview

This page contains the materials presented in the paper A Two-Step Method for Ensuring Printed Document Integrity using Crossing Number Distances.

We propose a two-step method that compares a template with a query document to ensure that the query document has not been tampered with. Our method first reverts geometric transformations the document underwent, and then extracts the crossing numbers in that document image. A Euclidean distance based matching method is applied to the two sets of crossing numbers, and abnormally distant point groups are flagged as potentially modified. A second step in our method is then applied to analyze the statistical properties of these distance values, to ensure that the document has not been altered. Our results when we apply our method to a database containing administrative documents and tampered versions of these documents - all of which underwent a print and scan process - show the validity of our considerations.

You can find in this page the impementation of proposed method in Python 3 and the augmented PaySlip dataset.

Method

The implementation of our method was done using Python and standard image processing libraries, such as OpenCV, matplotlib, and scikit-image. To compare two document images, one can call the check(str image1Path, str image2Path) function. The function will return two booleans, indicating if the match passed the two steps, and will also save images of both the distance map and distance dispersion map.

Dataset

In this work, we have used the PaySlip dataset dedicated to detect the falsifications in payslip images. We have considered a subset of documents with Arial font and a font size of 10. The falsifications of PaySlip dataset were carried out on digital documents, or on documents that were very slightly altered by a print-and-scan process. We therefore printed and scanned a subset of these documents (both genuine and altered versions) in order to verify the robustness of our method to the print-and-scan and double print-and-scan processes.

The augmented PaySlip dataset can be downloaded here. We provide 62 document images printed and scanned using TOSHIBA ColorMFP priner/scanner at 300 dpi and 600 dpi resolutions. Aditionally, we have re-printed and re-scanned 31 documents at 600 dpi resolution to verify the robustness of proposed method to double print-and-scan impact.

Here are the details of augmented PaySlip dataset used in this work:

Resolution Genuine Forged
PS 300dpi 10 21
PS 600dpi 10 21
Double PS 600dpi 10 21
Augmented PaySlip samples

Augmented PaySlip samples

Results

For this augmented PaySlip dataset, we have obtained the following accuracy results for two-class classification task (genuine vs forged document) as a function of the considered print-and-scan resolution:

Resolution Genuine Forged
PS 300dpi 100% 95%
PS 600dpi 100% 90%
Double PS 600dpi 50% 100%
Total mean 83% 95%

Citation

The code and the augmented dataset could only be used for scientific purposes. It must not be republished other than by the original authors. The scientific use includes processing the data and showing it in publications and presentations. If you use it, please cite:

@InProceedings{yriarte2022two,
    author    = {Yriarte F., Puteaux P. and Tkachenko I.},
    title     = {A Two-Step Method for Ensuring Printed Document Integrity using Crossing Number Distances},
    booktitle = {IEEE International Workshop on Information Forensics and Security},
    month     = {December},
    year      = {2022}
}

Acknowledgements

This work was funded by the project FuzzyDoc supported by the CNRS Research Group of Information, Signal, Image and Vision (CNRS GdR-ISIS).

LIRIS logo CRIStAL logo GDR-ISIS logo