codice-fuzzcale demo 2
So I’ve pretty much wrapped up this personal project.
It fuzzes all possible Italian fiscal codes (codici fiscali) based on incomplete information. You can enter the information interactively.
In the example below, we don’t know the person’s age, but enter a maximum age of 20 and minimum age of 10:
All 3,378 possible birth dates are iterated over and a valid fiscal code is generated for each one.
You can also enter the data via flags and all unknown values will be fuzzed automatically. Here, we know everything except the first name (characters 4-6 in the codice fiscale):
The playback on this is slow the first time, so once it has loaded, replay the demo to see how quickly the values are generated.
Evaluation & Improvements
The project does pretty much everything I initially specced. The only feature I wanted to implement but didn’t was a ‘best-first’ heuristic when fuzzing unknown names.
Currently, the algorithm brute forces every letter combination alphabetically (AAA
, AAB
, etc.). It would be much better to start with likely combinations based on popular names, but in the end I decided not to implement it as it’s a lot of work for the sake of three letters, and I feel like name/surname fuzzing is a rare use case.
I found Go to be a pleasure to work with. Initially it felt restrictive as it’s statically typed and I’m more used to the free-wheeling jazz improv of Python, but it results in merciless efficiency and blistering speed, so I’m all for it. Plus, you can launch a super-cheap thread with the simple command go
. No messing around, all the complexity abstracted. Lovely.
The most tricky part of the project was dynamically fuzzing the different subsections of the fiscal code, based on which values were known and unknown (which, of course, can’t be predicted).
In the end, I used a recursive function generateCF()
which takes a binary list as input e.g. [0,1,0,1,0]
, with each character representing a different value to fuzz or not. The function calls itself and iterates over the list, either launching the fuzzer for that subsection, or passing the user-provided value to the next iteration. At around 100 lines in length, it’s far from the most elegant solution, but it works well enough.
In the future, I might add the heuristics and improve efficiency, but for the meantime, I’m done with this project.
If you have any suggestions, comments or feedback, I would love to hear it. You can find the project on SourceHut or GitHub and my contact details are on the About page.