Decryption of Caesar Cipher using Dictionary-Based Approach

By evaluating all 25 decrypted messages, we can identify the correct one by examining its resemblance to English text rather than mere random gibberish. Unix/Linux systems come with a dictionary file. I am using Ubuntu and the dictionary locates at /usr/share/dict/cracklib-small, which contains an extensive collection (54763 words) of English words. To facilitate the process, a bash script named “dictionary_compare” is devised to decrypt all 25 shift values and subsequently compare the decoded words against the dictionary. The incorrect decryptions will yield few or no matches with the words in the dictionary, whereas the correct decryption will exhibit a majority, if not all, of its words being valid English words. The script tallies the number of words found in the dictionary for each decryption attempt and selects the most favorable translation based on the highest count.

#!/bin/bash
load_dictionary() {
    dictionary_file="/usr/share/dict/cracklib-small"
    if [[ -f "$dictionary_file" ]]; then
        cat "$dictionary_file"
    else
        echo "No dictionary found"
        exit 1
    fi
}

dictionary=$(load_dictionary)
echo "dictionary loaded"

shift_pattern_script="./shift_pattern"

max_word_count=0
best_shift=0
best_matched_words=""
best_matched_words_with_punctuation=""

if [ $# -ne 1 ]; then
    echo "incorrect filename"
    exit 1
fi

filename=$1
if [ ! -f "$filename" ]; then
    echo "File '$filename' not found."
    exit 1
fi

echo "$filename loaded"

for ((shift=1; shift<=25; shift++)); do
    matched_words=0
    echo "Shift $shift"
    pattern_U=$(bash "$shift_pattern_script" -k "$shift" -u)
    pattern_L=$(bash "$shift_pattern_script" -k "$shift")
    while IFS= read -r line; do
        decoded_line1=$(echo "$line" | tr '[A-Z]' "$pattern_U")   
        decoded_line2=$(echo "$decoded_line1" | tr '[a-z]' "$pattern_L") 
        decoded_line3=$(echo "$decoded_line2" | tr -d '[:punct:]')
        for word in $decoded_line3; do
           if grep -q -w "$word" <<< "$dictionary"; then
              ((matched_words++))             
           fi
        done	
    done < "$filename"
    echo "matched_words found: $matched_words" 	
    if ((matched_words > max_word_count)); then
        max_word_count=$matched_words
        best_shift=$shift
        best_matched_words=$decoded_line3
		best_matched_words_with_punctuation=$decoded_line2
    fi
done
echo "Best shift: $best_shift"
echo "Matched words: $max_word_count"
echo "Decoded words:"
echo "$best_matched_words_with_punctuation"

Detail explanation:

The load_dictionary function is defined to load a dictionary file. In this script, the dictionary file path is set to /usr/share/dict/cracklib-small. If the file exists, the function reads its contents using the cat command. If the file doesn’t exist, it prints an error message and exits the script with a non-zero exit code.
The dictionary variable is assigned the contents of the loaded dictionary file by calling the load_dictionary function. This variable will be used to check if words in the decoded lines exist in the dictionary.
The script checks if exactly one command-line argument (the filename) is provided. If the argument count is not equal to 1, it prints an error message and exits with a non-zero exit code.
The filename provided as the command-line argument is assigned to the filename variable. The script then checks if the file exists. If the file doesn’t exist, it prints an error message and exits with a non-zero exit code.
A loop is initiated to iterate through possible shift values from 1 to 25. Each iteration represents a different shift value.
Inside the loop, the matched_words variable is initialized to 0, which will keep track of the number of words matched for the current shift value.
The script executes the shift_pattern_script (another script in earlier blog) with the -k option set to the current shift value ($shift) to generate uppercase and lowercase shift patterns. The patterns are stored in the pattern_U and pattern_L variables, respectively.
The script starts reading the lines from the file specified by $filename in a while loop. For each line, it performs the following steps:
- It decodes the line by replacing uppercase letters with the corresponding letters from the pattern_U variable using the tr command, resulting in the decoded_line1.
- It further decodes decoded_line1 by replacing lowercase letters with the corresponding letters from the pattern_L variable using the tr command, resulting in the decoded_line2.
- It removes punctuation in decoded_line3.
- It iterates over each word in decoded_line3 using a for loop.
- For each word, it checks if the word exists in the loaded dictionary by using grep with the -q (quiet) and -w (match whole word) options. If the word is found in the dictionary, it increments the matched_words counter.
After processing all the lines in the file for the current shift value, the script prints the number of matched words found for that shift value.
It compares the matched_words count with the previous maximum count (max_word_count). If the current count is greater, it updates max_word_count, best_shift, best_matched_words and best_matched_words_with_punctuation with the current values.
The loop continues until all shift values from 1 to 25 are processed.
Finally, the script prints the best shift value (best_shift), the number of matched words (max_word_count), and the decoded words (best_matched_words_with_punctuation).

cipher.txt

Zjpluapzaz bzl vizlychapvuz myvt aol nyvbuk, hpy, huk zwhjl, hsvun dpao jvtwbaly tvklsz, av tvupavy huk zabkf whza, wylzlua, huk mbabyl jspthal johunl. Jspthal khah yljvykz wyvcpkl lcpklujl vm jspthal johunl rlf pukpjhavyz, zbjo hz nsvihs shuk huk vjlhu altwlyhabyl pujylhzlz; ypzpun zlh slclsz; pjl svzz ha Lhyao’z wvslz huk pu tvbuahpu nshjplyz; mylxblujf huk zlclypaf johunlz pu leayltl dlhaoly zbjo hz obyypjhulz, olhadhclz, dpskmpylz, kyvbnoaz, msvvkz, huk wyljpwpahapvu; huk jsvbk huk clnlahapvu jvcly johunlz.

Result:
./dictionary_compare cipher.txt
dictionary loaded
cipher.txt loaded
Shift 1
matched_words found: 4
Shift 2
matched_words found: 0
Shift 3
matched_words found: 0
Shift 4
matched_words found: 0
Shift 5
matched_words found: 4
Shift 6
matched_words found: 2
Shift 7
matched_words found: 2
Shift 8
matched_words found: 5
Shift 9
matched_words found: 0
Shift 10
matched_words found: 0
Shift 11
matched_words found: 0
Shift 12
matched_words found: 1
Shift 13
matched_words found: 1
Shift 14
matched_words found: 0
Shift 15
matched_words found: 0
Shift 16
matched_words found: 0
Shift 17
matched_words found: 1
Shift 18
matched_words found: 1
Shift 19
matched_words found: 69
Shift 20
matched_words found: 3
Shift 21
matched_words found: 1
Shift 22
matched_words found: 1
Shift 23
matched_words found: 3
Shift 24
matched_words found: 0
Shift 25
matched_words found: 0
Best shift: 19
Matched words: 69
Decoded words:
Scientists use observations from the ground, air, and space, along with computer models, to monitor and study past, present, and future climate change. Climate data records provide evidence of climate change key indicators, such as global land and ocean temperature increases; rising sea levels; ice loss at Earth’s poles and in mountain glaciers; frequency and severity changes in extreme weather such as hurricanes, heatwaves, wildfires, droughts, floods, and precipitation; and cloud and vegetation cover changes.

Decryption of Caesar Cipher using Dictionary-Based Approach

Related posts: