Decryption of Caesar Cipher using highest letter frequency Approach

Posted by:

|

On:

|

Research shows that the letter “E” is the alphabet’s most commonly used vowel. “E” is shown to have a frequency of 11.51% in all words analyzed. To decrypt a Caesar cipher using the highest letter frequency approach, you can follow these steps:

  1. Determine the letter frequency distribution of the encrypted text.
  2. Identify the letter in the encrypted text that has the highest frequency. Let’s call it the “cipher letter”.
  3. Calculate the forward shift value from the cipher letter to the most frequently occurring letter in the English language, which is the letter ‘e’.
  4. Apply the shift to each letter in the encrypted text to obtain the decrypted text.
#!/bin/bash

shift_pattern_script="./shift_pattern"

if [ $# -ne 1 ]; then
    echo "incorrect filename"
    exit 1
fi

filename=$1
if [ ! -f "$filename" ]; then
    echo "File '$filename' not found."
    exit 1
fi

echo "$filename loaded"
declare -A letter_counts

while IFS= read -r line; do
    line=$(tr '[:upper:]' '[:lower:]' <<< "$line")  # Convert to lowercase
    for ((i=0; i<${#line}; i++)); do
        char="${line:i:1}"
        if [[ $char =~ [a-z] ]]; then
            letter_counts[$char]=$(( letter_counts[$char] + 1 ))
        fi
    done
done < "$filename"

max_count=0
max_letter=""

for letter in "${!letter_counts[@]}"; do
    count=${letter_counts[$letter]}
    if (( count > max_count )); then
        max_count=$count
        max_letter=$letter
    fi
done

echo "Letter with the highest count: $max_letter"
echo "Count: $max_count"

e_ascii=$(( $(printf '%d' "'e") - 97 ))  # Convert 'e' to ASCII value (0-25)
highest_letter_ascii=$(( $(printf '%d' "'$max_letter") - 97 ))  # Convert letter to ASCII value (0-25)

shift=$(( (e_ascii - highest_letter_ascii + 26) % 26 ))

echo "Shift from the highest letter to 'e': $shift"

pattern_U=$(bash "$shift_pattern_script" -k "$shift" -u)
pattern_L=$(bash "$shift_pattern_script" -k "$shift")
while IFS= read -r line; do
	decoded_line1=$(echo "$line" | tr '[A-Z]' "$pattern_U")   
	decoded_line2=$(echo "$decoded_line1" | tr '[a-z]' "$pattern_L") 
done < "$filename"

echo "Decoded words:"
echo "$decoded_line2"

Detail explanation:

  1. shift_pattern_script="./shift_pattern": This line sets the variable shift_pattern_script to the path of a script called “shift_pattern”. It’s assumed that this script exists and is located in the same directory as the current script.
  2. Input Validation:
    • if [ $# -ne 1 ]; then ...: This condition checks if the number of command-line arguments provided to the script is not equal to 1. If that’s the case, it means an incorrect number of arguments is provided, and the script prints an error message and exits.
    • filename=$1: This line assigns the first command-line argument to the variable filename.
    • if [ ! -f "$filename" ]; then ...: This condition checks if the file specified by filename does not exist. If it doesn’t exist, the script prints an error message and exits.
  3. Letter Frequency Count:
    • declare -A letter_counts: This line declares an associative array called letter_counts to store the count of each letter.
    • while IFS= read -r line; do ... done < "$filename": This loop reads each line from the input file specified by filename.
    • line=$(tr '[:upper:]' '[:lower:]' <<< "$line"): This line converts the line to lowercase using the tr command. It ensures that all letters are in lowercase for accurate letter counting.
    • for ((i=0; i<${#line}; i++)); do ... done: This loop iterates over each character in the line.
    • char="${line:i:1}": This line assigns the current character to the variable char.
    • if [[ $char =~ [a-z] ]]; then ... fi: This condition checks if the character is a lowercase letter. If it is, it increments the count of that letter in the letter_counts array.
  4. Finding the Letter with the Highest Count:
    • for letter in "${!letter_counts[@]}"; do ... done: This loop iterates over each letter in the letter_counts array.
    • count=${letter_counts[$letter]}: This line assigns the count of the current letter to the variable count.
    • if (( count > max_count )); then ... fi: This condition checks if the count is greater than the previous maximum count. If it is, it updates the max_count and max_letter variables with the current count and letter, respectively.
  5. Shift Calculation:
    • e_ascii=$(( $(printf '%d' "'e") - 97 )): This line calculates the ASCII value of ‘e’ (97) and subtracts 97 to get the corresponding value from 0 to 25. It assigns the result to the variable e_ascii.
    • highest_letter_ascii=$(( $(printf '%d' "'$max_letter") - 97 )): This line calculates the ASCII value of the letter with the highest count and subtracts 97 to get the corresponding value from 0 to 25. It assigns the result to the variable highest_letter_ascii.
    • shift=$(( (e_ascii - highest_letter_ascii + 26) % 26 )): This line calculates the shift value needed to decrypt the text from the highest letter to ‘e’. It adds 26 before taking the modulus to ensure a positive shift value within the range of 0-25.
  6. Shift Pattern Calculation:
    • pattern_U=$(bash "$shift_pattern_script" -k "$shift" -u): This line executes the “shift_pattern” script with the shift value as an argument and captures the output in the variable pattern_U. It uses the -u option to generate the uppercase shift pattern.
    • pattern_L=$(bash "$shift_pattern_script" -k "$shift"): This line executes the “shift_pattern” script with the shift value as an argument and captures the output in the variable pattern_L. It generates the lowercase shift pattern by default.
  7. Decoding the Text
    • while IFS= read -r line; do ... done < "$filename": This loop reads each line from the input file specified by filename.
    • decoded_line1=$(echo "$line" | tr '[A-Z]' "$pattern_U"): This line uses the tr command to perform character replacement on the current line. It replaces uppercase letters with the corresponding letters from the uppercase shift pattern (pattern_U). The result is stored in the variable decoded_line1.
    • decoded_line2=$(echo "$decoded_line1" | tr '[a-z]' "$pattern_L"): This line uses the tr command again to replace lowercase letters in decoded_line1 with the corresponding letters from the lowercase shift pattern (pattern_L). The final result is stored in the variable decoded_line2.
  8. Printing the Decoded Words:
    • echo "Decoded words:": This line simply prints the message “Decoded words:” to indicate that the following lines will display the decoded text.
    • echo "$decoded_line2": This line prints the decoded text, which is stored in the decoded_line2 variable.

cipher.txt

Zjpluapzaz bzl vizlychapvuz myvt aol nyvbuk, hpy, huk zwhjl, hsvun dpao jvtwbaly tvklsz, av tvupavy huk zabkf whza, wylzlua, huk mbabyl jspthal johunl. Jspthal khah yljvykz wyvcpkl lcpklujl vm jspthal johunl rlf pukpjhavyz, zbjo hz nsvihs shuk huk vjlhu altwlyhabyl pujylhzlz; ypzpun zlh slclsz; pjl svzz ha Lhyao’z wvslz huk pu tvbuahpu nshjplyz; mylxblujf huk zlclypaf johunlz pu leayltl dlhaoly zbjo hz obyypjhulz, olhadhclz, dpskmpylz, kyvbnoaz, msvvkz, huk wyljpwpahapvu; huk jsvbk huk clnlahapvu jvcly johunlz.

Result:

./freq_compare cipher.txt
dictionary loaded
cipher.txt loaded
Letter with the highest count: l
Count: 53
Shift from the highest letter to ‘e’: 19
Decoded words:
Scientists use observations from the ground, air, and space, along with computer models, to monitor and study past, present, and future climate change. Climate data records provide evidence of climate change key indicators, such as global land and ocean temperature increases; rising sea levels; ice loss at Earth’s poles and in mountain glaciers; frequency and severity changes in extreme weather such as hurricanes, heatwaves, wildfires, droughts, floods, and precipitation; and cloud and vegetation cover changes.

Posted by

in