Research shows that the letter “E” is the alphabet’s most commonly used vowel. “E” is shown to have a frequency of 11.51% in all words analyzed. To decrypt a Caesar cipher using the highest letter frequency approach, you can follow these steps:
- Determine the letter frequency distribution of the encrypted text.
- Identify the letter in the encrypted text that has the highest frequency. Let’s call it the “cipher letter”.
- Calculate the forward shift value from the cipher letter to the most frequently occurring letter in the English language, which is the letter ‘e’.
- Apply the shift to each letter in the encrypted text to obtain the decrypted text.
#!/bin/bash
shift_pattern_script="./shift_pattern"
if [ $# -ne 1 ]; then
echo "incorrect filename"
exit 1
fi
filename=$1
if [ ! -f "$filename" ]; then
echo "File '$filename' not found."
exit 1
fi
echo "$filename loaded"
declare -A letter_counts
while IFS= read -r line; do
line=$(tr '[:upper:]' '[:lower:]' <<< "$line") # Convert to lowercase
for ((i=0; i<${#line}; i++)); do
char="${line:i:1}"
if [[ $char =~ [a-z] ]]; then
letter_counts[$char]=$(( letter_counts[$char] + 1 ))
fi
done
done < "$filename"
max_count=0
max_letter=""
for letter in "${!letter_counts[@]}"; do
count=${letter_counts[$letter]}
if (( count > max_count )); then
max_count=$count
max_letter=$letter
fi
done
echo "Letter with the highest count: $max_letter"
echo "Count: $max_count"
e_ascii=$(( $(printf '%d' "'e") - 97 )) # Convert 'e' to ASCII value (0-25)
highest_letter_ascii=$(( $(printf '%d' "'$max_letter") - 97 )) # Convert letter to ASCII value (0-25)
shift=$(( (e_ascii - highest_letter_ascii + 26) % 26 ))
echo "Shift from the highest letter to 'e': $shift"
pattern_U=$(bash "$shift_pattern_script" -k "$shift" -u)
pattern_L=$(bash "$shift_pattern_script" -k "$shift")
while IFS= read -r line; do
decoded_line1=$(echo "$line" | tr '[A-Z]' "$pattern_U")
decoded_line2=$(echo "$decoded_line1" | tr '[a-z]' "$pattern_L")
done < "$filename"
echo "Decoded words:"
echo "$decoded_line2"
Detail explanation:
shift_pattern_script="./shift_pattern"
: This line sets the variableshift_pattern_script
to the path of a script called “shift_pattern”. It’s assumed that this script exists and is located in the same directory as the current script.- Input Validation:
if [ $# -ne 1 ]; then ...
: This condition checks if the number of command-line arguments provided to the script is not equal to 1. If that’s the case, it means an incorrect number of arguments is provided, and the script prints an error message and exits.filename=$1
: This line assigns the first command-line argument to the variablefilename
.if [ ! -f "$filename" ]; then ...
: This condition checks if the file specified byfilename
does not exist. If it doesn’t exist, the script prints an error message and exits.
- Letter Frequency Count:
declare -A letter_counts
: This line declares an associative array calledletter_counts
to store the count of each letter.while IFS= read -r line; do ... done < "$filename"
: This loop reads each line from the input file specified byfilename
.line=$(tr '[:upper:]' '[:lower:]' <<< "$line")
: This line converts the line to lowercase using thetr
command. It ensures that all letters are in lowercase for accurate letter counting.for ((i=0; i<${#line}; i++)); do ... done
: This loop iterates over each character in the line.char="${line:i:1}"
: This line assigns the current character to the variablechar
.if [[ $char =~ [a-z] ]]; then ... fi
: This condition checks if the character is a lowercase letter. If it is, it increments the count of that letter in theletter_counts
array.
- Finding the Letter with the Highest Count:
for letter in "${!letter_counts[@]}"; do ... done
: This loop iterates over each letter in theletter_counts
array.count=${letter_counts[$letter]}
: This line assigns the count of the current letter to the variablecount
.if (( count > max_count )); then ... fi
: This condition checks if the count is greater than the previous maximum count. If it is, it updates themax_count
andmax_letter
variables with the current count and letter, respectively.
- Shift Calculation:
e_ascii=$(( $(printf '%d' "'e") - 97 ))
: This line calculates the ASCII value of ‘e’ (97) and subtracts 97 to get the corresponding value from 0 to 25. It assigns the result to the variablee_ascii
.highest_letter_ascii=$(( $(printf '%d' "'$max_letter") - 97 ))
: This line calculates the ASCII value of the letter with the highest count and subtracts 97 to get the corresponding value from 0 to 25. It assigns the result to the variablehighest_letter_ascii
.shift=$(( (e_ascii - highest_letter_ascii + 26) % 26 ))
: This line calculates the shift value needed to decrypt the text from the highest letter to ‘e’. It adds 26 before taking the modulus to ensure a positive shift value within the range of 0-25.
- Shift Pattern Calculation:
pattern_U=$(bash "$shift_pattern_script" -k "$shift" -u)
: This line executes the “shift_pattern” script with the shift value as an argument and captures the output in the variablepattern_U
. It uses the-u
option to generate the uppercase shift pattern.pattern_L=$(bash "$shift_pattern_script" -k "$shift")
: This line executes the “shift_pattern” script with the shift value as an argument and captures the output in the variablepattern_L
. It generates the lowercase shift pattern by default.
- Decoding the Text
while IFS= read -r line; do ... done < "$filename"
: This loop reads each line from the input file specified byfilename
.decoded_line1=$(echo "$line" | tr '[A-Z]' "$pattern_U")
: This line uses thetr
command to perform character replacement on the current line. It replaces uppercase letters with the corresponding letters from the uppercase shift pattern (pattern_U
). The result is stored in the variabledecoded_line1
.decoded_line2=$(echo "$decoded_line1" | tr '[a-z]' "$pattern_L")
: This line uses thetr
command again to replace lowercase letters indecoded_line1
with the corresponding letters from the lowercase shift pattern (pattern_L
). The final result is stored in the variabledecoded_line2
.
- Printing the Decoded Words:
echo "Decoded words:"
: This line simply prints the message “Decoded words:” to indicate that the following lines will display the decoded text.echo "$decoded_line2"
: This line prints the decoded text, which is stored in thedecoded_line2
variable.
cipher.txt
Zjpluapzaz bzl vizlychapvuz myvt aol nyvbuk, hpy, huk zwhjl, hsvun dpao jvtwbaly tvklsz, av tvupavy huk zabkf whza, wylzlua, huk mbabyl jspthal johunl. Jspthal khah yljvykz wyvcpkl lcpklujl vm jspthal johunl rlf pukpjhavyz, zbjo hz nsvihs shuk huk vjlhu altwlyhabyl pujylhzlz; ypzpun zlh slclsz; pjl svzz ha Lhyao’z wvslz huk pu tvbuahpu nshjplyz; mylxblujf huk zlclypaf johunlz pu leayltl dlhaoly zbjo hz obyypjhulz, olhadhclz, dpskmpylz, kyvbnoaz, msvvkz, huk wyljpwpahapvu; huk jsvbk huk clnlahapvu jvcly johunlz.
Result:
./freq_compare cipher.txt
dictionary loaded
cipher.txt loaded
Letter with the highest count: l
Count: 53
Shift from the highest letter to ‘e’: 19
Decoded words:
Scientists use observations from the ground, air, and space, along with computer models, to monitor and study past, present, and future climate change. Climate data records provide evidence of climate change key indicators, such as global land and ocean temperature increases; rising sea levels; ice loss at Earth’s poles and in mountain glaciers; frequency and severity changes in extreme weather such as hurricanes, heatwaves, wildfires, droughts, floods, and precipitation; and cloud and vegetation cover changes.