Cody

Problem 44383. Code breaker, Part III: Operation Xiangliu

You have been tasked with decoding several batches of coded messages that have been intercepted.

Based on previous intelligence that has been gathered, you can be confident that the messages were all encoded using a simple Caesar cipher (a type of substitution cipher), an example of which is the ROT13 cipher (discussed in Cody Challenge Problem 78). As in the Cody Challenge Problem, uppercase and lowercase letters are handled independently of one another, and all other characters (e.g. punctuation, numbers, accented letters) are unchanged. Unlike the Cody Challenge Problem, here the number of letters to shift is not known in advance, and will vary between (not within) batches — also, here you need to decode, not encode.

These latest activities that you are investigating have been nicknamed 'Operation Xiangliu' by your own organisation. A few decoding options are at your organisation's disposal:

  1. Test the candidate decodings against all words in a large dictionary — this could work, but it is very slow.
  2. Test the candidate decodings for the appearance of a name or phrase that is certain to appear regularly (e.g. "Operation Phoenix") — unfortunately as yet there is not enough knowledge of the activities to be sure of a suitable name or phrase to use.
  3. Test the candidate decodings against all words in a short list comprising, say, the hundred most frequently used English words.

The third option will be faster than the first option, and more reliable than the second option, so this is the approach you will be taking.

You have been careful to avoid using 'lemmatised' lists (which contain e.g. the verb "to be", rather than the various inflected forms such as "am", "is", "are") like those based on the OEC or COCA, and after setting aside another list you finally choose the list based on the BNC as the most reliable, and will use the first 100 words on that list. This list will be available for you to access as an input variable, bncWordlist, to your function. Note: (i) some entries on the list are morphemes (e.g. " n't " and " 's ") rather than words; (ii) some entries appear more than once (representing different grammatical word classes). Of course, in the original messages any capitalisation might be used.

You have been instructed that your 'certitude' (degree of confidence) in the decoding must be reported for each batch, and shall depend proportionally upon the sum of the characters for the words/morphemes in the decoded message that match words/morphemes in bncWordlist. Being able to match words/morphemes accounting for three-tenths of the characters (excluding spaces) shall correspond to 100% certitude; matching three-twentieths would be 50% certitude, and so on. Certitude shall be reported as a percentage, rounded up to the nearest integer, not greater than 100. You need to maximise your certitude for each batch by appropriate choice of the shifting parameter. If there are multiple shifts of equal certitude, report both options on different rows (arranged in order of ascending shift parameter).

Your task is to crack the codes and report back in a structure array:   (1) the shifting parameter that had been used in the encoding [as uint8] (usually scalar, but may be column vector);   (2) the decoded messages [as a cell array (containing character arrays)] (usually an array with a single row, but occasionally with multiple rows);   (3) your 'certitude' in the decoding [as uint8] (always scalar). The name of the structure array shall be s, with respective fields shift, message, and certitude.

EXAMPLE 1

Suppose the batch contained two encoded messages — "Vomftt qvstvfe, pqfo op eppst." and "Ffmt dbo ljmm, opu pomz xpvoe." (provided as character arrays within a cell array) — and a (right-shifting) ROT1 cipher had been applied. In that case A→B, B→C, ..., Y→Z, and Z→A; similarly, a→b, b→c, ..., y→z, and z→a. Thus the original messages would have been: "Unless pursued, open no doors." and "Eels can kill, not only wound." . Twelve of the 51 characters have been matched: "no", "can", "not", and "only".

The correct answer would therefore comprise:

s.shift = uint8(1)  
s.message = {'Unless pursued, open no doors.', 'Eels can kill, not only wound.'}
s.certitude = uint8(79)

EXAMPLE 2

Suppose the batch contained one encoded message — "Oa oqvvq'u cnycau dggp: "Ctu itcvkc ctvku"." (provided as a character array within a cell array) — and a (right-shifting) ROT2 cipher had been applied. In that case A→C, B→D, ..., Y→A, and Z→B; similarly, a→c, b→d, ..., y→a, and z→b. Thus the original message would have been: "My motto's always been: "Ars gratia artis"." . Eight of the 37 characters have been matched: "My", " 's ", and "been". Note carefully that: " 's " should only be matched once; "to" (in motto), "be" (in been), "at" (in gratia), "is" (in artis) and "a" (passim) should not be matched at all; and " ' " will only ever be used as an apostrophe (never as a quotation mark).

The correct answer would therefore comprise:

s.shift = uint8(2)  
s.message = {'My motto''s always been: "Ars gratia artis".'}
s.certitude = uint8(73)

If the batch of messages cannot be decoded when following the above assumptions, then return the original encoded message(s) unchanged, with a shift of zero and a certitude of zero.

Note: Many Cody solutions are hard to read and not necessarily computationally efficient. To direct attention toward efficient runtime speed of execution, timings are measured in the Test Suite, and reported back (if the submission is runnable). Although the timings are not perfectly reproducible, they do provide an indication of computational resource demand. (Try resubmitting on another day if your code runs slowly, in case it is caused by a server issue.)

To provide some extra motivation for you to get your code to run efficiently, it will fail the Test Suite if it is deemed "too slow".

----------

Previous problem: Operation Orthos. Next problem: TBA.

Solution Stats

15.79% Correct | 84.21% Incorrect
Last Solution submitted on May 07, 2019

Problem Comments

Solution Comments