Table of Contents Correspondences relating to Jeff Hill's Paper "Chaocipher: Analysis and Models"
|
I
stayed up till 3 AM this morning reading the article -- I enjoyed every
minute of it. I do have some comments and questions which I
would like to list for you.
|
Hi
Moshe, << You write on page 6 "Thus the possibility that each sector contained 1 to 3 teeth is considered first and when this does not yield a convincing fit to the graph of Figure 1, ...". Solving the equations (1) p1+p2+p3=1; (2) p1 + 2p2 + 3p3 = 2 yields the equations (3) p2 = 1 - 2p3 and (4) p1 = p3. So, given a value for p3 we can compute p1 and p2. >> That brings back memories. <g> That is the method I used for analyzing sectors having 1 to 3 and 1 to 4 teeth. << I take it you weren't able to find a good fit? >> I pulled together my notes and MathCad worksheets to get a summary of my analysis and also learned that .mcd files can be converted to .htm files. I will send Figure02.htm and image files to you later today. It should answer several of your questions, but I don't believe it is suitable as it stands to be made available on TCCH. << MM1, with steps of five, does not explain why Exhibit 1 does not repeat machine states in under nine steps, but MM2 does. >> As you will see from probability tables given in Figure02.htm, MM1 and MM2 both predict non-zero probabilities for repeats at Steps 6 through 8 and MM1 does so for Step 5 as well. While both models predict peaks and troughs with reasonable accuracy, neither completely predicts every probability found in the graph of Exhibit 1 data. There are two reasons for the discrepancies: (1) The statistics were collected without taking into account possible M2 boundaries, which undoubtedly leads to distortions if M2 periods exist as suggested in my paper. This is not an oversight, merely an attempt to glean as much information as possible from Exhibit 1 without introducing assumptions that might prove to be wrong. In 1994, of course, when the original studies were made, I knew nothing of M2 periods. (2) Byrne's device may itself cause at least some of the distortion by having a keystream that is biased in some, as yet unexplained, way. << It is plausible that a repeat could have happened within nine steps but didn't statistically. On the other hand, how would you explain repeat machine steps in the other exhibits? >> I haven't given much thought to Exhibits 2 and 3, but as far as Exhibit 4 is concerned, I believe that Byrne used an M2 period 5 letters long instead of 55. Short M2 periods increase the number of spurious "hits" that are collected when "comparisons" are made. I have run C98U simulations which support this idea. << On page 8: "These probabilities arise when a trial has two outcomes, A and B, ...". Do you have any ideas on what these outcomes could be based on in Chaocipher? Comparing the plaintext letter with the ciphertext letter, i.e., some autokey condition? >> I believe that some form of iteration (output becomes input) was used by Byrne and it is probably based on the PT/CT or M1/M2 disk alphabets. However, I have no clear idea as to how this would work. << Byrne alludes to a simple system that even a ten-year-old could learn quickly. I've wondered if it's some grouping of the A-Z alphabet. For example, A-M could be outcome 'A', while N-Z would be outcome 'B'. If it were some autokey, that would explain the extreme sensitivity to error. >> I believe that except for manually dialing the M1 disk into position based on a Key letter presented by the KY disk, which any ten-10-year-old could do, the rest of the work is handled internally by the device. For a given M1 position, there could be electrical connections and stepping motors that determine the next Key. << On page 9, your discussion of possible cryptographs is most interesting. Regarding half-rotors, have you had an opportunity to see John Savard's attempt at a half-rotor? >> I have indeed seen John's device and I'm sure it is effective as an alternative to a rotor machine. The problem is that, as it stands, it cannot replicate the statistical signature that is required. I suspect that if you start with John's machine and reconfigure so that it replicates this signature, then you will end up with C98U. <<Although we don't know what the M2 periods are, a plethora of counter cases could be indicative. <snip> Still regarding M2 blocks: if two adjacent M2 blocks *can* cause identical adjacent machines states, wouldn't we expect more such adjacent machine states in Exhibit 1 at M2 boundaries? >> This is an area where more work needs to be done to clarify the issue. I intend to explore this area and hopefully shed some light on it in a week or two. << In table 7, N=14 shows that C98U had two (2) occurrences. How odd! Is this correct? >> Yes, this is correct. The disk drive for C98U is simply a long, multi-sector "belt" with steps of 1, 2, and 4 randomly distributed over the belt. Byrne's drive mechanism seems to work more efficiently than this. << Do any of the models you mention explain how, say, there are five consecutive repeat machine states which suddenly stops matching? What keying factor would have allowed five consecutive states to be identical, but then suddenly become non-identical? >> It comes down to the fact that there are only a few possibilities for movement at each step. The existence of repeated short sequences is evidence that the 1-2-4 stepping model for Byrne's device is correct, since for combinations of 4 steps (as in Rows 22-34 and 43-55) there are only 3 to the 4th power, or 81, possible stepping sequences. In terms of C98U, this means that at various places in the drive belt there are sequences of four sectors which have exactly the same sequence of teeth. When these sequences come into alignment at just the right place in Exhibit 1, the result is four consecutive Pt/Ct identities. The run is ended because the fifth sectors in each sequence do not have the same number of teeth. The downside of this is that autokey alone is unlikely to explain the repeats, since it seems likely that there must be a hidden component, or scrambler, which takes pt/ct combinations as a signal, but has its own internal cycle to further disrupt the key stream. |
Hi Jeff, First, many thanks for your e-mail with answers to my list of questions. The goal of this letter is to update you that, after investigating C98, I believe it leads to contradictions. The conclusion is that Byrne's mechanism is not C98. This now leaves C98U. Here's what I found with C98. (*) Let's select a specific block size, say, N=55. This gives us many blocks of 55 characters each. (*) For each block of N characters we list all occurrences of identical plaintext letters with a distance of 1, 2, 3, 4, 5, or 6. We note the two corresponding ciphertext letter, c1 and c2. (*) We can derive information not only from adjacent identical plaintext letters, but even identical plaintext letters up to 6 places apart. This is because the Markov Process of 1, 2, or 4 shifts leaves a discernible pattern. (*) If the two identical plaintext letters have a distance of 1, then c1 and c2 are either 1, 2, or 4 places apart on the periodic cipher slide. (*) If the two identical plaintext letters have a distance of 2, then c1 and c2 are either 2, 3, 4, 5, 6, or 8 places apart. (*) If the two identical plaintext letters have a distance of 3, then c1 and c2 are either 3, 4, 5, 6, 7, 8, 9, 10, or 12 places apart. (*) Etc. for distances of 4, 5, and 6 (*) As a rule: -------- ----------------------------------------------------------------- Distance Possible places apart on the cipher slide 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 -------- ----------------------------------------------------------------- 1 x x x 2 x x x x x x 3 x x x x x x x x x 4 x x x x x x x x x x x x 5 x x x x x x x x x x x x x x x 6 x x x x x x x x x x x x x x x x x x In other words, an interval of <i> means the possible distances of the two ciphertext letters are spaced: i, i+1, i+2, ... 4i where 4i-1 is not included in the sequence. It's an interesting characteristic that at distance I, given steps of 1, 2, and 4, there is an interval that cannot be reached using any combination of the steps. For example, at distance 2, using 1, 2, and 4, one cannot create a sum of steps totaling 7. This is true for all distances up to 6. (*) Assuming a block size of 55 and examining line 234 of Exhibit 1 we get the following: Line 234: VETHEIRLIVESTHATTHATNATIONMIGHTLIVEWITISALTOGETHERFITTI NWBJJNPNCWSOTTOIBBMNZTKFMBMTWPRHJZGWFARDJCDCEKLLPYDUHMX Collating identical plaintext letters within a range of 1-6 for line 234 gives us: Occurrence CT Letters ---------- ---------- PT 1st 2nd Distance 1st 2nd -- ---------- -------- ---------- E 2 5 3 W J E 5 11 6 J S I 6 9 3 N C T 13 16 3 T I T 13 17 4 T B H 14 18 4 T B A 15 19 4 O M T 16 17 1 I B T 16 20 4 I N T 17 20 3 B N T 17 23 6 B K A 19 22 3 M T T 20 23 3 N K N 21 26 5 Z B I 24 28 4 F T I 28 33 5 T J I 33 37 4 J F I 33 39 6 J R I 37 39 2 F R T 38 43 5 A D T 43 47 4 D L E 46 49 3 K P T 47 53 6 L H I 52 55 3 U X T 53 54 1 H M These results can be presented as a directed graph (see the attached GIF). The important thing is to check for contradictions. There are two general contradictions I can think of: Contr-1: "x--(1)-->y" and "x--(5/6)-->y" Contr-2: "x--(1)-->y" and "y--(1/2/3/4/5)-->x" "x--(2)-->y" and "y--(1/2/3/4)-->x" "x--(3)-->y" and "y--(1/2/3)-->x" "x--(4)-->y" and "y--(1/2/3)-->x" "x--(5)-->y" and "y--(1/2)-->x" "x--(6)-->y" and "y--(1)-->x" [Ed. 19 December 2009: Mike Cowan correctly pointed out that the last three lines are incorrect. The above should read as follows: Contr-1: "x--(1)-->y" and "x--(5/6)-->y" Contr-2: "x--(1)-->y" and "y--(1/2/3/4/5)-->x" "x--(2)-->y" and "y--(1/2/3/4)-->x" "x--(3)-->y" and "y--(1/2/3)-->x" "x--(4)-->y" and "y--(1/2)-->x" "x--(5)-->y" and "y--(1)-->x" Rerunning the tests, however, produced the identical results: all block sizes from 5 to 1010 show contradictions. For some block sizes there were fewer contradictions because of the corrections, but all block sizes are still ruled out.] Contradiction #1 has two assertions that do not overlap, while Contradiction #2 looks at the interval between two CT letters and their reversal. In both these cases the ct letters "x" and "y" are to be positioned in contradictory positions on the periodic cipher slide. In the line 234 results above, there are no contradictions, and the double occurrence of "T--(4)-->B" is reassuring. Block size 55, however, does have 14 contradictory blocks. Here are a few of them (see the attached ZIP for a full printout of block sizes 5 - 110): Line 9 Occurrence CT Letters ---------- ---------- PT 1st 2nd Distance 1st 2nd -- ---------- -------- ---------- L 2 3 1 U T O 5 6 1 F K Q 8 9 1 S G O 16 20 4 V Y O 37 40 3 O J T 39 45 6 E Y E 44 47 3 G S CONTRADICTION: G--(?)-->S and S--(?)-->G R 49 52 3 H T Line 124 Occurrence CT Letters ---------- ---------- PT 1st 2nd Distance 1st 2nd -- ---------- -------- ---------- E 1 4 3 V M H 5 8 3 X Q H 5 10 5 X H H 8 10 2 Q H H 10 15 5 H W T 14 17 3 L J A 16 19 3 Q S A 19 25 6 S W N 20 23 3 M Y R 26 30 4 O H E 27 31 4 D N S 34 37 3 W D O 36 41 5 N E S 37 42 5 D Z F 44 45 1 D Q E 53 54 1 D Z CONTRADICTION: D--(1)-->Z and D--(5/6)-->Z Line 192 Occurrence CT Letters ---------- ---------- PT 1st 2nd Distance 1st 2nd -- ---------- -------- ---------- E 3 8 5 R O H 7 9 2 Q Y H 7 12 5 Q Z E 8 14 6 O T H 9 12 3 Y Z S 13 16 3 C J E 14 20 6 T R A 15 21 6 T R A 21 23 2 R O A 21 27 6 R E R 22 24 2 D A A 23 27 4 O E A 23 29 6 O D S 26 32 6 T O A 27 29 2 E D T 33 34 1 R O CONTRADICTION: R--(1)-->O and R--(5/6)-->O R 38 44 6 T F T 43 47 4 B R O 48 52 4 Z U E 50 54 4 X P These contradictions are enough to demolish block size 55. I expanded my script (see attached Perl script [change ".txt" suffix to ".pl" after downloading]) to run through all block sizes from 5 to 110, producing a brief output noting which ones have contradictions. Here are the results: $>perl ProcessC98Blocks.pl -pt raw\exhibit-1.pt.txt -ct raw\exhibit-1.ct.txt -blocksize-from 5 -blocksize-to 110 -printsummary Session options =============== Plaintext file: raw\exhibit-1.pt.txt Ciphertext file: raw\exhibit-1.ct.txt Block size (from): 5 Block size (to): 110 Print Matches: false Print Summary: true Blocksize 5: Blocksize 6: Blocksize 7: Blocksize 8: Blocksize 9: Blocksize 10: Blocksize 11: Blocksize 12: Blocksize 13: Blocksize 14: Blocksize 15: Blocksize 16: Blocksize 17: Blocksize 18: Blocksize 19: Blocksize 20: CONTRADICTION ( 2 found) Blocksize 21: Blocksize 22: CONTRADICTION ( 2 found) Blocksize 23: CONTRADICTION ( 1 found) Blocksize 24: CONTRADICTION ( 1 found) Blocksize 25: CONTRADICTION ( 2 found) Blocksize 26: CONTRADICTION ( 1 found) Blocksize 27: CONTRADICTION ( 3 found) Blocksize 28: CONTRADICTION ( 2 found) Blocksize 29: CONTRADICTION ( 2 found) Blocksize 30: CONTRADICTION ( 3 found) Blocksize 31: CONTRADICTION ( 3 found) Blocksize 32: CONTRADICTION ( 2 found) Blocksize 33: CONTRADICTION ( 5 found) Blocksize 34: CONTRADICTION ( 5 found) Blocksize 35: CONTRADICTION ( 5 found) Blocksize 36: CONTRADICTION ( 7 found) Blocksize 37: CONTRADICTION ( 3 found) Blocksize 38: CONTRADICTION ( 5 found) Blocksize 39: CONTRADICTION ( 6 found) Blocksize 40: CONTRADICTION ( 5 found) Blocksize 41: CONTRADICTION ( 6 found) Blocksize 42: CONTRADICTION ( 6 found) Blocksize 43: CONTRADICTION ( 6 found) Blocksize 44: CONTRADICTION ( 7 found) Blocksize 45: CONTRADICTION ( 7 found) Blocksize 46: CONTRADICTION ( 8 found) Blocksize 47: CONTRADICTION ( 4 found) Blocksize 48: CONTRADICTION ( 5 found) Blocksize 49: CONTRADICTION ( 8 found) Blocksize 50: CONTRADICTION ( 8 found) Blocksize 51: CONTRADICTION ( 10 found) Blocksize 52: CONTRADICTION ( 6 found) Blocksize 53: CONTRADICTION ( 11 found) Blocksize 54: CONTRADICTION ( 11 found) Blocksize 55: CONTRADICTION ( 14 found) Blocksize 56: CONTRADICTION ( 11 found) Blocksize 57: CONTRADICTION ( 9 found) Blocksize 58: CONTRADICTION ( 11 found) Blocksize 59: CONTRADICTION ( 13 found) Blocksize 60: CONTRADICTION ( 12 found) Blocksize 61: CONTRADICTION ( 12 found) (1 case(s) of multiple contradictions) Blocksize 62: CONTRADICTION ( 11 found) Blocksize 63: CONTRADICTION ( 12 found) (1 case(s) of multiple contradictions) Blocksize 64: CONTRADICTION ( 13 found) Blocksize 65: CONTRADICTION ( 13 found) Blocksize 66: CONTRADICTION ( 12 found) Blocksize 67: CONTRADICTION ( 16 found) (1 case(s) of multiple contradictions) Blocksize 68: CONTRADICTION ( 13 found) Blocksize 69: CONTRADICTION ( 14 found) (1 case(s) of multiple contradictions) Blocksize 70: CONTRADICTION ( 20 found) Blocksize 71: CONTRADICTION ( 16 found) (1 case(s) of multiple contradictions) Blocksize 72: CONTRADICTION ( 16 found) (2 case(s) of multiple contradictions) Blocksize 73: CONTRADICTION ( 16 found) (1 case(s) of multiple contradictions) Blocksize 74: CONTRADICTION ( 14 found) (1 case(s) of multiple contradictions) Blocksize 75: CONTRADICTION ( 18 found) Blocksize 76: CONTRADICTION ( 18 found) (2 case(s) of multiple contradictions) Blocksize 77: CONTRADICTION ( 13 found) Blocksize 78: CONTRADICTION ( 15 found) Blocksize 79: CONTRADICTION ( 15 found) Blocksize 80: CONTRADICTION ( 17 found) (1 case(s) of multiple contradictions) Blocksize 81: CONTRADICTION ( 16 found) (1 case(s) of multiple contradictions) Blocksize 82: CONTRADICTION ( 20 found) Blocksize 83: CONTRADICTION ( 15 found) Blocksize 84: CONTRADICTION ( 20 found) (3 case(s) of multiple contradictions) Blocksize 85: CONTRADICTION ( 20 found) (4 case(s) of multiple contradictions) Blocksize 86: CONTRADICTION ( 27 found) (1 case(s) of multiple contradictions) Blocksize 87: CONTRADICTION ( 24 found) (3 case(s) of multiple contradictions) Blocksize 88: CONTRADICTION ( 23 found) (1 case(s) of multiple contradictions) Blocksize 89: CONTRADICTION ( 26 found) (4 case(s) of multiple contradictions) Blocksize 90: CONTRADICTION ( 26 found) (1 case(s) of multiple contradictions) Blocksize 91: CONTRADICTION ( 18 found) Blocksize 92: CONTRADICTION ( 26 found) (3 case(s) of multiple contradictions) Blocksize 93: CONTRADICTION ( 24 found) (1 case(s) of multiple contradictions) Blocksize 94: CONTRADICTION ( 23 found) (2 case(s) of multiple contradictions) Blocksize 95: CONTRADICTION ( 26 found) (3 case(s) of multiple contradictions) Blocksize 96: CONTRADICTION ( 20 found) (2 case(s) of multiple contradictions) Blocksize 97: CONTRADICTION ( 23 found) (1 case(s) of multiple contradictions) Blocksize 98: CONTRADICTION ( 27 found) (3 case(s) of multiple contradictions) Blocksize 99: CONTRADICTION ( 30 found) (2 case(s) of multiple contradictions) Blocksize 100: CONTRADICTION ( 28 found) (4 case(s) of multiple contradictions) Blocksize 101: CONTRADICTION ( 27 found) (4 case(s) of multiple contradictions) Blocksize 102: CONTRADICTION ( 29 found) (6 case(s) of multiple contradictions) Blocksize 103: CONTRADICTION ( 31 found) (4 case(s) of multiple contradictions) Blocksize 104: CONTRADICTION ( 34 found) (5 case(s) of multiple contradictions) Blocksize 105: CONTRADICTION ( 30 found) (4 case(s) of multiple contradictions) Blocksize 106: CONTRADICTION ( 30 found) (3 case(s) of multiple contradictions) Blocksize 107: CONTRADICTION ( 38 found) (8 case(s) of multiple contradictions) Blocksize 108: CONTRADICTION ( 30 found) (4 case(s) of multiple contradictions) Blocksize 109: CONTRADICTION ( 32 found) (6 case(s) of multiple contradictions) Blocksize 110: CONTRADICTION ( 37 found) (4 case(s) of multiple contradictions) Finished! Assuming the pt/ct in Exhibit 1 are correct, C98 is an option only for block sizes 5-19 and 21. Unless the block size is as small as these, my conclusion is that C98 as presented in your paper is not used in Exhibit1. This is encouraging because it allows one to concentrate now on C98A and C98U. Have you reached a similar conclusion re C98? Best regards, Moshe |
Hi
Moshe, Thank you for a very interesting analysis. I will need some time to read this carefully and consider the consequences. I'm not sure that I will have it completely evaluated before next weekend. However, if I understand the main thesis of your analysis, what you seem to have invalidated is the possibility that C98 uses simple probabilities of 1/2, 1/4, and 1/4. One reason that I included both MM1 and MM2 in my paper is that this shows that there are more ways than one to interpret the Exhibit 1 repeat frequencies. As far as which models in the C98-series are still viable, I am sure, as I wrote in my paper, that C94 is not. I am also sure that C98A is not viable based on analysis similar to that in Table 7 of my paper. The main feature that gives either C98 or C98U a chance is that both divide the text up into M2 periods. However, I doubt that C98 is a model of Byrne's device even though it comes closest to matching Langen's description. My doubts stem from the fact that the M1 disk must also serve as the KY disk which, as your analysis confirms, exposes the Key transitions to direct analysis. Best regards, Jeff |
Hi
Moshe, You have proven that C98 cannot be stepping with simple probabilities of 1/2, 1/4/ and 1/4, but the question is, which do we reject, C98, M2 periods of 55 letters, or the simple stepping probabilities? After reviewing the 14 pt/ct matches that your analysis located, I believe that these can be reconciled with C98 by using a Markov model that includes steps of 3, 5, 6, and 7 steps in addition to steps of 1, 2, and 4, thereby giving up the simple probabilities. However, I prefer to keep my options open on this question and not choose between C98 and simple probabilities until further research has clarified the issue. I find after reviewing my Cryptograph simulators that the 55-letter block will have to be given up. I wrote my paper C:A&M as a concept piece and was focused on what I consider the essential clues found in Silent Years and the Chaocipher Exhibits. By not going into greater detail about the construction of the C98 series of cryptographs and the statistics generated by them, I overlooked the fact that the M2 period has to be much longer than 55 letters in order for C98 and C98U to replicate the probabilities derived from Exhibit 1. My apologies for any confusion this may have caused. Best regards, Jeff |
Hi
Moshe,
I've found your website really excellent as a source of interesting and useful information on Chaocipher and it's good to have it all in one place. In trying to get thoughts in order on how to tackle this one, I've drafted a strategy on which I'd much appreciate comments, criticisms, additions...or whatever may occur to you and other folk. This is attached. It includes one or two aspects not previously reported, that may be of interest. For me the hunt is now on for a machine that churns out the right kind of ciphertext! As Jeff Hill has commented, once we've found that, cryptanalysis to find Byrne's keys should be possible. Look forward to hearing findings on a Wheatstone-type machine when you're ready. Best, Mike. |
Hi
Moshe,
I think I'm beginning to see why this is so difficult for us but why Friedman really wasn't interested. I can see that with 2 cipher wheels the machine will use [676] different alphabets during enciphering. The order it uses these alphabets will be set by some kind of mechanism that advances one of the disks by a different amount after each encipherment -- for example in a cycle of 1,2,1,4 as Jeff wrote about -- and the other disk at less frequent intervals. With so many alphabets there are less than 10 letters enciphered with any one alphabet in the 5500 cipher letters we've been given -- and we can't find out which letters belong to each alphabet because we don't know the advancement pattern. However if we stole the machine, as the French stole Enigma, we'd be laughing. Each of the [676] cipher disk positions could be tried in little time. In a computer driven world, you could think of having daily changes to the mixed alphabets and the advancements but I doubt this was a practical proposition for Byrne's machine in the 1920's. Friedman must have thought he had better devices available in-house, and taken the easy way out to put Byrne down gently. All that doesn't help us! so on with the merry activity. . . . Best, Mike. |