The resulting pseudorandom confusion sequence can be combined with data as in the usual stream cipher.. Part of the result, often just a single byte, is used to cipher data, and the resu
Trang 1c.d.f
In statistics, cumulative distribution function A function which gives the probability of obtaining a particular value or lower
CFB
CFB or Ciphertext FeedBack is an operating mode for a block cipher
CFB is closely related to OFB, and is intended to provide some of the
characteristics of a stream cipher from a block cipher CFB generally forms
an autokey stream cipher CFB is a way of using a block cipher to form a random number generator The resulting pseudorandom confusion sequence can be combined with data as in the usual stream cipher
CFB assumes a shift register of the block cipher block size An IV or initial value first fills the register, and then is ciphered Part of the result, often just
a single byte, is used to cipher data, and the resulting ciphertext is also
shifted into the register The new register value is ciphered, producing
another confusion value for use in stream ciphering
One disadvantage of this, of course, is the need for a full block-wide
ciphering operation, typically for each data byte ciphered The advantage is the ability to cipher individual characters, instead of requiring accumulation into a block before processing
Chain
An operation repeated in a sequence, such that each result depends upon the previous result, or an initial value One example is the CBC operating mode
Chaos
The unexpected ability to find numerical relationships in physical processes formerly considered random Typically these take the form of iterative
applications of fairly simple computations In a chaotic system, even tiny changes in state eventually lead to major changes in state; this is called
"sensitive dependence on initial conditions." It has been argued that every good computational random number generator is "chaotic" in this sense
In physics, the "state" of an analog physical system cannot be fully
measured, which always leaves some remaining uncertainty to be magnified
on subsequent steps And, in many cases, a physical system may be slightly affected by thermal noise and thus continue to accumulate new information into its "state."
Trang 2In a computer, the state of the digital system is explicit and complete, and there is no uncertainty No noise is accumulated All operations are
completely deterministic This means that, in a computer, even a "chaotic" computation is completely predictable and repeatable
Chi-Square
In statistics, a goodness of fit test used for comparing two distributions Mainly used on nominal and ordinal measurements Also see: Kolmogorov-Smirnov
In the usual case, many independent samples are counted by category or separated into value-range "bins." The reference distribution gives us the the number of values to expect in each bin Then we compute a X2 test statistic related to the difference between the distributions:
X2 = SUM( SQR(Observed[i] - Expected[i]) / Expected[i] )
("SQR" is the squaring function, and we require that each expectation not be zero.) Then we use a tabulation of chi-square statistic values to look up the probability that a particular X2 value or lower (in the c.d.f.) would occur by random sampling if both distributions were the same The statistic also
depends upon the "degrees of freedom," which is almost always one less than the final number of bins See the chi-square section of the Ciphers By Ritter / JavaScript computation pages
The c.d.f percentage for a particular chi-square value is the area of the
statistic distribution to the left of the statistic value; this is the probability of
obtaining that statistic value or less by random selection when testing two
distributions which are exactly the same Repeated trials which randomly sample two identical distributions should produce about the same number of
X2 values in each quarter of the distribution (0% to 25%, 25% to 50%, 50%
to 75%, and 75% to 100%) So if we repeatedly find only very high
percentage values, we can assume that we are probing different distributions And even a single very high percentage value would be a matter of some interest
Any statistic probability can be expressed either as the proportion of the area
to the left of the statistic value (this is the "cumulative distribution function"
or c.d.f.), or as the area to the right of the value (this is the "upper tail")
Using the upper tail representation for the X2 distribution can make sense because the usual chi-squared test is a "one tail" test where the decision is
Trang 3always made on the upper tail But the "upper tail" has an opposite "sense"
to the c.d.f., where higher statistic values always produce higher percentage values Personally, I find it helpful to describe all statistics by their c.d.f., thus avoiding the use of a wrong "polarity" when interpreting any particular statistic While it is easy enough to convert from the c.d.f to the complement
or vise versa (just subtract from 1.0), we can base our arguments on either form, since the statistical implications are the same
It is often unnecessary to use a statistical test if we just want to know
whether a function is producing something like the expected distribution:
We can look at the binned values and generally get a good idea about
whether the distributions change in similar ways at similar places A good rule-of-thumb is to expect chi-square totals similar to the number of bins, but distinctly different distributions often produce huge totals far beyond the values in any table, and computing an exact probability for such cases is simply irrelevant On the other hand, it can be very useful to perform 20 to
40 independent experiments to look for a reasonable statistic distribution, rather than simply making a "yes / no" decision on the basis of what might turn out to be a rather unusual result
Since we are accumulating discrete bin-counts, any fractional expectation
will always differ from any actual count For example, suppose we expect an even distribution, but have many bins and so only accumulate enough
samples to observe about 1 count for every 2 bins In this situation, the
absolute best sample we could hope to see would be something like
(0,1,0,1,0,1, ), which would represent an even, balanced distribution over the range But even in this best possible case we would still be off by half a count in each and every bin, so the chi-square result would not properly characterize this best possible sequence Accordingly, we need to
accumulate enough samples so that the quantization which occurs in binning does not appreciably affect the accuracy of the result Normally I try to
expect at least 10 counts in each bin
But when we have a reference distribution that trails off toward zero,
inevitably there will be some bins with few counts Taking more samples
will just expand the range of bins, some of which will be lightly filled in any case We can avoid quantization error by summing both the observations and expectations from multiple bins, until we get a reasonable expectation value (again, I like to see 10 counts or more) In this way, the "tails" of the
distribution can be more properly (and legitimately) characterized
Trang 4Cipher
In general, a key-selected secret transformation between plaintext and
ciphertext Specifically, a secrecy mechanism or process which operates on individual characters or bits independent of semantic content As opposed to
a secret code, which generally operates on words, phrases or sentences, each
of which may carry some amount of complete meaning Also see:
cryptography, block cipher, stream cipher, a cipher taxonomy, and
substitution
A good cipher can transform secret information into a multitude of different
intermediate forms, each of which represents the original information Any
of these intermediate forms or ciphertexts can be produced by ciphering the information under a particular key value The intent is that the original
information only be exposed by one of the many possible keyed
interpretations of that ciphertext Yet the correct interpretation is available merely by deciphering under the appropriate key
A cipher appears to reduce the protection of secret information to
enciphering under some key, and then keeping that key secret This is a great reduction of effort and potential exposure, and is much like keeping your valuables in your house, and then locking the door when you leave But there are also similar limitations and potential problems
With a good cipher, the resulting ciphertext can be stored or transmitted otherwise exposed without also exposing the secret information hidden inside This means that ciphertext can be stored in, or transmitted through, systems which have no secrecy protection For transmitted information, this also means that the cipher itself must be distributed in multiple places, so in general the cipher cannot be assumed to be secret With a good cipher, only the deciphering key need be kept secret
A Cipher Taxonomy
For the analysis of cipher operation it is useful to collect ciphers into groups
based on their functioning (or intended functioning) The goal is to group ciphers which are essentially similar, so that as we gain an understanding of
one cipher, we can apply that understanding to others in the same group We
thus classify not by the components which make up the cipher, but instead
on the "black-box" operation of the cipher itself
Trang 5We seek to hide distinctions of size, because operation is independent of
size, and because size effects are usually straightforward We thus classify serious block ciphers as keyed simple substitution, just like newspaper
amusement ciphers, despite their obvious differences in strength and
construction This allows us to compare the results from an ideal tiny cipher
to those from a large cipher construction; the grouping thus can provide
benchmark characteristics for measuring large cipher constructions
We could of course treat each cipher as an entity unto itself, or relate ciphers
by their dates of discovery, the tree of developments which produced them,
or by known strength But each of these criteria is more or less limited to
telling us "this cipher is what it is." We already know that What we want to
know is what other ciphers function in a similar way, and then whatever is
known about those ciphers In this way, every cipher need not be an island
unto itself, but instead can be judged and compared in a related community
of similar techniques
Our primary distinction is between ciphers which handle all the data at once (block ciphers), and those which handle some, then some more, then some more (stream ciphers) We thus see the usual repeated use of a block cipher
as a stream meta-cipher which has the block cipher as a component It is also
possible for a stream cipher to be re-keyed or re-originate frequently, and so appear to operate on "blocks." Such a cipher, however, would not have the overall diffusion we normally associate with a block cipher, and so might usefully be regarded as a stream meta-cipher with a stream cipher
component
The goal is not to give each cipher a label, but instead to seek insight Each cipher in a particular general class carries with it the consequences of that class And because these groupings ignore size, we are free to generalize from the small to the large and so predict effects which may be unnoticed in full-size ciphers
A BLOCK CIPHER
A block cipher requires the accumulation of some amount of data or
multiple data elements for ciphering to complete (Sometimes stream ciphers accumulate data for convenience, as in cylinder ciphers, which nevertheless logically cipher each character independently.)