Because we had already established the theory [29], we overview this research. By many experiences through discriminant analysis, we found four severe problems in addition to two facts [27,28]. If researchers discriminate real data, they can easily find these problems. However, nobody pointed out these problems. The reason is that even if the actual research subject is not a normal distribution, they are studying in a restricted world assuming a normal distribution. Because we solved four problems, we tried the Problem5 as applied research of the theory and solved this problem within 54 days in 2015. In this paper, we introduce the cancer gene diagnosis by microarrays. However, researchers must understand MP in addition to statistics, and
challenge the analysis of real data. We find five problems of discriminant theory as follows.
2.1.1 Problem1: Defect of NM Solved by Fact1
Let us consider two-class discrimination that consists of n cases (n1cases belong to class1, and n2cases belong to class2) and p-independent variablesx. The yi=1 for class1 and yi= −1 for class2 for eachxi. Equation (1) defines LDF such as f(x). The bis p-dimensional discriminant coefficients. The b0is the intercept.
f(x)=b∗x+b0 (1)
Most researchers erroneously believe the discriminant rule is as follows: If f(xi)
≥ 0,xi belongs to class1. If f(xi) < 0,xi belongs to class2. This understanding is completely wrong. The true discriminant rule is as follows:
(1) If yi*f(xi) > 0,xiis correctly classified to the class1 or class2.
(2) If yi*f(xi) < 0,xiis misclassified to the class1 or class2.
(3) If yi*f(xi)=0, we cannot decide which classxibelongs until now.
Although Problem1 was the unresolved problem, the notation of IP-OLDF found the relation of NMs and LDF coefficients (Fact1) and solved Problem1 [24]. There are many defects of NM as follows:
(1) Different LDFs such as Fisher’s LDF [12], logistic regression [9,11], regularized discriminant analysis (RDA) [14], four OLDFs and three SVMs give us different NMs.
(2) If we change the discriminant hyperplane, we obtain different NMs.
(3) If k cases are on the discriminant hyperplane (f (xi)=0), then the correct NM may increase to (k+NM).
Although our claim is logical and simple, no one has pointed out strangely.
On the other hand, Miyake and Shinmura developed a heuristic optimal LDF (OLDF) based on MNM criterion after the investigation of an error rate of LDF [19].
However, statistical journals rejected our paper because of MNM standard overfitted and overestimated the sample. Medical engineering journal accepted our paper after over four years [20]. MNM is the most important statistics for LSD-discrimination instead of NM.
2.1.2 Problem2: LSD-Discrimination
Vapnik [40] defines LSD-discrimination clearly. We modify his idea as follows:
(1) If yi*f(xi)≥1,xiis correctly classified the class1 or class2.
(2) There is no case in the range of−1 < f (xi) < 1.
We consider two support vector SV= ±1 instead of discriminant hyperplane f(x)
=0. This definition is better than the above true discriminant rule. We firstly defined IP-OLDF. However, it cannot find true optimal convex polyhedron (OCP) introduced in Fig.1if the data does not satisfy the general position. Thus, we developed a RIP that looks for the inner point directly. However, two referees in Japanese and European journals rejected our papers because LSD-discrimination was easy, and the purpose of the discriminant analysis was to discriminate the overlapping data. However, they are not logical because they did not know the statistics MNM. We can firstly define MNM=0 equal to LSD. They do not know MNM that clearly defines LSD (MNM
=0) and overlapping data (MNM > 0). Also, nobody pointed out that microarrays are LSD, as the importance of LSD discrimination studies is not known. This fact is the first reason why Problem5 was not solved because of the fundamental knowledge of discriminant theory has been ignored until now.
2.1.3 Problem3: The Defect of Generalized Inverse Matrix
The pass/fail judgment of the exam is the self-evident LSD-discrimination. How- ever, when Fisher’s LDF discriminated the research data of the university entrance examination center in Japan, the error rate was over 20% in mathematics [26]. Fur- thermore, when the passing score is 90% of the score total, the quadratic discriminant function (QDF) and RDA misclassify all the pass candidates. The reason is caused by results that all successful applicants answered some questions correctly. This result is caused by the defect of the generalized inverse matrix. If we add a random number to a constant value, we can quickly solve it. We wasted three research years in the wrong approach.
2.1.4 Problem4: Inferential Statistics
Fisher never formulated the standard error of error rate and discriminant coefficients.
Therefore, we developed the 100-fold cross validation for small sample (Method1) and the best model as model selection instead of Leave-one-out (LOO) method [18].
We calculate two minimum average of error rates M1 and M2 for the 100 train- ing samples and 100 validation samples. Because MNM decreases monotonously (Fact2), the error rate of the full model always becomes M1. Thus, M1 is not proper for model selection. M2 of RIP is better than other seven LDFs such as two other OLDFs, three SVMs, Fisher’s LDF and logistic regression using six different types of ordinary data. This truth indicates that MNM criterion is free from overfitting.
Although many medical researchers validated their results of Problem5 by LOO that was developed in the age of weak computing power, we consider the no need for validation of our results because two classes are separable completely. We explain this matter using RatioSV that is the other important statistics of LSD-discrimination later.
2.1.5 Problem5: Cancer Gene Analysis by Microarray
Golub et al. [15] published their paper at Science in 1999 and confessed that they started their research about 30 years ago. Thus, we judged that they started their study around 1970. Probably, Prof. Golub is the pioneer of Problem5. Because they devel- oped several unique methods instead of statistical methods, we think they decided most statistical methods were useless for Problem5, except for cluster analysis. They should have complained to engineering researchers, especially discriminant theorists including us, complaining of the consequences of their failure.
We consider other non-medical researchers studied this theme after around 1990 because the commercial microarray equipment was used after 1990. Probably, they did not know the NIH’s decision. They published many papers about feature selec- tion and filtering system to separate signal and noise from the high dimensional microarray. However, no researchers solved Problem5 entirely. In this research, we explain the reason why they could not succeed in the cancer gene diagnosis (Prob- lem6) in addition to cancer gene analysis (Problem5). No researchers found Fact3.
This theme is Problem5. Next, we developed LINGO Program3 to solve Method2 by three OLDFs and three SVMs [34]. RIP and Revised LP-OLDF can decompose microarray into many SMs (MNM=0) and the noise gene subspace (MNM > 0).
This truth is Fact4. Only H-SVM and two OLDFs can find Fact3. However, only RIP, and Revised LP-OLDF can find Fact4. By this breakthrough, because we are free from the curse of high dimensional data, we can propose the cancer gene diagnosis by JMP. We are not the specialists of this area. However, because the fact of LSD is a critical signal of cancer gene diagnosis, our results are precise for everyone. Only the cooperation of MP and statistics can succeed in the cancer gene analysis and diag- nosis. Notably, LINGO [22] and JMP [21] enhanced our intellectual productivities because we introduced LINDO, LINGO, SAS and JMP into Japan and published many books and papers.