Machine intelligence is the study of applying machine learning, artificial intelligence to Big Data Analytics. While Big Data Analytics is a broader term encompassing data, storage and computation, machine intelligence is specialized to intelligent pro- grams that can be built for big data. Machine intelligence technologies are being used for a variety of problem types like classification and clustering for natural language processing, modeling support vector machines and neural networks. Inno- vations in machine intelligence span technologies, industries, enterprises and soci- eties. Machine intelligence technologies under development include natural language processing, deep learning, predictive modeling, signal processing, computer vision, speech recognition, robotics and augmented reality. Machine intelligence is being applied to industries such as agriculture, education, finance, legal, manufacturing,
medical, media, automotive and retail. Machine intelligence is developing enterprises in sales, security, recruitment, marketing and fraud detection.
Computational intelligence refers to the specific type of algorithms that are useful for machine intelligence. Computational intelligence algorithms are derived from ideas in computational sciences and engineering. Computational sciences involve the study of computer systems built from the knowledge that is a mixture of pro- gramming, mathematics and domains of application. Whereas the focus of computer sciences is theory and mathematics, the focus of computational sciences is applica- tion and engineering. The data mining tasks in computational sciences depend on the domain of application and computational complexity of the application.
Commonly used computational intelligence techniques construct intelligent com- puting architectures. These architectures commonly involve heuristics built over complex networks, fuzzy logic, probabilistic inference and evolutionary comput- ing. Furthermore, the various computing architectures are integrated in a structured manner to form hybrid architectures. Hybrid architecture models the best result of each data mining task by assuming that a combination of tasks is better than any single task. In the literature, hybrid architectures are also referred to by the umbrella term soft computing architectures. Here, soft computing refers to the frameworks of knowledge representation and decision making that explore imprecise strategies that become computationally tractable on real-word problems. The guiding principle of soft computing and computational intelligence is to provide human-like expertise at a low solution cost. Human-like expertise is provided by validating the computational intelligence against domain knowledge. In modern science, most of the application domains reason with time variable data that is uncertain, complex and noisy. A two- step problem-solving strategy is employed to deal with such data. In the first step, a stand-alone intelligent system is built to rapidly experiment with possible develop- ments into a prototype system. On successful completion of the first step, a second step builds a hybrid architecture that is more complex yet stable. A more complex hybrid model could use an evolutionary algorithm to train a neural network which in turn acts as preprocessing system to a fuzzy system that produces the final output.
Modeling performance in any one data mining task of the hybrid architecture has a corresponding incremental or detrimental effect on the final performance of the entire solution.
Many machine learning problems reduce to optimization or mathematical pro- gramming problems in operations research. In optimization, the problem is to find the best set of parameters of the model that solve the problem over a solution search space. The optimization algorithm is designed and implemented with respect to stan- dard objective function formulations. In the area of optimization, objective functions typically model the information loss in the problem. Hence, the objective functions are also called loss functions or cost functions. Typically, the parameters found by solving for the objective function are also normalized by regularization functions and relaxation functions that model prior information about the parameters with respect to records in the data. The parametric normalization is supposed to prevent the dom- inance of one or few parameters over the rest of the parameters in the modeling process. Such a parametric dominance is referred to as the bias-variance trade-off
in the machine learning literature. Furthermore, constrained optimization imposes constraints on searching for the solutions of the objective functions.
There are many types of standard objective functions used in constrained opti- mization. Typical formulations include objectives and constraints that are one or more of linear, quadratic, nonlinear, convex, goal, geometric, integer, fractional, fuzzy, semi-definite, semi-infinite and stochastic [2]. Convex programming opti- mizes objective functions that are solved for the same solution by local search and global search. In real-world problems, convex programming is preferred as a solu- tion technique because it is computationally tractable on a wide variety of problems.
Novel methods of machine learning algorithms that use convex optimization include kernel-based ranking, graph-based clustering and structured learning. If the problem cannot be directly mapped onto a convex objective function, the machine learning algorithms attempt to decompose the problem into subproblems that can be mapped onto a convex objective function. Most of the convex objectives can be solved easily by using standard optimization algorithms. In any case, error is introduced into the solutions found by these machine learning algorithms. The error is either due to a finite amount of data that is available for algorithm or because the optimal solution underlying error distribution is unknown in advance. Another source of error is the approximations made by the search technique used in the machine learning algo- rithm. Since search techniques cannot find all possible solutions in finite resources of the computer, they reduce the problem space of search by finding only important solutions. The importance of a solution is then formulated as a convex optimization problem by correct choice of objective functions. In real-world problems, machine learning algorithms require optimization algorithms that have properties of gener- alization, scalability, performance, robustness, approximation, theoretically known complexity and simple but accurate implementation [2].
A class of computational intelligence algorithms that are commonly used by both researchers and engineers for global optimization in machine learning are the evolutionary computing algorithms. Another class of common computational intelli- gence algorithms is multiobjective or multicriterion optimization algorithms. In such algorithms, evolution is defined as a two-step process of random variation and selec- tion in time. The process of evolution is modeled through evolutionary algorithms.
Evolutionary algorithms are useful method for optimization when direct analytical discovery is not possible. Multiobjective optimization considers the more compli- cated scenario of many competing objectives [3]. A classical optimization algorithm applied to the single objective optimization problems can find only a single solution in one simulation run. By contrast, evolutionary algorithms deal with a population of solutions in one simulation run. In evolutionary algorithms that are applied to multiobjective optimization, we first define conditions for a solution to become an inferior or dominated solution and then we define the conditions for a set of solutions to become the Pareto-optimal set. In the context of classical optimization, weighted sum approach, perturbation method, goal programming, Tchybeshev method, min—
max method are popular approaches to multiobjective optimization. These algorithms not only find a single optimum solution in one simulation but also have difficulties with nonlinear non-convex search spaces.
Evolutionary computing simulates the mechanisms of evolutionary learning and complex adaptation found in natural processes. To be computationally tractable, evolutionary learning systems ought to efficiently search the space of solutions and exploit domain knowledge to produce results that are intelligible. The techniques used for searching a solution include genetic mutation and recombination, sym- bolic reasoning and structured communication through language, morphology and physiology determining organism’s behavior in nervous systems. To model complex adaptation, objects in the environment are grouped into set of properties or con- cepts. Concepts are combined to form propositions, and propositions are combined to form reasons and expressions. Syntactic, semantic and preference properties of logical expressions can then reduce the candidate solutions that are searched with respect to the training examples. Such data properties are represented in computa- tional intelligence algorithms by defining the architecture of neural networks and rules in symbolic representations [4].
As discussed in [5], evolutionary computing algorithms can be categorized into genetic algorithms and programming, evolutionary strategies and programming, dif- ferential evolution and estimation of distribution, classifier systems and swarm intel- ligence. Each of these algorithms has initialization steps, operational steps and search steps that are iteratively evaluated until a termination criterion is satisfied. Machine learning techniques that have been used in evolutionary computing include case- based reasoning and reinforcement learning, clustering analysis and competitive learning, matrix decomposition methods and regression, artificial neural networks, support vector machines and probabilistic graphical models. As discussed in [5], these techniques are used in various evolutionary steps of the evolutionary com- puting algorithm, namely population initialization, fitness evaluation and selection, population reproduction and variation, algorithm adaptation and local search. They could use algorithm design techniques such as sequential approach, greedy approach, local search, linear programming, and dynamic programming and randomized algo- rithms.
Exact optimization algorithms are guaranteed to find an optimal solution. To maintain convergence guarantees, exact algorithms must search the entire solution space. Although many optimization techniques efficiently eliminate large number of solutions at each iteration, lot of real-world problems cannot be tackled by exact optimization algorithms. Because of the impracticality of exact search, heuristic search algorithms have been developed for global optimization in machine learning.
Algorithms for global optimization that are based on heuristics fall under the cate- gory of hyper-heuristics and meta-heuristics. These optimization algorithms are yet another set of approaches for solving computational search problems. They search for a solution by designing and tuning heuristic methods. Hyper-heuristics indi- rectly finds solution for the optimization problem by working on a search space of heuristics. By comparison, meta-heuristics directly works on a solution space that is same as search space on which the objective function is defined. Difficulties in using heuristics for search arise out of the parameter or algorithm selections involved in the solution.
Hyper-heuristics attempts to find correct method of heuristics that could solve a given problem. Hyper-heuristic approaches can be categorized into low-level heuris- tics, high-level heuristics, greedy heuristics and meta-heuristics. Genetic program- ming is the most widely used hyper-heuristic technique. To build low-level heuristics, one can use mixed integer programming paradigms like branch-and-bound, branch- and-cut local search heuristics, graph coloring heuristics like largest-weighted degree and saturation degree. By contrast, high-level heuristics has an evaluation mecha- nism that allocates high priority to most relevant features in the data. High-level heuristics defined on a search space of heuristics builds on optimization algorithms like iterative local search, steepest descent and Tabu Search.
By contrast, meta-heuristics process guides and modifies the operations of problem-specific heuristics while avoiding the disadvantages of iterative improve- ment whose local search cannot escape local optima. The goal of meta-heuristic search is to find near global solutions to the optimization problem. It is implemented by efficient restarting of the local search and introducing a bias into the local search.
As discussed in [6], the bias can be of various types such as descent bias (based on objective function), memory bias (based on previously made decisions) and expe- rience bias (based on prior performance). While meta-heuristics ranges from local search to machine learning processes, the underlying heuristic can be a local search.
Moreover, meta-heuristics may use domain knowledge to control the search pro- cedure. Meta-heuristics like genetic algorithms and ant algorithms models nature.
Algorithms like Iterated Local Search and Tabu Search work with single solutions in the local search procedure. Some meta-heuristics algorithms such as Guided Local Search change the search criteria by utilizing information collected during the search.
Some meta-heuristics builds algorithms with memory of the past searching. Memory- less algorithms assume Markov condition on the underlying probability distribution to determine next step in the search algorithm. Iterative improvement, Simulated Annealing, Tabu Search, Guided Local Search, Variable Neighborhood Search and Iterated Local Search are some of the most common meta-heuristics [6].
Neural networks offer a structured technique for algebraically combining the input features into progressively more abstract features that eventually lead to the expected output variables. The different ways to arrive at abstract features are determined by the various possible neural network architectures. Thus, the abstract features in neural network allow machine learning with generalization capabilities. Neural networks store knowledge in the weights connecting neurons. The weights are determined by training a learning algorithm on given data. By contrast, fuzzy inference systems offer a framework for approximate reasoning. Fuzzy algorithms modeled data as a set of rules and interpolate the reasoning output as a response to new inputs. Understanding fuzzy rules in terms of domain knowledge is simple. However, only complex training algorithms can learn the fuzzy rules. Probabilistic reasoning in Bayesian belief net- works updates previous estimates of the outcome by conditioning with new evidence.
Evolutionary computing is an optimization method that finds candidate solutions by iterative, generative and adaptive processes built on the known samples. By contrast, classical optimization techniques like gradient descent, conjugate gradient and quasi- Newton techniques use gradients and Hessians of the objective function to search
for optimum parameters. Depending on nonlinearity of the objective function, these techniques have been adapted into algorithms training neural networks such as feed forward, back propagation, Hebbian and recurrent learning.