Hierarchical mixtures of experts and the EM algorithm Michael I. Jordan Department of Brain and Cognitive Sciences MIT Cambridge, MA 02139 Abstract We present a tree-structured architecture for supervised learning. The statistical model underlying the architecture is a hi- erarchical mixture model in which both, For mixture of experts architectures, the EM algorithm decouples the estimation process in a manner that fits well with the modular structure of the architecture. Moreover, Jordan and Jacobs (1994) observed a significant speedup over gradient techniques. The author looks at the performance of the hierarchical mixture of experts (HME) architecture trained with EM and.
6/2/1994 · The statistical model underlying the architecture is a hierarchical mixture model in which both the mixture coefficients and the mixture components are generalized linear models (GLIM’s). Learning is treated as a maximum likelihood problem in particular, we present an Expectation-Maximization (EM) algorithm for adjusting the parameters of the architecture.
6/2/1994 · The statistical model underlying the architecture is a hierarchical mixture model in which both the mixture coefficients and the mixture components are generalized linear models (GLIM’s). Learning is treated as a maximum likelihood problem in particular, we present an Expectation-Maximization (EM) algorithm for adjusting the parameters of the architecture .
6/2/1994 · the hierarchical mixture -of- experts architecture and present the likelihood function for the architecture. After describing a gradient descent algo- rithm, we develop a more powerful learning algorithm for the architec- ture that is a special case of the general Expectation-Maximization ( EM ) framework of Dempster et al. (1977). We also describe …
6/2/1994 · eterized surfaces that are adjusted by the learning algorithm. The hierarchical mixture-of-experts (HME) architecture is shown in Figure 1.’ The architecture is a tree in which the gating networks sit at the nonterminals of the tree. These networks receive the vector x as input and produce scalar outputs that are a partition of unity at each, Hierarchical mixtures of experts and the EM algorithm Abstract: We present a tree-structured architecture for supervised learning. The statistical model underlying the architecture is a hierarchical mixture model in which both the mixture coefficients and the mixture components are generalized linear models (GLIMs).
Jordan and Jacobs recently proposed an EM algorithm for the mixture of experts architecture of Jacobs, Jordan, Nowlan and Hinton (1991) and the hierarchical mixture of experts architecture of …
ARTICLE Communicated by Steven Nowlan Hierarchical Mixtures of Experts and the EM Algorithm Michael I. Jordan Department of Brain and Cognitive Sciences, Massachusetts lnstitute of Technology, Cambridge, MA 02139 USA Robert A. Jacobs Department of Psychology, University of Rochester, Rochester, NY 14627 USA We present a tree-structured architecture.
Mixtures of Experts and EM Algorithm 185 The gating networks are also generalized linear. Define intermediate variables El as follows: El = VTX