Last active
          November 22, 2024 04:34 
        
      - 
      
- 
        Save jcohen66/b5ab1cb98fdabded1e1f910c30a80b9f to your computer and use it in GitHub Desktop. 
    AI Mixture Of Expert Model (MEM) #ai #mem #mixture #of #expert #model #network #rnn #cnn #perceptron #convolutional #recurrent #weight #bias #neural #feedforward
  
        
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
  | Neural Network | |
| - A machine learning process that teaches computers to process data in a way thats similar to the human brain | |
| - Type of deep learning that uses a layered structure of interconnected nodes (neurons) that resemble the brain | |
| - Types | |
| - Convolutional Neural Networks (CNNs) | |
| - Good at finding patterns in images to recognize objects, classes, and categories | |
| - Use principles of linear algebra (matrix multiplication) to find patterns | |
| - Feedforward Neural Networks | |
| - One of the simplest types of neural networks | |
| - Info moves in one direction | |
| - From input nodes | |
| - Through hidden nodes | |
| - To output nodes | |
| - Used for facial recognition | |
| - Recurrent Neural Networks (RNNs) | |
| - Deep learning models are trained to process sequential data inputs | |
| - Words | |
| - Sentences | |
| - Time-series data | |
| - Converts them to specific sequential data outputs | |
| - Perceptrons | |
| Perceptrons | |
| - The building block of deep modern neural networks | |
| - Single-layer neural networks perform computations to detect features or business intelligence in input data | |
| - A one-layer neural network that can classify things into two parts | |
| - Inputs are weighted | |
| - Applied to a static value called 'Bias' | |
| - Passed thru an Activation Function that maps the output result to a value between 0 and 1 indicating (0 - Not Activated/1 - Activated) | |
| - Binary Step Function | |
| - Outputs 1 for all positive inputs | |
| - Outputs 0 otherwise | |
| - Perceptron Learning Rule | |
| - Power comes from the step by step process where the model learns weights and bias via the Perceptron Learning Rule | |
| - Train using many examples where you know the answer: showing images with breast cancer indicated | |
| 1. Get the prediction from the Perceptron | |
| 2. Compute the error: ie the difference between the correct output and the prediction | |
| 3. Multiply each input value by the error and add it to the weight | |
| 4. Add the error to the bias | |
| - Work like artificial neurons to learn elements and process them | |
| - Weights | |
| - Each input neuron is associated with a weight | |
| - Represents the strength of the connection between the input neuron and the output neuron | |
| - Bias | |
| - A bias term is added to the input layer to provide the perceptron with additional flexibility in modeling complex patterns in input data | |
| Expert Network | |
| - A collection of neural networks that are trained to specialize in specific aspects of a problem or data set | |
| - Designed to be sparse | |
| - Only a few are active at any given time | |
| - Helps prevent the system from becoming overwhelmed | |
| Mixture Of Expert Model | |
| - A machine learning model that combines multiple expert networks into a single predictive model | |
| - Each expert network is trained on a different subset of the data or features | |
| - Their predictions are combined to produce the final output | |
| - The model consists of several expert networks each with its own set of parameters | |
| - The expert networks are typically trained independently on different parts of the input data | |
| - A gating network (also known as a mixing network) learns to weight the predictions of the individual experts and combine them into a final prediction | |
| Advantages | |
| - Improved Performance | |
| - By combining multiple specialized experts, MoEs can potentially achieve better overall performance than single-network models | |
| - Interpretability | |
| - The expert networks can be interpreted as representing different aspects of the input data | |
| - Scalability | |
| - MoEs can be scaled up by adding more expert networks without significantly increasing the traininng time or computational cost | |
| Applications | |
| - Natural Language Processing (NLP) | |
| - Computer Vision | |
| - Speech Recognition | |
| - Recommender Systems | |
| - Medical Diagnosis | |
| Training | |
| - The expert networks are trained on their respective subsets of the data | |
| - The gating network is trained to learn the optimal weights for combining the expert predictions | |
| - The entire MoE model is trained end-to-end to minimize the overall loss function | |
| Limitations | |
| - Traning Data Requirements | |
| - MoEs typically require larger datasets than single-network models due to the need to train multiple experts | |
| - Computational Costs | |
| - The use of multiple expert networks can increase the computational cost of inference | |
| - Interpretability | |
| - While the expert networks themselves can be interpretable, the gating network can introduce addtional complexity | |
| - Makes the overall model less interpretable | |
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment