Skip to content

Instantly share code, notes, and snippets.

@jcohen66
Last active November 22, 2024 04:34
Show Gist options
  • Save jcohen66/b5ab1cb98fdabded1e1f910c30a80b9f to your computer and use it in GitHub Desktop.
Save jcohen66/b5ab1cb98fdabded1e1f910c30a80b9f to your computer and use it in GitHub Desktop.
AI Mixture Of Expert Model (MEM) #ai #mem #mixture #of #expert #model #network #rnn #cnn #perceptron #convolutional #recurrent #weight #bias #neural #feedforward
Neural Network
- A machine learning process that teaches computers to process data in a way thats similar to the human brain
- Type of deep learning that uses a layered structure of interconnected nodes (neurons) that resemble the brain
- Types
- Convolutional Neural Networks (CNNs)
- Good at finding patterns in images to recognize objects, classes, and categories
- Use principles of linear algebra (matrix multiplication) to find patterns
- Feedforward Neural Networks
- One of the simplest types of neural networks
- Info moves in one direction
- From input nodes
- Through hidden nodes
- To output nodes
- Used for facial recognition
- Recurrent Neural Networks (RNNs)
- Deep learning models are trained to process sequential data inputs
- Words
- Sentences
- Time-series data
- Converts them to specific sequential data outputs
- Perceptrons
Perceptrons
- The building block of deep modern neural networks
- Single-layer neural networks perform computations to detect features or business intelligence in input data
- A one-layer neural network that can classify things into two parts
- Inputs are weighted
- Applied to a static value called 'Bias'
- Passed thru an Activation Function that maps the output result to a value between 0 and 1 indicating (0 - Not Activated/1 - Activated)
- Binary Step Function
- Outputs 1 for all positive inputs
- Outputs 0 otherwise
- Perceptron Learning Rule
- Power comes from the step by step process where the model learns weights and bias via the Perceptron Learning Rule
- Train using many examples where you know the answer: showing images with breast cancer indicated
1. Get the prediction from the Perceptron
2. Compute the error: ie the difference between the correct output and the prediction
3. Multiply each input value by the error and add it to the weight
4. Add the error to the bias
- Work like artificial neurons to learn elements and process them
- Weights
- Each input neuron is associated with a weight
- Represents the strength of the connection between the input neuron and the output neuron
- Bias
- A bias term is added to the input layer to provide the perceptron with additional flexibility in modeling complex patterns in input data
Expert Network
- A collection of neural networks that are trained to specialize in specific aspects of a problem or data set
- Designed to be sparse
- Only a few are active at any given time
- Helps prevent the system from becoming overwhelmed
Mixture Of Expert Model
- A machine learning model that combines multiple expert networks into a single predictive model
- Each expert network is trained on a different subset of the data or features
- Their predictions are combined to produce the final output
- The model consists of several expert networks each with its own set of parameters
- The expert networks are typically trained independently on different parts of the input data
- A gating network (also known as a mixing network) learns to weight the predictions of the individual experts and combine them into a final prediction
Advantages
- Improved Performance
- By combining multiple specialized experts, MoEs can potentially achieve better overall performance than single-network models
- Interpretability
- The expert networks can be interpreted as representing different aspects of the input data
- Scalability
- MoEs can be scaled up by adding more expert networks without significantly increasing the traininng time or computational cost
Applications
- Natural Language Processing (NLP)
- Computer Vision
- Speech Recognition
- Recommender Systems
- Medical Diagnosis
Training
- The expert networks are trained on their respective subsets of the data
- The gating network is trained to learn the optimal weights for combining the expert predictions
- The entire MoE model is trained end-to-end to minimize the overall loss function
Limitations
- Traning Data Requirements
- MoEs typically require larger datasets than single-network models due to the need to train multiple experts
- Computational Costs
- The use of multiple expert networks can increase the computational cost of inference
- Interpretability
- While the expert networks themselves can be interpretable, the gating network can introduce addtional complexity
- Makes the overall model less interpretable
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment