jcohen66 · November 22, 2024 04:34
diff --git a/ai_mem_model.txt b/ai_mem_model.txt
 Neural Network
 	- A machine learning process that teaches computers to process data in a way thats similar to the human brain
    - Type of deep learning that uses a layered structure of interconnected nodes (neurons) that resemble the brain
    - Types
    	- Convolutional Neural Networks (CNNs)
        	- Good at finding patterns in images to recognize objects, classes, and categories
            - Use principles of linear algebra (matrix multiplication) to find patterns
        - Feedforward Neural Networks
        	- One of the simplest types of neural networks
            - Info moves in one direction
            	- From input nodes
                - Through hidden nodes
                - To output nodes
                - Used for facial recognition
        - Recurrent Neural Networks (RNNs)
        	- Deep learning models are trained to process sequential data inputs
            	- Words
                - Sentences
                - Time-series data
            - Converts them to specific sequential data outputs
        - Perceptrons
        
        
 Perceptrons
 	- The building block of deep modern neural networks
    - Single-layer neural networks perform computations to detect features or business intelligence in input data
    - A one-layer neural network that can classify things into two parts
    - Inputs are weighted
    - Applied to a static value called 'Bias'
    - Passed thru an Activation Function that maps the output result to a value between 0 and 1 indicating (0 - Not Activated/1 - Activated)
    - Binary Step Function
    	- Outputs 1 for all positive inputs
        - Outputs 0 otherwise
    - Perceptron Learning Rule
    	- Power comes from the step by step process where the model learns weights and bias via the Perceptron Learning Rule
        - Train using many examples where you know the answer: showing images with breast cancer indicated
        	1. Get the prediction from the Perceptron
            2. Compute the error: ie the difference between the correct output and the prediction
            3. Multiply each input value by the error and add it to the weight
            4. Add the error to the bias
                
        - Work like artificial neurons to learn elements and process them
        - Weights
        	- Each input neuron is associated with a weight
            - Represents the strength of the connection between the input neuron and the output neuron
        - Bias 
           	- A bias term is added to the input layer to provide the perceptron with additional flexibility in modeling complex patterns in input data

 Expert Network
 	- A collection of neural networks that are trained to specialize in specific aspects of a problem or data set
    - Designed to be sparse
    - Only a few are active at any given time
    - Helps prevent the system from becoming overwhelmed

 Mixture Of Expert Model
 	- A machine learning model that combines multiple expert networks into a single predictive model
    - Each expert network is trained on a different subset of the data or features
    - Their predictions are combined to produce the final output
    - The model consists of several expert networks each with its own set of parameters
    - The expert networks are typically trained independently on different parts of the input data
    - A gating network (also known as a mixing network) learns to weight the predictions of the individual experts and combine them into a final prediction
    
 Advantages
 	- Improved Performance
    	- By combining multiple specialized experts, MoEs can potentially achieve better overall performance than single-network models
    - Interpretability
        - The expert networks can be interpreted as representing different aspects of the input data
    - Scalability
    	- MoEs can be scaled up by adding more expert networks without significantly increasing the traininng time or computational cost
        
 Applications
 	- Natural Language Processing (NLP)
    - Computer Vision
    - Speech Recognition
    - Recommender Systems
    - Medical Diagnosis
    
 Training
 	- The expert networks are trained on their respective subsets of the data
    - The gating network is trained to learn the optimal weights for combining the expert predictions
    - The entire MoE model is trained end-to-end to minimize the overall loss function
    
 Limitations
 	- Traning Data Requirements
    	- MoEs typically require larger datasets than single-network models due to the need to train multiple experts
    - Computational Costs
    	- The use of multiple expert networks can increase the computational cost of inference
    - Interpretability
    	- While the expert networks themselves can be interpretable, the gating network can introduce addtional complexity
        - Makes the overall model less interpretable
	Neural Network
	- A machine learning process that teaches computers to process data in a way thats similar to the human brain
	- Type of deep learning that uses a layered structure of interconnected nodes (neurons) that resemble the brain
	- Types
	- Convolutional Neural Networks (CNNs)
	- Good at finding patterns in images to recognize objects, classes, and categories
	- Use principles of linear algebra (matrix multiplication) to find patterns
	- Feedforward Neural Networks
	- One of the simplest types of neural networks
	- Info moves in one direction
	- From input nodes
	- Through hidden nodes
	- To output nodes
	- Used for facial recognition
	- Recurrent Neural Networks (RNNs)
	- Deep learning models are trained to process sequential data inputs
	- Words
	- Sentences
	- Time-series data
	- Converts them to specific sequential data outputs
	- Perceptrons


	Perceptrons
	- The building block of deep modern neural networks
	- Single-layer neural networks perform computations to detect features or business intelligence in input data
	- A one-layer neural network that can classify things into two parts
	- Inputs are weighted
	- Applied to a static value called 'Bias'
	- Passed thru an Activation Function that maps the output result to a value between 0 and 1 indicating (0 - Not Activated/1 - Activated)
	- Binary Step Function
	- Outputs 1 for all positive inputs
	- Outputs 0 otherwise
	- Perceptron Learning Rule
	- Power comes from the step by step process where the model learns weights and bias via the Perceptron Learning Rule
	- Train using many examples where you know the answer: showing images with breast cancer indicated
	1. Get the prediction from the Perceptron
	2. Compute the error: ie the difference between the correct output and the prediction
	3. Multiply each input value by the error and add it to the weight
	4. Add the error to the bias

	- Work like artificial neurons to learn elements and process them
	- Weights
	- Each input neuron is associated with a weight
	- Represents the strength of the connection between the input neuron and the output neuron
	- Bias
	- A bias term is added to the input layer to provide the perceptron with additional flexibility in modeling complex patterns in input data

	Expert Network
	- A collection of neural networks that are trained to specialize in specific aspects of a problem or data set
	- Designed to be sparse
	- Only a few are active at any given time
	- Helps prevent the system from becoming overwhelmed

	Mixture Of Expert Model
	- A machine learning model that combines multiple expert networks into a single predictive model
	- Each expert network is trained on a different subset of the data or features
	- Their predictions are combined to produce the final output
	- The model consists of several expert networks each with its own set of parameters
	- The expert networks are typically trained independently on different parts of the input data
	- A gating network (also known as a mixing network) learns to weight the predictions of the individual experts and combine them into a final prediction

	Advantages
	- Improved Performance
	- By combining multiple specialized experts, MoEs can potentially achieve better overall performance than single-network models
	- Interpretability
	- The expert networks can be interpreted as representing different aspects of the input data
	- Scalability
	- MoEs can be scaled up by adding more expert networks without significantly increasing the traininng time or computational cost

	Applications
	- Natural Language Processing (NLP)
	- Computer Vision
	- Speech Recognition
	- Recommender Systems
	- Medical Diagnosis

	Training
	- The expert networks are trained on their respective subsets of the data
	- The gating network is trained to learn the optimal weights for combining the expert predictions
	- The entire MoE model is trained end-to-end to minimize the overall loss function

	Limitations
	- Traning Data Requirements
	- MoEs typically require larger datasets than single-network models due to the need to train multiple experts
	- Computational Costs
	- The use of multiple expert networks can increase the computational cost of inference
	- Interpretability
	- While the expert networks themselves can be interpretable, the gating network can introduce addtional complexity
	- Makes the overall model less interpretable