Skip to content

Instantly share code, notes, and snippets.

@sudoevans
Created December 15, 2023 08:39
Show Gist options
  • Save sudoevans/9d927efcec7def824ab4a0dbfbdc0e95 to your computer and use it in GitHub Desktop.
Save sudoevans/9d927efcec7def824ab4a0dbfbdc0e95 to your computer and use it in GitHub Desktop.

Revisions

  1. sudoevans created this gist Dec 15, 2023.
    83 changes: 83 additions & 0 deletions machine-learning-notes.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,83 @@
    ## This is my summary of machine learning fundamental concepts.
    1. **Supervised Learning:**

    - Goal: To train a model to predict output based on labeled training data.
    - Algorithms:
    - Linear Regression: Used for predicting continuous outcomes.
    - Logistic Regression: Used for binary classification problems.
    - Decision Trees: Simplifies data into rules to make predictions.
    - Support Vector Machines (SVMs): Finds the best decision boundary to separate data.
    - k-Nearest Neighbors (k-NN): Predicts based on similarity to nearby data points.


    2. **Unsupervised Learning:**

    - Goal: To find structure or patterns in unlabeled data.
    - Algorithms:
    - Clustering: Groups similar data points into clusters.
    - Principal Component Analysis (PCA): Reduces data dimensionality while preserving key features.
    - Anomaly Detection: Identifies unusual data points that deviate significantly from the norm.


    3. **Reinforcement Learning:**

    - Goal: To train an agent to make decisions in an environment to maximize rewards.
    - Algorithms:
    - Q-Learning: Updates value estimations based on past decisions and rewards.
    - SARSA (State-Action-Reward-State-Action): Similar to Q-Learning but uses only one action per state.
    - Deep Q-Networks (DQN): Uses neural networks for value estimation in complex environments.


    4. **Deep Learning:**

    - Goal: To create neural networks that learn from large amounts of data.
    - Architectures:
    - Convolutional Neural Networks (CNNs): Effective for image and speech recognition.
    - Recurrent Neural Networks (RNNs): Used for sequential data like text and time series.
    - Transformers: Recent advances for language processing and machine translation.


    5. **Evaluation Metrics:**

    - Accuracy: Measures the percentage of correct predictions.
    - Precision: Measures the proportion of true positives among all predicted positives.
    - Recall: Measures the proportion of true positives among all actual positives.
    - F1-score: Combines precision and recall into a single metric.
    - Mean Squared Error (MSE): Measures the average squared difference between predicted and actual values.


    6. **Bias and Variance:**

    - Bias: The systematic error introduced by a model due to assumptions or simplifications.
    - Variance: The random error introduced by a model due to the randomness in the data.
    - Bias-Variance Tradeoff: Balancing bias and variance to optimize model performance.


    7. **Overfitting and Underfitting:**

    - Overfitting: When a model performs well on training data but poorly on unseen data.
    - Underfitting: When a model fails to capture the underlying patterns in the data.


    8. **Regularization:**

    - Techniques to reduce overfitting by penalizing model complexity.
    - L1 Regularization (Lasso): Penalizes the sum of absolute coefficients.
    - L2 Regularization (Ridge): Penalizes the sum of squared coefficients.


    9. **Feature Engineering:**

    - The process of transforming raw data into features that are more suitable for machine learning models.
    - Techniques:
    - Feature Scaling: Normalizing features to have a consistent range.
    - Feature Selection: Selecting the most informative features.
    - Feature Extraction: Creating new features from original features.


    10. **Model Selection and Validation:**

    - Techniques to select the best model and avoid overfitting:
    - Cross-Validation: Evaluates a model on multiple subsets of the data.
    - Train-Validation-Test Split: Divides the data into separate sets for training, validation, and testing.
    - Hyperparameter Tuning: Optimizing the model's hyperparameters to improve performance.