🤸♀️
    
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
  | # train_grpo.py | |
| import re | |
| import torch | |
| from datasets import load_dataset, Dataset | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| from peft import LoraConfig | |
| from trl import GRPOConfig, GRPOTrainer | |
| # Load and prep dataset | 
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
  | #Binary search tree (BST) is a binary tree where the value of each node is larger or equal to the values in all the nodes in that node's left subtree and is smaller than the values in all the nodes in that node's right subtree. | |
| # Write a function that, efficiently with respect to time used, checks if a given binary search tree contains a given value. | |
| # For example, for the following tree: | |
| # n1 (Value: 1, Left: null, Right: null) | |
| # n2 (Value: 2, Left: n1, Right: n3) | |
| # n3 (Value: 3, Left: null, Right: null) | |
| # Call to contains(n2, 3) should return True since a tree with root at n2 contains number 3. | 
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
  | /*App usage data are kept in the following table: | |
| TABLE sessions | |
| id INTEGER PRIMARY KEY, | |
| userId INTEGER NOT NULL, | |
| duration DECIMAL NOT NULL | |
| Write a query that selects userId and average session duration for each user who has more than one session.*/ | |
| -- Example case create statement: | 
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
  | The very first step after downloading and unzipping the dataset was to import all 8 separate .csv files and format them | |
| as individual pandas data frames. Each data frame would have a review per row. Each data frame would have 4 different | |
| columns (from left to right): “Review Score”, “Tail of Review URL”, “Review Title” and “Review Text”. | |
| All reviews were combined into one big dataframe to make data wrangling easier- such as applying functions on it. | |
| Then columns: “Review Score” and “Review Text” were separated out as their own variables since these would be the main | |
| objects handled in the Machine Learning algorithm. | |
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
  | The problem is I want to assign a sentiment to a review as +/-/neutral based on words used in product reviews. (Given a review, the goal is to predict the user’s attitude.) | |
| According to Wikipedia, sentiment analysis is (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. | |
| My client would be amazon.com or some other e-commerce giant that would like to know which of their products are highly liked. Then the company can invest accordingly. Based on my analysis, the company would make certain products more available or recommend similar products in order to retain and grow their customer base. | |
| For negative sentiments, client could do research on what are the drivers behind negative sentiments, especially related to competitors. If there is negative conversation, reach out to these reviewers. | |
| With sentime | 
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
  | I was at first unwilling to attend tonight's meetup because it seemed like it was more for people who were still exploring | |
| Data Science as a career. And I was already committed to Data Science. So maybe not tonight. | |
| However, the point of me going to these meet-ups were not to just hear some advice, but to meet people, network with others in the Data Science space. And I was able to meet Michelle Kelsey, who is part of IBM's Watson Cognitive team. This really excited me because I was first exposed to IBM's Watson Cognitive through Serena's Watson. She was able to feed her play data into Watson, who predicted for her the best move/s for her next game. | |
| The same cognitive solution was demonstrated with Cognotoy dino, a toy that learns, remembers and responds through dialog with the | |
| thw user. This got the best of me. Now I want my own cognotoy dino. I ended up meeting Michelle face-to-face as planned, got her business card and took up her offer to meet her back in the city (SF) sometime in April to discuss more | 
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
  | Validating A/B Test Results for Yammer | |
| Possible Causes to Increased Messages in Treatment Group | |
| 1. Metric may need to be redefined | |
| 2. Poor calculations | |
| 3. Users were not random, which would make test set-up faulty by being bias | |
| 4. Confounding factor that is hard to detect, but having effect(s) on test results | |
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
  | Last night, I got to meet a huge variety of people interested in data analytics/science! I really enjoyed meeting people who recently got their appetite wet in data science and people who are seasoned in the field since writing 1000-lines of code is a breeze for them and who talk about R as if they are more fluent in it than English. A neat surprise was the presence of Math and Economics professors adding in their input to the discussion on whether or not Peer Assisted Learning would help raise performance levels in STEM classes offered at Sacramento State University. This talk helped me learn that a result from an experiment can always be questioned. Experiment design can be reassessed even after the experiment has been completed. Therefore, even after drawing conclusions on my experiment, keeping an open mind for feedback would be advisable. | |
| The second talk was the one I had more interest in since I have been wondering about how to pick a model that best addressed my Capstone Project problem. I had a |