Author: anuj

  • Where is the data stored in a trained ML model?

    The final trained model doesn’t store the actual training data, but rather extracts patterns, relationships, and representations from it during training. How much of the training data is “stored” in a model depends on the type of model and how it’s designed, but here’s a general breakdown: Classical Machine Learning Models (e.g., Decision Trees, SVMs):…

  • Addressing Data Imbalance using GCP Native Tools

    Sure! Here’s the content formatted as HTML: “`html GCP Tools for Handling Imbalanced Datasets GCP Tools for Handling Imbalanced Datasets in ML 1. AI Platform (Vertex AI) Custom Model Training: Vertex AI supports training with custom code, enabling you to implement techniques like oversampling, undersampling, or class weighting. Hyperparameter Tuning: You can tune parameters such…

  • F1 Score in Machine Learning

      Understanding the F1 Score in Machine Learning The F1 score is a measure of a model’s accuracy that takes both precision and recall into account. It is the harmonic mean of precision and recall, giving a balanced view of the performance, especially for binary classification tasks. Precision and Recall Precision is the proportion of…

  • Data catalogs, data lineage, data quality, and data observability – for big data workflows

    Data catalogs, data lineage, data quality, and data observability as they apply to big data workflows: 1. Data Catalogs Definition: A data catalogue is an organized inventory of all the data assets within an organization, which includes metadata that describes the data. In big data workflows, it serves as a centralized repository where users can…

  • Machine Learning – Labeling Best Practices

    Creating labels for a machine learning dataset is a critical step, especially for supervised learning tasks where models need to learn from **labeled** examples. Here’s how you can approach creating labels for different types of machine learning datasets: ### **Steps for Creating Labels** #### 1. **Understand the Problem Domain** – Before creating labels, you need…

  • Keras and Regression Models

    **Regression models** are statistical techniques used to model and analyze the relationship between a dependent variable (also called the target or output) and one or more independent variables (also known as features or predictors). The goal of regression analysis is to predict the value of the dependent variable based on the values of the independent…

  • Using a Transformer for building conversational chatbots

    Transformers for Conversational Chatbots Transformers, particularly models like GPT (Generative Pre-trained Transformer), have revolutionized conversational chatbots with their ability to understand and generate human-like text. Here’s how you would use a Transformer for building and deploying conversational chatbots: 1. Understanding Transformers: Transformers are a type of deep learning model designed to handle sequential data by…

  • Labeling vs Predictions in ML

    In Machine Learning (ML), **labeling** and **prediction** refer to different stages in the machine learning workflow: 1. **Labeling**: – **Definition**: Labeling is the process of assigning correct or known output values (called labels) to data points. These labels are used during the training phase of supervised learning models. For example, in an image classification task,…

  • Machine Learning Model Parameters and Memory Usage

    The **parameters** in a machine learning (ML) model directly affect the **memory usage** because they determine the amount of data the model needs to store and process during training and inference. The more parameters a model has, the more memory it consumes. Here’s a breakdown of how this works: ### 1. **Memory for Storing Parameters**…

  • Are parameters known prior to Training an ML Model?

    ### Are Parameters Known Prior to Training an ML Model? The process of determining the parameters of a model is known as **training**. During training, the model learns from the data by iteratively adjusting its parameters to optimize a given objective function (also known as a loss function). So – No, **parameters are not known…