Machine Learning

19. Machine Learning#

ml-types-of-learning

Fig. 19.1 Types of Machine Learning#

Algorithm Family

What are these? Make table and add description

  1. Regression

  2. Classification

  3. Ranking

  4. Clustering

  5. Pattern detection

  6. Time Series

  7. Anomaly detection

  8. Survival analysis

  9. Causal analysis

Ref.: Vidya (2025), p.66

Model Validation

  1. Training dataset validation

  2. Validation dataset validation (optional)

  3. Test dataset validation

Model Evaluation Goals

  1. Regression

    1. • The primary goal is Predictive Accuracy, which assesses how accurately the regression model predicts continuous target values, ensuring that the predicted values closely match the actual values.

    2. Residual Analysis analyzes and minimizes the residuals (the differences between predicted and actual values) to ensure that the model captures underlying patterns in the data.

  2. Classification

    1. Evaluate Discriminative Power, the model’s ability to discriminate between different classes by assessing metrics like accuracy or precision-recall curves. Address class imbalance issues to achieve a balanced classification performance, especially in scenarios with imbalanced class distributions.

  3. Ranking

    1. Relevance Ranking assesses how well the model ranks items or documents by their relevance to a user query or context.

    2. Rank Stability ensures the stability of rankings across different queries or situations, indicating the reliability of the ranking model.

  4. Clustering

    1. For Clustering problems, Cluster Purity is an indicator of how similar the data points in the same cluster are and how different data points in other clusters are, aiming for high cluster purity.

    2. Evaluate the interpretability of the clusters and whether they align with domain knowledge or expectations.

  5. Pattern detection

    1. In Pattern Mining, we assess the model’s ability to discover meaningful patterns, associations, or trends within data, ensuring it captures relevant information.

    2. Generalization evaluates how well the model generalizes patterns to new or unseen data, avoiding overfitting.

  6. Time Series

    1. For problems that involve Time series, Forecasting Accuracy evaluates the accuracy of time series forecasting models in predicting future values using metrics like Mean Absolute Error (MAE) or Root Mean Squared Error (RMSE).

    2. Seasonal Decomposition assesses the ability to capture and interpret seasonal patterns, trends, and noise within time series data.

  7. Anomaly detection

    1. Anomaly Detection Rate measures the ability to identify anomalies effectively while minimizing false positives, often using metrics like precision and recall.

    2. Threshold Tuning optimizes the anomaly detection threshold to balance sensitivity and specificity per the specific application’s requirements.

  8. Survival analysis

  9. Causal analysis

Ref.: Vidya (2025), p.68-69

How Much Data for ML Models?

ml-how-much-data

Fig. 19.2 Data Needed for an ML Model (Vidya, 2025, pp.71)#

Data Types

ml-how-much-data

Fig. 19.3 Amount of Data Needed for an ML Model (Vidya, 2025, pp.71)#

ml-data-types-in-ds

Fig. 19.4 Type of Data in Data Science (Vidya, 2025, pp.71)#