Glossary

The ML Glossary Pt. 4

136

Glossary

AI

Continuation of <a href="/post/122">Pt. 3</a>. 121. Scaling (Feature Scaling) Scaling is the process of transforming feature values to a specific range or distribution to improve model performance and convergence. Common methods include Min-Max Scaling: x' = (x - min(x)) / (max(x) - min(x)) and Standardization: x' = (x - μ) / σ Scaling is crucial for algorithms sensitive to feature magnitudes, such as gradient descent and SVMs.<br> 122. Self-Attention Self-Attention is a mechanism where each element in a sequence computes attention weights with respect to every other element, enabling context-aware representations. It is central to the Transformer architecture, allowing parallel sequence processing and capturing long-range dependencies.<br> 123. Self-Supervised Learning Self-Supervised Learning is a paradigm where models learn from unlabeled data by creating their own labels from the data structure itself. Examples include predicting missing words in a sentence (BERT) or predicting future frames in video. It bridges the gap between supervised and unsupervised learning.<br> 124. Similarity Measures Similarity Measures quantify how alike two data points are. Common metrics include Cosine Similarity: cos(θ) = (A·B) / (||A|| ||B||) and Euclidean Distance: d(A, B) = sqrt(Σ (a_i - b_i)²) Choosing the right measure is critical for clustering, retrieval, and nearest-neighbor methods.<br> 125. Sliding Window Technique The Sliding Window Technique processes data sequences by moving a fixed-size window across the dataset, extracting overlapping subsequences. It is used in time-series forecasting, object detection, and streaming data analysis.<br> 126. Softmax Function The Softmax Function converts a vector of raw scores into probabilities: softmax(z_i) = exp(z_i) / Σ exp(z_j) It is often used in the output layer of classification networks to normalize predictions over multiple classes.<br> 127. Sparsity Sparsity refers to the property of having many zero or near-zero elements in a dataset or model parameters. Sparse models improve interpretability, reduce computation, and prevent overfitting. L1 regularization encourages sparsity.<br> 128. Spectral Clustering Spectral Clustering is an algorithm that uses the eigenvalues of a similarity matrix to perform dimensionality reduction before clustering in a lower-dimensional space. It is effective for non-convex cluster structures.<br> 129. Spike-and-Slab Prior The Spike-and-Slab Prior is a Bayesian variable selection technique that models parameters as a mixture of a "spike" at zero (irrelevant variables) and a "slab" with broader distribution (relevant variables). It promotes sparsity in Bayesian inference.<br> 130. State Space Models State Space Models describe systems that evolve over time using latent (hidden) states. They are represented by state equations and observation equations, commonly applied in control theory and time-series forecasting (e.g., Kalman filters).<br> 131. Stationarity Stationarity is a property of a time series where statistical characteristics like mean, variance, and autocorrelation remain constant over time. Many forecasting models require stationarity, achieved via differencing or transformation.<br> 132. Statistical Learning Theory Statistical Learning Theory studies the properties of learning algorithms, focusing on generalization from finite samples. It provides bounds on model performance through concepts like VC dimension and Rademacher complexity.<br> 133. Stemming Stemming is the process of reducing words to their root form by removing suffixes or prefixes. For example, "running" becomes "run". It is widely used in text preprocessing for NLP tasks.<br> 134. Stochastic Process A Stochastic Process is a sequence of random variables indexed by time or space. Examples include Markov chains and Brownian motion, both of which are relevant to modeling uncertainty in ML systems.<br> 135. Stratified Sampling Stratified Sampling divides the dataset into strata (homogeneous subgroups) and samples proportionally from each. It ensures balanced representation of important subgroups, often used in classification tasks.<br> 136. Subgradient Methods Subgradient Methods extend gradient-based optimization to non-differentiable convex functions by using subgradients instead of true gradients. They are useful in optimizing L1-regularized objectives.<br> 137. Survival Analysis Survival Analysis models the time until an event of interest occurs, often incorporating censored data. Common methods include the Kaplan–Meier estimator and Cox proportional hazards model, applied in reliability prediction and healthcare analytics.<br> 138. Symbolic AI Symbolic AI is an approach to AI based on explicitly representing knowledge using symbols and rules. While interpretable, it lacks adaptability compared to statistical methods. Recent trends explore integrating symbolic reasoning with neural networks.<br> 139. Synthetic Data Synthetic Data is artificially generated data that mimics the statistical properties of real data. It is used to augment training sets, protect privacy, and test systems without exposing sensitive information.<br> 140. Systematic Bias Systematic Bias occurs when there is a consistent deviation in predictions due to flaws in the data, measurement process, or model assumptions. Unlike random error, systematic bias cannot be reduced by increasing the sample size.<br> Continued <a href="/post/123">here</a>.

- Shubham Anuraj, 01:01 AM, 13 Aug, 2025

AI


Next Read: Hot and Cold