Glossary

The ML Glossary Pt. 3

33

Glossary

AI

Continuation of Pt 2. 81. Query A Query is a request made to a machine learning model to retrieve predictions or insights. In information retrieval and recommendation systems, queries are used to fetch relevant results from a database or search engine based on user input.<br> 82. Queueing Model Queueing Models are used to analyze systems where entities queue for resources, such as servers in a network. In reinforcement learning, queueing models are often employed to simulate environments where decisions about resource allocation need to be optimized.<br> 83. Q-Learning Q-Learning is a model-free reinforcement learning algorithm that aims to learn the optimal policy by estimating the Q-values (state-action values). The Q-value is updated using the Bellman equation: Q(s, a) ← Q(s, a) + α [r + γ max Q(s', a') - Q(s, a)] where s is the current state, a is the action, r is the reward, s' is the next state, α is the learning rate, and γ is the discount factor.<br> 84. Quasi-Newton Methods Quasi-Newton Methods are optimization algorithms that approximate the Hessian matrix of second derivatives to find the minimum of a function. These methods, like BFGS (Broyden–Fletcher–Goldfarb–Shanno), are faster than Gradient Descent for certain problems and require fewer computational resources compared to full Newton's methods.<br> 85. Query Expansion Query Expansion is a technique used in information retrieval to improve search results by adding related terms or synonyms to the original query. This is often achieved using techniques like word embeddings or thesauri.<br> 86. Queue Length Queue Length refers to the number of pending items in a queue at any given time. In machine learning pipelines, managing queue length is important for tasks like batch processing and streaming data handling.<br> 87. Quorum Learning Quorum Learning is a decentralized approach where individual agents make decisions based on local information and aggregate consensus. It is often used in distributed machine learning and multi-agent systems.<br> 88. Radial Basis Function (RBF) A Radial Basis Function is a real-valued function whose value depends only on the distance from a center point. It is commonly used in machine learning models like RBF networks and in the kernel trick for support vector machines (SVMs). A typical RBF is the Gaussian function: φ(x, c) = exp(-γ||x - c||²) where x is the input, c is the center, and γ controls the spread.<br> 89. Random Forest Random Forest is an ensemble learning method that combines multiple decision trees to improve prediction accuracy and control overfitting. Each tree is trained on a random subset of data and features, and the final prediction is made by averaging (regression) or majority voting (classification). Random forests are robust, versatile, and often outperform single decision trees.<br> 90. Random Initialization Random Initialization refers to the process of assigning random values to model parameters (e.g., weights) at the start of training. Proper initialization is critical to prevent problems like symmetry breaking and to accelerate convergence. Techniques like Xavier and He initialization are commonly used in deep learning.<br> 91. Range Range in machine learning describes the spread of a feature's values, calculated as the difference between the maximum and minimum values. Features with different ranges can disrupt model training and are often normalized or scaled to ensure uniform contribution to the model.<br> 92. Rank Rank refers to the position of an item in an ordered list or the dimensionality of a matrix. In information retrieval, ranking algorithms sort items (e.g., search results) based on relevance scores. In linear algebra, matrix rank indicates the number of linearly independent rows or columns, essential in dimensionality reduction and optimization.<br> 93. Recurrent Neural Network (RNN) A Recurrent Neural Network is a type of neural network designed to process sequential data. Unlike traditional feedforward networks, RNNs maintain a hidden state that carries information from previous inputs. This allows them to model temporal dependencies, making them ideal for tasks like language modeling and time series analysis.<br> 94. Regularization Regularization is a set of techniques to prevent overfitting by penalizing model complexity. Common methods include L1 regularization (Lasso), which encourages sparsity by adding λ||w||₁ to the loss function, and L2 regularization (Ridge), which adds λ||w||₂². Regularization balances bias and variance for better generalization.<br> 95. Reinforcement Learning (RL) Reinforcement Learning is a framework where an agent learns to take actions in an environment to maximize cumulative rewards. The agent iteratively updates its policy based on the reward signals received. Algorithms include Q-Learning, SARSA, and policy gradient methods.<br> 96. Residual Block A Residual Block is a building block for deep neural networks, particularly ResNets (Residual Networks). It introduces skip connections, allowing the network to learn the residual mapping: y = F(x) + x where F(x) represents the learned transformation and x is the input. This architecture mitigates the vanishing gradient problem and enables the training of extremely deep networks.<br> 97. Ridge Regression Ridge Regression, also known as L2 regularization, is a linear regression variant that adds a penalty term to the loss function: Loss = ||y - Xw||² + λ||w||² where λ is the regularization parameter controlling the trade-off between fitting the data and keeping the weights small. Ridge regression reduces multicollinearity and prevents overfitting.<br> 98. ROC Curve (Receiver Operating Characteristic) An ROC Curve is a graphical representation of a binary classifier's performance as the decision threshold varies. It plots the True Positive Rate (TPR) against the False Positive Rate (FPR). The Area Under the Curve (AUC-ROC) is a single metric summarizing the model's ability to distinguish between classes.<br> 99. Root Mean Squared Error (RMSE) RMSE is a commonly used metric to measure the difference between predicted and actual values in regression tasks. It is defined as: RMSE = √(Σ(y_pred - y_actual)² / n) where n is the number of observations. RMSE penalizes large errors more than small ones, providing insight into model performance.<br> 100. Robustness Robustness refers to a model's ability to maintain performance under varying conditions, such as noisy inputs, adversarial attacks, or missing data. Robust models generalize well to unseen or perturbed data and are often evaluated using adversarial examples or stress tests.<br> 101. Rule-Based System A Rule-Based System is a traditional AI approach that relies on a predefined set of rules and logic to make decisions. Rules are often expressed as "if-then" statements. While interpretable, these systems lack adaptability and struggle with complex, high-dimensional data.<br> 102. Random Search Random Search is a hyperparameter optimization technique that randomly samples values within a predefined range. While less systematic than grid search, it can outperform it by exploring a larger hyperparameter space more efficiently, especially for high-dimensional problems.<br> 103. Regularization Path A Regularization Path shows how model parameters change as the regularization strength (λ) varies. It is often visualized in Lasso or Ridge regression to identify the optimal level of regularization and understand the impact on feature selection.<br> 104. ReLU (Rectified Linear Unit) ReLU is an activation function commonly used in neural networks, defined as: f(x) = max(0, x) It introduces non-linearity while avoiding the vanishing gradient problem seen in sigmoid and tanh functions. Variants like Leaky ReLU and Parametric ReLU address the issue of "dead neurons" in standard ReLU.<br> 105. Residual Sum of Squares (RSS) RSS is a measure of the discrepancy between observed and predicted values in regression. It is calculated as: RSS = Σ(y_actual - y_predicted)² Lower RSS indicates a better fit of the model to the data.<br> 106. Representation Learning Representation Learning refers to the process of automatically discovering useful features from raw data. Deep learning excels at representation learning by creating hierarchical feature representations in neural networks, reducing the need for manual feature engineering.<br> 107. Restricted Boltzmann Machine (RBM) An RBM is a generative stochastic neural network consisting of a visible layer and a hidden layer. These layers are connected but do not interact within themselves. RBMs are used in collaborative filtering, dimensionality reduction, and as building blocks for deep belief networks.<br> 108. Rough Set Theory Rough Set Theory is a mathematical framework for dealing with uncertainty and vagueness in data. It partitions data into lower and upper approximations, enabling decision-making without requiring probabilistic information. Applications include feature selection and rule induction.<br> 109. RNN Cell An RNN Cell is the fundamental unit of a recurrent neural network, responsible for processing sequential data. Each cell takes an input vector and a hidden state from the previous time step to compute the current hidden state, forming the basis of the network's temporal dependencies.<br> 110. Sample Complexity Sample Complexity refers to the number of training samples required for a learning algorithm to achieve a desired level of performance. It depends on factors such as model complexity, the learning algorithm, and the distribution of the data.<br> 111. Sampling Bias Sampling Bias occurs when the training data does not represent the true distribution of the population, leading to skewed model predictions. Addressing sampling bias often requires resampling techniques or reweighting methods to correct the imbalance.<br> 112. SARSA (State-Action-Reward-State-Action) SARSA is an on-policy reinforcement learning algorithm that updates the Q-value based on the current action and the next action taken under the policy. The update rule is: Q(s, a) ← Q(s, a) + α [r + γ Q(s', a') - Q(s, a)] where α is the learning rate, γ is the discount factor, and r is the reward.<br> 113. Scalability Scalability refers to a machine learning model or system's ability to handle increasing amounts of data or computational demands efficiently. Scalable systems can maintain performance by leveraging distributed computing, parallelism, or optimized algorithms.<br> 114. Semi-Supervised Learning Semi-Supervised Learning is a machine learning paradigm that uses a small amount of labeled data along with a large amount of unlabeled data to improve learning accuracy. Algorithms include self-training, co-training, and graph-based methods.<br> 115. Sensitivity (True Positive Rate) Sensitivity, also known as recall, measures a classifier's ability to correctly identify positive instances. It is defined as: Sensitivity = TP / (TP + FN) where TP is the number of true positives and FN is the number of false negatives.<br> 116. SGD (Stochastic Gradient Descent) SGD is an optimization algorithm that updates model parameters using the gradient of the loss function with respect to a single data point (or a small batch) at each step: θ ← θ - α ∇L(θ) where θ represents parameters, α is the learning rate, and ∇L(θ) is the gradient. It is computationally efficient and widely used in large-scale machine learning.<br> 117. Shannon Entropy Shannon Entropy is a measure of uncertainty or information content in a probability distribution. For a discrete random variable X with probabilities p(x), entropy is defined as: H(X) = -Σ p(x) log(p(x)) Higher entropy indicates greater uncertainty, while lower entropy suggests more predictability.<br> 118. Sigmoid Function The Sigmoid Function is a non-linear activation function commonly used in neural networks. It maps input values to a range between 0 and 1: σ(x) = 1 / (1 + exp(-x)) However, it can suffer from vanishing gradients, which makes it less suitable for deep networks compared to ReLU.<br> 119. Sparse Data Sparse Data refers to datasets where most values are zero or missing. Such data is common in fields like natural language processing and recommender systems. Sparse representations like CSR (Compressed Sparse Row) or embeddings are often used to manage these datasets efficiently.<br> 120. Support Vector Machine (SVM) An SVM is a supervised learning algorithm that finds the hyperplane maximizing the margin between data points of different classes. The optimization problem is: minimize: (1/2)||w||² subject to: y_i(w·x_i + b) ≥ 1, ∀i SVMs can handle non-linear data using kernel functions like RBF and polynomial kernels.<br> Continued <a href="">here</a>.

- Shubham Anuraj, 01:23 AM, 26 Dec, 2024

AI