Not all explanations are created equal. For an explanation to be useful in practice, it must do more than highlight inputs or display weights. It needs to behave reliably and reflect how the model actually works.
Academic Writing, Data Science, My Digital Universe, Portfolio

Why Consistency and Faithfulness Matter in AI Explanations

In the growing field of explainable AI, tools like LIME and SHAP have made it possible to peek inside complex models and understand their reasoning. But just because a model can explain itself doesn’t mean every explanation is meaningful or trustworthy.

Evaluating the Quality of Explanations

Not all explanations are created equal. For an explanation to be useful in practice, it must do more than highlight inputs or display weights. It needs to behave reliably and reflect how the model actually works.

Two critical properties help assess that:

1. Consistency

A good explanation should behave consistently. That means:

  • If you train the same model on different subsets of similar data, the explanations should remain relatively stable.
  • Small changes to input data shouldn’t lead to dramatically different explanations.

Inconsistent explanations can confuse users, misrepresent what the model has learned, and signal overfitting or instability in the model itself.

2. Faithfulness

Faithfulness asks a simple but powerful question: Do the features highlighted in the explanation actually influence the model’s prediction?

An explanation is not faithful if it attributes importance to features that, when changed or removed, don’t affect the outcome. This kind of misleading output can erode trust and create false narratives around how the model operates.

Why These Metrics Matter

In sensitive applications like healthcare, lending, or security, misleading explanations are more than just technical flaws. They can have real-world consequences.

  • Imagine a credit scoring model that cites a user’s browser history or favorite color as key decision drivers. Even if the model is technically accurate, such explanations would damage its credibility and raise ethical and legal concerns.
  • In regulated industries, explanations that fail consistency or faithfulness checks can expose organizations to compliance risks and reputational damage.

Real-World Examples

Faithfulness Test: Credit Risk Model

A faithfulness test was applied to a credit risk model used to classify applicants as “high” or “low” risk. The SHAP explanation highlighted feature A (e.g., number of bank accounts) as highly important.

To test faithfulness, this feature was removed and the model’s prediction didn’t change … at all!

What the graph shows:

  • SHAP value for “Number of Bank Accounts” was +0.25 (suggesting a major contribution).
  • But after removing it, the model’s risk prediction stayed the same, proving that this feature wasn’t actually influencing the output.

This revealed a serious problem: the model was producing unfaithful explanations. It was surfacing irrelevant features as important, likely due to correlation artifacts in the training data.

Consistency Test: Credit Risk Model

A credit scoring model was trained on two different but similar subsets of loan application data. Both versions produced the same prediction for an applicant: “high risk”, but gave very different explanations.

What the graph shows:

  • In Training Set A, the top contributing feature was “Credit Utilization” (+0.3).
  • In Training Set B, it was “Employment Type” (+0.28).
  • The SHAP bar charts for the same applicant looked noticeably different, even though the final decision didn’t change.

This inconsistency raised questions about model stability: Can we trust that the model is learning the right patterns, or is it too sensitive to the training data?

Final Thoughts

As AI systems continue to make critical decisions in our lives, explainability is not a luxury, it’s a necessity. Tools like LIME and SHAP offer a valuable window into how models work, but that window needs to be clear and reliable.

Metrics like consistency and faithfulness help us evaluate the strength of those explanations. Without them, we risk mistaking noise for insights, or worse, making important decisions based on misleading information.

Accuracy might get a model deployed, but consistency and faithfulness should decide its validity and trust. If you want to learn more about explainability in AI, please check this blog post, where I talk about how LIME and SHAP can help explain model outcomes.

Understanding Model Decisions with SHAP and LIME
Academic Writing, Data Science, My Digital Universe

What Made the Model Say That? Real Examples of Explainable AI

When people talk about artificial intelligence, especially deep learning, the conversation usually centers around accuracy and performance. How well does the model classify images? Can it outperform humans in pattern recognition? While these questions are valid, they miss a crucial piece of the puzzle: explainability.

Explainability is about understanding why an AI model makes a specific prediction. In high-stakes domains like healthcare, finance, or criminal justice, knowing the why is just as important as the what. Yet this topic is often overlooked in favor of performance benchmarks.

Why Is Explainability Hard in Deep Learning?

Classical models like decision trees (e.g., CART) offer built-in transparency. You can trace the decision path from root to leaf and understand the model’s logic. But deep learning models are different. They operate through layers of nonlinear transformations and millions of parameters. As a result, even domain experts can find their predictions opaque.

This can lead to problems:

  • Lack of trust from users or stakeholders
  • Difficulty debugging or improving models
  • Potential for hidden biases or unfair decisions

This is where explainability tools come in.

Tools That Help Open the Black Box

Two widely used frameworks for model explanation are LIME (Local Interpretable Model-agnostic Explanations) and SHAP (SHapley Additive exPlanations). Both aim to provide insights into which features influenced a specific prediction and by how much.

LIME in Action

LIME works by perturbing the input data and observing how the model’s predictions change. For instance, in a text classification task, LIME can highlight which words in an email led the model to flag it as spam. It does this by creating many variations of the email (e.g., removing or replacing words) and observing the output.

Loan Risk Example:

  • A model classifies a loan application as risky. We will use John as an example.
  • We want to find the reasons as to why their application was labeled as risky.
  • LIME reveals that the applicant’s job status and credit utilization were the two most influential factors.

LIME reveals that the model flagged John’s loan as risky mainly due to his contract employment status and high credit utilization. Although John had no previous defaults and a moderate income, those factors were outweighed by the others in the model’s decision.

SHAP in Practice

SHAP uses concepts from cooperative game theory to assign each feature an importance value. It ensures a more consistent and mathematically grounded explanation. SHAP values can be plotted to show how each input feature pushes the prediction higher or lower.

Medical Diagnosis Example:

  • Let’s use Maria as an example, after her information was entered into the system, she was classified as high risk by the model
  • To understand as to what factors contributed to that classification, we can use SHAP. SHAP shows that age and blood pressure significantly contributed to a high-risk prediction.
  • These insights help physicians verify if the model aligns with clinical reasoning.

Final Thoughts

The examples of Maria and John illustrate a powerful truth: even highly accurate models are incomplete without explanations. When a model labels someone as high-risk, whether for a disease or a loan default, it’s not enough to accept the outcome at face value. We need to understand why the model made that decision.

Tools like LIME and SHAP make this possible. They open up the black box and allow us to see which features mattered most, giving decision-makers the context they need to trust or challenge the model’s output.

Why Explainability Matters in Business:

  • Builds trust with stakeholders
  • Supports accountability in sensitive decisions
  • Uncovers potential biases or errors in the model
  • Aligns predictions with domain expertise

As AI becomes more embedded in real-world systems, explainability is not optional; it’s essential. It turns predictions into insights, and insights into informed, ethical decisions.

Good AI model evaluations doesn’t stop at explainability. Learn the importance of consistency and faithfulness and see why it matters by checking out this post.

Conflict-free scheduling: Ensures no two vehicles with intersecting paths enter at the same time.
Academic Writing, Data Science, My Digital Universe, Portfolio

Optimizing Traffic Flow: Efficient, but Is It Safe?

Unsignalized intersections are managed without traffic lights. They rely on stop signs and right-of-way rules. These intersections are inherently riskier compared to traffic light enforced ones because now there is no lights and it depends on the driver paying attention to the stop sign, but that is a different matter altogether.

They’re common in suburban or low-traffic areas but increasingly challenged by growing traffic volumes and the emergence of Connected and Automated Vehicles (CAVs).

These intersections are friction points in modern traffic systems. And the problem often starts with one outdated rule: First-Come-First-Served (FCFS).

First-Come-First-Served

FCFS is a simple scheduling principle: vehicles cross in the order they arrive. If multiple vehicles approach, each waits for the ones ahead (yes, you are supposed to wait if the other person arrives at the stop sign before you) even if their paths don’t conflict.

Why It Falls Short

  • No spatial awareness: Vehicles wait even when their paths don’t intersect. This may not be a bad thing if your city or neighborhood has CRAZY drivers but it is not efficient right?
  • Ignores vehicle dynamics: No speed adjustments are used to reduce waiting time. Although you may be able to reply to a text or two? NO. Don’t text and drive!
  • Creates bottlenecks: Delays increase when vehicles arrive from different directions in quick succession. Oh well, your precious time.

In the animation above, each vehicle waits for the previous one to clear the intersection, even when there’s no collision risk. The result? Wasted time and unused intersection space. Well, that is if you only care about efficiency. Not so bad from a safety point of view.

Why FCFS Doesn’t Work for CAVs

As vehicles become more intelligent and connected, relying on a static rule like FCFS is inefficient. This is assuming that the person behind the wheel is also intelligent enough to practice caution and obey traffic rules and drives SOBER.

Modern CAVs can:

  • Share real-time location and speed data.
  • Coordinate with one another to avoid collisions.
  • Adjust their behavior dynamically.

FCFS fails to take advantage of these capabilities. It often causes unnecessary queuing, increasing delays even when safe, efficient crossings are possible through minor speed changes. Again, assuming that the drivers are all outstanding citizens with common sense, yes, this is not very efficient and there is room for improvement.

A Smarter Alternative: Conflict-Free, Real-Time Scheduling

This recent paper, named “An Optimal Scheduling Model for Connected Automated Vehicles at an Unsignalized Intersection” proposes a linear programming-based model to optimize flow at unsignalized intersections. The model is built for CAVs and focuses on minimizing average delay by scheduling optimal crossing times based on:

  • Vehicle location and direction
  • Potential conflict zones

Key Features of the Model

  • Conflict-free scheduling: Ensures no two vehicles with intersecting paths enter at the same time.
  • Rolling horizon optimization: Continuously updates schedules in real time.
  • Delay minimization: Vehicles adjust speed slightly instead of stopping.

In this visualization, vehicles coordinate seamlessly:

  • The red car enters first.
  • The gray car slows slightly to avoid a conflict.
  • The blue car times its approach to maintain flow.

No stopping. No wasted time. Just optimized motion.

Now that all sounds good to me. It sounds somewhat like a California Stop, if you know what I mean. But how can we trust the human to obey these more intricate optimization suggestions when people don’t even adhere to more simple rules like slowing down in a school zone? Ok, maybe a different topic. So let’s assume that these are all goody goodies behind the wheel and continue.

Performance: How the Model Compares to FCFS

According to the study’s simulations:

  • Up to 76.22% reduction in average vehicle delay compared to FCFS.
  • Real-time responsiveness using rolling optimization.
  • Faster computation than standard solvers like Gurobi, making it viable for live deployment.

The result? Smoother traffic, shorter waits, and better use of intersection capacity without traffic signals.

Rethinking the Rules of the Road

FCFS is simple but simplicity comes at a cost. In a connected, data-driven traffic ecosystem, rule-based systems like FCFS are no longer sufficient.

This study makes the case clear: real-time, model-based scheduling is the future of unsignalized intersection management. As cities move toward CAVs and smarter infrastructure, the ability to optimize traffic flow will become not just beneficial, but essential. That said, complexity also comes at a cost. If all the vehicles are autonomous and are controlled by a safe, optimized, and centralized algorithmic command center, this could work. But as soon as you introduce free agency, which is not a bad thing, but in this context it introduces a lot of risk, randomness, uncertainty, and CHAOS … one have to think about efficiency vs. practicality and safety.

If these CAVs are able to enter into a semi-controlled environment when they enter the parameter of the intersection, perhaps this approach could work. This means that while they are in the grid (defined by a region that leads up to the stop sign), the driver does loose some autonomy and their vehicle will be simulated by a central command … this might be a good solution to implement.

Either way, this is an interesting study. After all, we all want to get from point A to point B in the most efficient way possible. The less time we spend behind the wheel at stop signs, the more time we have for … hopefully not scrolling Tik Tok. But hey, even that is better than just sitting at a stop sign, right?

The value of traditional language corpora for pretraining LLMs is plateauing, making it necessary to gather new, challenging data to improve performance on language and reasoning tasks.
Academic Writing, Data Science, Portfolio

Agentic LLMs: The Evolution of AI in Reasoning and Social Interaction

The landscape of artificial intelligence is evolving every second. Large Language Models (LLMs) are evolving from passive entities into active, decision-making agents. This shift introduces agentic LLMs. Now, we have all seen people metnion agentic this, agentic that, the last few months. In essence, these systems are endowed with reasoning abilities, interfaces for real-world action, and the capacity to engage with other agents. These advancements are poised to redefine industries such as robotics, medical diagnostics, financial advising, and scientific research.

The Three Pillars of Agentic LLMs

  1. Reasoning Capabilities At the heart of agentic LLMs lies their reasoning ability. Drawing inspiration from human cognition, these systems emulate both rapid, intuitive decisions (System 1 thinking) and slower, analytical deliberation (System 2 thinking). Current research predominantly focuses on enhancing the decision-making processes of individual LLMs.
  2. Interfaces for Action Moving beyond static responses, agentic LLMs are equipped to act within real-world environments. This is achieved through the integration of interfaces that facilitate tool usage, robotic control, or web interactions. Such systems leverage grounded retrieval-augmented techniques and benefit from reinforcement learning, enabling agents to learn through interaction with their environment rather than relying solely on predefined datasets.
  3. Social Environments The third component emphasizes multi-agent interaction, allowing agents to collaborate, compete, build trust, and exhibit behaviors akin to human societies. This fosters a social environment where agents can develop collective intelligence. Concepts like theory of mind and self-reflection enhance these interactions, enabling agents to understand and anticipate the behaviors of others.

A Self-Improving Loop

The interplay between reasoning, action, and interaction creates a continuous feedback loop. As agents engage with their environment and each other, they generate new data for ongoing training and refinement. This dynamic learning process addresses the limitations of static datasets, promoting perpetual improvement.

Here, agents act in the world, generate their own experiences, and learn from the outcomes, without needing a predefined dataset. This approach, used by models from OpenAI and DeepSeek, allows LLMs to capture the full complexity of real-world scenarios, including the consequences of their own actions. Although reinforcement learning introduces challenges like training instability due to feedback loops, these can be mitigated through diverse exploration and cautious tuning. Multi-agent simulations in open-world environments may offer a more scalable and dynamic alternative for generating the diverse experiences required for stable, continual learning.

From Individual Intelligence to Collective Behavior

The multi-agent paradigm extends the focus beyond individual reasoning to explore emergent behaviors such as trust, deception, and collaboration. These dynamics are observed in human societies, and insights gained from these studies can inform discussions on artificial superintelligence by modeling how intelligent behaviors emerge from agent interactions.

Conclusion

Agentic LLMs are reshaping the understanding of machine learning and reasoning. By enabling systems to act autonomously and interact socially, researchers are advancing toward creating entities capable of adaptation, collaboration, and evolution within complex environments. The future of AI lies in harmonizing these elements: reasoning, action, and interaction into unified intelligent agent systems that not only respond but also comprehend, decide, and evolve.

What does this mean for fine tuning LLMs? Well, here is where it gets interesting. Unlike traditional LLM fine-tuning, which relies on static datasets curated from the internet and shaped by past human behavior, agentic LLMs can generate new training data through interaction. This marks a shift from supervised learning to a self-learning paradigm rooted in reinforcement learning.

For an indepth take on agentic LLMs, I highly recommend reading this survey.

Bar chart showing ChatGPT and BERT agreement with researcher sentiment labels for positive, negative, and neutral categories
Academic Writing, Data Science, My Digital Universe, Portfolio

How Good is Sentiment Analysis? Even Humans Don’t Always Agree

Sentiment Analysis Is Harder Than It Looks

Sentiment analysis is everywhere: from analyzing customer reviews to tracking political trends. But how accurate is it really? More importantly, how well can machines capture human emotion and nuance in text?

To answer that, I conducted a real-world comparison using a dataset of healthcare-related social media posts. I evaluated the performance of two AI models : ChatGPT and a BERT model fine-tuned on Reddit and Twitter data against human-coded sentiment labels. I used the fine-tuned BERT model and the dataset as part of my Ph.D dissertation. So this was already available to me. More information on the BERT model can be found in my disseration.

The results tell an interesting story about the strengths and limitations of sentiment analysis; not just for machines, but for humans too.

ChatGPT vs. BERT: Which One Came Closer to Human Labels?

Overall, ChatGPT performed better than the BERT model:

  • ChatGPT reached 59.83% agreement with human-coded sentiments on preprocessed text (meaning I used the dataset I used for the BERT model)
  • On raw text, agreement was 58.52% (here, I used the same dataset I provided the second human coder, without any preprocessing like lemmatization, etc)
  • The BERT model lagged behind in both scenarios

This shows that large language models like ChatGPT are moving sentiment analysis closer to human-level understanding; but the gains are nuanced. See below image that shows ChatGPT vs the Trained BERT agreement levels with my coding for each Reddit post. Note that this was when I used the preprocessed dataset and generated the ChatGPT output.

Class-by-Class Comparison: Where Each Model Excels

Looking beyond overall scores, I broke down agreement across each sentiment class. Here’s what I found:

  • Neutral Sentiment: ChatGPT led with 64.76% agreement, outperforming BERT’s 44.76%.
  • Positive Sentiment: BERT did better with 66.19% vs. ChatGPT’s 41.73%.
  • Negative Sentiment: Both struggled, with BERT at 26.09% and ChatGPT at 17.39%.

These results suggest that ChatGPT handles nuance and neutrality better, while BERT tends to over-assign positivity; a common pattern in models trained on social platforms like Reddit and Twitter.

But Wait … Even Humans Don’t Fully Agree

Here’s where it gets more interesting! When comparing two human coders on the same dataset, agreement was just 72.79% overall. Class-specific agreement levels were:

  • Neutral: 91.8%
  • Negative: 60%
  • Positive: Only 43.6%

This mirrors the model behavior. The subjectivity of sentiment, especially when it comes to borderline cases or ambiguous language, is challenging even humans!

Why Sentiment Is So Difficult … Even for Humans

As discussed in my dissertation, sentiment classification is impacted by:

  • Ambiguous or mixed emotions in a single post
  • Sarcasm and figurative language
  • Short posts with little context
  • Different human interpretations of tone and intent

In short: Sentiment is not just about word choice, it’s about context, subtlety, and perception. I tackle this in much more depth in my dissertation. So, if you want to read more about what other researchers are saying, I suggest you refer to Chapter 5, where I talk about sentiment analysis issues, explanations, and implications.

Here is the Spiel

  • ChatGPT outperformed a Reddit-Twitter trained BERT model in both overall accuracy and especially on neutral sentiment.
  • Positive and negative sentiment remain harder to classify, for both models and humans.
  • Even human coders don’t always agree, proving that sentiment is a subjective task by nature.
  • For applications in healthcare, finance, or policy; where precision matters, sentiment analysis needs to be interpreted carefully, not blindly trusted.

Final Thoughts

AI is getting better at understanding us, but it still has blind spots. As we continue to apply sentiment analysis in real-world domains, we must account for ambiguity, human disagreement, and context. More importantly, we need to acknowledge that even “ground truth” isn’t always absolute.

Let’s keep pushing the boundaries, but with a healthy respect for the complexity of human emotion.

Diagram illustrating how a large language model (LLM) answers questions using ontology embeddings, Chain-of-Thought prompting, and Retrieval-Augmented Generation from a knowledge graph.
Academic Writing, Data Science, My Digital Universe, Portfolio

Revolutionizing Data Interaction: How AI Can Comprehend Your Evolving Data Without Retraining

In the rapidly evolving landscape of enterprise AI, organizations often grapple with a common challenge: enabling large language models (LLMs) to interpret and respond to queries based on structured data, such as knowledge graphs, without necessitating frequent retraining as the data evolves.

A novel approach addresses this issue by integrating three key methodologies:

  1. Ontology embeddings : Transform structured data into formats that LLMs can process, facilitating an understanding of relationships, hierarchies, and schema definitions within the data.
  2. Chain-of-Thought prompting: Encourage LLMs to engage in step-by-step reasoning, enhancing their ability to navigate complex data structures and derive logical conclusions.
  3. Retrieval-Augmented Generation (RAG): Equip models to retrieve pertinent information from databases or knowledge graphs prior to generating responses, ensuring that outputs are both accurate and contextually relevant.

By synergizing these techniques, organizations can develop more intelligent and efficient systems for querying knowledge graphs without the need for continuous model retraining.

Implementation Strategy

  • Combining Ontology Embeddings with Chain-of-Thought Prompting: This fusion allows LLMs to grasp structured knowledge and reason through it methodically, which is particularly beneficial when dealing with intricate data relationships.
  • Integrating within a RAG Framework: Traditionally used for unstructured data, RAG can be adapted to retrieve relevant segments from knowledge graphs, providing LLMs with the necessary context for informed response generation.
  • Facilitating Zero/Few-Shot Reasoning: This approach minimizes the need for retraining by utilizing well-structured prompts, enabling LLMs to generalize across various datasets and schemas effectively.

Organizational Benefits

Adopting this methodology offers several advantages:

  • Reduced Need for Retraining: Systems can adapt to evolving data without the overhead of continuous model updates.
  • Enhanced Explainability: The step-by-step reasoning process provides transparency in AI-driven decisions.
  • Improved Performance with Complex Data: The model’s ability to comprehend and navigate structured data leads to more accurate responses.
  • Adaptability to Schema Changes: The system remains resilient amidst modifications in data structures.
  • Efficient Deployment Across Domains: LLMs can be utilized across various sectors without domain-specific fine-tuning.

Practical Applications

This approach has been successfully implemented in large-scale systems, such as the Dutch national cadastral knowledge graph (Kadaster), demonstrating its viability in real-world scenarios. For instance, deploying a chatbot capable of:

  • Understanding domain-specific relationships without explicit programming.
  • Updating its knowledge base in tandem with data evolution.
  • Operating seamlessly across departments with diverse taxonomies.
  • Delivering transparent and traceable answers in critical domains.

Conclusion

By integrating ontology-aware prompting, systematic reasoning, and retrieval-enhanced generation, organizations can develop AI systems that interact with structured data more effectively. This strategy not only streamlines the process but also enhances the reliability and adaptability of AI applications in data-intensive industries. For a comprehensive exploration of this methodology, refer to Bolin Huang’s Master’s thesis.

A visual representation of a Knowledge Graph Question Answering (KGQA) framework that integrates ontology embeddings, Chain-of-Thought prompting, and Retrieval-Augmented Generation (RAG). The diagram shows the flow from user query to LLM reasoning and response generation based on structured data from a knowledge graph.
"Comparison of traditional time series models like ARIMA with foundation models like TimesFM, SigLLM, and GPT-based anomaly detection approaches"
Academic Writing, Data Science, My Digital Universe, Portfolio

Time Series + LLMs: Hype or Breakthrough?

Time series foundational models like UniTS and TimesFM are trained on massive, diverse datasets and show promising results in anomaly detection. Surprisingly, even general-purpose LLMs (like GPT) can detect anomalies effectively; without any domain-specific pretraining.

But here’s the reality check:

🔹 LLMs are not always superior to traditional models like ARIMA. In fact, classical statistical models still outperform them in some cases—especially when data is clean and patterns are well-understood.

🔹 Pretrained pipelines like Orion reduce the cost of training from scratch, enabling faster deployment. However, real-time efficiency remains a challenge.

🔹 SigLLM, which converts time series into text for LLM input, is innovative—but rolling window representations make it computationally expensive.

🔹 Despite limitations like context window size and slow inference, LLMs are still flexible enough to be competitive. But they don’t consistently outperform classical models across the board.

👉 The bottom line: LLMs are not a silver bullet. The most effective strategy is often hybrid, combining classical statistical techniques with foundation model strengths.

Are LLMs the future of time series modeling—or just another wave of AI hype?

Let’s discuss.

#AI #TimeSeries #AnomalyDetection #LLMs #FoundationModels
📄 Thesis by Linh K. Nguyen (MIT)

Animated comparison of regression models showing bias, variance, and prediction accuracy
Academic Writing, Data Science, My Digital Universe, Portfolio

🎢 Bias, Variance & the Great Regressor Showdown

Imagine throwing five regressors into the same ring, giving them the same dataset, and watching them wrestle with reality. That’s what this animation is all about: a visual deep dive into bias, variance, and model complexity—without the textbook-level headache.

The models in play

Five well-known regression models, one smooth sinusoidal target function, and a bunch of noisy data points:

  • Linear Regression – The straight-line enthusiast.
  • Decision Tree – Thinks in boxes, and sometimes forgets how to generalize.
  • Random Forest – The chill ensemble kid who smooths out the chaos.
  • XGBoost – The overachiever with a calculator and an ego.
  • KNN – Your nosy neighbor who always asks, “What are your closest friends doing?”

🎥 The Animation:


The Concepts in Play

🎯 Bias

Bias refers to the error introduced when a model makes overly simplistic assumptions about the data. In other words, it is what happens when the model is too rigid or inflexible to capture the true patterns.

Take Linear Regression for example:

“Let’s pretend everything is a straight line.”

That assumption may work in some cases, but when the data contains curves or more complex relationships, the model cannot adapt. This leads to underfitting, where the model performs poorly on both training and test data because it has failed to learn the underlying structure.


🎢 Variance

Variance measures how sensitive a model is to fluctuations or noise in the training data. A high variance model learns the data too well, including all the random quirks and outliers, which means it performs well on the training set but poorly on new data.

This is typical of models like Decision Trees and KNN:

“I will memorize your quirks and your noise.”

These models often produce excellent training scores but fall apart during testing. That gap in performance is a red flag for overfitting, where the model has essentially memorized instead of generalized.


🤹 Model Complexity

Model complexity describes how flexible a model is in fitting the data. A more complex model can capture intricate patterns and relationships, but that flexibility comes at a cost.

More complexity often means the model has a higher risk of chasing noise rather than signal. It may give impressive training performance but fail when deployed in the real world. Complex models also tend to be harder to interpret and require more data to train effectively.

So while complexity sounds appealing, it is not always the answer. Sometimes the simplest model, with fewer moving parts, delivers the most reliable results.


💡 What We Learn from the GIF

  • Linear Regression has high bias. It’s smooth but can’t capture curves.
  • Decision Tree slices the data too rigidly. Prone to overfitting.
  • Random Forest balances bias and variance quite well (💪).
  • XGBoost tries to win—but often needs careful tuning to avoid bias.
  • KNN loves to follow the data closely, sometimes too closely.

Why This Matters (a lot)

In the real world:

  • Underfitting leads to useless predictions.
  • Overfitting leads to confident but wrong predictions.
  • Balanced models win in production.

Understanding the bias-variance tradeoff helps you:

✅ Pick the right model
✅ Avoid overcomplicating
✅ Diagnose errors
✅ Not trust every “98% R² score” you see on LinkedIn


Final Thoughts

Model performance isn’t magic—it’s tradeoffs. Sometimes the simplest model wins because the data is solid. Other times, the fanciest algorithm trips on its own complexity.

📩 Which model do you think performed best here?
Hit me up with your thoughts or overfitting horror stories.


#MachineLearning #BiasVariance #RegressionModels #ModelComplexity #XGBoost #RandomForest #KNN #Overfitting #Underfitting #DataScience

Academic Writing, My Digital Universe, Portfolio

Entertainment Overcoming Resistance Model (EORM)

Entertainment Education programs have been found to create awareness and behavioral changes on social and health issues. The traditional definition of E-E refers to entertainment programs that are designed to convey known, prosocial effects to viewers. The wide range of definitions that describe E-E can be attributed to the variety of goals in E-E programming. Some programs are focused on informing viewers whereas others are focused on changing attitudes or behaviors.

The Extended Elaboration Likelihood Model highlights E-E’s role of resistance and narrative engagement. The Entertainment Overcoming Resistance Model was thus introduced as a way to consider how E-E programs can overcome resistance. EORM states that transportation, identification, similarity, and PSI help overcome various types of resistance to persuasion—enhancing persuasive outcomes.

The article examines how narrative transportation and character involvement reduce three forms of resistance: reactance, counter arguing, and perceived vulnerability. It compares effects between a dramatic-narrative and nonnarrative program that highlighted consequences of unplanned teen pregnancies. It revealed that the dramatic narrative reduced reactance, fostered parasocial interactions, and decreased perceptions of persuasive intent. Identification with characters in the narrative was found to reduce counterarguing— increasing perceived vulnerabilities to unplanned pregnancies. Transportation into the dramatic narrative though, was associated with counterarguing, contrary to the expectations.

Furthermore, noticing a hidden agenda to promote healthy behavior, disguised as entertainment was found to arouse reactance, while a more straightforward attempt to persuade, did not. This is an interesting find and I really liked the suggestions to address such behavior. The authors suggest that E-E creators should begin by first understanding the forms of resistance that operate within their targeted audience, since there are different types of resistance. It is suggested that perceived persuasive intent to be kept lower for audiences with high reactance and to use characters that users can bond with. Another suggestion is to focus on production features that facilitate empathy when the goal is to increase perceived vulnerability, rather than developing similar characters. These suggestions can be very helpful for E-E creators that want to communicate social issues. Being too pushy or being too cunning could turn the audience off to what you are trying to say, especially if it is a controversial topic for the viewer to begin with. So, using a charismatic character as a vehicle to carry the message seems to be a logical approach to me. The suggestion to focus less on the character but more on production features, when the goal is to increase perceived vulnerability, is a bit harder for me to grasp. Perhaps the idea is that in order to increase awareness a mere character would not suffice.

Koops van ’t Jagt, R., Hoeks, J. C. J., Duizer … (2017) conducted research to find out how a Spanish fotonovela (similar to a comic book), about diabetes communicated E-E messaging to high and low proficiency Dutch readers. The results found that when it comes to diabetes knowledge, readers of the fotonovela outperformed participants that were given a traditional brochure. This was true for both high and low proficiency readers. However, they did not score significantly higher than traditional brochure readers when it came to behavioral intentions. The researchers state that perhaps the lack of focus on measuring behavioral intentions could be a reason for the lack of visibility into behavioral significance and that in similar studies behavioral intention too had been found to be significant from E-E exposure.

I think EORM could be applied for advertising efforts, especially when advertising healthcare related products. However, if the advertisement is seen as too cunning or crafty, audiences might display reactance. So, advertising professionals would have to focus a lot on characters as well production features to evoke empathy. In practicality, this approach would be quite difficult to achieve in short 30 second – 1-minute advertisements. However, if advertising efforts were to focus on longer-formats, leveraging EORM could benefit both advertisers as well as audiences, especially when it comes to imparting knowledge.

My question to you is: would you be interested in product marketing efforts that utilize the EORM model? Where would you think such ads would be appropriate? For example, would you be receptive if an EORM ad was playing on your Facebook feed, interrupting something you were watching, or would you rather prefer that it shows up in search results when you are searching for a potential health concern? So, for example, you suspect you might have arthritis and you are trying to find more information on YouTube and happen to come across an E-E video series on YouTube about a character that suffers from similar health issues, would you prefer that piece over something that is served on your Facebook feed?

Citation

Koops van ’t Jagt, R., Hoeks, J. C. J., Duizer, E., Baron, M., Molina, G. B., Unger, J. B., & Jansen, C. J. M. (2017). Sweet Temptations: How Does Reading a Fotonovela About Diabetes Affect Dutch Adults with Different Levels of Literacy? Health Communication, 33(3), 284–290. https://doi.org/10.1080/10410236.2016.1258617

Academic Writing, Portfolio

Fear Appeals

Since 1953, fear appeals theory has undergone developments to keep up with the trends of the time. In the beginning, fear was the focus in fear appeals theories and perceived threat and perceived efficacy were introduced to the equation in the ’70s and ’80s. Past research on the subject matter was focused on conceptual and methodological issues and recent research, including the present study, focuses on quantitative methods to analyze fear appeals literature.

There are three main models of the fear appeals theory: Drive Theories, Parallel Response Models, and SEU Models. In fact, one could say there are four if one was to consider the Extended Parallel Process Model (EPPM), which draws upon elements from the former three models. Drive Theories is the oldest model that was used to explain fear appeal results. It suggested that the level of fear, aroused by a fear appeal, motivates actions. It proposed a U-shaped relationship between fear and attitude change, where a moderate amount of fear was thought to produce the most attitude change. The biggest contribution of the Parallel Response Model was the introduction of cognitive processing to the fear appeals theory. It introduced the idea that fear appeals produce danger-control and fear-control processes in subjects. The SEU models identify components and cognitive mediators that lead to message acceptance in fear appeals. The SEU models suggest that high-threat and high-efficacy produce the most message acceptance but fail to explain when and how.

I found the EPPM to be the most interesting of the models. The EPPM suggests that when people believe that they can perform the recommended response against a threat, they are more motivated to consciously think of ways to remove or lessen the threat. Usually, this means adopting the methods outlined in the message to control the danger. In contrast, when people doubt the efficiency of the recommended response or their ability to do it, they are motivated to control their fear and focus on eliminating the fear through denial.

The EPPM can be used to analyze how people have responded to the pandemic. It might also be noteworthy to pay attention to how their political affiliations and media consumption choices affect their responses to fear appeals as well. For example, the Far-Right has downplayed the threat of COVID by claiming that it is as dangerous as the common flu. They have also downplayed the effectiveness of wearing a mask to prevent the spread of the pandemic. Whereas the media outlets considered to be liberal or Left-Leaning by the Far Right have portrayed COVID as a serious threat and have promoted mask-wearing and social-distancing as effective and necessary measures to tackle it.

The findings from the metanalyses conducted by Witte, K., & Allen, M. (2000) suggest that high-threat fear, accompanied by equally high-efficacy messages is the most effective. It also suggests that feal appeals, without high-efficacy messages, run the risk of backfiring and producing defensive responses. In the light of this matter, do you think this is what happened with how the pandemic was portrayed and how it was received by American society? Was it portrayed as highly threatening and inevitable, making some people think that there’s not much they could do to protect themselves against it? Was there a lack of high-self efficacy and an overemphasis of high-threat that enabled certain Right Wing media institutions to strengthen their narrative of COVID being nothing more than the common flu or the ineffectiveness of wearing a mask? Hold on to that thought because you might want to consider the findings from the article I selected as well.

Gerjo Kok, Gjalt-Jorn Y. Peters … (2018) states that researchers have been misled in their interpretation when it comes to the effectiveness of fear appeals in promoting health behaviors. The study reviews empirical evidence to conclude that fear appeals are only effective in cases of high self-efficacy. The study uses smoking as an example to illustrate the discussion on fear appeals, with the goal of promoting political decision-making that is based on theory and evidence and suggests alternatives to fear appeals that can be used in health promotion messaging. The researchers suggest that fear appeals are more effective when combined with non-threatening messages that improve self-efficacy, which is in stark contrast to Witte, K., & Allen, M. (2000)’s recommendations.

The article provides an example of how messages related to coping tactics have garnered more attention from smokers as compared to fear appeals. Applying the suggestions from Gerjo Kok, Gjalt-Jorn Y. Peters … (2018) to the COVID responses in populations can shed some light on the behaviors we have seen in American society. What if COVID was presented in the media with a heavier focus on coping rather than fear appeals? Would such an approach have been more effective? I am aware that COVID is a different kind of danger as compared to smoking, which can be avoided solely by choice, whereas a contagious virus is at times unavoidable. That said, the spread of COVID would have been lessened if stricter measures were taken and if people had adhered to guidelines from the very beginning. Taiwan, being in such close proximity to the epicenter of the disease managed to handle the pandemic much better than the West.

Citations

Gerjo Kok, Gjalt-Jorn Y. Peters, Loes T. E. Kessels, Gill A. ten Hoor & Robert

A. C. Ruiter (2018) Ignoring theory and misinterpreting evidence: the false belief in fear appeals,

Health Psychology Review, 12:2, 111-125, DOI: 10.1080/17437199.2017.1415767

Witte, K., & Allen, M. (2000). A Meta-Analysis of Fear Appeals: Implications for Effective Public Health Campaigns. Health Education & Behavior27(5), 591–615. https://doi.org/10.1177/109019810002700506