
Picture by Editor
# Introduction
For a lot of organizations, notably these in regulated industries or with restricted technical infrastructure, Excel and its XLMiner add-in function the first platform for predictive modeling and machine studying workflows.
But Excel’s accessibility masks a important hole: the distinction between operating fashions and constructing reliable analytics programs. Engaged on a mortgage approval prediction undertaking, I found that Excel-based machine studying fails not as a consequence of algorithmic limitations, however as a consequence of some practices which can be incessantly neglected.
This text transforms the unwelcome experiences into 5 complete frameworks that can elevate your Excel-based machine studying work.
# Lesson 1: A number of Strategies for Outlier Detection
Outlier dealing with is extra artwork than science, and untimely elimination can eradicate official excessive values that carry vital info. In a single case, all residential asset values above the ninety fifth percentile have been eliminated utilizing a easy IQR calculation, beneath the belief they have been errors. Later evaluation revealed the elimination of official ultra-high-value properties, a related section for giant mortgage approvals.
The lesson: Use a number of detection strategies and guide overview earlier than elimination. Create a complete outlier detection framework.
In a brand new sheet adjoining to the principle knowledge, create detection columns:
- Column A: Authentic Worth (residential_assets_value)
- Column B: IQR Methodology
=IF(A2 > QUARTILE.INC($A$2:$A$4270,3) + 1.5*(QUARTILE.INC($A$2:$A$4270,3)-QUARTILE.INC($A$2:$A$4270,1)), "Outlier_IQR", "Regular") - Column C: 3-Sigma Methodology
=IF(ABS(A2-AVERAGE($A$2:$A$4270)) > 3*STDEV($A$2:$A$4270), "Outlier_3SD", "Regular") - Column D: Percentile Methodology
=IF(A2 > PERCENTILE.INC($A$2:$A$4270,0.99), "Outlier_P99", "Regular") - Column E: Mixed Flag
=IF(COUNTIF(B2:D2,"Outlier*")>=2, "INVESTIGATE", "OK") - Column F: Handbook Evaluate [Notes after investigating]
- Column G: Ultimate Determination [Keep/Remove/Transform]
This multi-method method revealed patterns in my mortgage knowledge:
- Values flagged by all three strategies (IQR, 3-sigma, and percentile): Doubtless errors
- Values flagged by IQR however not 3-sigma: Authentic excessive values in skewed distributions
- Values flagged solely by percentile: The intense however legitimate circumstances I nearly misplaced
The “Handbook Evaluate” column is essential. For every flagged commentary, doc findings corresponding to: “Authentic luxurious property, verified towards public information” or “Doubtless knowledge entry error, worth exceeds market most by 10x.”
# Lesson 2: At all times Set Random Seeds
Few experiences are extra irritating than presenting glorious mannequin outcomes, then being unable to breed these actual numbers when getting ready the ultimate report. This situation occurred with a classification tree mannequin: Sooner or later’s validation accuracy was 97.3%, however subsequent day’s was 96.8%. The distinction appears small, but it surely undermines credibility. Makes the viewers surprise which quantity is actual and the way a lot can this evaluation be trusted.
The lesson: The perpetrator is random partitioning with no fastened seed. Most machine studying algorithms contain randomness at some stage.
- Knowledge partitioning: Which observations go into coaching vs. validation vs. take a look at units
- Neural networks: Preliminary weight randomization
- Some ensemble strategies: Random characteristic choice
XLMiner makes use of random processes for partitioning the info. Operating the identical mannequin twice with similar parameters yields barely totally different outcomes as a result of the coaching/validation break up differs every time.
The answer is straightforward however non-obvious. When utilizing XLMiner’s partition performance (present in most mannequin dialogs):
- Test the field labeled “Set seed” (it is unchecked by default)
- Enter a particular integer: 12345, 42, 2024, or any memorable quantity
- Doc this seed worth within the Mannequin Log
Now, each time the mannequin is run with this seed:
- An identical coaching/validation/take a look at splits
- An identical mannequin efficiency metrics
- An identical predictions for a similar observations
- Excellent reproducibility
Right here is an instance from the mortgage approval dataset with out seed (three runs of similar logistic regression):
- Run 1: Validation Accuracy = 92.4%, F1 = 0.917
- Run 2: Validation Accuracy = 91.8%, F1 = 0.923
- Run 3: Validation Accuracy = 92.1%, F1 = 0.919
And with with seed=12345 (three runs of similar logistic regression):
- Run 1: Validation Accuracy = 92.1%, F1 = 0.928
- Run 2: Validation Accuracy = 92.1%, F1 = 0.928
- Run 3: Validation Accuracy = 92.1%, F1 = 0.928
The distinction issues enormously for credibility. When tasked with recreating the evaluation, it may be completed with confidence, understanding the numbers will match.
Necessary caveat: The seed controls randomness in partitioning and initialization, but it surely would not make the evaluation resistant to different adjustments. If knowledge is modified (including observations, altering transformations) or mannequin parameters are adjusted, outcomes will nonetheless differ, as they need to.
# Lesson 3: Correct Knowledge Partitioning: The Three-Means Cut up
Associated to reproducibility is partition technique. XLMiner’s default settings create a 60/40 coaching/validation break up. This appears cheap till the query arises: the place’s the take a look at set?
A typical mistake includes constructing a neural community, tuning it primarily based on validation efficiency, then reporting these validation metrics as ultimate outcomes.
The lesson: And not using a separate take a look at set, the optimization happens instantly on the info being reported, inflating efficiency estimates. The right partitioning technique makes use of three units.
1. Coaching Set (50% of Knowledge)
- The place the mannequin learns patterns
- Used to suit parameters, coefficients, or weights
- For the mortgage dataset: ~2,135 observations
2. Validation Set (30% of Knowledge)
- For mannequin choice and hyperparameter tuning
- Used to check totally different fashions or configurations
- Helps choose the most effective pruned tree, optimum cutoff worth, or excellent neural community structure
- For the mortgage dataset: ~1,280 observations
3. Check Set (20% of Knowledge)
- “Ultimate examination” – solely rating as soon as
- Used ONLY in spite of everything modeling choices are full
- Gives unbiased estimate of real-world efficiency
- For the mortgage dataset: ~854 observations
Important rule: By no means iterate on take a look at set efficiency. The second a mannequin is chosen as a result of “it performs higher on the take a look at set,” that take a look at set turns into a second validation set, and efficiency estimates grow to be biased.
That is my workflow now:
- Set seed to 12345
- Partition 50/30/20 (coaching/validation/take a look at)
- Construct a number of mannequin variants, evaluating every on validation set solely
- Choose the most effective mannequin primarily based on validation efficiency and enterprise necessities
- Rating the take a look at set precisely as soon as with the chosen mannequin
- Report take a look at set efficiency because the anticipated real-world final result
Right here is an instance from the mortgage approval undertaking:
| Mannequin Model | Coaching Acc | Validation Acc | Check Acc | Chosen? |
|---|---|---|---|---|
| Logistic Regression (all vars) | 90.6% | 89.2% | Not scored | No |
| Logistic Regression (stepwise) | 91.2% | 92.1% | Not scored | No |
| Classification Tree (depth=7) | 98.5% | 97.3% | Not scored | Sure |
| Classification Tree (depth=5) | 96.8% | 96.9% | Not scored | No |
| Neural Community (7 nodes) | 92.3% | 90.1% | Not scored | No |
After choosing the Classification Tree (depth=7) primarily based on validation efficiency, the take a look at set was scored precisely as soon as: 97.4% accuracy. This take a look at accuracy represents the anticipated manufacturing efficiency.
# Lesson 4: The Coaching/Validation Hole: Recognizing Overfitting Earlier than It Hurts
The issue: Preliminary have a look at the classification tree outcomes from the undertaking report appear promising.
Coaching Knowledge Efficiency:
- Accuracy: 98.45%
- Precision: 99%
- Recall: 96%
- F1 Rating: 98.7%
The mannequin appeared profitable to date till the main target received shifted to the validation outcomes.
Validation Knowledge Efficiency:
- Accuracy: 97.27%
- Precision: 98%
- Recall: 94%
- F1 Rating: 97.3%
The distinction appeared small, simply 1.18% accuracy distinction. However figuring out whether or not the hole constituted an issue required a scientific framework.
The lesson: It’s important to grasp when the mannequin memorizes as a substitute of learns.
The sensible answer: Create an overfitting monitor. Construct a easy however systematic comparability sheet that makes overfitting apparent.
Step 1: Create the Comparability Framework
Here’s a mannequin efficiency comparability within the “Overfitting_Monitor” sheet:
| Metric | Coaching | Validation | Hole | Hole % | Standing |
|---|---|---|---|---|---|
| Accuracy | 98.45% | 97.27% | 1.18% | 1.20% | ✓ Good |
| Precision | 99.00% | 98.00% | 1.00% | 1.01% | ✓ Good |
| Recall | 96.27% | 94.40% | 1.87% | 1.94% | ✓ Good |
| F1 Rating | 98.76% | 97.27% | 1.49% | 1.51% | ✓ Good |
| Specificity | 96.56% | 92.74% | 3.82% | 4.06% | ? Watch |
And listed here are the interpretation guidelines:
- Hole < 3%: ✅ Good – Mannequin generalizing nicely
- Hole 3-5%: ❓ Watch – Acceptable however monitor intently
- Hole 5-10%: ⚠️ Regarding – Potential overfit, contemplate simplification
- Hole > 10%: ❌ Downside – Particular overfit, should tackle
And that is the detailed evaluation:
- General Evaluation: GOOD
- Reasoning: All main metrics inside 2% hole. Specificity hole barely greater however nonetheless acceptable. Mannequin seems to generalize nicely.
- Advice: Proceed with take a look at set analysis.
Step 2: Add the calculation formulation
Cell: Hole (for Accuracy)=[@Training] - [@Validation]
Cell: Hole % (for Accuracy)=([@Training] - [@Validation]) / [@Training]
Cell: Standing (for Accuracy)
=IF([@[Gap %]]<0.03, "✓ Good",
IF([@[Gap %]]<0.05, "? Watch",
IF([@[Gap %]]<0.10, "⚠ Regarding", "✗ Downside")))
Step 3: Create a Visible Overfitting Chart
Construct a side-by-side bar chart evaluating coaching vs. validation for every metric. This makes patterns immediately seen:


When bars are shut, the mannequin generalizes nicely. When coaching bars are for much longer than validation bars, there’s overfitting.
Evaluating Throughout Totally different Fashions
The true worth comes from evaluating overfitting patterns throughout the mannequin choices. Right here is the comparability for the “Model_Overfitting_Comparison” sheet:
| Mannequin | Practice Acc | Val Acc | Hole | Overfitting Threat |
|---|---|---|---|---|
| Logistic Regression | 91.2% | 92.1% | -0.9% | Low (damaging hole) |
| Classification Tree | 98.5% | 97.3% | 1.2% | Low |
| Neural Community (5 nodes) | 90.7% | 89.8% | 0.9% | Low |
| Neural Community (10 nodes) | 95.1% | 88.2% | 6.9% | Excessive – Reject this |
| Neural Community (14 nodes) | 99.3% | 85.4% | 13.9% | Very Excessive – Reject this |
Interpretation: The neural community with 10+ nodes is clearly overfitting. Regardless of excessive coaching accuracy (99.3%), validation accuracy drops to 85.4%. The mannequin memorized coaching knowledge patterns that do not generalize.
Best option: Classification tree
- Excessive efficiency (97.3% validation)
- Minimal overfitting (1.2% hole)
- Good stability of complexity and generalization
Listed below are some easy methods to cut back overfitting when noticed:
- For Classification Bushes: Cut back most depth or improve minimal samples per node
- For Neural Networks: Cut back variety of nodes or layers
- For Logistic Regression: Take away variables or use stepwise choice
- For All Fashions: Add extra coaching knowledge if doable
# Lesson 5: Implement Knowledge Validation for Categorical Variables
Knowledge entry errors are silent killers of machine studying tasks. A single typo, corresponding to “gradute” as a substitute of “graduate”, creates a 3rd class in what needs to be a binary variable. The mannequin now has an sudden characteristic worth it is by no means seen throughout coaching, doubtlessly inflicting errors throughout deployment or, worse, silently producing incorrect predictions.
The prevention: Excel’s knowledge validation characteristic. This is the implementation protocol for categorical variables:
In a hidden sheet (identify it “Validation_Lists”), create lists of legitimate values:
- For schooling: Create a column with entries “Graduate” and “Not Graduate”
- For self-employment: Create a column with entries “Sure” and “No”
- For mortgage standing: Create a column with entries “Permitted” and “Rejected”
Within the knowledge entry sheet:
- Choose your entire column for a categorical variable (e.g., column containing schooling knowledge)
- Knowledge → Knowledge Validation → Settings tab
- Permit: Checklist
- Supply: Navigate to the hidden validation sheet and choose the suitable listing
- Error Alert tab: Type = Cease, with a transparent message: “Solely ‘Graduate’ or ‘Not Graduate’ are legitimate entries”
Now it is inconceivable to enter invalid values. Customers see a dropdown with legitimate decisions, eliminating typographical errors totally.
For numeric variables with identified ranges, apply comparable validation to forestall inconceivable values:
- Credit score scores: Have to be between 300 and 900
- Mortgage time period: Have to be between 1 and 30 years
- Annual earnings: Have to be larger than 0
Choose the column, apply knowledge validation, set:
- Permit: Complete Quantity (or Decimal)
- Knowledge: between
- Minimal: 300 (for credit score scores)
- Most: 900
# Ultimate Ideas
Here’s a abstract of the teachings outlined within the article.


5 Excel AI Classes I Discovered the Laborious Means (click on to enlarge)
The 5 practices on this article — multi-method outlier detection, setting random seeds, three-way knowledge partitioning, monitoring the training-validation hole, and implementing knowledge validation — share a standard thread: they’re all easy to implement however devastating to omit.
None of those practices require superior statistical information or complicated programming. They do not demand further software program or costly instruments. Excel XLMiner is a robust software for accessible machine studying.
Rachel Kuznetsov has a Grasp’s in Enterprise Analytics and thrives on tackling complicated knowledge puzzles and trying to find contemporary challenges to tackle. She’s dedicated to creating intricate knowledge science ideas simpler to grasp and is exploring the assorted methods AI makes an influence on our lives. On her steady quest to be taught and develop, she paperwork her journey so others can be taught alongside her. You will discover her on LinkedIn.
