Saturday Intensive · 9:00 AM – 1:00 PM

Quantitative Research Methods
in the Age of AI

A doctoral-level intensive bridging classical statistical methodology with AI-driven data science. By the end of this session you will be able to design a quantitative study, choose the right statistical test, run descriptive and inferential analysis in SPSS, and build a PLS-SEM model in SmartPLS.

Instructor
Dr. Teik-Kheong Tan, PhD · IEEE
Duration
4 Hours · 4 Modules · 4 Quiz Breaks
Tools
SPSS · SmartPLS · Python · R
Level
PhD / DBA Doctoral Candidates
Module 1 9:20 – 10:00 AM

Research Foundations & Measurement

Before you touch SPSS or SmartPLS, you must be anchored in the philosophical foundations of what you are doing and why. Every methodological choice you make — from your sampling strategy to your statistical test — flows from these foundations.

Research Paradigms — Why Your Worldview Matters

A research paradigm is your set of beliefs about the nature of reality (ontology) and how we can know it (epistemology). Quantitative research sits firmly within positivism — the view that reality exists independently of the observer, that it can be measured objectively, and that patterns in data reveal universal truths about human behaviour.

🔭
Positivism
Reality is objective and measurable. Knowledge comes from empirical observation and measurement. This is the home of quantitative research.
🔬
Post-Positivism
Reality exists but our measurement of it is imperfect. Acknowledges measurement error — which is exactly why SEM corrects for it.
🧠
Interpretivism
Reality is socially constructed. Knowledge is subjective. This is where qualitative research lives — interviews, ethnography, grounded theory.
⚖️
Pragmatism
Use whatever works. Mixed methods researchers often sit here — combining surveys with interviews for a fuller picture.
🤖 AI Lens

Machine learning models are inherently positivist — they assume patterns in data are real and discoverable. When you use AI to analyse survey data or predict outcomes, you are operationalising positivism at scale. Understanding this alignment helps you defend your methodology to examiners who may question the use of AI tools in doctoral research.

The Four Measurement Scales — The DNA of Quantitative Data

Every variable in your study sits on one of four measurement scales. The scale determines which descriptive statistics are meaningful and which inferential tests are valid. Getting this wrong is one of the most common errors in doctoral dissertations.

NNominal
Categories only
OOrdinal
Order, no distance
IInterval
Order + distance
RRatio
True zero point
ScaleExampleModeMedianMeanSDRatio
NominalGender, Country, PhD/DBA
OrdinalRankings, Likert* (debated)
IntervalLikert 1–7, Temperature °C
RatioSalary, Age, Years of experience
⚠️ The Likert Scale Debate

Strictly speaking, a Likert scale (1–7) is ordinal — we cannot prove the distance between "3" and "4" equals the distance between "6" and "7". However, in practice, most researchers in management, marketing, and social science treat Likert scales as interval when there are 5 or more points. SmartPLS and SPSS both support this convention. Be prepared to justify this in your viva.

🔑 Key Rule

You can always move down the scale (treat ratio as interval) but never move up (you cannot treat nominal data as ratio). This single rule resolves most methodological choice questions about which statistic to use.

Quiz Break 1 — Foundations & Measurement
Dr. Tan will launch 5 questions on your screen. Take 10–12 minutes to answer. Be honest with your responses — the class results will be discussed together.
Open Quiz on This Device →
Module 2 10:00 – 10:45 AM

Descriptive Statistics & the Normal Distribution

Before you test any hypothesis, you must describe your data. Descriptive statistics are not just preliminary steps — they reveal the story your data is telling. A non-normal distribution, a bimodal distribution, an extreme outlier — all of these change which inferential tests you can legitimately use.

Measures of Central Tendency

Central tendency statistics tell you where the middle of your distribution sits. The choice between mean, median, and mode is not arbitrary — it depends on your scale of measurement and the shape of your distribution.

📊
Mode
The most frequently occurring value. The only measure valid for nominal data. Useful for understanding the most common category in your sample.
Valid for: Nominal, Ordinal, Interval, Ratio
📏
Median
The 50th percentile — the middle value when data is ordered. Robust to outliers. Use when your distribution is skewed.
Valid for: Ordinal, Interval, Ratio
⚖️
Mean
The arithmetic average. Uses all data points — so it is pulled by extreme values (outliers). The foundation of most parametric tests.
Mean = Σx / N
💡 Doctoral Insight

When you report descriptive statistics in your dissertation, always report both mean and median for continuous variables. A large gap between them signals skewness — which then forces you to justify whether parametric or non-parametric tests are appropriate for your inferential analysis.

The Four Moments of a Normal Distribution

A normal (bell-curve) distribution is completely described by four mathematical properties — the "four moments." Your data will never be perfectly normal, but understanding how far it deviates — and in which direction — is critical for choosing the right tests.

1️⃣
Mean — 1st Moment
Central tendency. Where the distribution is centred. The peak of the bell curve.
Σx / N
2️⃣
Standard Deviation — 2nd Moment
Spread/dispersion. How tightly or loosely values cluster around the mean.
SD = √[Σ(x-x̄)² / (N-1)]
3️⃣
Skewness — 3rd Moment
The lean or tail of the distribution. Positive skew = tail to the right. Negative skew = tail to the left.
+ve = right tail | -ve = left tail
4️⃣
Kurtosis — 4th Moment
Peakedness vs. flatness. Leptokurtic (+ve) = peaked. Platykurtic (-ve) = flat.
Leptokurtic | Platykurtic
Key Areas Under the Normal Curve — The Empirical Rule
μ -1SD +1SD -2SD +2SD 68% 95%
±1 SD ≈ 68% of data
±2 SD ≈ 95% of data
±3 SD ≈ 99.9% of data
🔑 When to use non-parametric tests

If your data is significantly skewed (|skewness| > 2) or has extreme kurtosis, parametric tests like t-tests and ANOVA may give you misleading results. Switch to non-parametric alternatives: Mann-Whitney U instead of independent t-test, Wilcoxon instead of paired t-test, Kruskal-Wallis instead of ANOVA. PLS-SEM is particularly valuable here because it makes no distributional assumptions.

Choosing the Right Graph — A Decision You Cannot Get Wrong

Your choice of graph is not cosmetic — it is a methodological statement. Examiners and reviewers notice when you use a histogram for categorical data or a bar chart for continuous data. Here is the rule:

📊
Bar Chart
For categorical data (nominal or ordinal). Each bar represents a category. Height shows frequency, percentage, or mean score. Use for: gender, country, employment type, programme type.
Nominal / Ordinal data
📈
Histogram
For continuous data (interval or ratio). Bars touch each other — no gaps. Shows the shape of a distribution. Use to check normality, skewness, and kurtosis visually.
Interval / Ratio data
📦
Boxplot
Shows median, interquartile range, and outliers. Excellent for comparing distributions across groups. Reveals skewness through the position of the median line within the box.
Median · IQR · Outliers
🔵
Scatterplot
For showing the relationship between two continuous variables. Each dot = one case. The pattern reveals direction (positive/negative) and strength of correlation. Used before regression.
Correlation · Regression
Chart TypeData TypePurposeNever Use For
Bar ChartNominal / OrdinalFrequency of categoriesContinuous distributions
HistogramInterval / RatioShape of distributionCategorical data
BoxplotInterval / RatioSpread, median, outliersShowing exact frequencies
ScatterplotTwo continuous variablesCorrelation / regressionSingle variable distribution
Line GraphInterval / Ratio over timeTrends over timeCross-sectional categorical data
💡 Dispersion Measures — Range vs Standard Deviation

The Range (max − min) is the simplest measure of dispersion — it tells you the spread between the highest and lowest values. However it is extremely sensitive to outliers. The Standard Deviation is more robust because it uses all values and measures average distance from the mean. The Interquartile Range (IQR) = Q3 − Q1, and is used with the median (for skewed data) just as SD is used with the mean (for normal data).

Quiz Break 2 — Descriptive Statistics
5 questions on central tendency, the four moments, and the normal distribution. Dr. Tan will step out briefly — take your time and be precise.
Open Quiz on This Device →
10:45 – 11:00 AM · Break
15-minute break. Results from Quiz Break 2 will be discussed when we return. Stretch, hydrate, and come back ready for inferential statistics.
Module 3 11:00 – 11:45 AM

Inferential Statistics & SPSS

Inferential statistics allow you to draw conclusions about a population from a sample. They answer the fundamental doctoral question: "Is the pattern I see in my data real, or could it have occurred by chance?" The answer comes in the form of a p-value — and the test you use depends on your research question, your measurement scale, and your data distribution.

Statistical Test Decision Tree

Use this framework to choose the right test every time. Work through each question in order.

1. What is your research question type?
Difference between groups → go to Q2 Relationship between variables → go to Q4 Prediction → Regression / SEM
2. How many groups are you comparing?
2 groups → go to Q3 3+ groups → ANOVA (parametric) or Kruskal-Wallis
3. Are the two groups independent or paired?
Independent (different people) → Independent Samples T-test Paired (same people, two time points) → Paired Samples T-test Non-normal data → Mann-Whitney U / Wilcoxon
4. What type of relationship?
Association only → Pearson / Spearman Correlation Categorical association → Chi-Square Predictive → Regression Latent variables → SEM (Module 4)

The Core Tests — ABC Company Case Study

We use the ABC Company employee dataset (Employee_data_1.sav) throughout SPSS. Here are the 8 research questions and which test answers each one.

#Research QuestionTestWhy
Q1Gender difference in employment categories?Chi-SquareTwo categorical (nominal) variables
Q2Difference in custodial vs managerial salary?Independent T-testTwo independent groups, continuous DV
Q3Difference between beginning and current salary?Paired T-testSame employees, two measurements
Q4Correlation: Education, Salary, Experience?Pearson CorrelationRelationships between continuous variables
Q5–8Predictors of current salary?Multiple RegressionMultiple IVs predicting one DV

ANOVA — When You Have Three or More Groups

The T-test compares two groups. When you have three or more independent groups, you use ANOVA (Analysis of Variance). For example: comparing satisfaction scores across students from Pakistan, India, and UAE.

⚖️
One-Way ANOVA
Compares means across 3+ groups on a single factor. Tells you IF there is a significant difference — not WHERE. Post-hoc tests (Tukey, Bonferroni) identify which pairs differ.
F = Between-group variance / Within-group variance
🔄
Non-parametric alternative
If data is non-normal or ordinal, use Kruskal-Wallis instead of ANOVA. It is the non-parametric equivalent and makes no distributional assumptions.
Kruskal-Wallis H test

Regression — Predicting Outcomes

Regression moves beyond correlation to prediction. It models how much change in the dependent variable (Y) is produced by a unit change in the independent variable (X), holding all other variables constant.

🔑 The Regression Equation

Y = a + b₁X₁ + b₂X₂ + ... + ε

Where: Y = dependent variable (outcome to predict) · a = intercept (value of Y when all X = 0) · b = regression coefficient (how much Y changes per unit of X) · X = independent variable (predictor) · ε = error term (unexplained variance)

The value tells you what percentage of variance in Y is explained by your predictors. R² = 0.40 means your IVs explain 40% of the variance in Y.

🤖 AI Enhancement

AI tools like Python's statsmodels and R's built-in functions can run all of these tests and produce publication-ready output in seconds. More importantly, large language models (ChatGPT, Claude) can now interpret your SPSS output — paste your results table and ask for an APA-formatted interpretation. This is a legitimate productivity tool for doctoral researchers, but you must understand the underlying logic well enough to verify the interpretation.

Interpreting the p-value — What it Actually Means

The p-value is the probability of observing your result (or something more extreme) if the null hypothesis were true. It is not the probability that your hypothesis is correct. This distinction trips up many doctoral candidates in their viva.

p < 0.05
Statistically significant at the 95% confidence level. You reject the null hypothesis. The result is unlikely to be due to chance alone.
⚠️
p < 0.01
Significant at 99% confidence. A more stringent threshold. Often required in medical and natural science research.
🔍
p > 0.05
Not significant. You fail to reject the null. This does NOT mean there is no effect — it may mean your sample was too small (insufficient power).
⚠️ SPSS Demo — Switch to SPSS

Dr. Tan will now switch screens to demonstrate SPSS live. Open the Employee_data_1.sav file. We will run: Descriptive Statistics → Frequencies → Crosstabulation → Chi-Square → Independent T-test → Paired T-test → Correlation → Regression. Follow along on your own laptop if you have SPSS installed.

Quiz Break 3 — Inferential Statistics
5 questions on statistical tests, p-values, and the SPSS case study. Dr. Tan is on a short break — take 12 minutes to work through these carefully.
Open Quiz on This Device →
Module 4 11:45 AM – 12:30 PM

Structural Equation Modelling & SmartPLS

SEM is the most powerful multivariate technique available to management researchers. It combines factor analysis and regression in a single model, allows you to work with latent (unobserved) variables, corrects for measurement error, and tests complex causal structures that regression simply cannot handle.

Why SEM? — Moving Beyond Regression

📉
Multiple Regression
Tests relationships between observed variables only. Cannot model measurement error. One dependent variable at a time. Variables must be directly measurable.
🔗
SEM Advantage
Models latent (unobservable) constructs. Corrects for measurement error. Tests multiple dependent variables simultaneously. Models mediation and moderation elegantly.

CB-SEM vs PLS-SEM — Which One for Your Research?

CriterionCB-SEM (AMOS/LISREL)PLS-SEM (SmartPLS)
Sample SizeLarge (200+)Small to medium (30+)
Data DistributionMust be normalNo assumptions
Research PurposeTheory confirmationPrediction & exploration
Model ComplexityCan be difficultHandles complex models well
Latent VariablesCommon factorsComposite factors
Missing DataProblematicMore robust
Best for PhD/DBAConfirmatory, large nExploratory, typical n
💡 The SmartPLS Sweet Spot

Most PhD and DBA candidates in management, marketing, IS, and healthcare research use PLS-SEM because: (1) sample sizes in surveys of 150–300 are the norm, (2) management constructs like "trust", "satisfaction", and "intention" are inherently latent, and (3) the research is often exploratory. SmartPLS v3 is free for academic use and produces results directly reportable in journals like MIS Quarterly, JAMS, and IJIM.

Reflective vs Formative Measurement Models

This is one of the most conceptually important decisions in your SEM model — and one of the most common viva examination questions.

Reflective Model
Arrows point FROM the construct TO the indicators. The construct CAUSES the indicators. Indicators are interchangeable — dropping one does not change the construct's meaning.
Construct → Indicator 1, 2, 3
Formative Model
Arrows point FROM the indicators TO the construct. The indicators DEFINE the construct. Not interchangeable — each adds unique meaning.
Indicator 1, 2, 3 → Construct
Deli99 PLS-SEM Model
expect_1
expect_2Customer Expectation
expect_3
Customer Satisfaction
loyal_1
Customer Loyaltyloyal_2
loyal_3

PLS-SEM Quality Criteria — What to Report

CriterionWhere in SmartPLSThresholdAssesses
Outer LoadingsPLS Results → Outer Loadings> 0.70 (≥ 0.40 exploratory)Indicator Reliability
Composite ReliabilityQuality Criteria → CR&V> 0.70 (≥ 0.60 exploratory)Internal Consistency
AVEQuality Criteria → CR&V> 0.50Convergent Validity
Discriminant ValidityQuality Criteria → DV√AVE > correlationsDiscriminant Validity
Path CoefficientsBootstrapping → Path Coeffp < 0.05Hypothesis Support
PLS Results → R²> 0.26 substantialExplanatory Power
VIFQuality Criteria → Collinearity< 5.0Multicollinearity

Endogenous vs Exogenous Variables

➡️
Exogenous Variable
Only sends arrows to other variables — never receives them. The independent variable in your model. Has no antecedent construct in the path diagram. In regression terms: the predictor (X).
Arrows only pointing OUT
⬅️
Endogenous Variable
Receives arrows from other variables — it is influenced by at least one other construct. The dependent variable in your model. May also send arrows to other constructs (making it both a DV and IV simultaneously).
Arrows pointing IN (and possibly out)

EFA vs CFA — Two Very Different Approaches

Both are factor analysis techniques but they serve completely different purposes. Choosing the wrong one is a common dissertation error.

CriterionEFA — ExploratoryCFA — Confirmatory
PurposeDiscover factor structure from dataTest a pre-specified factor structure
Theory required?No — data-drivenYes — theory-driven
When to useDeveloping a new scale, no prior modelValidating an existing scale or construct
SoftwareSPSS, R (psych package)AMOS, R (lavaan), SmartPLS
PhD useScale development studiesMost management research dissertations
💡 SmartPLS Bootstrapping — Why It Matters

The PLS algorithm gives you path coefficients — but not their significance. Bootstrapping is a resampling procedure that generates t-statistics and p-values for every path in your model. SmartPLS typically uses 5,000 bootstrap subsamples. Without bootstrapping, you cannot make any claim about whether your hypotheses are supported. The bootstrapping output gives you: t-values, p-values, confidence intervals, and significance flags for every path coefficient.

Inner Model vs Outer Model

🔗
Inner Model (Structural Model)
The relationships BETWEEN latent constructs. This is where your hypotheses live — path coefficients, R², and predictive relevance (Q²). Assessed after validating the outer model.
Construct ↔ Construct paths
📐
Outer Model (Measurement Model)
How latent constructs are measured by their indicators. This is where you assess outer loadings, AVE, CR, and discriminant validity. Must be validated BEFORE assessing the inner model.
Construct ↔ Indicator loadings

Second-Order Constructs

A second-order construct is an abstract higher-level construct that is itself measured by multiple first-order constructs rather than by direct indicators. This is used when a concept is too complex to be captured by a single set of indicators.

🔑 Example — Service Quality as a Second-Order Construct

"Service Quality" cannot be directly measured. Instead, it is captured through three first-order constructs: Reliability (can you depend on the service?), Responsiveness (how quickly do they respond?), and Assurance (do they inspire confidence?). Each first-order construct has its own set of indicators. Service Quality → Reliability, Responsiveness, Assurance → each with 3–4 indicators. SmartPLS handles second-order constructs using the repeated indicators approach or the two-stage approach.

⚠️ SmartPLS Demo — Switch to SmartPLS

Dr. Tan will now switch screens to demonstrate SmartPLS live. We will use the Deli99.csv dataset. Steps: Import data → Build path model → Run PLS Algorithm → Assess outer loadings → Check CR and AVE → Run bootstrapping → Interpret path coefficients and significance → Check VIF for multicollinearity. Follow along on your own laptop.

Quiz Break 4 — SEM & SmartPLS
5 final questions on SEM theory, PLS vs CB-SEM, measurement models, and quality criteria. This is the hardest set — take your time. Dr. Tan will be back in 12 minutes.
Open Quiz on This Device →
Module 5 12:30 – 1:00 PM

Writing Up Your Results

Running the analysis is only half the work. How you write it up determines whether your examiner trusts your methodology. Here are the non-negotiable conventions for reporting quantitative results in a doctoral dissertation.

APA Reporting Rules

1
Always report effect size alongside significance
p < 0.05 tells you it's significant. Cohen's d (for t-tests) or η² (for ANOVA) tells you if it matters practically. A p-value of 0.001 with d = 0.1 is statistically significant but practically meaningless.
2
Report exact p-values, not just "p < 0.05"
Write p = 0.032, not p < 0.05. The exception is when p < 0.001, which you may report as such. APA 7th edition requires this.
3
For SEM: report a complete measurement model table before the structural model
Show outer loadings, CR, AVE, and discriminant validity evidence BEFORE you discuss path coefficients. Examiners check this order.
4
Connect every statistical finding back to your research question
"The independent samples t-test revealed a significant difference (t(472) = 4.32, p = 0.001) in current salary between custodial (M = $27,838) and managerial (M = $63,977) employees, supporting H1." Every result needs a sentence like this.
🤖 AI Writing Assistant

You can legitimately use AI to help interpret and write up your SPSS and SmartPLS output — provided you understand what you are reporting. Paste your output table into Claude or ChatGPT and ask: "Write an APA-format results paragraph for this independent samples t-test output for a doctoral dissertation." Then verify the interpretation against what you know. AI writes the prose; you provide the scholarly judgement.


Bonus Module If time permits

Mixed Methods Research

Mixed methods research is the fastest-growing methodology in management, health, and social science doctoral research. It combines the statistical power of quantitative research with the contextual depth of qualitative research — producing findings that are both generalisable and deeply understood.

Why Mixed Methods? The Case for Both

Every methodology has blind spots. Surveys tell you what is happening and how much — but rarely why. Interviews tell you why — but you cannot generalise from 15 people to a population. Mixed methods fills both gaps.

📊
Quantitative Alone
Tells you: what, how much, how many, is there a relationship? Cannot tell you: why, how does it feel, what is the lived experience, what nuances exist?
🗣️
Qualitative Alone
Tells you: why, how, what is the lived experience. Cannot tell you: how widespread, how significant statistically, does this generalise to the population?
🔀
Mixed Methods
Triangulates both. Provides breadth (quantitative) AND depth (qualitative). Each strand validates and enriches the other. Strongest for complex management phenomena.

The Three Core Mixed Methods Designs

1️⃣
Explanatory Sequential
QUANT → QUAL. Quantitative first. Qualitative explains unexpected or puzzling quantitative results. Most common in management research.
Survey (n=300) → Interviews (n=15)
2️⃣
Exploratory Sequential
QUAL → QUANT. Qualitative first to explore and build theory. Findings inform the design of a survey instrument. Used when no suitable scale exists.
Interviews (n=20) → Survey (n=250)
3️⃣
Convergent Parallel
QUANT + QUAL simultaneously. Both strands collected independently at the same time, then merged and compared. Strongest for triangulation.
Survey + Interviews (concurrent)
DesignOrderPurposeBest For
Explanatory SequentialQUANT → QUALQual explains quant resultsUnexpected survey findings
Exploratory SequentialQUAL → QUANTQual builds the survey instrumentNew contexts, no existing scales
Convergent ParallelQUANT + QUALTriangulation — compare and mergeComplex phenomena, time-limited
💡 Which Design for Your Research?

If your survey results are surprising and you want to understand WHY → Explanatory Sequential. If you are entering a new context where no validated scales exist and you need to develop your own instrument → Exploratory Sequential. If you have limited time and want the strongest triangulation → Convergent Parallel. Most DBA candidates use Explanatory Sequential because they already have a survey but find unexpected results that demand deeper investigation.

🔑 Triangulation — The Core Strength

Triangulation means using multiple data sources or methods to cross-validate findings. If your survey shows that trust significantly predicts loyalty (β = 0.62, p < 0.001) AND your interviews reveal that customers say "I stay because I trust them completely" — the two strands corroborate each other. This convergence strengthens your claims dramatically. If they diverge — that divergence itself becomes a finding worth explaining.

Quiz 6 — Mixed Methods Research
6 questions on mixed methods designs, triangulation, and choosing the right approach for your research.
Open Quiz on This Device →
Session Close

Open Q&A

Use the Live Quiz open-text question to submit your final questions to Dr. Tan. All questions are displayed anonymously on screen and discussed as a group.

Submit Your Question → ← Back to Home