Saturday Intensive · 9:00 AM – 1:00 PM

Quantitative Research Methods
in the Age of AI

A doctoral-level intensive bridging classical statistical methodology with AI-driven data science. By the end of this session you will be able to design a quantitative study, choose the right statistical test, run descriptive and inferential analysis in SPSS, and build a PLS-SEM model in SmartPLS.

Instructor
Dr. Teik-Kheong Tan, PhD · IEEE

Duration
4 Hours · 4 Modules · 4 Quiz Breaks

Tools
SPSS · SmartPLS · Python · R

Level
PhD / DBA Doctoral Candidates

Module 1 9:20 – 10:00 AM

Research Foundations & Measurement

Before you touch SPSS or SmartPLS, you must be anchored in the philosophical foundations of what you are doing and why. Every methodological choice you make — from your sampling strategy to your statistical test — flows from these foundations.

Research Paradigms — Why Your Worldview Matters

A research paradigm is your set of beliefs about the nature of reality (ontology) and how we can know it (epistemology). Quantitative research sits firmly within positivism — the view that reality exists independently of the observer, that it can be measured objectively, and that patterns in data reveal universal truths about human behaviour.

🔭

Positivism

Reality is objective and measurable. Knowledge comes from empirical observation and measurement. This is the home of quantitative research.

🔬

Post-Positivism

Reality exists but our measurement of it is imperfect. Acknowledges measurement error — which is exactly why SEM corrects for it.

🧠

Interpretivism

Reality is socially constructed. Knowledge is subjective. This is where qualitative research lives — interviews, ethnography, grounded theory.

⚖️

Pragmatism

Use whatever works. Mixed methods researchers often sit here — combining surveys with interviews for a fuller picture.

🤖 AI Lens

Machine learning models are inherently positivist — they assume patterns in data are real and discoverable. When you use AI to analyse survey data or predict outcomes, you are operationalising positivism at scale. Understanding this alignment helps you defend your methodology to examiners who may question the use of AI tools in doctoral research.

The Four Measurement Scales — The DNA of Quantitative Data

Every variable in your study sits on one of four measurement scales. The scale determines which descriptive statistics are meaningful and which inferential tests are valid. Getting this wrong is one of the most common errors in doctoral dissertations.

NNominal
Categories only

OOrdinal
Order, no distance

IInterval
Order + distance

RRatio
True zero point

Scale	Example	Mode	Median	Mean	SD	Ratio
Nominal	Gender, Country, PhD/DBA	✓	—	—	—	—
Ordinal	Rankings, Likert* (debated)	✓	✓	—	—	—
Interval	Likert 1–7, Temperature °C	✓	✓	✓	✓	—
Ratio	Salary, Age, Years of experience	✓	✓	✓	✓	✓

⚠️ The Likert Scale Debate

Strictly speaking, a Likert scale (1–7) is ordinal — we cannot prove the distance between "3" and "4" equals the distance between "6" and "7". However, in practice, most researchers in management, marketing, and social science treat Likert scales as interval when there are 5 or more points. SmartPLS and SPSS both support this convention. Be prepared to justify this in your viva.

🔑 Key Rule

You can always move down the scale (treat ratio as interval) but never move up (you cannot treat nominal data as ratio). This single rule resolves most methodological choice questions about which statistic to use.

⚡

Quiz Break 1 — Foundations & Measurement

Dr. Tan will launch 5 questions on your screen. Take 10–12 minutes to answer. Be honest with your responses — the class results will be discussed together.

Open Quiz on This Device →

Module 2 10:00 – 10:45 AM

Descriptive Statistics & the Normal Distribution

Before you test any hypothesis, you must describe your data. Descriptive statistics are not just preliminary steps — they reveal the story your data is telling. A non-normal distribution, a bimodal distribution, an extreme outlier — all of these change which inferential tests you can legitimately use.

Measures of Central Tendency

Central tendency statistics tell you where the middle of your distribution sits. The choice between mean, median, and mode is not arbitrary — it depends on your scale of measurement and the shape of your distribution.

📊

Mode

The most frequently occurring value. The only measure valid for nominal data. Useful for understanding the most common category in your sample.

Valid for: Nominal, Ordinal, Interval, Ratio

📏

Median

The 50th percentile — the middle value when data is ordered. Robust to outliers. Use when your distribution is skewed.

Valid for: Ordinal, Interval, Ratio

⚖️

Mean

The arithmetic average. Uses all data points — so it is pulled by extreme values (outliers). The foundation of most parametric tests.

Mean = Σx / N

💡 Doctoral Insight

When you report descriptive statistics in your dissertation, always report both mean and median for continuous variables. A large gap between them signals skewness — which then forces you to justify whether parametric or non-parametric tests are appropriate for your inferential analysis.

The Four Moments of a Normal Distribution

A normal (bell-curve) distribution is completely described by four mathematical properties — the "four moments." Your data will never be perfectly normal, but understanding how far it deviates — and in which direction — is critical for choosing the right tests.

1️⃣

Mean — 1st Moment

Central tendency. Where the distribution is centred. The peak of the bell curve.

Σx / N

2️⃣

Standard Deviation — 2nd Moment

Spread/dispersion. How tightly or loosely values cluster around the mean.

SD = √[Σ(x-x̄)² / (N-1)]

3️⃣

Skewness — 3rd Moment

The lean or tail of the distribution. Positive skew = tail to the right. Negative skew = tail to the left.

+ve = right tail | -ve = left tail

4️⃣

Kurtosis — 4th Moment

Peakedness vs. flatness. Leptokurtic (+ve) = peaked. Platykurtic (-ve) = flat.

Leptokurtic | Platykurtic

Key Areas Under the Normal Curve — The Empirical Rule

±1 SD ≈ 68% of data

±2 SD ≈ 95% of data

±3 SD ≈ 99.9% of data

🔑 When to use non-parametric tests

If your data is significantly skewed (|skewness| > 2) or has extreme kurtosis, parametric tests like t-tests and ANOVA may give you misleading results. Switch to non-parametric alternatives: Mann-Whitney U instead of independent t-test, Wilcoxon instead of paired t-test, Kruskal-Wallis instead of ANOVA. PLS-SEM is particularly valuable here because it makes no distributional assumptions.

Choosing the Right Graph — A Decision You Cannot Get Wrong

Your choice of graph is not cosmetic — it is a methodological statement. Examiners and reviewers notice when you use a histogram for categorical data or a bar chart for continuous data. Here is the rule:

📊

Bar Chart

For categorical data (nominal or ordinal). Each bar represents a category. Height shows frequency, percentage, or mean score. Use for: gender, country, employment type, programme type.

Nominal / Ordinal data

📈

Histogram

For continuous data (interval or ratio). Bars touch each other — no gaps. Shows the shape of a distribution. Use to check normality, skewness, and kurtosis visually.

Interval / Ratio data

📦

Boxplot

Shows median, interquartile range, and outliers. Excellent for comparing distributions across groups. Reveals skewness through the position of the median line within the box.

Median · IQR · Outliers

🔵

Scatterplot

For showing the relationship between two continuous variables. Each dot = one case. The pattern reveals direction (positive/negative) and strength of correlation. Used before regression.

Correlation · Regression

Chart Type	Data Type	Purpose	Never Use For
Bar Chart	Nominal / Ordinal	Frequency of categories	Continuous distributions
Histogram	Interval / Ratio	Shape of distribution	Categorical data
Boxplot	Interval / Ratio	Spread, median, outliers	Showing exact frequencies
Scatterplot	Two continuous variables	Correlation / regression	Single variable distribution
Line Graph	Interval / Ratio over time	Trends over time	Cross-sectional categorical data

💡 Dispersion Measures — Range vs Standard Deviation

The Range (max − min) is the simplest measure of dispersion — it tells you the spread between the highest and lowest values. However it is extremely sensitive to outliers. The Standard Deviation is more robust because it uses all values and measures average distance from the mean. The Interquartile Range (IQR) = Q3 − Q1, and is used with the median (for skewed data) just as SD is used with the mean (for normal data).

⚡

Quiz Break 2 — Descriptive Statistics

5 questions on central tendency, the four moments, and the normal distribution. Dr. Tan will step out briefly — take your time and be precise.

Open Quiz on This Device →

☕

10:45 – 11:00 AM · Break

15-minute break. Results from Quiz Break 2 will be discussed when we return. Stretch, hydrate, and come back ready for inferential statistics.

Module 3 11:00 – 11:45 AM

Inferential Statistics & SPSS

Inferential statistics allow you to draw conclusions about a population from a sample. They answer the fundamental doctoral question: "Is the pattern I see in my data real, or could it have occurred by chance?" The answer comes in the form of a p-value — and the test you use depends on your research question, your measurement scale, and your data distribution.

Statistical Test Decision Tree

Use this framework to choose the right test every time. Work through each question in order.

1. What is your research question type?

Difference between groups → go to Q2 Relationship between variables → go to Q4 Prediction → Regression / SEM

2. How many groups are you comparing?

2 groups → go to Q3 3+ groups → ANOVA (parametric) or Kruskal-Wallis

3. Are the two groups independent or paired?

Independent (different people) → Independent Samples T-test Paired (same people, two time points) → Paired Samples T-test Non-normal data → Mann-Whitney U / Wilcoxon

4. What type of relationship?

Association only → Pearson / Spearman Correlation Categorical association → Chi-Square Predictive → Regression Latent variables → SEM (Module 4)

The Core Tests — ABC Company Case Study

We use the ABC Company employee dataset (Employee_data_1.sav) throughout SPSS. Here are the 8 research questions and which test answers each one.

#	Research Question	Test	Why
Q1	Gender difference in employment categories?	Chi-Square	Two categorical (nominal) variables
Q2	Difference in custodial vs managerial salary?	Independent T-test	Two independent groups, continuous DV
Q3	Difference between beginning and current salary?	Paired T-test	Same employees, two measurements
Q4	Correlation: Education, Salary, Experience?	Pearson Correlation	Relationships between continuous variables
Q5–8	Predictors of current salary?	Multiple Regression	Multiple IVs predicting one DV

ANOVA — When You Have Three or More Groups

The T-test compares two groups. When you have three or more independent groups, you use ANOVA (Analysis of Variance). For example: comparing satisfaction scores across students from Pakistan, India, and UAE.

⚖️

One-Way ANOVA

Compares means across 3+ groups on a single factor. Tells you IF there is a significant difference — not WHERE. Post-hoc tests (Tukey, Bonferroni) identify which pairs differ.

F = Between-group variance / Within-group variance

🔄

Non-parametric alternative

If data is non-normal or ordinal, use Kruskal-Wallis instead of ANOVA. It is the non-parametric equivalent and makes no distributional assumptions.

Kruskal-Wallis H test

Regression — Predicting Outcomes

Regression moves beyond correlation to prediction. It models how much change in the dependent variable (Y) is produced by a unit change in the independent variable (X), holding all other variables constant.

🔑 The Regression Equation

Y = a + b₁X₁ + b₂X₂ + ... + ε

Where: Y = dependent variable (outcome to predict) · a = intercept (value of Y when all X = 0) · b = regression coefficient (how much Y changes per unit of X) · X = independent variable (predictor) · ε = error term (unexplained variance)

The R² value tells you what percentage of variance in Y is explained by your predictors. R² = 0.40 means your IVs explain 40% of the variance in Y.

🤖 AI Enhancement

AI tools like Python's statsmodels and R's built-in functions can run all of these tests and produce publication-ready output in seconds. More importantly, large language models (ChatGPT, Claude) can now interpret your SPSS output — paste your results table and ask for an APA-formatted interpretation. This is a legitimate productivity tool for doctoral researchers, but you must understand the underlying logic well enough to verify the interpretation.

Interpreting the p-value — What it Actually Means

The p-value is the probability of observing your result (or something more extreme) if the null hypothesis were true. It is not the probability that your hypothesis is correct. This distinction trips up many doctoral candidates in their viva.

✅

p < 0.05

Statistically significant at the 95% confidence level. You reject the null hypothesis. The result is unlikely to be due to chance alone.

⚠️

p < 0.01

Significant at 99% confidence. A more stringent threshold. Often required in medical and natural science research.

🔍

p > 0.05

Not significant. You fail to reject the null. This does NOT mean there is no effect — it may mean your sample was too small (insufficient power).

⚠️ SPSS Demo — Switch to SPSS

Dr. Tan will now switch screens to demonstrate SPSS live. Open the Employee_data_1.sav file. We will run: Descriptive Statistics → Frequencies → Crosstabulation → Chi-Square → Independent T-test → Paired T-test → Correlation → Regression. Follow along on your own laptop if you have SPSS installed.

⚡

Quiz Break 3 — Inferential Statistics

5 questions on statistical tests, p-values, and the SPSS case study. Dr. Tan is on a short break — take 12 minutes to work through these carefully.

Open Quiz on This Device →

Module 4 11:45 AM – 12:30 PM

Structural Equation Modelling & SmartPLS

SEM is the most powerful multivariate technique available to management researchers. It combines factor analysis and regression in a single model, allows you to work with latent (unobserved) variables, corrects for measurement error, and tests complex causal structures that regression simply cannot handle.

Why SEM? — Moving Beyond Regression

📉

Multiple Regression

Tests relationships between observed variables only. Cannot model measurement error. One dependent variable at a time. Variables must be directly measurable.

🔗

SEM Advantage

Models latent (unobservable) constructs. Corrects for measurement error. Tests multiple dependent variables simultaneously. Models mediation and moderation elegantly.

CB-SEM vs PLS-SEM — Which One for Your Research?

Criterion	CB-SEM (AMOS/LISREL)	PLS-SEM (SmartPLS)
Sample Size	Large (200+)	Small to medium (30+)
Data Distribution	Must be normal	No assumptions
Research Purpose	Theory confirmation	Prediction & exploration
Model Complexity	Can be difficult	Handles complex models well
Latent Variables	Common factors	Composite factors
Missing Data	Problematic	More robust
Best for PhD/DBA	Confirmatory, large n	Exploratory, typical n

💡 The SmartPLS Sweet Spot

Most PhD and DBA candidates in management, marketing, IS, and healthcare research use PLS-SEM because: (1) sample sizes in surveys of 150–300 are the norm, (2) management constructs like "trust", "satisfaction", and "intention" are inherently latent, and (3) the research is often exploratory. SmartPLS v3 is free for academic use and produces results directly reportable in journals like MIS Quarterly, JAMS, and IJIM.

Reflective vs Formative Measurement Models

This is one of the most conceptually important decisions in your SEM model — and one of the most common viva examination questions.

←

Reflective Model

Arrows point FROM the construct TO the indicators. The construct CAUSES the indicators. Indicators are interchangeable — dropping one does not change the construct's meaning.

Construct → Indicator 1, 2, 3

→

Formative Model

Arrows point FROM the indicators TO the construct. The indicators DEFINE the construct. Not interchangeable — each adds unique meaning.

Indicator 1, 2, 3 → Construct

Deli99 PLS-SEM Model

expect_1→

expect_2→Customer Expectation

expect_3→

→ Customer Satisfaction →

→loyal_1

Customer Loyalty→loyal_2

→loyal_3

PLS-SEM Quality Criteria — What to Report

Criterion	Where in SmartPLS	Threshold	Assesses
Outer Loadings	PLS Results → Outer Loadings	> 0.70 (≥ 0.40 exploratory)	Indicator Reliability
Composite Reliability	Quality Criteria → CR&V	> 0.70 (≥ 0.60 exploratory)	Internal Consistency
AVE	Quality Criteria → CR&V	> 0.50	Convergent Validity
Discriminant Validity	Quality Criteria → DV	√AVE > correlations	Discriminant Validity
Path Coefficients	Bootstrapping → Path Coeff	p < 0.05	Hypothesis Support
R²	PLS Results → R²	> 0.26 substantial	Explanatory Power
VIF	Quality Criteria → Collinearity	< 5.0	Multicollinearity

Endogenous vs Exogenous Variables

➡️

Exogenous Variable

Only sends arrows to other variables — never receives them. The independent variable in your model. Has no antecedent construct in the path diagram. In regression terms: the predictor (X).

Arrows only pointing OUT

⬅️

Endogenous Variable

Receives arrows from other variables — it is influenced by at least one other construct. The dependent variable in your model. May also send arrows to other constructs (making it both a DV and IV simultaneously).

Arrows pointing IN (and possibly out)

EFA vs CFA — Two Very Different Approaches

Both are factor analysis techniques but they serve completely different purposes. Choosing the wrong one is a common dissertation error.

Criterion	EFA — Exploratory	CFA — Confirmatory
Purpose	Discover factor structure from data	Test a pre-specified factor structure
Theory required?	No — data-driven	Yes — theory-driven
When to use	Developing a new scale, no prior model	Validating an existing scale or construct
Software	SPSS, R (psych package)	AMOS, R (lavaan), SmartPLS
PhD use	Scale development studies	Most management research dissertations

💡 SmartPLS Bootstrapping — Why It Matters

The PLS algorithm gives you path coefficients — but not their significance. Bootstrapping is a resampling procedure that generates t-statistics and p-values for every path in your model. SmartPLS typically uses 5,000 bootstrap subsamples. Without bootstrapping, you cannot make any claim about whether your hypotheses are supported. The bootstrapping output gives you: t-values, p-values, confidence intervals, and significance flags for every path coefficient.

Inner Model vs Outer Model

🔗

Inner Model (Structural Model)

The relationships BETWEEN latent constructs. This is where your hypotheses live — path coefficients, R², and predictive relevance (Q²). Assessed after validating the outer model.

Construct ↔ Construct paths

📐

Outer Model (Measurement Model)

How latent constructs are measured by their indicators. This is where you assess outer loadings, AVE, CR, and discriminant validity. Must be validated BEFORE assessing the inner model.

Construct ↔ Indicator loadings

Second-Order Constructs

A second-order construct is an abstract higher-level construct that is itself measured by multiple first-order constructs rather than by direct indicators. This is used when a concept is too complex to be captured by a single set of indicators.

🔑 Example — Service Quality as a Second-Order Construct

"Service Quality" cannot be directly measured. Instead, it is captured through three first-order constructs: Reliability (can you depend on the service?), Responsiveness (how quickly do they respond?), and Assurance (do they inspire confidence?). Each first-order construct has its own set of indicators. Service Quality → Reliability, Responsiveness, Assurance → each with 3–4 indicators. SmartPLS handles second-order constructs using the repeated indicators approach or the two-stage approach.

⚠️ SmartPLS Demo — Switch to SmartPLS

Dr. Tan will now switch screens to demonstrate SmartPLS live. We will use the Deli99.csv dataset. Steps: Import data → Build path model → Run PLS Algorithm → Assess outer loadings → Check CR and AVE → Run bootstrapping → Interpret path coefficients and significance → Check VIF for multicollinearity. Follow along on your own laptop.

⚡

Quiz Break 4 — SEM & SmartPLS

5 final questions on SEM theory, PLS vs CB-SEM, measurement models, and quality criteria. This is the hardest set — take your time. Dr. Tan will be back in 12 minutes.

Open Quiz on This Device →

Module 5 12:30 – 1:00 PM

Writing Up Your Results

Running the analysis is only half the work. How you write it up determines whether your examiner trusts your methodology. Here are the non-negotiable conventions for reporting quantitative results in a doctoral dissertation.

APA Reporting Rules

Always report effect size alongside significance

p < 0.05 tells you it's significant. Cohen's d (for t-tests) or η² (for ANOVA) tells you if it matters practically. A p-value of 0.001 with d = 0.1 is statistically significant but practically meaningless.

Report exact p-values, not just "p < 0.05"

Write p = 0.032, not p < 0.05. The exception is when p < 0.001, which you may report as such. APA 7th edition requires this.

For SEM: report a complete measurement model table before the structural model

Show outer loadings, CR, AVE, and discriminant validity evidence BEFORE you discuss path coefficients. Examiners check this order.

Connect every statistical finding back to your research question

"The independent samples t-test revealed a significant difference (t(472) = 4.32, p = 0.001) in current salary between custodial (M = $27,838) and managerial (M = $63,977) employees, supporting H1." Every result needs a sentence like this.

🤖 AI Writing Assistant

You can legitimately use AI to help interpret and write up your SPSS and SmartPLS output — provided you understand what you are reporting. Paste your output table into Claude or ChatGPT and ask: "Write an APA-format results paragraph for this independent samples t-test output for a doctoral dissertation." Then verify the interpretation against what you know. AI writes the prose; you provide the scholarly judgement.

Bonus Module If time permits

Mixed Methods Research

Mixed methods research is the fastest-growing methodology in management, health, and social science doctoral research. It combines the statistical power of quantitative research with the contextual depth of qualitative research — producing findings that are both generalisable and deeply understood.

Why Mixed Methods? The Case for Both

Every methodology has blind spots. Surveys tell you what is happening and how much — but rarely why. Interviews tell you why — but you cannot generalise from 15 people to a population. Mixed methods fills both gaps.

📊

Quantitative Alone

Tells you: what, how much, how many, is there a relationship? Cannot tell you: why, how does it feel, what is the lived experience, what nuances exist?

🗣️

Qualitative Alone

Tells you: why, how, what is the lived experience. Cannot tell you: how widespread, how significant statistically, does this generalise to the population?

🔀

Mixed Methods

Triangulates both. Provides breadth (quantitative) AND depth (qualitative). Each strand validates and enriches the other. Strongest for complex management phenomena.

The Three Core Mixed Methods Designs

1️⃣

Explanatory Sequential

QUANT → QUAL. Quantitative first. Qualitative explains unexpected or puzzling quantitative results. Most common in management research.

Survey (n=300) → Interviews (n=15)

2️⃣

Exploratory Sequential

QUAL → QUANT. Qualitative first to explore and build theory. Findings inform the design of a survey instrument. Used when no suitable scale exists.

Interviews (n=20) → Survey (n=250)

3️⃣

Convergent Parallel

QUANT + QUAL simultaneously. Both strands collected independently at the same time, then merged and compared. Strongest for triangulation.

Survey + Interviews (concurrent)

Design	Order	Purpose	Best For
Explanatory Sequential	QUANT → QUAL	Qual explains quant results	Unexpected survey findings
Exploratory Sequential	QUAL → QUANT	Qual builds the survey instrument	New contexts, no existing scales
Convergent Parallel	QUANT + QUAL	Triangulation — compare and merge	Complex phenomena, time-limited

💡 Which Design for Your Research?

If your survey results are surprising and you want to understand WHY → Explanatory Sequential. If you are entering a new context where no validated scales exist and you need to develop your own instrument → Exploratory Sequential. If you have limited time and want the strongest triangulation → Convergent Parallel. Most DBA candidates use Explanatory Sequential because they already have a survey but find unexpected results that demand deeper investigation.

🔑 Triangulation — The Core Strength

Triangulation means using multiple data sources or methods to cross-validate findings. If your survey shows that trust significantly predicts loyalty (β = 0.62, p < 0.001) AND your interviews reveal that customers say "I stay because I trust them completely" — the two strands corroborate each other. This convergence strengthens your claims dramatically. If they diverge — that divergence itself becomes a finding worth explaining.

⚡

Quiz 6 — Mixed Methods Research

6 questions on mixed methods designs, triangulation, and choosing the right approach for your research.

Open Quiz on This Device →

Session Close

Open Q&A

Use the Live Quiz open-text question to submit your final questions to Dr. Tan. All questions are displayed anonymously on screen and discussed as a group.

Submit Your Question → ← Back to Home

Quantitative Research Methodsin the Age of AI

Research Foundations & Measurement

Research Paradigms — Why Your Worldview Matters

The Four Measurement Scales — The DNA of Quantitative Data

Descriptive Statistics & the Normal Distribution

Measures of Central Tendency

The Four Moments of a Normal Distribution

Choosing the Right Graph — A Decision You Cannot Get Wrong

Inferential Statistics & SPSS

Statistical Test Decision Tree

The Core Tests — ABC Company Case Study

ANOVA — When You Have Three or More Groups

Regression — Predicting Outcomes

Interpreting the p-value — What it Actually Means

Structural Equation Modelling & SmartPLS

Why SEM? — Moving Beyond Regression

CB-SEM vs PLS-SEM — Which One for Your Research?

Reflective vs Formative Measurement Models

PLS-SEM Quality Criteria — What to Report

Endogenous vs Exogenous Variables

EFA vs CFA — Two Very Different Approaches

Inner Model vs Outer Model

Second-Order Constructs

Writing Up Your Results

APA Reporting Rules

Mixed Methods Research

Why Mixed Methods? The Case for Both

The Three Core Mixed Methods Designs

Open Q&A

Quantitative Research Methods
in the Age of AI