Quantitative Research Methods
in the Age of AI
A doctoral-level intensive bridging classical statistical methodology with AI-driven data science. By the end of this session you will be able to design a quantitative study, choose the right statistical test, run descriptive and inferential analysis in SPSS, and build a PLS-SEM model in SmartPLS.
Research Foundations & Measurement
Before you touch SPSS or SmartPLS, you must be anchored in the philosophical foundations of what you are doing and why. Every methodological choice you make — from your sampling strategy to your statistical test — flows from these foundations.
Research Paradigms — Why Your Worldview Matters
A research paradigm is your set of beliefs about the nature of reality (ontology) and how we can know it (epistemology). Quantitative research sits firmly within positivism — the view that reality exists independently of the observer, that it can be measured objectively, and that patterns in data reveal universal truths about human behaviour.
Machine learning models are inherently positivist — they assume patterns in data are real and discoverable. When you use AI to analyse survey data or predict outcomes, you are operationalising positivism at scale. Understanding this alignment helps you defend your methodology to examiners who may question the use of AI tools in doctoral research.
The Four Measurement Scales — The DNA of Quantitative Data
Every variable in your study sits on one of four measurement scales. The scale determines which descriptive statistics are meaningful and which inferential tests are valid. Getting this wrong is one of the most common errors in doctoral dissertations.
Categories only
Order, no distance
Order + distance
True zero point
| Scale | Example | Mode | Median | Mean | SD | Ratio |
|---|---|---|---|---|---|---|
| Nominal | Gender, Country, PhD/DBA | ✓ | — | — | — | — |
| Ordinal | Rankings, Likert* (debated) | ✓ | ✓ | — | — | — |
| Interval | Likert 1–7, Temperature °C | ✓ | ✓ | ✓ | ✓ | — |
| Ratio | Salary, Age, Years of experience | ✓ | ✓ | ✓ | ✓ | ✓ |
Strictly speaking, a Likert scale (1–7) is ordinal — we cannot prove the distance between "3" and "4" equals the distance between "6" and "7". However, in practice, most researchers in management, marketing, and social science treat Likert scales as interval when there are 5 or more points. SmartPLS and SPSS both support this convention. Be prepared to justify this in your viva.
You can always move down the scale (treat ratio as interval) but never move up (you cannot treat nominal data as ratio). This single rule resolves most methodological choice questions about which statistic to use.
Descriptive Statistics & the Normal Distribution
Before you test any hypothesis, you must describe your data. Descriptive statistics are not just preliminary steps — they reveal the story your data is telling. A non-normal distribution, a bimodal distribution, an extreme outlier — all of these change which inferential tests you can legitimately use.
Measures of Central Tendency
Central tendency statistics tell you where the middle of your distribution sits. The choice between mean, median, and mode is not arbitrary — it depends on your scale of measurement and the shape of your distribution.
When you report descriptive statistics in your dissertation, always report both mean and median for continuous variables. A large gap between them signals skewness — which then forces you to justify whether parametric or non-parametric tests are appropriate for your inferential analysis.
The Four Moments of a Normal Distribution
A normal (bell-curve) distribution is completely described by four mathematical properties — the "four moments." Your data will never be perfectly normal, but understanding how far it deviates — and in which direction — is critical for choosing the right tests.
If your data is significantly skewed (|skewness| > 2) or has extreme kurtosis, parametric tests like t-tests and ANOVA may give you misleading results. Switch to non-parametric alternatives: Mann-Whitney U instead of independent t-test, Wilcoxon instead of paired t-test, Kruskal-Wallis instead of ANOVA. PLS-SEM is particularly valuable here because it makes no distributional assumptions.
Choosing the Right Graph — A Decision You Cannot Get Wrong
Your choice of graph is not cosmetic — it is a methodological statement. Examiners and reviewers notice when you use a histogram for categorical data or a bar chart for continuous data. Here is the rule:
| Chart Type | Data Type | Purpose | Never Use For |
|---|---|---|---|
| Bar Chart | Nominal / Ordinal | Frequency of categories | Continuous distributions |
| Histogram | Interval / Ratio | Shape of distribution | Categorical data |
| Boxplot | Interval / Ratio | Spread, median, outliers | Showing exact frequencies |
| Scatterplot | Two continuous variables | Correlation / regression | Single variable distribution |
| Line Graph | Interval / Ratio over time | Trends over time | Cross-sectional categorical data |
The Range (max − min) is the simplest measure of dispersion — it tells you the spread between the highest and lowest values. However it is extremely sensitive to outliers. The Standard Deviation is more robust because it uses all values and measures average distance from the mean. The Interquartile Range (IQR) = Q3 − Q1, and is used with the median (for skewed data) just as SD is used with the mean (for normal data).
Inferential Statistics & SPSS
Inferential statistics allow you to draw conclusions about a population from a sample. They answer the fundamental doctoral question: "Is the pattern I see in my data real, or could it have occurred by chance?" The answer comes in the form of a p-value — and the test you use depends on your research question, your measurement scale, and your data distribution.
Statistical Test Decision Tree
Use this framework to choose the right test every time. Work through each question in order.
The Core Tests — ABC Company Case Study
We use the ABC Company employee dataset (Employee_data_1.sav) throughout SPSS. Here are the 8 research questions and which test answers each one.
| # | Research Question | Test | Why |
|---|---|---|---|
| Q1 | Gender difference in employment categories? | Chi-Square | Two categorical (nominal) variables |
| Q2 | Difference in custodial vs managerial salary? | Independent T-test | Two independent groups, continuous DV |
| Q3 | Difference between beginning and current salary? | Paired T-test | Same employees, two measurements |
| Q4 | Correlation: Education, Salary, Experience? | Pearson Correlation | Relationships between continuous variables |
| Q5–8 | Predictors of current salary? | Multiple Regression | Multiple IVs predicting one DV |
ANOVA — When You Have Three or More Groups
The T-test compares two groups. When you have three or more independent groups, you use ANOVA (Analysis of Variance). For example: comparing satisfaction scores across students from Pakistan, India, and UAE.
Regression — Predicting Outcomes
Regression moves beyond correlation to prediction. It models how much change in the dependent variable (Y) is produced by a unit change in the independent variable (X), holding all other variables constant.
Y = a + b₁X₁ + b₂X₂ + ... + ε
Where: Y = dependent variable (outcome to predict) · a = intercept (value of Y when all X = 0) · b = regression coefficient (how much Y changes per unit of X) · X = independent variable (predictor) · ε = error term (unexplained variance)
The R² value tells you what percentage of variance in Y is explained by your predictors. R² = 0.40 means your IVs explain 40% of the variance in Y.
AI tools like Python's statsmodels and R's built-in functions can run all of these tests and produce publication-ready output in seconds. More importantly, large language models (ChatGPT, Claude) can now interpret your SPSS output — paste your results table and ask for an APA-formatted interpretation. This is a legitimate productivity tool for doctoral researchers, but you must understand the underlying logic well enough to verify the interpretation.
Interpreting the p-value — What it Actually Means
The p-value is the probability of observing your result (or something more extreme) if the null hypothesis were true. It is not the probability that your hypothesis is correct. This distinction trips up many doctoral candidates in their viva.
Dr. Tan will now switch screens to demonstrate SPSS live. Open the Employee_data_1.sav file. We will run: Descriptive Statistics → Frequencies → Crosstabulation → Chi-Square → Independent T-test → Paired T-test → Correlation → Regression. Follow along on your own laptop if you have SPSS installed.
Structural Equation Modelling & SmartPLS
SEM is the most powerful multivariate technique available to management researchers. It combines factor analysis and regression in a single model, allows you to work with latent (unobserved) variables, corrects for measurement error, and tests complex causal structures that regression simply cannot handle.
Why SEM? — Moving Beyond Regression
CB-SEM vs PLS-SEM — Which One for Your Research?
| Criterion | CB-SEM (AMOS/LISREL) | PLS-SEM (SmartPLS) |
|---|---|---|
| Sample Size | Large (200+) | Small to medium (30+) |
| Data Distribution | Must be normal | No assumptions |
| Research Purpose | Theory confirmation | Prediction & exploration |
| Model Complexity | Can be difficult | Handles complex models well |
| Latent Variables | Common factors | Composite factors |
| Missing Data | Problematic | More robust |
| Best for PhD/DBA | Confirmatory, large n | Exploratory, typical n |
Most PhD and DBA candidates in management, marketing, IS, and healthcare research use PLS-SEM because: (1) sample sizes in surveys of 150–300 are the norm, (2) management constructs like "trust", "satisfaction", and "intention" are inherently latent, and (3) the research is often exploratory. SmartPLS v3 is free for academic use and produces results directly reportable in journals like MIS Quarterly, JAMS, and IJIM.
Reflective vs Formative Measurement Models
This is one of the most conceptually important decisions in your SEM model — and one of the most common viva examination questions.
PLS-SEM Quality Criteria — What to Report
| Criterion | Where in SmartPLS | Threshold | Assesses |
|---|---|---|---|
| Outer Loadings | PLS Results → Outer Loadings | > 0.70 (≥ 0.40 exploratory) | Indicator Reliability |
| Composite Reliability | Quality Criteria → CR&V | > 0.70 (≥ 0.60 exploratory) | Internal Consistency |
| AVE | Quality Criteria → CR&V | > 0.50 | Convergent Validity |
| Discriminant Validity | Quality Criteria → DV | √AVE > correlations | Discriminant Validity |
| Path Coefficients | Bootstrapping → Path Coeff | p < 0.05 | Hypothesis Support |
| R² | PLS Results → R² | > 0.26 substantial | Explanatory Power |
| VIF | Quality Criteria → Collinearity | < 5.0 | Multicollinearity |
Endogenous vs Exogenous Variables
EFA vs CFA — Two Very Different Approaches
Both are factor analysis techniques but they serve completely different purposes. Choosing the wrong one is a common dissertation error.
| Criterion | EFA — Exploratory | CFA — Confirmatory |
|---|---|---|
| Purpose | Discover factor structure from data | Test a pre-specified factor structure |
| Theory required? | No — data-driven | Yes — theory-driven |
| When to use | Developing a new scale, no prior model | Validating an existing scale or construct |
| Software | SPSS, R (psych package) | AMOS, R (lavaan), SmartPLS |
| PhD use | Scale development studies | Most management research dissertations |
The PLS algorithm gives you path coefficients — but not their significance. Bootstrapping is a resampling procedure that generates t-statistics and p-values for every path in your model. SmartPLS typically uses 5,000 bootstrap subsamples. Without bootstrapping, you cannot make any claim about whether your hypotheses are supported. The bootstrapping output gives you: t-values, p-values, confidence intervals, and significance flags for every path coefficient.
Inner Model vs Outer Model
Second-Order Constructs
A second-order construct is an abstract higher-level construct that is itself measured by multiple first-order constructs rather than by direct indicators. This is used when a concept is too complex to be captured by a single set of indicators.
"Service Quality" cannot be directly measured. Instead, it is captured through three first-order constructs: Reliability (can you depend on the service?), Responsiveness (how quickly do they respond?), and Assurance (do they inspire confidence?). Each first-order construct has its own set of indicators. Service Quality → Reliability, Responsiveness, Assurance → each with 3–4 indicators. SmartPLS handles second-order constructs using the repeated indicators approach or the two-stage approach.
Dr. Tan will now switch screens to demonstrate SmartPLS live. We will use the Deli99.csv dataset. Steps: Import data → Build path model → Run PLS Algorithm → Assess outer loadings → Check CR and AVE → Run bootstrapping → Interpret path coefficients and significance → Check VIF for multicollinearity. Follow along on your own laptop.
Writing Up Your Results
Running the analysis is only half the work. How you write it up determines whether your examiner trusts your methodology. Here are the non-negotiable conventions for reporting quantitative results in a doctoral dissertation.
APA Reporting Rules
You can legitimately use AI to help interpret and write up your SPSS and SmartPLS output — provided you understand what you are reporting. Paste your output table into Claude or ChatGPT and ask: "Write an APA-format results paragraph for this independent samples t-test output for a doctoral dissertation." Then verify the interpretation against what you know. AI writes the prose; you provide the scholarly judgement.
Mixed Methods Research
Mixed methods research is the fastest-growing methodology in management, health, and social science doctoral research. It combines the statistical power of quantitative research with the contextual depth of qualitative research — producing findings that are both generalisable and deeply understood.
Why Mixed Methods? The Case for Both
Every methodology has blind spots. Surveys tell you what is happening and how much — but rarely why. Interviews tell you why — but you cannot generalise from 15 people to a population. Mixed methods fills both gaps.
The Three Core Mixed Methods Designs
| Design | Order | Purpose | Best For |
|---|---|---|---|
| Explanatory Sequential | QUANT → QUAL | Qual explains quant results | Unexpected survey findings |
| Exploratory Sequential | QUAL → QUANT | Qual builds the survey instrument | New contexts, no existing scales |
| Convergent Parallel | QUANT + QUAL | Triangulation — compare and merge | Complex phenomena, time-limited |
If your survey results are surprising and you want to understand WHY → Explanatory Sequential. If you are entering a new context where no validated scales exist and you need to develop your own instrument → Exploratory Sequential. If you have limited time and want the strongest triangulation → Convergent Parallel. Most DBA candidates use Explanatory Sequential because they already have a survey but find unexpected results that demand deeper investigation.
Triangulation means using multiple data sources or methods to cross-validate findings. If your survey shows that trust significantly predicts loyalty (β = 0.62, p < 0.001) AND your interviews reveal that customers say "I stay because I trust them completely" — the two strands corroborate each other. This convergence strengthens your claims dramatically. If they diverge — that divergence itself becomes a finding worth explaining.
Open Q&A
Use the Live Quiz open-text question to submit your final questions to Dr. Tan. All questions are displayed anonymously on screen and discussed as a group.