A research instrument · est. 2026

Most methods capture snapshots.
Lives unfold as trajectories.

A longitudinal infrastructure for narrative identity research — structured elicitation, transparent computational analysis, and bounded interpretive synthesis. Open-source, reproducible, AI-mediated.

§ 01 The cost case

What changes when the administrator is no longer the bottleneck.

Longitudinal qualitative research is structurally expensive. Trained interviewers, scheduling, transcription, attrition management — costs scale linearly with participants × waves × hours. GNA decouples elicitation density from administrator hours.

 
Traditional longitudinal qualitative
Global Narrative Atlas
Cost per participant
$265 – $576Engstrom et al. 2014, NIA-funded, n=483, 100 weeks
Token + infrastructure cost~$2–$8 / participant / month *estimate, pending production deployment
Staff time per enrolment
1.83 hoursbefore any interview begins
~0 hoursself-onboarding via the platform
In-depth interview pricing
$5,000 – $15,000per 10–15 IDIs (industry 2026)
Bounded by API spendscales with token consumption
Availability
Scheduledwithin interviewer working hours
24 / 7any time zone, any cadence
Linguistic reach
Bound to interviewer fluencytypically monolingual cohorts
Multilingual by defaultcode-switching preserved in transcript
Attrition risk profile
Scheduling fatigue compoundsPSID: ~50% sample lost over 21 yrs
Participant-paced engagementno scheduling burden to reschedule
Data sovereignty
Held by institutionparticipant withdrawal complex
Participant-owned, deletablewithdrawal cascades, encrypted at rest

Cost figures from Engstrom, Tappen & Ouslander (2014); industry pricing from Drive Research (2026); attrition figures from PSID and Carduff et al. (2015). GNA cost estimates assume current-generation LLM API pricing and a research-pilot operational footprint. The “conventional longitudinal qualitative” column is episodic by design; longitudinal extension requires repeat-interview architecture and substantial researcher commitment. The cost contrast is not a one-for-one replacement claim — GNA is suited to studies where text-based reflective dialogue is the appropriate elicitation modality, and is offered as a computational extension of the longitudinal-qualitative tradition (Neale, Thomson, Bertaux, Plummer, Riessman), not as a substitute for it.

The obvious objection — and the answer.

“You’re comparing $265 of trained-interviewer time to $5 of LLM tokens, but the LLM doesn’t do what the interviewer does.”

Correct — and that is the methodological claim, not a bug. The instrument GNA measures is register, not interviewer skill. The exploratory empirical work (Vasse 2026; identical analytical pipeline applied to elicited dialogue vs. unelicited blog control) suggests that the elicited register survives, computationally, in AI-mediated text. Whether the resulting trajectories carry the same interpretive weight as a clinician-conducted life-history interview is an empirical question with a partial answer: the lexical signatures Pennebaker identified in expressive-writing studies appear in elicited GNA dialogue at substantially higher rates than in unprompted personal narrative. The cost case is not “GNA replaces the interviewer.” It is “GNA is designed to elicit the register the interviewer was hired to elicit, at a cost structure that scales differently.”

§ 02 Exploratory evidence

A substantial corpus-level contrast, identical pipeline.

In exploratory empirical work, the same analytical engine was applied to elicited reflective dialogue (a self-study by the platform's developer) and to unprompted blog narrative as a contrast corpus. The contrast is striking — and consistent with four decades of expressive-writing research (Pennebaker & Francis 1996; Klein & Boals 2010). It does not isolate elicitation as the active variable; the two corpora differ on several confounded dimensions.

133 : 1
Range of structural-thinking signal between elicited dialogue and unelicited blog narrative.
ρ = 0.976
Spearman correlation of structural-thinking growth with time across the elicited corpus, p = 0.0003.
53,999
Posts from the Blog Authorship Corpus (100 authors, 1999–2006), analysed as negative control.

Structural-thinking language over time

Elicited Blog control
0 2 4 6 8 terms / 1k words M1 M2 M4 M6 M8 months of engagement

Structural-thinking terms per 1,000 words, by month. Elicited cohort: 20,449 messages from sustained AI-mediated dialogue (single participant). Blog control: 53,999 posts from 100 authors of the Blog Authorship Corpus. Identical analytical pipeline. Trajectory shape approximated for visual reference; canonical values in Vasse (2026).

§ 03 Register, shown

Same biographical material. Different register.

Drawn from the synthetic demonstration cohort. Both writers are Latin-American migrants to Spain, writing in roughly the same season. One is responding to an AI-driven question. The other is keeping a diary alone.

persona_01 — Argentine, Madrid

Elicited · Q1

Home. It’s a complicated word, isn’t it? I’ve felt it tugging at me since we moved to Madrid, but I’m not sure it’s stuck yet. Sometimes I catch myself in the narrow streets of Lavapiés, surrounded by the shouts of vendors at the rastro, and I feel a spark of something that resembles belonging. Like yesterday, watching Camila negotiate for a vintage dress while our dog, an overexcited little ball of fur, tried to steal the show by barking at every…

persona_17 — Paraguayan, Madrid suburb

Unelicited · diary

hoy no fue un día fácil. la casa estaba un poco desordenada, pero mariana y diego estaban contentos, jugando con sus juguetes. les hice un chocolate caliente en la tarde. a ellos les gusta, siempre piden más. después, hicieron ruido y corrían por todo el piso; a veces me asustan y tengo que recordar que son pequeños. llamé a mi madre como cada semana. su voz es un abrazo, pero me duele saber que ella está enferma. me dijo que ya no puede salir tanto y…

The elicited response reaches for what experiences mean. The diary entry reports what happened. Neither is more articulate. They are written from different rhetorical postures — and the register difference is what the pipeline measures.

§ 04 Architecture

Three layers, separable and transparent.

Measurement is separated from interpretation by design. Each layer is auditable and may be replaced or extended independently.

01 / Elicitation

The AI-driven protocol

A structured set of canonical questions across grounding, exploration, and integration. Adaptable to new populations. An AI assistant administers tailored follow-up probes within bounded research protocol.

02 / Analysis

The DOL pipeline

Deterministic lexicon-based profiling: structural thinking (40-term lexicon), epistemic uncertainty (12-term lexicon), MTLD, TF-IDF topic modelling. 23 open-source Python scripts, MIT-licensed. Identical input yields identical output.

03 / Synthesis

Bounded interpretation

A schema-constrained LLM receives extracted scores — never raw text — and translates trajectories into readable reports. Explicit prohibition on speculation, therapeutic framing, or claims unsupported by the extracted signal.

The platform’s published validation derives from a single self-study (author as participant, eight months). Replication on independent cohorts is in progress.

§ 05 Positionality & reflexivity

The instrument is not neutral.

The Global Narrative Atlas was shaped by Rayan B. Vasse’s own trajectory. The Fourth Culture framework on which the platform rests is in part an autoethnographic concept; the eight-month self-study is the first empirical application. The platform’s lexicons, canonical questions, and interpretive emphases all reflect this positionality. Researchers from other migration corridors, linguistic traditions, and identity formations are explicitly invited to extend, modify, or contest the lexicons and the canonical questions to fit their own contexts. The instrument is not neutral; it is offered as a starting point for collaborative refinement.

§ 06 Definitional clarity

What this is not.

The integrity of the research depends on this distinction.

Not therapy. GNA does not diagnose, treat, or claim wellbeing outcomes. It is not a substitute for clinical care.

Not a chatbot. No character, no relationship simulation, no engagement optimisation. The dialogue serves the analytical pipeline.

Not a journaling app. Journaling is unstructured and audience-less. GNA is structured, longitudinal, and analytically instrumented from the first prompt.

Not a mood tracker. The platform reads cognitive and narrative structure, not affective valence. Feeling better is not the goal; making meaning differently is the signal.

§ 07 Two ways in

An open instrument, not a finished product.

Migration researchers, narrative identity scholars, journal-keepers, and scholars working at the intersection of NLP, narrative methodology, and cultural analysis — and beyond, into any longitudinal narrative inquiry where reflective elicitation yields richer data than passive observation — are invited to extend, replicate, adapt, and challenge the platform.

The work this extends, replicates, and stands beside.

01 · Foundational Pennebaker, J. W., & Francis, M. E. (1996). Cognitive, emotional, and language processes in disclosure. Cognition & Emotion, 10. The LIWC-based finding that expressive prompts elevate causal and insight vocabulary — the lineage from which GNA’s structural-thinking lexicon descends.
02 · Replication Klein, K., & Boals, A. (2001/2010). Expressive writing can increase working memory capacity; cognitive words mediate meaning-making and post-traumatic growth. Replicated across multiple populations and languages.
03 · This platform Vasse, R. B. (2026). Eliciting the Narrative Register: A Computational Platform for Longitudinal Identity Research. Zenodo. doi:10.5281/zenodo.20142013. Positive demonstration vs. Blog Authorship Corpus negative control.
04 · Pipeline DOI Vasse, R. B. (2026a). A Computational Pipeline for Quantifying Longitudinal Cognitive Dynamics in Sustained Human–LLM Interaction. Zenodo. doi:10.5281/zenodo.18927752
05 · Theoretical lens Kommers, C., Ahnert, R., Antoniak, M., et al. (2026). Computational Hermeneutics: Evaluating Generative AI as a Cultural Technology. Frontiers in Artificial Intelligence.
06 · Cost evidence Engstrom, G. A., Tappen, R. M., & Ouslander, J. G. (2014). Costs associated with recruitment and interviewing of study participants in a diverse population of community-dwelling older adults. Nursing Research, 63(1), 63–67. doi:10.1097/NNR.0000000000000008. NIA-funded, n=483, 100 weeks; $265–$576 per participant.