Pathway · 01 · For collaborators

An open instrument for longitudinal narrative research.

GNA is offered to the research community as transparent, reproducible, considered infrastructure. Its value will not be measured by the findings of any single deployment, but by the diversity of research questions it enables across the contexts in which contemporary identity takes shape.

Replicate in 10 minutes → Initiate collaboration

§ 01 The methodological problem

Longitudinal qualitative work is structurally expensive.

Trained interviewers, scheduling overhead, transcription, attrition management. Costs scale linearly with participants × waves × hours. In practice, this means n is small, intervals are wide, and depth/density trades off against budget.

$265–$576

Per-participant cost in a longitudinal study of community-dwelling older adults (recruitment, scheduling, interviewing, scoring, log updates, incentives). Engstrom et al. 2014, NIA-funded, n = 483, 100 weeks.

1.83 hrs

Average staff time per successful enrolment, before any interview takes place. Recruitment and scheduling alone. Same study.

~50%

Cumulative attrition in the PSID panel over its first 21 years. Every wave compounds respondent burden; every reschedule loses subjects. (Lynn 2009; Deng et al. 2013.)

60–80%

Cost reduction reported for AI-moderated qualitative interviewing vs. traditional agency-led work in commercial market research (Merren, 2026). A weaker test bed than academic research, but a directional signal.

These numbers describe a constraint, not a dismissal of the longitudinal-qualitative tradition. Repeat-interview architectures — Neale, Thomson, Bertaux, Plummer, Riessman — have decades of methodological depth that GNA aims to extend computationally, not replace. What the cost figures show is that elicitation density — structured reflective exchanges per participant per unit time — is bound, in the traditional model, by administrator hours. GNA decouples that binding while preserving the reflective register qualitative interviewing was developed to produce.

§ 02 Three tracks of collaboration

Where collaboration is most welcome.

Track · 01

Apply the protocol to new populations or domains

Migration cohorts, recovery samples, clinical-outcome panels, professional-identity studies, bereavement and grief follow-ups, mentorship dyads, or any longitudinal participant pool where structured reflective elicitation is feasible. Modify lexicons. Add domain-specific markers — language-specific where relevant, but also clinical, vocational, or constructional. Adapt the canonical questions. Evaluate resulting trajectories against independent qualitative or instrumental evidence.

Narrative researchers across psychology, clinical sciences, sociology, organizational studies, and sociolinguistics.

Track · 02

Contribute longitudinal narrative

Authors, journal-keepers, and reflective writers may contribute existing longitudinal material through the platform. The GNA’s value as a research instrument grows with each longitudinal corpus it can analyse. Contributors retain full ownership of their data; the pipeline is auditable.

Long-form interior writers, diarists, autobiographers.

Track · 03

Test the elicitation-dependence hypothesis

The claim that structured self-inquiry produces a register systematically distinct from casual personal narrative is offered as a working hypothesis that the platform exists to test. The Blog Authorship Corpus baseline and the exploratory empirical work published to date provide a public point of comparison against which other elicitation formats — therapy transcripts, life-history interviews, spiritual direction logs — may be contrasted.

Computational hermeneutics scholars, NLP methodologists.

§ 03 Methodological foundations

Three layers, auditable and modular.

01 / Elicitation

The AI-driven protocol

A structured set of canonical questions across three phases — grounding, exploration, integration — designed to prompt the introspective register in which identity transformation becomes computationally legible. An AI assistant administers follow-up probes tailored to participant responses; the dialogue is bounded by research protocol, with no engagement-optimisation reward function. Adaptable to new populations and cultural contexts.

02 / Analysis

The DOL pipeline

A 23-script open-source Python pipeline applying transparent lexicon-based profiling: structural thinking (40 terms), epistemic uncertainty (12 terms), vocabulary specialisation (TTR, MTLD), topic modelling (TF-IDF + truncated SVD + K-means), and dyadic alignment. Statistical validation by Spearman rank correlation against permutation null distributions. Available at github.com/RayanBVasse/DOL under MIT licence; runs locally on any standard machine.

03 / Synthesis

Bounded LLM interpretation

A schema-constrained prompt translates extracted scores into readable trajectory reports. The LLM receives only quantitative signals — never raw text — and is instructed to ground judgements solely in the data, with explicit prohibition on therapeutic, diagnostic, or self-help framings. The prompt is versioned in the repository; the output schema is enforced at parse time and structured failures are returned rather than free text.

§ 04 Exploratory corpus contrast

A first-pass corpus contrast: elicited dialogue vs unelicited public writing.

The numbers below describe a contrast between two corpora, not a controlled test. The elicited corpus is a single participant’s eight-month engagement (the platform’s author). The unelicited corpus is a hundred public bloggers from a different decade, genre, and audience. The 133:1 ratio is consistent with — but does not isolate — an elicitation effect: it conflates elicitation with genre, audience, era, and participant selection. A controlled validation (same participants under both conditions, or matched cohorts randomly assigned) is the methodological next step.

20,449

Messages of elicited reflective dialogue analysed in the positive demonstration. The participant is the platform’s author; the eight-month engagement is a self-study and a proof of concept, not population-level validation. The corpus, pipeline outputs, and trajectories are openly available; replication across independent participants is the next phase.

53,999

Unelicited blog posts from 100 authors of the Blog Authorship Corpus (1999–2006), analysed through the identical pipeline as a negative-control corpus from a different genre and era.

133 : 1

Ratio in the range of structural-thinking lexicon hits per 1,000 words between the two corpora. Striking, but confounded — see the dimensions on which the corpora differ.

Genre, audience, era, selection, domain all differ between the two corpora. Any could plausibly contribute to the contrast; a within-subject design would be required for a clean causal estimate.

Honest limit: the positive demonstration is a self-study. The trajectories observed are statistically structured within that single series but cannot be generalised. The architecture is offered as infrastructure for replication, not as a closed validation. Diverse-cohort applications are the open question and the next chapter of work.

§ 05 Status of evidence

What the published work has shown, and what it has not.

What the published work has shown

A single participant (the platform’s developer) engaged in sustained AI-mediated reflective dialogue over eight months produced a measurable, monotonic increase in the structural-thinking lexicon signal, alongside a non-monotonic epistemic-uncertainty arc and progressive lexical specialisation.
The same analytical pipeline applied to a hundred Blog Authorship Corpus authors (1999–2006) produced a 133-fold smaller range of structural-thinking signal.
The contrast is consistent with the elicitation-dependence hypothesis but does not isolate elicitation from genre, audience, era, or participant selection.

What the published work has not shown

That elicitation produces a register shift in participants other than the platform’s developer.
That the lexicon measures correspond to cognitive structuration as defined by independent psychological instruments.
That the contrast holds in matched comparisons (same participant, both conditions; or matched cohorts, randomly assigned).
That the methodology generalises across languages, cultures, or migration corridors.

The platform’s next phase of work is intended to address these gaps through independent participant cohorts and controlled comparisons.

§ 06 Governance

A clear split: what the platform commits to, what your study is responsible for.

GNA is multi-tenant infrastructure. Most governance lives with each PI and their institution. The platform commits to a narrow set of guarantees that make a PI's study safe to host.

What the platform commits to

Inter-study data isolation (schema-scoped, Row-Level Security enforced)
No model training on participant data
Encryption at rest
Open-source analytical pipeline (auditable methodology)
Platform Terms of Service and Acceptable Use
Sample crisis-resources page linked from every dialogue session

What your study is responsible for

Institutional ethics approval (IRB or equivalent)
Study-specific consent text
Participant recruitment, safeguarding, and follow-up
Analytical interpretation and publication of results
Data retention policy per your institution’s requirements
Adapting lexicons, canonical questions, and AI-guide prompts to your context

The platform offers Technical and Cultural Council templates (drafted; currently in formation) that PIs may adopt or adapt for their own study governance. They are recommended scaffolding for study-level oversight — not platform-level commitments — and exist to help PIs structure their own ethical review pipelines.

§ 07 Publications & prior work

Cite, replicate, extend.

Manuscript · 2026

Eliciting the Narrative Register: A Computational Platform for Longitudinal Identity Research

Vasse, R. B. (2026). Introduces the Global Narrative Atlas; contrasts elicited reflective dialogue against unelicited blog narrative through an identical analytical pipeline. Self-study; replication across independent participants is the next phase. doi:10.5281/zenodo.20142013
Methods · Zenodo

A Computational Pipeline for Quantifying Longitudinal Cognitive Dynamics in Sustained Human–LLM Interaction

Vasse, R. B. (2026a). The underlying DOL methodology paper. doi:10.5281/zenodo.18927752
Companion · Zenodo

Blind Spots in AI-Based Longitudinal Psychological Inference

Vasse, R. B. (2026b). Single-subject self-study examining what AI-mediated inference detects and what it misses. doi:10.5281/zenodo.18890981
Companion · Zenodo

Measuring Within-Person Variation in Written Communication Patterns Across Social Contexts

Vasse, R. B. (2026c). Cross-context profiling; the same individual produces measurably different communicative fingerprints across relational environments. doi:10.5281/zenodo.18825307
Theory · Frontiers in AI

Computational Hermeneutics: Evaluating Generative AI as a Cultural Technology

Kommers, C., Ahnert, R., Antoniak, M., et al. (2026). The theoretical lens within which GNA’s elicitation-dependence hypothesis is offered as a working hypothesis the platform exists to test.

Longitudinal-qualitative lineage GNA extends, not replaces

Bertaux (life-story method); Plummer (Documents of Life); Neale (qualitative longitudinal research); Thomson (relational analysis across waves); Riessman (narrative analysis); Schütze (biographical-narrative interviewing). GNA is offered as a computational extension of this tradition’s elicitation-and-return architecture, not as a substitute for the qualitative depth it produces.

The work this extends, replicates, and stands beside.

01 · Foundational Pennebaker, J. W., & Francis, M. E. (1996). Cognitive, emotional, and language processes in disclosure. Cognition & Emotion, 10. The LIWC-based finding that expressive prompts elevate causal and insight vocabulary — the lineage from which GNA’s structural-thinking lexicon descends.

02 · Replication Klein, K., & Boals, A. (2001/2010). Expressive writing can increase working memory capacity; cognitive words mediate meaning-making and post-traumatic growth. Replicated across multiple populations and languages.

03 · This platform Vasse, R. B. (2026). Eliciting the Narrative Register: A Computational Platform for Longitudinal Identity Research. Zenodo. doi:10.5281/zenodo.20142013. Positive demonstration vs. Blog Authorship Corpus negative control.

04 · Pipeline DOI Vasse, R. B. (2026a). A Computational Pipeline for Quantifying Longitudinal Cognitive Dynamics in Sustained Human–LLM Interaction. Zenodo. doi:10.5281/zenodo.18927752

05 · Theoretical lens Kommers, C., Ahnert, R., Antoniak, M., et al. (2026). Computational Hermeneutics: Evaluating Generative AI as a Cultural Technology. Frontiers in Artificial Intelligence.

06 · Cost evidence Engstrom, G. A., Tappen, R. M., & Ouslander, J. G. (2014). Costs associated with recruitment and interviewing of study participants in a diverse population of community-dwelling older adults. Nursing Research, 63(1), 63–67. doi:10.1097/NNR.0000000000000008. NIA-funded, n=483, 100 weeks; $265–$576 per participant.