2 xtitle: coherence & propositions

observations in :schizophrenia: threads
date: 2025-07-27 22:18:07
eval output M11, normalised to obs, distance ceiling = outliers removed.

2.1 method (M11)

To compute distances we queried a corpus for matching conditions where certain (assumed) determiners appear before similar nouns. In M11 (removed…)
This distance should give us information structural evidence of how strong these noun occurences are connected, i.e. if a noun appears out of the blue mostly or if it somewhere before has been introduced to the audience. In information structure definitions this would be termed with given and new information (Prince 1981).

2.2 legende

Table 2.1: model vars
variable	explanation	values
target	corpus	obs,ref
q	condition	a,b,c,d,e,f
det	antecedent POS==DET	TRUE,FALSE
q:a	query condition	.*
q:b	query condition	this,that,those,these
q:c	query condition	the
q:d	query condition	a,an,any,some
q:e	query condition	my
q:f	query condition	his,her,their,your

2.3 anova analysis

2.3.1 anova plain, formula: [`dist_rel_obs ~ targetqdet`]

##                   Df     Sum Sq    Mean Sq    F value    Pr(>F)    
## target             1 3.6884e+09 3688441713 86368.2495 < 2.2e-16 ***
## q                  5 4.5563e+08   91126214  2133.8040 < 2.2e-16 ***
## det                1 2.8130e+07   28129559   658.6794 < 2.2e-16 ***
## target:q           5 2.6402e+06     528048    12.3647 5.105e-12 ***
## target:det         1 2.3260e+06    2325954    54.4643 1.584e-13 ***
## q:det              1 2.7700e+06    2769968    64.8613 8.044e-16 ***
## target:q:det       1 4.3320e+03       4332     0.1014    0.7501    
## Residuals    1045139 4.4634e+10      42706                         
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

2.3.2 anova of linear regression model: [`anova(summary(lmer))`]

## Type III Analysis of Variance Table with Satterthwaite's method
##                Sum Sq Mean Sq NumDF   DenDF  F value    Pr(>F)    
## target        6094426 6094426     1  718328 185.7599 < 2.2e-16 ***
## q            42974986 8594997     5 1037780 261.9780 < 2.2e-16 ***
## det            372221  372221     1 1037864  11.3454 0.0007564 ***
## target:q      1766151  353230     5 1037779  10.7666  2.27e-10 ***
## target:det      92612   92612     1 1037867   2.8228 0.0929316 .  
## q:det          326995  326995     1 1037796   9.9669 0.0015938 ** 
## target:q:det   137650  137650     1 1037795   4.1956 0.0405288 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

2.3.3 linear regression coefficients, formula: [`dist_rel_obs ~ targetqdet+(1|lemma)+(1|aut_id)`]

## Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
## Formula: eval(expr(lmeform))
##    Data: dfa
## 
## REML criterion at convergence: 13864438
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -6.3215 -0.4316 -0.0874  0.1283 12.4351 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  aut_id   (Intercept) 40316    200.8   
##  lemma    (Intercept) 42511    206.2   
##  Residual             32808    181.1   
## Number of obs: 1045155, groups:  aut_id, 4533; lemma, 3652
## 
## Fixed effects:
##                        Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)           2.870e+02  4.900e+00  6.838e+03  58.565  < 2e-16 ***
## targetref            -8.842e+01  3.051e+00  7.021e+04 -28.978  < 2e-16 ***
## qb                    3.566e+01  7.380e+00  1.036e+06   4.833 1.35e-06 ***
## qc                    6.937e+01  1.268e+00  1.038e+06  54.728  < 2e-16 ***
## qd                    5.704e+01  1.325e+00  1.038e+06  43.053  < 2e-16 ***
## qe                    7.522e+01  1.447e+00  1.038e+06  51.975  < 2e-16 ***
## qf                    6.010e+01  1.805e+00  1.037e+06  33.291  < 2e-16 ***
## detTRUE               3.154e+00  6.295e-01  1.042e+06   5.011 5.42e-07 ***
## targetref:qb          2.973e+01  1.105e+01  1.038e+06   2.690 0.007140 ** 
## targetref:qc          1.748e+01  6.460e+00  1.038e+06   2.706 0.006813 ** 
## targetref:qd          3.302e+01  6.156e+00  1.037e+06   5.363 8.18e-08 ***
## targetref:qe         -2.499e+01  8.090e+00  1.038e+06  -3.090 0.002005 ** 
## targetref:qf         -1.280e+02  3.760e+01  1.038e+06  -3.404 0.000663 ***
## targetref:detTRUE    -3.951e+00  1.561e+00  1.043e+06  -2.531 0.011370 *  
## qb:detTRUE            1.201e+01  7.839e+00  1.036e+06   1.532 0.125614    
## targetref:qb:detTRUE  4.436e+01  2.166e+01  1.038e+06   2.048 0.040529 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## fit warnings:
## fixed-effect model matrix is rank deficient so dropping 8 columns / coefficients

2.4 plots

Figure 2.1: compare distances by corpus, normalised to obs, distance ceiling = outliers removed

Figure 2.2: mean distances over query/corpus, normalised to obs, distance ceiling = outliers removed

Table 2.2: mean/median table for M11
target	q	n	mean	median
obs	a	557479	169	77
ref	a	394838	57	25
obs	b	5904	217	120
ref	b	747	122	44
obs	c	26883	240	138
ref	c	1033	126	73
obs	d	25400	253	149
ref	d	1041	154	106
obs	e	19228	260	149
ref	e	656	93	79
obs	f	11920	211	111
ref	f	26	100	126

Figure 2.3: median distances over query/corpus, normalised to obs, distance ceiling = outliers removed

Figure 2.4: distances relation, normalised to obs, distance ceiling = outliers removed

Figure 2.5: distances normalised vs. raw

2.5 REF

References

Prince, Ellen F. 1981. “Toward a Taxonomy of Given-New Information.” In Syntax and Semantics: Vol. 14. Radical Pragmatics, edited by P. Cole, 223–55. New York: Academic Press.