home | introduction | research | people | facilities | events & news | visitor info | contact us | search


Back to text

Example of SearchTool with Shoebox

 

Command: search -Itestdata.dat -TSbxTiers.lis -PQuery1.sbx

-Itestdata.dat
Inputfile with SHOEBOX data: "testdata.dat":

\ref mi1 006
\_no 00006
\tx  6a de   6i       Fami:lya;
\mb  de   6i      ke7x   -i    =k 
\egl to PREP A3(POSS) Family que.   
\fts i lo que tiene;
\fte
\com
\wo vi:0

\ref mi1 007
\_no 00007
\tx  7a de   7i       fami:lya;
\mb  7a de   7i       fami:lya 
\sgl a  de   A3(POSS) familia  
\egl to PREP A3(POSS) family   
\fts ?a de su familia?;
\fte
\com

\ref mi1 008
\_no 00008
\tx  cha7k    7it@pa7;
\mb  chaj =ak 7it  -@  -pa    -a7
\sgl cual =AN EXIST-INV-INCI.I-RLTVZR
\egl which=AN EXIST-INV-INCI.I-RLTVZR
\fts lo que tiene;
\fte
\com
\wo vtl:2v;2(p)


-TSbxTiers.lis
The tierfile "SbxTiers.lis" can look like:
FILETYPE='SHOEBOX'
>\ref
>\tx
>\egl
>\fts
>\wo

-PQuery1.sbx
The patternfile "Query1.sbx" can look like:
#1 '\egl' 'family' I
#2 '\tx'  'fami:lya'
#3 '\fts'  '^LO.*TIENE' I
3+1*2

The result on the screen will be:
********************************************************************************
*** Command: Search -Itestdata.dat -TSbxTiers.lis -PQuery1.sbx
********************************************************************************
*** Inputfiles are of type SHOEBOX.
*** Blocks start with tiername: \ref.
*** Selected tiernames for output:
*** Tiername: \ref
*** Tiername: \tx
*** Tiername: \egl
*** Tiername: \fts
*** Tiername: \wo
********************************************************************************
*** Pattern: #1 '\egl' 'family'  I
*** Pattern: #2 '\tx' 'fami:lya'
*** Pattern: #3 '\fts' '^LO.*TIENE'  I
*** Combination: 3+1*2
********************************************************************************
*** FILE: testdata.dat
*** Block: 2
\ref mi1 007
\tx  7a de   7i       fami:lya;
\egl to PREP A3(POSS) family   
\fts ?a de su familia?;
*** Block: 3
\ref mi1 008
\tx  cha7k    7it@pa7;
\egl which=AN EXIST-INV-INCI.I-RLTVZR
\fts lo que tiene;
\wo vtl:2v;2(p)
***
*** Blocks in file testdata.dat: 3
*** Block matches in file testdata.dat: 2
********************************************************************************
********************************************************************************
*** Total files read: 1
*** Total blocks read: 3
*** Total block matches: 2
********************************************************************************


Explanation of result:
Pattern: #1 '\egl' 'family'  I     : matches block 1; Family
                                     and      block 2; family
Pattern: #2 '\tx' 'fami:lya'       : matches block 2; fami:lya
Pattern: #3 '\fts' '^LO.*TIENE'  I : matches block 3; lo que tiene

Note:
- I                  : Ignore case
- ^ (in ^LO.*TIENE) : matches from beginning of line.
                      ^ \ $ . [ ] | ( ) * + ? have a special meaning
                      within a regular expression.

Combination: 3+1*2 is evaluated as 3+(1*2)
             * (= AND) has a higher priority as + (= OR);
        
             So 1*2     matches block 2
                3       matches block 3
                3+1*2   matches block 2,3

Combination: 3+1*2                  : matches block 2,3

4. New features Version 3.0
===========================

A. Support of Shoebox column structure
B. Support of comment lines in tierfile, patternfile, listfile and vectorfile
C. New tierfile keywords:
  - TIERNAMES='ALL'
  - FILETYPE='CHAT'
  - OUTPUTTYPE='KWAL'
D. Support of Chat files
E. Support of block context
F. Vectors


4.A Support of Shoebox column structure
=======================================

Shoebox files have an aligned column structure e.g.:
\ref mi2 003
\_no 00003
\tx  7i  ka: jatpa          ta       tuni       m@7ki;
\mb  7i  ka: jat    -pa     ta       tun  -i    m@:k7         -i
\sgl y   NEG poder  -INCI.I C3(ERG) hacer-INCD hacer_tamales-NMZR
\egl and NEG be_able-INCI.I C3(ERG) do   -INCD prepare_tamal-NMZR

With search it's possible to specify a column context for two patterns.
- you can specify that 2 patterns should match in the same
  column (column context = 0).
- you can also specify that if pattern #1 matches in column X,
  then pattern #2 should match in column:
  - X-1 or X or X+1                (column context = 1)
  - X-2 or X-1 or X or X+1 or X+2 (column context = 2)
  - ....

e.g. A patternfile can now look like:
#1 '\mb'  'tun'
#2 '\sgl' 'hacer'
C(1|0|2)

The program will return all blocks where pattern #1 and pattern #2
match in the same column.

where C(1|0|2) is:
C    : Column expression
(
1    : PatternNo
|    : Divider
0    : Column context
|    : Divider
2    : PatternNo
)
Back to text