Searching with PAT

  1. PAT interprets text as a set of suffix strings.

    For example, indexing every word in this sentence yields the 12 strings:

    For example, indexing every word in this sentence yields the 12 strings:
    example, indexing every word in this sentence yields the 12 strings:
    indexing every word in this sentence yields the 12 strings:
    every word in this sentence yields the 12 strings:
    word in this sentence yields the 12 strings:
    in this sentence yields the 12 strings:
    this sentence yields the 12 strings:
    sentence yields the 12 strings:
    yields the 12 strings:
    the 12 strings:
    12 strings:
    strings:
    
  2. PAT's regions are user-specific "views."

    For XML, the most common regions to pre-define are those corresponding to a document's elements. However, arbitrary intervals of text can be defined to be regions as well.

    [See names of regions to be used with docs when accessing the OED from PAT.]


Command-line interaction with PAT

pat 2e.cntl

        Pat Text Searching System, Release 3.4.2
            Copyright 1987, 1989, 1990 by 
  University of Waterloo and Open Text Systems, Inc.

>> water
  1: 48442 matches

>> pr sample.7 
192807323, ..sper.dr-</gk> water + <gk>a&lenis.dhfa&acu.goj</gk> voracious: s..
520790341, ..e took to the water, disappeared, leaving it on the low under ba..
145798504, ..nced from the water like a carp. </T></Q><Q><D>1843</D> <A>Paget..
549737948, ..4 <T>The 1929 water ski champion, Herr Pribitzer of the water-sk..
190797617, ..ngsley</A> <W>Water-Bab.</W> iii. 116 <T>Dark hovers under swirl..
549099801, ..ating-oil..of water-white and odorless qualities. </T></Q><Q><D>..
549623784, .. the maddest *Waterloo-Crackers. </T></Q><Q><D>1851</D> <A>Mayhe..

>> a..z 
  2: 60343111 matches

>> pr sample 
555709177, ..e Christopher as my owne, I will he be put unto the schoale. </T..
290164101, ..ir slangy off-colour jokes. </T></Q><Q><D>1972</D> <PSA><A>G. Bl..
 10053096, ../D> <W>Compl. Fam.-Piece</W> <sc>ii. </sc>iii. 388 <T>Amber Pear..
 97073359, ..> in <W>Cott. Hom.</W> 201 <T>&Th.e muchele delit of &th.ine swe..
 58277014, .. specially in knowledge (as the seraphim in love); a conventiona..
194517420, ..mplative, and nonverbal. </T></Q></PQP><PQP><Q><D>1957</D> <W>Wh..
408029625, ..us widths and patterns.</T></Q><Q><D>1833</D> <A>J. Bennett</A> ..
481205743, ..design of the SEAC and DYSEAC. </T></Q><Q><D>1960</D> <A>Gregory..
440450458, ..III. 558/2 <T>The domain of Sonata was for a long while almost m..
502535403, ..ed comprises..two Gatling guns, and six *torpedo tubes or torped..

>> "to be or" 
  3: 458 matches

>> pr sample.5 
454233240, .. set upright; to be or become erect.  Of hair, spines, etc.: cf...
562398537, ..</PS>, liable to be or capable of being withheld.</DEF></SE></p>..
 94031003, ..></IL>): i.e. to be (or make it) a matter of death of capital pu..
192510576, ..Of the voice: to be or to become husky. </p></DEF><QP><Q><D>1922..
407435097, ..7 <T>A Sealer to be ordeyned &amp. sworne to stryke the Cloth &a..

>> stanqts = docs "Quotation" including (stanford near univ) 
  4: stanqts = 11 matches

>> pr 
 33555823, ../Q></PQP><PQP><Q><D>1915</D> <W>Jrnl. Parasitology</W> I. 107 <T..
 77975481, .. </T></Q></EQ><Q><D>1939</D> <W>Jrnl. R. Aeronaut. Soc.</W> XLII..
108709543, ..ward. </T></Q><Q><D>1922</D> <W>Bull. 31st Ann. Reg. 1921-22</W>..
118300609, ..ford. </T></Q><Q><D>1959</D> <W>Encounter</W> July 67/1 <T>He..i..
403069715, ..hool. </T></Q><Q><D>1978</D> <W>Sci. Amer.</W> July 15/2 <T>The ..
426049944, ..logy. </T></Q><Q><D>1980</D> <W>N.Y. Times</W> 22 June <sc>iv. <..
435096033, ..</DEF><QP><EQ><Q><D>1927</D> <W>Amer. Speech</W> II. 278/1 [Stan..
481327831, ..tion. </T></Q><Q><D>1942</D> <A>L. O. Waldorf</A> <W>How to play..
493332893, ../Q></PQP><PQP><Q><D>1959</D> <W>Nation</W> 24 Jan. 62/2 <T>Other..
517358657, ..tion. </T></Q><Q><D>1984</D> <W>New Scientist</W> 3 May 46/1 <T>..
550661317, ..ints. </T></Q><Q><D>1899</D> <A>J. London</A> <W>Let.</W> 12 Sep..

>> pr.docs."Quotation" 
 33555823, ..<Q><D>1915</D> <W>Jrnl. Parasitology</W> I. 107 <T>The not-infrequent occurrence of the notorious `*black widow' spider, <i>Latrodectes mactans</i>, in the vicinity of Stanford University. </T></Q>..
 77975481, ..<Q><D>1939</D> <W>Jrnl. R. Aeronaut. Soc.</W> XLIII. 109 <T>Tests of an eight-blade contra-propeller of 32-in. diameter..were made in the wind tunnel at Stanford University. </T></Q>..
108709543, ..<Q><D>1922</D> <W>Bull. 31st Ann. Reg. 1921-22</W> (Stanford Univ.) 109 <T>As a recognition of high scholastic attainment the Bachelor's degree may be granted `with distinction' or `with great distinction'. </T></Q>..
118300609, ..<Q><D>1959</D> <W>Encounter</W> July 67/1 <T>He..is now editor-in-chief of Stanford University Press. </T></Q>..
403069715, ..<Q><D>1978</D> <W>Sci. Amer.</W> July 15/2 <T>The latter experience convinced him that his interest lay in research; he therefore went back to school, acquiring his Ph.D. from Stanford University in 1965.</T></Q>..
426049944, ..<Q><D>1980</D> <W>N.Y. Times</W> 22 June <sc>iv. </sc>8<sc>e </sc><T>In more recent years `Silicon Valley' has grown up along the peninsula from San Francisco through Stanford University to San Jose. </T></Q>..
435096033, ..<Q><D>1927</D> <W>Amer. Speech</W> II. 278/1 [Stanford Univ.] <T><i>Smoke up</i>, official warning of dangerously low standing in history. </T></Q>..
481327831, ..<Q><D>1942</D> <A>L. O. Waldorf</A> <W>How to play Football</W> ix. 112 <T>In 1940, Stanford University used the T formation with great success. </T></Q>..
493332893, ..<Q><D>1959</D> <W>Nation</W> 24 Jan. 62/2 <T>Other think-factories are Johns Hopkins University Operations Research Office... Johns Hopkins thinks for the Army.  Stanford Research Institute..does the bulk of its thinking for a variety of government agencies. </T></Q>..
517358657, ..<Q><D>1984</D> <W>New Scientist</W> 3 May 46/1 <T>At Stanford University..medical scientists no longer expose cells to UV light... They prefer to ultraviolate them.</T></Q>..
550661317, ..<Q><D>1899</D> <A>J. London</A> <W>Let.</W> 12 Sept. (1966) 54 <T>And to-morrow I start out on that postponed trip of mine to Stanford University and Mt. Hamilton, to say nothing of way points. </T></Q>..

>> pr (stanford within *stanqts) 
493333055, ..or the Army.  Stanford Research Institute..does the bulk of its ..
108709595, .. 1921-22</W> (Stanford Univ.) 109 <T>As a recognition of high sc..
435096079, ..W> II. 278/1 [Stanford Univ.] <T><i>Smoke up</i>, official warni..
 33555991, ..e vicinity of Stanford University. </T></Q><Q><D>1927</D> <W>Dai..
 77975634, ..ind tunnel at Stanford University. </T></Q><Q><D>1940</D> <W>Fli..
550661443, ..ip of mine to Stanford University and Mt. Hamilton, to say nothi..
403069891, ..is Ph.D. from Stanford University in 1965.</T></Q></QP></S6></S4..
517358710, ..ay 46/1 <T>At Stanford University..medical scientists no longer ..
118300684, ..r-in-chief of Stanford University Press. </T></Q><Q><D>1959</D> ..
426050111, ..cisco through Stanford University to San Jose. </T></Q></PQP><PQ..
481327915, ..2 <T>In 1940, Stanford University used the T formation with grea..

>> quit 
Used 0.08 cpu seconds.

PAT's API: responses with {quieton}

pat -q 2e.cntl

[sent to PAT: water]
<SSize>48442</SSize>
[sent to PAT: pr sample.7]
<PSet><Start>192807323</Start><Start>520790341</Start><Start>145798504</Start><Start>549737948</Start><Start>190797617</Start><Start>549099801</Start><Start>549623784</Start></PSet>
[sent to PAT: a..z]
<SSize>60343111</SSize>
[sent to PAT: pr sample]
<PSet><Start>555709177</Start><Start>290164101</Start><Start>10053096</Start><Start>97073359</Start><Start>58277014</Start><Start>194517420</Start><Start>408029625</Start><Start>481205743</Start><Start>440450458</Start><Start>502535403</Start></PSet>
[sent to PAT: "to be or"]
<SSize>458</SSize>
[sent to PAT: pr sample.5]
<PSet><Start>454233240</Start><Start>562398537</Start><Start>94031003</Start><Start>192510576</Start><Start>407435097</Start></PSet>
[sent to PAT: stanqts = docs "Quotation" including (stanford near univ)]
<SSize>11</SSize>
[sent to PAT: pr]
<PSet><Start>33555823</Start><Start>77975481</Start><Start>108709543</Start><Start>118300609</Start><Start>403069715</Start><Start>426049944</Start><Start>435096033</Start><Start>481327831</Start><Start>493332893</Start><Start>517358657</Start><Start>550661317</Start></PSet>
[sent to PAT: pr.docs."Quotation"]
<RSet><Start>33555823</Start><End>33556019</End><Start>77975481</Start><End>77975662</End><Start>108709543</Start><End>108709757</End><Start>118300609</Start><End>118300718</End><Start>403069715</Start><End>403069926</End><Start>426049944</Start><End>426050151</End><Start>435096033</Start><End>435096179</End><Start>481327831</Start><End>481327983</End><Start>493332893</Start><End>493333159</End><Start>517358657</Start><End>517358828</End><Start>550661317</Start><End>550661518</End></RSet>
[sent to PAT: pr (stanford within *stanqts)]
<PSet><Start>493333055</Start><Start>108709595</Start><Start>435096079</Start><Start>33555991</Start><Start>77975634</Start><Start>550661443</Start><Start>403069891</Start><Start>517358710</Start><Start>118300684</Start><Start>426050111</Start><Start>481327915</Start></PSet>
[sent to PAT: quit]

PAT control file for the OED

{This is a Pat control file}
{Mode 1}
{CharMappings "" ""
 "^H " "^T "^M " "! " """ " "$ " "% " "' " "( " ") " "* " "+ "
 ", " ". " ": " "; " "= " "> " "? " "@ "
 "Aa" "Bb" "Cc" "Dd" "Ee" "Ff" "Gg" "Hh" "Ii" "Jj" "Kk" "Ll" "Mm"
 "Nn" "Oo" "Pp" "Qq" "Rr" "Ss" "Tt" "Uu" "Vv" "Ww" "Xx" "Yy" "Zz"
 "[ " "\ " "] " "^ " "_ " "` " "{ " "| " "} " "~ "}
{StopWords}
{NumberChars 572728830}
{NumberExtNodes 118897340}
# Total disk space used for tree: 475589360 chars,  83.0% of text file
# 1 entry per 4.82 characters of text
# Merging the index took 4417.42 seconds
{LongestMatch 0}
{WordStarters  " \P" "\P-" "-\P" "\P<" "\P&"}
{TextFileName "/u/fwtompa/oed/data/2e": 1..572728830(0)}
{TreeFileName "/u/fwtompa/oed/data/2e.tree"}
{DocumentFile "/u/fwtompa/oed/data/2e.docs"}
{HashValue 2056705323}

Overview of the PAT software components

Pat Text Search System Distribution Overview
(Version 3.4.2)
pat
the main search engine.

patbld
the main index building engine.
patmrg
a utility used to merge two indices together.

patdocs
a stand-alone utility for pre-defining a single type of region.
multidocs
a stand-alone utility to create multiple region files at one time.

Those who wish to create new indexes for use with PAT should refer to the accompanying man pages, as well as the man pages that describe data in cntl and docs (i.e., regions) files.


CGI scripts to search the OED

An ouline of Web support software for searching the OED using the PAT operators and displaying tagged text: according to various styles.

  • lookup
    Search for entries containing a word or phrase:
    • initialize:
      • http protocol
      • choice of tag style specifications
      • settings of search string, display format, search scope, number of entries to return
      • PAT, and tags for chosen style
    • formulate appropriate PAT command to match parameters
    • make call to PAT (using pat_docs_list)
    • for each returned element within the range to be displayed
      • read the text of the entry (using pat_string)
      • convert it to HTML using tags_convert
    • prepare for follow-on query
  • tap
    Provide a Web-based interface to PAT searches on the OED:
    • initialize:
      • http protocol
      • choice of tag style specifications
      • choice of PAT regions to return
      • settings of display format, number of entries to return, sequence of search commands, region type to return
      • PAT, and tags for chosen style
    • make call to PAT (using pat_docs_list)
    • for each returned element within the range to be displayed
      • read the text of the entry (using pat_string)
      • convert it to HTML using tags_convert
    • prepare for follow-on query

    PATTM is a registered trademark of Open Text Corporation, and the PATTM Text Search System is licensed for University of Waterloo faculty, students and academic staff for teaching and research.