Document (#26421)

Author
Navarro, G.
Baeza-Yates, R.
Azevedo Arcoverde, J.M.
Title
Matchsimile : a flexible approximate matching tool for searching proper names
Source
Journal of the American Society for Information Science and technology. 54(2003) no.1, S.3-15
Year
2003
Abstract
We present the architecture and algorithms behind Matchsimile, an approximate string matching lookup tool especially designed for extracting person and company names from large texts. Part of a larger information extraction environment, this specific engine receives a large set of proper names to search for, a text to search, and search options; and outputs all the occurrences of the names found in the text. Beyond the similarity search capabilities applied at the intraword level, the tool considers a set of specific person name formation rules at the word level, such as combination, abbreviation, duplicity detections, ordering, word omission and insertion, among others. This engine is used in a successful commercial application (also named Matchsimile), which allows searching for lawyer names in official law publications.

Similar documents (author)

  1. Baeza-Yates, R.; Navarro, G.: Block addressing indices for approximate text retrieval (2000) 4.08
    4.082208 = sum of:
      4.082208 = product of:
        5.442944 = sum of:
          1.7510949 = weight(author_txt:navarro in 5295) [ClassicSimilarity], result of:
            1.7510949 = score(doc=5295,freq=1.0), product of:
              0.45706174 = queryWeight, product of:
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.0521937 = queryNorm
              3.8312001 = fieldWeight in 5295, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.4375 = fieldNorm(doc=5295)
          1.7510949 = weight(author_txt:yates in 5295) [ClassicSimilarity], result of:
            1.7510949 = score(doc=5295,freq=1.0), product of:
              0.45706174 = queryWeight, product of:
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.0521937 = queryNorm
              3.8312001 = fieldWeight in 5295, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.4375 = fieldNorm(doc=5295)
          1.9407543 = weight(author_txt:baeza in 5295) [ClassicSimilarity], result of:
            1.9407543 = score(doc=5295,freq=1.0), product of:
              0.48949558 = queryWeight, product of:
                1.0348728 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.0521937 = queryNorm
              3.9648046 = fieldWeight in 5295, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.4375 = fieldNorm(doc=5295)
        0.75 = coord(3/4)
    
  2. Baeza-Yates, R.; Navarro, G.: XQL and proximal nodes (2002) 4.08
    4.082208 = sum of:
      4.082208 = product of:
        5.442944 = sum of:
          1.7510949 = weight(author_txt:navarro in 1454) [ClassicSimilarity], result of:
            1.7510949 = score(doc=1454,freq=1.0), product of:
              0.45706174 = queryWeight, product of:
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.0521937 = queryNorm
              3.8312001 = fieldWeight in 1454, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.4375 = fieldNorm(doc=1454)
          1.7510949 = weight(author_txt:yates in 1454) [ClassicSimilarity], result of:
            1.7510949 = score(doc=1454,freq=1.0), product of:
              0.45706174 = queryWeight, product of:
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.0521937 = queryNorm
              3.8312001 = fieldWeight in 1454, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.4375 = fieldNorm(doc=1454)
          1.9407543 = weight(author_txt:baeza in 1454) [ClassicSimilarity], result of:
            1.9407543 = score(doc=1454,freq=1.0), product of:
              0.48949558 = queryWeight, product of:
                1.0348728 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.0521937 = queryNorm
              3.9648046 = fieldWeight in 1454, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.4375 = fieldNorm(doc=1454)
        0.75 = coord(3/4)
    
  3. Baeza-Yates, R.A.: Introduction to data structures and algorithms related to information retrieval (1992) 2.11
    2.1096282 = sum of:
      2.1096282 = product of:
        4.2192564 = sum of:
          2.0012515 = weight(author_txt:yates in 4082) [ClassicSimilarity], result of:
            2.0012515 = score(doc=4082,freq=1.0), product of:
              0.45706174 = queryWeight, product of:
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.0521937 = queryNorm
              4.3785143 = fieldWeight in 4082, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.5 = fieldNorm(doc=4082)
          2.218005 = weight(author_txt:baeza in 4082) [ClassicSimilarity], result of:
            2.218005 = score(doc=4082,freq=1.0), product of:
              0.48949558 = queryWeight, product of:
                1.0348728 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.0521937 = queryNorm
              4.531205 = fieldWeight in 4082, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.5 = fieldNorm(doc=4082)
        0.5 = coord(2/4)
    
  4. Baeza-Yates, R.A.: String searching algorithms (1992) 2.11
    2.1096282 = sum of:
      2.1096282 = product of:
        4.2192564 = sum of:
          2.0012515 = weight(author_txt:yates in 4505) [ClassicSimilarity], result of:
            2.0012515 = score(doc=4505,freq=1.0), product of:
              0.45706174 = queryWeight, product of:
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.0521937 = queryNorm
              4.3785143 = fieldWeight in 4505, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.5 = fieldNorm(doc=4505)
          2.218005 = weight(author_txt:baeza in 4505) [ClassicSimilarity], result of:
            2.218005 = score(doc=4505,freq=1.0), product of:
              0.48949558 = queryWeight, product of:
                1.0348728 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.0521937 = queryNorm
              4.531205 = fieldWeight in 4505, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.5 = fieldNorm(doc=4505)
        0.5 = coord(2/4)
    
  5. Castillo, C.; Baeza-Yates, R.: Web retrieval and mining (2009) 1.85
    1.8459246 = sum of:
      1.8459246 = product of:
        3.6918492 = sum of:
          1.7510949 = weight(author_txt:yates in 891) [ClassicSimilarity], result of:
            1.7510949 = score(doc=891,freq=1.0), product of:
              0.45706174 = queryWeight, product of:
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.0521937 = queryNorm
              3.8312001 = fieldWeight in 891, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.757029 = idf(docFreq=18, maxDocs=44421)
                0.4375 = fieldNorm(doc=891)
          1.9407543 = weight(author_txt:baeza in 891) [ClassicSimilarity], result of:
            1.9407543 = score(doc=891,freq=1.0), product of:
              0.48949558 = queryWeight, product of:
                1.0348728 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.0521937 = queryNorm
              3.9648046 = fieldWeight in 891, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.4375 = fieldNorm(doc=891)
        0.5 = coord(2/4)
    

Similar documents (content)

  1. Lutz, R.; Green, S.: Data stewardship : the care and handling of named entries (1999) 0.24
    0.24365997 = sum of:
      0.24365997 = product of:
        0.7614374 = sum of:
          0.019586215 = weight(abstract_txt:searching in 710) [ClassicSimilarity], result of:
            0.019586215 = score(doc=710,freq=1.0), product of:
              0.097482674 = queryWeight, product of:
                1.2368855 = boost
                4.2862926 = idf(docFreq=1660, maxDocs=44421)
                0.01838722 = queryNorm
              0.20091996 = fieldWeight in 710, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2862926 = idf(docFreq=1660, maxDocs=44421)
                0.046875 = fieldNorm(doc=710)
          0.028077947 = weight(abstract_txt:specific in 710) [ClassicSimilarity], result of:
            0.028077947 = score(doc=710,freq=2.0), product of:
              0.098369546 = queryWeight, product of:
                1.2424992 = boost
                4.305746 = idf(docFreq=1628, maxDocs=44421)
                0.01838722 = queryNorm
              0.28543332 = fieldWeight in 710, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.305746 = idf(docFreq=1628, maxDocs=44421)
                0.046875 = fieldNorm(doc=710)
          0.021804383 = weight(abstract_txt:large in 710) [ClassicSimilarity], result of:
            0.021804383 = score(doc=710,freq=1.0), product of:
              0.10471035 = queryWeight, product of:
                1.2819191 = boost
                4.4423513 = idf(docFreq=1420, maxDocs=44421)
                0.01838722 = queryNorm
              0.20823522 = fieldWeight in 710, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4423513 = idf(docFreq=1420, maxDocs=44421)
                0.046875 = fieldNorm(doc=710)
          0.0399981 = weight(abstract_txt:word in 710) [ClassicSimilarity], result of:
            0.0399981 = score(doc=710,freq=1.0), product of:
              0.15691097 = queryWeight, product of:
                1.5692512 = boost
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.01838722 = queryNorm
              0.25490952 = fieldWeight in 710, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.046875 = fieldNorm(doc=710)
          0.06338828 = weight(abstract_txt:person in 710) [ClassicSimilarity], result of:
            0.06338828 = score(doc=710,freq=1.0), product of:
              0.21328798 = queryWeight, product of:
                1.8295698 = boost
                6.3401756 = idf(docFreq=212, maxDocs=44421)
                0.01838722 = queryNorm
              0.29719573 = fieldWeight in 710, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3401756 = idf(docFreq=212, maxDocs=44421)
                0.046875 = fieldNorm(doc=710)
          0.06946676 = weight(abstract_txt:proper in 710) [ClassicSimilarity], result of:
            0.06946676 = score(doc=710,freq=1.0), product of:
              0.22671406 = queryWeight, product of:
                1.8862752 = boost
                6.5366817 = idf(docFreq=174, maxDocs=44421)
                0.01838722 = queryNorm
              0.30640694 = fieldWeight in 710, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5366817 = idf(docFreq=174, maxDocs=44421)
                0.046875 = fieldNorm(doc=710)
          0.024280291 = weight(abstract_txt:search in 710) [ClassicSimilarity], result of:
            0.024280291 = score(doc=710,freq=1.0), product of:
              0.14173366 = queryWeight, product of:
                2.1091979 = boost
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.01838722 = queryNorm
              0.17130928 = fieldWeight in 710, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.046875 = fieldNorm(doc=710)
          0.4948354 = weight(abstract_txt:names in 710) [ClassicSimilarity], result of:
            0.4948354 = score(doc=710,freq=16.0), product of:
              0.452072 = queryWeight, product of:
                4.2115273 = boost
                5.8378363 = idf(docFreq=351, maxDocs=44421)
                0.01838722 = queryNorm
              1.0945942 = fieldWeight in 710, product of:
                4.0 = tf(freq=16.0), with freq of:
                  16.0 = termFreq=16.0
                5.8378363 = idf(docFreq=351, maxDocs=44421)
                0.046875 = fieldNorm(doc=710)
        0.32 = coord(8/25)
    
  2. Baeza-Yates, R.; Navarro, G.: Block addressing indices for approximate text retrieval (2000) 0.21
    0.20613268 = sum of:
      0.20613268 = product of:
        0.6441646 = sum of:
          0.061366152 = weight(abstract_txt:string in 5295) [ClassicSimilarity], result of:
            0.061366152 = score(doc=5295,freq=1.0), product of:
              0.1367552 = queryWeight, product of:
                1.0359117 = boost
                7.179679 = idf(docFreq=91, maxDocs=44421)
                0.01838722 = queryNorm
              0.44872993 = fieldWeight in 5295, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.179679 = idf(docFreq=91, maxDocs=44421)
                0.0625 = fieldNorm(doc=5295)
          0.05359793 = weight(abstract_txt:text in 5295) [ClassicSimilarity], result of:
            0.05359793 = score(doc=5295,freq=6.0), product of:
              0.086639546 = queryWeight, product of:
                1.1660681 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.01838722 = queryNorm
              0.61863124 = fieldWeight in 5295, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=5295)
          0.045232426 = weight(abstract_txt:searching in 5295) [ClassicSimilarity], result of:
            0.045232426 = score(doc=5295,freq=3.0), product of:
              0.097482674 = queryWeight, product of:
                1.2368855 = boost
                4.2862926 = idf(docFreq=1660, maxDocs=44421)
                0.01838722 = queryNorm
              0.46400478 = fieldWeight in 5295, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.2862926 = idf(docFreq=1660, maxDocs=44421)
                0.0625 = fieldNorm(doc=5295)
          0.02907251 = weight(abstract_txt:large in 5295) [ClassicSimilarity], result of:
            0.02907251 = score(doc=5295,freq=1.0), product of:
              0.10471035 = queryWeight, product of:
                1.2819191 = boost
                4.4423513 = idf(docFreq=1420, maxDocs=44421)
                0.01838722 = queryNorm
              0.27764696 = fieldWeight in 5295, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4423513 = idf(docFreq=1420, maxDocs=44421)
                0.0625 = fieldNorm(doc=5295)
          0.053330798 = weight(abstract_txt:word in 5295) [ClassicSimilarity], result of:
            0.053330798 = score(doc=5295,freq=1.0), product of:
              0.15691097 = queryWeight, product of:
                1.5692512 = boost
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.01838722 = queryNorm
              0.33987933 = fieldWeight in 5295, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.0625 = fieldNorm(doc=5295)
          0.0731447 = weight(abstract_txt:matching in 5295) [ClassicSimilarity], result of:
            0.0731447 = score(doc=5295,freq=1.0), product of:
              0.19369711 = queryWeight, product of:
                1.7435218 = boost
                6.0419855 = idf(docFreq=286, maxDocs=44421)
                0.01838722 = queryNorm
              0.3776241 = fieldWeight in 5295, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0419855 = idf(docFreq=286, maxDocs=44421)
                0.0625 = fieldNorm(doc=5295)
          0.04578336 = weight(abstract_txt:search in 5295) [ClassicSimilarity], result of:
            0.04578336 = score(doc=5295,freq=2.0), product of:
              0.14173366 = queryWeight, product of:
                2.1091979 = boost
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.01838722 = queryNorm
              0.3230239 = fieldWeight in 5295, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.0625 = fieldNorm(doc=5295)
          0.28263673 = weight(abstract_txt:approximate in 5295) [ClassicSimilarity], result of:
            0.28263673 = score(doc=5295,freq=3.0), product of:
              0.33070943 = queryWeight, product of:
                2.2781856 = boost
                7.894805 = idf(docFreq=44, maxDocs=44421)
                0.01838722 = queryNorm
              0.8546377 = fieldWeight in 5295, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.894805 = idf(docFreq=44, maxDocs=44421)
                0.0625 = fieldNorm(doc=5295)
        0.32 = coord(8/25)
    
  3. Järvelin, A.; Keskustalo, H.; Sormunen, E.; Saastamoinen, M.; Kettunen, K.: Information retrieval from historical newspaper collections in highly inflectional languages : a query expansion approach (2016) 0.19
    0.1885783 = sum of:
      0.1885783 = product of:
        0.7857429 = sum of:
          0.122732304 = weight(abstract_txt:string in 4223) [ClassicSimilarity], result of:
            0.122732304 = score(doc=4223,freq=4.0), product of:
              0.1367552 = queryWeight, product of:
                1.0359117 = boost
                7.179679 = idf(docFreq=91, maxDocs=44421)
                0.01838722 = queryNorm
              0.89745986 = fieldWeight in 4223, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.179679 = idf(docFreq=91, maxDocs=44421)
                0.0625 = fieldNorm(doc=4223)
          0.067868516 = weight(abstract_txt:occurrences in 4223) [ClassicSimilarity], result of:
            0.067868516 = score(doc=4223,freq=1.0), product of:
              0.14625257 = queryWeight, product of:
                1.0712789 = boost
                7.4248013 = idf(docFreq=71, maxDocs=44421)
                0.01838722 = queryNorm
              0.46405008 = fieldWeight in 4223, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4248013 = idf(docFreq=71, maxDocs=44421)
                0.0625 = fieldNorm(doc=4223)
          0.030120287 = weight(abstract_txt:level in 4223) [ClassicSimilarity], result of:
            0.030120287 = score(doc=4223,freq=1.0), product of:
              0.10721132 = queryWeight, product of:
                1.2971379 = boost
                4.4950905 = idf(docFreq=1347, maxDocs=44421)
                0.01838722 = queryNorm
              0.28094316 = fieldWeight in 4223, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4950905 = idf(docFreq=1347, maxDocs=44421)
                0.0625 = fieldNorm(doc=4223)
          0.09237164 = weight(abstract_txt:word in 4223) [ClassicSimilarity], result of:
            0.09237164 = score(doc=4223,freq=3.0), product of:
              0.15691097 = queryWeight, product of:
                1.5692512 = boost
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.01838722 = queryNorm
              0.58868825 = fieldWeight in 4223, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.0625 = fieldNorm(doc=4223)
          0.1462894 = weight(abstract_txt:matching in 4223) [ClassicSimilarity], result of:
            0.1462894 = score(doc=4223,freq=4.0), product of:
              0.19369711 = queryWeight, product of:
                1.7435218 = boost
                6.0419855 = idf(docFreq=286, maxDocs=44421)
                0.01838722 = queryNorm
              0.7552482 = fieldWeight in 4223, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.0419855 = idf(docFreq=286, maxDocs=44421)
                0.0625 = fieldNorm(doc=4223)
          0.3263608 = weight(abstract_txt:approximate in 4223) [ClassicSimilarity], result of:
            0.3263608 = score(doc=4223,freq=4.0), product of:
              0.33070943 = queryWeight, product of:
                2.2781856 = boost
                7.894805 = idf(docFreq=44, maxDocs=44421)
                0.01838722 = queryNorm
              0.9868506 = fieldWeight in 4223, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.894805 = idf(docFreq=44, maxDocs=44421)
                0.0625 = fieldNorm(doc=4223)
        0.24 = coord(6/25)
    
  4. Mustafa, S.H.: Word-oriented approximate string matching using occurrence heuristic tables : a heuristic for searching Arabic text (2005) 0.19
    0.1870674 = sum of:
      0.1870674 = product of:
        0.66809785 = sum of:
          0.07670769 = weight(abstract_txt:string in 2715) [ClassicSimilarity], result of:
            0.07670769 = score(doc=2715,freq=1.0), product of:
              0.1367552 = queryWeight, product of:
                1.0359117 = boost
                7.179679 = idf(docFreq=91, maxDocs=44421)
                0.01838722 = queryNorm
              0.56091243 = fieldWeight in 2715, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.179679 = idf(docFreq=91, maxDocs=44421)
                0.078125 = fieldNorm(doc=2715)
          0.08483565 = weight(abstract_txt:occurrences in 2715) [ClassicSimilarity], result of:
            0.08483565 = score(doc=2715,freq=1.0), product of:
              0.14625257 = queryWeight, product of:
                1.0712789 = boost
                7.4248013 = idf(docFreq=71, maxDocs=44421)
                0.01838722 = queryNorm
              0.5800626 = fieldWeight in 2715, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4248013 = idf(docFreq=71, maxDocs=44421)
                0.078125 = fieldNorm(doc=2715)
          0.027351577 = weight(abstract_txt:text in 2715) [ClassicSimilarity], result of:
            0.027351577 = score(doc=2715,freq=1.0), product of:
              0.086639546 = queryWeight, product of:
                1.1660681 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.01838722 = queryNorm
              0.3156939 = fieldWeight in 2715, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.078125 = fieldNorm(doc=2715)
          0.032643694 = weight(abstract_txt:searching in 2715) [ClassicSimilarity], result of:
            0.032643694 = score(doc=2715,freq=1.0), product of:
              0.097482674 = queryWeight, product of:
                1.2368855 = boost
                4.2862926 = idf(docFreq=1660, maxDocs=44421)
                0.01838722 = queryNorm
              0.3348666 = fieldWeight in 2715, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2862926 = idf(docFreq=1660, maxDocs=44421)
                0.078125 = fieldNorm(doc=2715)
          0.066663496 = weight(abstract_txt:word in 2715) [ClassicSimilarity], result of:
            0.066663496 = score(doc=2715,freq=1.0), product of:
              0.15691097 = queryWeight, product of:
                1.5692512 = boost
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.01838722 = queryNorm
              0.42484915 = fieldWeight in 2715, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.078125 = fieldNorm(doc=2715)
          0.091430865 = weight(abstract_txt:matching in 2715) [ClassicSimilarity], result of:
            0.091430865 = score(doc=2715,freq=1.0), product of:
              0.19369711 = queryWeight, product of:
                1.7435218 = boost
                6.0419855 = idf(docFreq=286, maxDocs=44421)
                0.01838722 = queryNorm
              0.4720301 = fieldWeight in 2715, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0419855 = idf(docFreq=286, maxDocs=44421)
                0.078125 = fieldNorm(doc=2715)
          0.2884649 = weight(abstract_txt:approximate in 2715) [ClassicSimilarity], result of:
            0.2884649 = score(doc=2715,freq=2.0), product of:
              0.33070943 = queryWeight, product of:
                2.2781856 = boost
                7.894805 = idf(docFreq=44, maxDocs=44421)
                0.01838722 = queryNorm
              0.8722609 = fieldWeight in 2715, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.894805 = idf(docFreq=44, maxDocs=44421)
                0.078125 = fieldNorm(doc=2715)
        0.28 = coord(7/25)
    
  5. Thelwall, M.: Text characteristics of English language university Web sites (2005) 0.18
    0.18370548 = sum of:
      0.18370548 = product of:
        0.7654395 = sum of:
          0.033090178 = weight(abstract_txt:specific in 4463) [ClassicSimilarity], result of:
            0.033090178 = score(doc=4463,freq=1.0), product of:
              0.098369546 = queryWeight, product of:
                1.2424992 = boost
                4.305746 = idf(docFreq=1628, maxDocs=44421)
                0.01838722 = queryNorm
              0.3363864 = fieldWeight in 4463, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.305746 = idf(docFreq=1628, maxDocs=44421)
                0.078125 = fieldNorm(doc=4463)
          0.09427642 = weight(abstract_txt:word in 4463) [ClassicSimilarity], result of:
            0.09427642 = score(doc=4463,freq=2.0), product of:
              0.15691097 = queryWeight, product of:
                1.5692512 = boost
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.01838722 = queryNorm
              0.60082746 = fieldWeight in 4463, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.078125 = fieldNorm(doc=4463)
          0.06946498 = weight(abstract_txt:engine in 4463) [ClassicSimilarity], result of:
            0.06946498 = score(doc=4463,freq=1.0), product of:
              0.16127679 = queryWeight, product of:
                1.5909325 = boost
                5.5132036 = idf(docFreq=486, maxDocs=44421)
                0.01838722 = queryNorm
              0.43071902 = fieldWeight in 4463, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5132036 = idf(docFreq=486, maxDocs=44421)
                0.078125 = fieldNorm(doc=4463)
          0.11577793 = weight(abstract_txt:proper in 4463) [ClassicSimilarity], result of:
            0.11577793 = score(doc=4463,freq=1.0), product of:
              0.22671406 = queryWeight, product of:
                1.8862752 = boost
                6.5366817 = idf(docFreq=174, maxDocs=44421)
                0.01838722 = queryNorm
              0.51067823 = fieldWeight in 4463, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5366817 = idf(docFreq=174, maxDocs=44421)
                0.078125 = fieldNorm(doc=4463)
          0.04046715 = weight(abstract_txt:search in 4463) [ClassicSimilarity], result of:
            0.04046715 = score(doc=4463,freq=1.0), product of:
              0.14173366 = queryWeight, product of:
                2.1091979 = boost
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.01838722 = queryNorm
              0.28551546 = fieldWeight in 4463, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.654598 = idf(docFreq=3123, maxDocs=44421)
                0.078125 = fieldNorm(doc=4463)
          0.41236287 = weight(abstract_txt:names in 4463) [ClassicSimilarity], result of:
            0.41236287 = score(doc=4463,freq=4.0), product of:
              0.452072 = queryWeight, product of:
                4.2115273 = boost
                5.8378363 = idf(docFreq=351, maxDocs=44421)
                0.01838722 = queryNorm
              0.91216195 = fieldWeight in 4463, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.8378363 = idf(docFreq=351, maxDocs=44421)
                0.078125 = fieldNorm(doc=4463)
        0.24 = coord(6/25)