Document (#37467)

Author
Gödert, W.
Title
Detecting multiword phrases in mathematical text corpora
Source
http://arxiv.org/abs/1210.0852
Year
2012
Abstract
We present an approach for detecting multiword phrases in mathematical text corpora. The method used is based on characteristic features of mathematical terminology. It makes use of a software tool named Lingo which allows to identify words by means of previously defined dictionaries for specific word classes as adjectives, personal names or nouns. The detection of multiword groups is done algorithmically. Possible advantages of the method for indexing and information retrieval and conclusions for applying dictionary-based methods of automatic indexing instead of stemming procedures are discussed.
Footnote
Vgl. auch unter: http://hdl.handle.net/10760/17742.
Theme
Automatisches Indexieren
Field
Mathematik
Object
Lingo

Similar documents (author)

  1. Gödert, W.: Inhalte formal erschließen : Anspruch und Wirklichkeit (1984) 4.34
    4.3370585 = sum of:
      4.3370585 = weight(author_txt:gödert in 31) [ClassicSimilarity], result of:
        4.3370585 = fieldWeight in 31, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          6.939294 = idf(docFreq=116, maxDocs=44421)
          0.625 = fieldNorm(doc=31)
    
  2. Gödert, W.: Gegenwart und Zukunft der bibliothekarischen Sacherschließung : Gedanken unter Berücksichtigung des EDV-Einsatzes (1981) 4.34
    4.3370585 = sum of:
      4.3370585 = weight(author_txt:gödert in 165) [ClassicSimilarity], result of:
        4.3370585 = fieldWeight in 165, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          6.939294 = idf(docFreq=116, maxDocs=44421)
          0.625 = fieldNorm(doc=165)
    
  3. Gödert, W.: Syntax von Dokumentationssprachen im Online-Katalog (1988) 4.34
    4.3370585 = sum of:
      4.3370585 = weight(author_txt:gödert in 167) [ClassicSimilarity], result of:
        4.3370585 = fieldWeight in 167, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          6.939294 = idf(docFreq=116, maxDocs=44421)
          0.625 = fieldNorm(doc=167)
    
  4. Gödert, W.: Aufbereitung und Recherche von nach RSWK gebildeten Daten in der CD-ROM-Ausgabe der Deutschen Bibliographie (1990) 4.34
    4.3370585 = sum of:
      4.3370585 = weight(author_txt:gödert in 168) [ClassicSimilarity], result of:
        4.3370585 = fieldWeight in 168, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          6.939294 = idf(docFreq=116, maxDocs=44421)
          0.625 = fieldNorm(doc=168)
    
  5. Gödert, W.: Gestaltung sachlicher Abfragekomponenten für Online-Kataloge : Vortrag anläßlich der Tagung 'Automatisierte Sacherschließung - Status und Trends, Schloß Hofen, Lochau bei Bregenz, 17.4.-20.4.1989 (???) 4.34
    4.3370585 = sum of:
      4.3370585 = weight(author_txt:gödert in 170) [ClassicSimilarity], result of:
        4.3370585 = fieldWeight in 170, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          6.939294 = idf(docFreq=116, maxDocs=44421)
          0.625 = fieldNorm(doc=170)
    

Similar documents (content)

  1. Nagy T., I.: Detecting multiword expressions and named entities in natural language texts (2014) 0.30
    0.29889682 = sum of:
      0.29889682 = product of:
        0.9340526 = sum of:
          0.02343605 = weight(abstract_txt:previously in 2536) [ClassicSimilarity], result of:
            0.02343605 = score(doc=2536,freq=1.0), product of:
              0.097762436 = queryWeight, product of:
                1.0512366 = boost
                6.136947 = idf(docFreq=260, maxDocs=44421)
                0.015153716 = queryNorm
              0.2397245 = fieldWeight in 2536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.136947 = idf(docFreq=260, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2536)
          0.018498775 = weight(abstract_txt:based in 2536) [ClassicSimilarity], result of:
            0.018498775 = score(doc=2536,freq=8.0), product of:
              0.052600645 = queryWeight, product of:
                1.0904983 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.015153716 = queryNorm
              0.35168344 = fieldWeight in 2536, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2536)
          0.0296695 = weight(abstract_txt:dictionary in 2536) [ClassicSimilarity], result of:
            0.0296695 = score(doc=2536,freq=1.0), product of:
              0.11440787 = queryWeight, product of:
                1.1372145 = boost
                6.6388726 = idf(docFreq=157, maxDocs=44421)
                0.015153716 = queryNorm
              0.25933096 = fieldWeight in 2536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6388726 = idf(docFreq=157, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2536)
          0.08915567 = weight(abstract_txt:detection in 2536) [ClassicSimilarity], result of:
            0.08915567 = score(doc=2536,freq=8.0), product of:
              0.11912009 = queryWeight, product of:
                1.160398 = boost
                6.774214 = idf(docFreq=137, maxDocs=44421)
                0.015153716 = queryNorm
              0.74845195 = fieldWeight in 2536, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                6.774214 = idf(docFreq=137, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2536)
          0.06407854 = weight(abstract_txt:named in 2536) [ClassicSimilarity], result of:
            0.06407854 = score(doc=2536,freq=4.0), product of:
              0.12042153 = queryWeight, product of:
                1.1667197 = boost
                6.8111186 = idf(docFreq=132, maxDocs=44421)
                0.015153716 = queryNorm
              0.5321186 = fieldWeight in 2536, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.8111186 = idf(docFreq=132, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2536)
          0.013380915 = weight(abstract_txt:text in 2536) [ClassicSimilarity], result of:
            0.013380915 = score(doc=2536,freq=1.0), product of:
              0.08477145 = queryWeight, product of:
                1.3843766 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.015153716 = queryNorm
              0.15784696 = fieldWeight in 2536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2536)
          0.04128899 = weight(abstract_txt:method in 2536) [ClassicSimilarity], result of:
            0.04128899 = score(doc=2536,freq=5.0), product of:
              0.105073184 = queryWeight, product of:
                1.5412582 = boost
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.015153716 = queryNorm
              0.3929546 = fieldWeight in 2536, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2536)
          0.6545441 = weight(abstract_txt:multiword in 2536) [ClassicSimilarity], result of:
            0.6545441 = score(doc=2536,freq=11.0), product of:
              0.58360344 = queryWeight, product of:
                4.4487095 = boost
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.015153716 = queryNorm
              1.1215563 = fieldWeight in 2536, product of:
                3.3166249 = tf(freq=11.0), with freq of:
                  11.0 = termFreq=11.0
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2536)
        0.32 = coord(8/25)
    
  2. Cruys, T. van de; Moirón, B.V.: Semantics-based multiword expression extraction (2007) 0.17
    0.16897345 = sum of:
      0.16897345 = product of:
        0.8448672 = sum of:
          0.015696732 = weight(abstract_txt:based in 3919) [ClassicSimilarity], result of:
            0.015696732 = score(doc=3919,freq=1.0), product of:
              0.052600645 = queryWeight, product of:
                1.0904983 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.015153716 = queryNorm
              0.2984133 = fieldWeight in 3919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.09375 = fieldNorm(doc=3919)
          0.12518637 = weight(abstract_txt:nouns in 3919) [ClassicSimilarity], result of:
            0.12518637 = score(doc=3919,freq=1.0), product of:
              0.16665292 = queryWeight, product of:
                1.3725271 = boost
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.015153716 = queryNorm
              0.7511802 = fieldWeight in 3919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.09375 = fieldNorm(doc=3919)
          0.06267227 = weight(abstract_txt:method in 3919) [ClassicSimilarity], result of:
            0.06267227 = score(doc=3919,freq=2.0), product of:
              0.105073184 = queryWeight, product of:
                1.5412582 = boost
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.015153716 = queryNorm
              0.5964631 = fieldWeight in 3919, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.09375 = fieldNorm(doc=3919)
          0.16766593 = weight(abstract_txt:corpora in 3919) [ClassicSimilarity], result of:
            0.16766593 = score(doc=3919,freq=1.0), product of:
              0.25512213 = queryWeight, product of:
                2.4016159 = boost
                7.01012 = idf(docFreq=108, maxDocs=44421)
                0.015153716 = queryNorm
              0.6571987 = fieldWeight in 3919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.01012 = idf(docFreq=108, maxDocs=44421)
                0.09375 = fieldNorm(doc=3919)
          0.47364593 = weight(abstract_txt:multiword in 3919) [ClassicSimilarity], result of:
            0.47364593 = score(doc=3919,freq=1.0), product of:
              0.58360344 = queryWeight, product of:
                4.4487095 = boost
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.015153716 = queryNorm
              0.81158864 = fieldWeight in 3919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.09375 = fieldNorm(doc=3919)
        0.2 = coord(5/25)
    
  3. Dias, G.: Multiword unit hybrid extraction (o.J.) 0.11
    0.11132311 = sum of:
      0.11132311 = product of:
        0.9276926 = sum of:
          0.10432197 = weight(abstract_txt:nouns in 1643) [ClassicSimilarity], result of:
            0.10432197 = score(doc=1643,freq=1.0), product of:
              0.16665292 = queryWeight, product of:
                1.3725271 = boost
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.015153716 = queryNorm
              0.6259835 = fieldWeight in 1643, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.078125 = fieldNorm(doc=1643)
          0.13972162 = weight(abstract_txt:corpora in 1643) [ClassicSimilarity], result of:
            0.13972162 = score(doc=1643,freq=1.0), product of:
              0.25512213 = queryWeight, product of:
                2.4016159 = boost
                7.01012 = idf(docFreq=108, maxDocs=44421)
                0.015153716 = queryNorm
              0.5476656 = fieldWeight in 1643, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.01012 = idf(docFreq=108, maxDocs=44421)
                0.078125 = fieldNorm(doc=1643)
          0.683649 = weight(abstract_txt:multiword in 1643) [ClassicSimilarity], result of:
            0.683649 = score(doc=1643,freq=3.0), product of:
              0.58360344 = queryWeight, product of:
                4.4487095 = boost
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.015153716 = queryNorm
              1.1714272 = fieldWeight in 1643, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.078125 = fieldNorm(doc=1643)
        0.12 = coord(3/25)
    
  4. Terada, A.; Tokunaga, T.; Tanaka, H.: Automatic expansion of abbreviations by using context and character information (2004) 0.11
    0.109293714 = sum of:
      0.109293714 = product of:
        0.34154287 = sum of:
          0.03749768 = weight(abstract_txt:previously in 3560) [ClassicSimilarity], result of:
            0.03749768 = score(doc=3560,freq=1.0), product of:
              0.097762436 = queryWeight, product of:
                1.0512366 = boost
                6.136947 = idf(docFreq=260, maxDocs=44421)
                0.015153716 = queryNorm
              0.3835592 = fieldWeight in 3560, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.136947 = idf(docFreq=260, maxDocs=44421)
                0.0625 = fieldNorm(doc=3560)
          0.037709996 = weight(abstract_txt:instead in 3560) [ClassicSimilarity], result of:
            0.037709996 = score(doc=3560,freq=1.0), product of:
              0.09813111 = queryWeight, product of:
                1.0532169 = boost
                6.148508 = idf(docFreq=257, maxDocs=44421)
                0.015153716 = queryNorm
              0.38428175 = fieldWeight in 3560, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.148508 = idf(docFreq=257, maxDocs=44421)
                0.0625 = fieldNorm(doc=3560)
          0.010464488 = weight(abstract_txt:based in 3560) [ClassicSimilarity], result of:
            0.010464488 = score(doc=3560,freq=1.0), product of:
              0.052600645 = queryWeight, product of:
                1.0904983 = boost
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.015153716 = queryNorm
              0.1989422 = fieldWeight in 3560, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1830752 = idf(docFreq=5005, maxDocs=44421)
                0.0625 = fieldNorm(doc=3560)
          0.047471203 = weight(abstract_txt:dictionary in 3560) [ClassicSimilarity], result of:
            0.047471203 = score(doc=3560,freq=1.0), product of:
              0.11440787 = queryWeight, product of:
                1.1372145 = boost
                6.6388726 = idf(docFreq=157, maxDocs=44421)
                0.015153716 = queryNorm
              0.41492954 = fieldWeight in 3560, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6388726 = idf(docFreq=157, maxDocs=44421)
                0.0625 = fieldNorm(doc=3560)
          0.061750956 = weight(abstract_txt:dictionaries in 3560) [ClassicSimilarity], result of:
            0.061750956 = score(doc=3560,freq=1.0), product of:
              0.13633212 = queryWeight, product of:
                1.2414052 = boost
                7.2471204 = idf(docFreq=85, maxDocs=44421)
                0.015153716 = queryNorm
              0.45294502 = fieldWeight in 3560, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2471204 = idf(docFreq=85, maxDocs=44421)
                0.0625 = fieldNorm(doc=3560)
          0.08345758 = weight(abstract_txt:nouns in 3560) [ClassicSimilarity], result of:
            0.08345758 = score(doc=3560,freq=1.0), product of:
              0.16665292 = queryWeight, product of:
                1.3725271 = boost
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.015153716 = queryNorm
              0.5007868 = fieldWeight in 3560, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.0625 = fieldNorm(doc=3560)
          0.021409463 = weight(abstract_txt:text in 3560) [ClassicSimilarity], result of:
            0.021409463 = score(doc=3560,freq=1.0), product of:
              0.08477145 = queryWeight, product of:
                1.3843766 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.015153716 = queryNorm
              0.25255513 = fieldWeight in 3560, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=3560)
          0.04178152 = weight(abstract_txt:method in 3560) [ClassicSimilarity], result of:
            0.04178152 = score(doc=3560,freq=2.0), product of:
              0.105073184 = queryWeight, product of:
                1.5412582 = boost
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.015153716 = queryNorm
              0.39764208 = fieldWeight in 3560, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.0625 = fieldNorm(doc=3560)
        0.32 = coord(8/25)
    
  5. Wordhoard (o.J.) 0.10
    0.10180085 = sum of:
      0.10180085 = product of:
        0.84834045 = sum of:
          0.02676183 = weight(abstract_txt:text in 909) [ClassicSimilarity], result of:
            0.02676183 = score(doc=909,freq=1.0), product of:
              0.08477145 = queryWeight, product of:
                1.3843766 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.015153716 = queryNorm
              0.3156939 = fieldWeight in 909, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.078125 = fieldNorm(doc=909)
          0.2633816 = weight(abstract_txt:phrases in 909) [ClassicSimilarity], result of:
            0.2633816 = score(doc=909,freq=4.0), product of:
              0.2452502 = queryWeight, product of:
                2.3546922 = boost
                6.8731537 = idf(docFreq=124, maxDocs=44421)
                0.015153716 = queryNorm
              1.0739303 = fieldWeight in 909, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.8731537 = idf(docFreq=124, maxDocs=44421)
                0.078125 = fieldNorm(doc=909)
          0.558197 = weight(abstract_txt:multiword in 909) [ClassicSimilarity], result of:
            0.558197 = score(doc=909,freq=2.0), product of:
              0.58360344 = queryWeight, product of:
                4.4487095 = boost
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.015153716 = queryNorm
              0.9564663 = fieldWeight in 909, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.078125 = fieldNorm(doc=909)
        0.12 = coord(3/25)