Document (#40623)

Gil-Leiva, I.
SISA-automatic indexing system for scientific articles : experiments with location heuristics rules versus TF-IDF rules
Knowledge organization. 44(2017) no.3, S.139-162
Indexing is contextualized and a brief description is provided of some of the most used automatic indexing systems. We describe SISA, a system which uses location heuristics rules, statistical rules like term frequency (TF) or TF-IDF to obtain automatic or semi-automatic indexing, depending on the user's preference. The aim of this research is to ascertain which rules (location heuristics rules or TF-IDF rules) provide the best indexing terms. SISA is used to obtain the automatic indexing of 200 scientific articles on fruit growing written in Portuguese. It uses, on the one hand, location heuristics rules founded on the value of certain parts of the articles for indexing such as titles, abstracts, keywords, headings, first paragraph, conclusions and references and, on the other, TF-IDF rules. The indexing is then evaluated to ascertain retrieval performance through recall, precision and f-measure. Automatic indexing of the articles with location heuristics rules provided the best results with the evaluation measures.
Beitrag in einem Special Issue "New Trends for Knowledge Organization, Guest Editor: Renato Rocha Souza ".
Automatisches Indexieren

Similar documents (author)

  1. Leiva, I.G. -> Gil-Leiva, I.: 5.56
    5.564393 = sum of:
      5.564393 = weight(author_txt:leiva in 98) [ClassicSimilarity], result of:
        5.564393 = fieldWeight in 98, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.993418 = idf(docFreq=14, maxDocs=44421)
          0.4375 = fieldNorm(doc=98)
  2. Mederos, A. Leiva- = > Leiva-Mederos, A.: 4.77
    4.7694798 = sum of:
      4.7694798 = weight(author_txt:leiva in 4166) [ClassicSimilarity], result of:
        4.7694798 = fieldWeight in 4166, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.993418 = idf(docFreq=14, maxDocs=44421)
          0.375 = fieldNorm(doc=4166)
  3. Leiva, I. Gil- => Gil-Leiva, I.: 4.77
    4.7694798 = sum of:
      4.7694798 = weight(author_txt:leiva in 735) [ClassicSimilarity], result of:
        4.7694798 = fieldWeight in 735, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.993418 = idf(docFreq=14, maxDocs=44421)
          0.375 = fieldNorm(doc=735)
  4. Leiva, I. Gil- => Gil-Leiva, I.: 4.77
    4.7694798 = sum of:
      4.7694798 = weight(author_txt:leiva in 912) [ClassicSimilarity], result of:
        4.7694798 = fieldWeight in 912, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.993418 = idf(docFreq=14, maxDocs=44421)
          0.375 = fieldNorm(doc=912)
  5. Leiva, I. Gil- => Gil-Leiva, I.: 4.77
    4.7694798 = sum of:
      4.7694798 = weight(author_txt:leiva in 1739) [ClassicSimilarity], result of:
        4.7694798 = fieldWeight in 1739, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.993418 = idf(docFreq=14, maxDocs=44421)
          0.375 = fieldNorm(doc=1739)

Similar documents (content)

  1. Kim, P.K.: ¬An automatic indexing of compound words based on mutual information for Korean text retrieval (1995) 0.15
    0.14582996 = sum of:
      0.14582996 = product of:
        0.7291498 = sum of:
          0.01223812 = weight(abstract_txt:used in 1620) [ClassicSimilarity], result of:
            0.01223812 = score(doc=1620,freq=1.0), product of:
              0.038883578 = queryWeight, product of:
                1.1028295 = boost
                3.3572001 = idf(docFreq=4205, maxDocs=44421)
                0.010502208 = queryNorm
              0.3147375 = fieldWeight in 1620, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3572001 = idf(docFreq=4205, maxDocs=44421)
                0.09375 = fieldNorm(doc=1620)
          0.012409239 = weight(abstract_txt:system in 1620) [ClassicSimilarity], result of:
            0.012409239 = score(doc=1620,freq=1.0), product of:
              0.039245196 = queryWeight, product of:
                1.1079458 = boost
                3.372775 = idf(docFreq=4140, maxDocs=44421)
                0.010502208 = queryNorm
              0.31619766 = fieldWeight in 1620, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.372775 = idf(docFreq=4140, maxDocs=44421)
                0.09375 = fieldNorm(doc=1620)
          0.13609192 = weight(abstract_txt:automatic in 1620) [ClassicSimilarity], result of:
            0.13609192 = score(doc=1620,freq=1.0), product of:
              0.27939484 = queryWeight, product of:
                5.120296 = boost
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.010502208 = queryNorm
              0.48709533 = fieldWeight in 1620, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.09375 = fieldNorm(doc=1620)
          0.23965472 = weight(abstract_txt:indexing in 1620) [ClassicSimilarity], result of:
            0.23965472 = score(doc=1620,freq=4.0), product of:
              0.29380864 = queryWeight, product of:
                6.4307823 = boost
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.010502208 = queryNorm
              0.815683 = fieldWeight in 1620, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.09375 = fieldNorm(doc=1620)
          0.32875586 = weight(abstract_txt:rules in 1620) [ClassicSimilarity], result of:
            0.32875586 = score(doc=1620,freq=2.0), product of:
              0.4733533 = queryWeight, product of:
                8.604051 = boost
                5.238438 = idf(docFreq=640, maxDocs=44421)
                0.010502208 = queryNorm
              0.69452536 = fieldWeight in 1620, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.238438 = idf(docFreq=640, maxDocs=44421)
                0.09375 = fieldNorm(doc=1620)
        0.2 = coord(5/25)
  2. Mundgod, M.B.; Prasad, A.R.D.: Automatic identification of bibliographic data elements from the title pages of documents : a heuristic approach (1996) 0.14
    0.14134693 = sum of:
      0.14134693 = product of:
        0.8834183 = sum of:
          0.021493433 = weight(abstract_txt:system in 1397) [ClassicSimilarity], result of:
            0.021493433 = score(doc=1397,freq=3.0), product of:
              0.039245196 = queryWeight, product of:
                1.1079458 = boost
                3.372775 = idf(docFreq=4140, maxDocs=44421)
                0.010502208 = queryNorm
              0.5476704 = fieldWeight in 1397, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.372775 = idf(docFreq=4140, maxDocs=44421)
                0.09375 = fieldNorm(doc=1397)
          0.0075453785 = weight(abstract_txt:with in 1397) [ClassicSimilarity], result of:
            0.0075453785 = score(doc=1397,freq=1.0), product of:
              0.0322434 = queryWeight, product of:
                1.2299609 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.010502208 = queryNorm
              0.23401311 = fieldWeight in 1397, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.09375 = fieldNorm(doc=1397)
          0.19246304 = weight(abstract_txt:automatic in 1397) [ClassicSimilarity], result of:
            0.19246304 = score(doc=1397,freq=2.0), product of:
              0.27939484 = queryWeight, product of:
                5.120296 = boost
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.010502208 = queryNorm
              0.68885684 = fieldWeight in 1397, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.09375 = fieldNorm(doc=1397)
          0.66191643 = weight(abstract_txt:heuristics in 1397) [ClassicSimilarity], result of:
            0.66191643 = score(doc=1397,freq=3.0), product of:
              0.5233169 = queryWeight, product of:
                6.3970194 = boost
                7.7894444 = idf(docFreq=49, maxDocs=44421)
                0.010502208 = queryNorm
              1.2648481 = fieldWeight in 1397, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.7894444 = idf(docFreq=49, maxDocs=44421)
                0.09375 = fieldNorm(doc=1397)
        0.16 = coord(4/25)
  3. Ibekwe-SanJuan, F.: Semantic metadata annotation : tagging Medline abstracts for enhanced information access (2010) 0.14
    0.14080015 = sum of:
      0.14080015 = product of:
        0.5866673 = sum of:
          0.024480306 = weight(abstract_txt:conclusions in 936) [ClassicSimilarity], result of:
            0.024480306 = score(doc=936,freq=1.0), product of:
              0.070179954 = queryWeight, product of:
                1.0476514 = boost
                6.3784575 = idf(docFreq=204, maxDocs=44421)
                0.010502208 = queryNorm
              0.3488219 = fieldWeight in 936, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3784575 = idf(docFreq=204, maxDocs=44421)
                0.0546875 = fieldNorm(doc=936)
          0.0071389037 = weight(abstract_txt:used in 936) [ClassicSimilarity], result of:
            0.0071389037 = score(doc=936,freq=1.0), product of:
              0.038883578 = queryWeight, product of:
                1.1028295 = boost
                3.3572001 = idf(docFreq=4205, maxDocs=44421)
                0.010502208 = queryNorm
              0.18359688 = fieldWeight in 936, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3572001 = idf(docFreq=4205, maxDocs=44421)
                0.0546875 = fieldNorm(doc=936)
          0.006224619 = weight(abstract_txt:with in 936) [ClassicSimilarity], result of:
            0.006224619 = score(doc=936,freq=2.0), product of:
              0.0322434 = queryWeight, product of:
                1.2299609 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.010502208 = queryNorm
              0.19305095 = fieldWeight in 936, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.0546875 = fieldNorm(doc=936)
          0.026524395 = weight(abstract_txt:scientific in 936) [ClassicSimilarity], result of:
            0.026524395 = score(doc=936,freq=2.0), product of:
              0.07403417 = queryWeight, product of:
                1.5217432 = boost
                4.6324444 = idf(docFreq=1174, maxDocs=44421)
                0.010502208 = queryNorm
              0.35827234 = fieldWeight in 936, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6324444 = idf(docFreq=1174, maxDocs=44421)
                0.0546875 = fieldNorm(doc=936)
          0.023822902 = weight(abstract_txt:best in 936) [ClassicSimilarity], result of:
            0.023822902 = score(doc=936,freq=1.0), product of:
              0.08683104 = queryWeight, product of:
                1.6480211 = boost
                5.0168557 = idf(docFreq=799, maxDocs=44421)
                0.010502208 = queryNorm
              0.2743593 = fieldWeight in 936, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0168557 = idf(docFreq=799, maxDocs=44421)
                0.0546875 = fieldNorm(doc=936)
          0.49847615 = weight(abstract_txt:heuristics in 936) [ClassicSimilarity], result of:
            0.49847615 = score(doc=936,freq=5.0), product of:
              0.5233169 = queryWeight, product of:
                6.3970194 = boost
                7.7894444 = idf(docFreq=49, maxDocs=44421)
                0.010502208 = queryNorm
              0.95253205 = fieldWeight in 936, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.7894444 = idf(docFreq=49, maxDocs=44421)
                0.0546875 = fieldNorm(doc=936)
        0.24 = coord(6/25)
  4. Panyr, J.: Information retrieval techniques in rule-based expert systems (1991) 0.13
    0.12674943 = sum of:
      0.12674943 = product of:
        0.6337471 = sum of:
          0.011538211 = weight(abstract_txt:used in 3035) [ClassicSimilarity], result of:
            0.011538211 = score(doc=3035,freq=2.0), product of:
              0.038883578 = queryWeight, product of:
                1.1028295 = boost
                3.3572001 = idf(docFreq=4205, maxDocs=44421)
                0.010502208 = queryNorm
              0.29673737 = fieldWeight in 3035, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3572001 = idf(docFreq=4205, maxDocs=44421)
                0.0625 = fieldNorm(doc=3035)
          0.014328956 = weight(abstract_txt:system in 3035) [ClassicSimilarity], result of:
            0.014328956 = score(doc=3035,freq=3.0), product of:
              0.039245196 = queryWeight, product of:
                1.1079458 = boost
                3.372775 = idf(docFreq=4140, maxDocs=44421)
                0.010502208 = queryNorm
              0.36511362 = fieldWeight in 3035, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.372775 = idf(docFreq=4140, maxDocs=44421)
                0.0625 = fieldNorm(doc=3035)
          0.1814559 = weight(abstract_txt:automatic in 3035) [ClassicSimilarity], result of:
            0.1814559 = score(doc=3035,freq=4.0), product of:
              0.27939484 = queryWeight, product of:
                5.120296 = boost
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.010502208 = queryNorm
              0.64946043 = fieldWeight in 3035, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.0625 = fieldNorm(doc=3035)
          0.0798849 = weight(abstract_txt:indexing in 3035) [ClassicSimilarity], result of:
            0.0798849 = score(doc=3035,freq=1.0), product of:
              0.29380864 = queryWeight, product of:
                6.4307823 = boost
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.010502208 = queryNorm
              0.27189434 = fieldWeight in 3035, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.0625 = fieldNorm(doc=3035)
          0.3465391 = weight(abstract_txt:rules in 3035) [ClassicSimilarity], result of:
            0.3465391 = score(doc=3035,freq=5.0), product of:
              0.4733533 = queryWeight, product of:
                8.604051 = boost
                5.238438 = idf(docFreq=640, maxDocs=44421)
                0.010502208 = queryNorm
              0.732094 = fieldWeight in 3035, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.238438 = idf(docFreq=640, maxDocs=44421)
                0.0625 = fieldNorm(doc=3035)
        0.2 = coord(5/25)
  5. Driscoll, J.R.; Rajala, D.A.; Shaffer, W.H.: ¬The operation and performance of an artificially intelligent keywording system (1991) 0.12
    0.11622614 = sum of:
      0.11622614 = product of:
        0.5811307 = sum of:
          0.010198434 = weight(abstract_txt:used in 6680) [ClassicSimilarity], result of:
            0.010198434 = score(doc=6680,freq=1.0), product of:
              0.038883578 = queryWeight, product of:
                1.1028295 = boost
                3.3572001 = idf(docFreq=4205, maxDocs=44421)
                0.010502208 = queryNorm
              0.26228127 = fieldWeight in 6680, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3572001 = idf(docFreq=4205, maxDocs=44421)
                0.078125 = fieldNorm(doc=6680)
          0.010341033 = weight(abstract_txt:system in 6680) [ClassicSimilarity], result of:
            0.010341033 = score(doc=6680,freq=1.0), product of:
              0.039245196 = queryWeight, product of:
                1.1079458 = boost
                3.372775 = idf(docFreq=4140, maxDocs=44421)
                0.010502208 = queryNorm
              0.26349807 = fieldWeight in 6680, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.372775 = idf(docFreq=4140, maxDocs=44421)
                0.078125 = fieldNorm(doc=6680)
          0.0319308 = weight(abstract_txt:provided in 6680) [ClassicSimilarity], result of:
            0.0319308 = score(doc=6680,freq=1.0), product of:
              0.08321797 = queryWeight, product of:
                1.6133695 = boost
                4.9113703 = idf(docFreq=888, maxDocs=44421)
                0.010502208 = queryNorm
              0.3837008 = fieldWeight in 6680, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9113703 = idf(docFreq=888, maxDocs=44421)
                0.078125 = fieldNorm(doc=6680)
          0.1412179 = weight(abstract_txt:indexing in 6680) [ClassicSimilarity], result of:
            0.1412179 = score(doc=6680,freq=2.0), product of:
              0.29380864 = queryWeight, product of:
                6.4307823 = boost
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.010502208 = queryNorm
              0.48064584 = fieldWeight in 6680, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3503094 = idf(docFreq=1557, maxDocs=44421)
                0.078125 = fieldNorm(doc=6680)
          0.3874425 = weight(abstract_txt:rules in 6680) [ClassicSimilarity], result of:
            0.3874425 = score(doc=6680,freq=4.0), product of:
              0.4733533 = queryWeight, product of:
                8.604051 = boost
                5.238438 = idf(docFreq=640, maxDocs=44421)
                0.010502208 = queryNorm
              0.81850594 = fieldWeight in 6680, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.238438 = idf(docFreq=640, maxDocs=44421)
                0.078125 = fieldNorm(doc=6680)
        0.2 = coord(5/25)