Document (#33374)

Author
Can, F.
Kocberber, S.
Balcik, E.
Kaynak, C.
Ocalan, H.C.
Title
Information retrieval on Turkish texts
Source
Journal of the American Society for Information Science and Technology. 59(2008) no.3, S.407-421
Year
2008
Abstract
In this study, we investigate information retrieval (IR) on Turkish texts using a large-scale test collection that contains 408,305 documents and 72 ad hoc queries. We examine the effects of several stemming options and query-document matching functions on retrieval performance. We show that a simple word truncation approach, a word truncation approach that uses language-dependent corpus statistics, and an elaborate lemmatizer-based stemmer provide similar retrieval effectiveness in Turkish IR. We investigate the effects of a range of search conditions on the retrieval performance; these include scalability issues, query and document length effects, and the use of stopword list in indexing.

Similar documents (content)

  1. Ekmekcioglu, F.C.; Lynch, M.F.; Willet, P.: Development and evaluation of conflation techniques for the implementation of a document retrieval system for Turkish text databases (1995) 0.51
    0.5063446 = sum of:
      0.5063446 = product of:
        1.8083736 = sum of:
          0.05409347 = weight(abstract_txt:matching in 5865) [ClassicSimilarity], result of:
            0.05409347 = score(doc=5865,freq=1.0), product of:
              0.095497906 = queryWeight, product of:
                1.0172693 = boost
                6.0419855 = idf(docFreq=286, maxDocs=44421)
                0.015537397 = queryNorm
              0.5664362 = fieldWeight in 5865, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0419855 = idf(docFreq=286, maxDocs=44421)
                0.09375 = fieldNorm(doc=5865)
          0.009731611 = weight(abstract_txt:that in 5865) [ClassicSimilarity], result of:
            0.009731611 = score(doc=5865,freq=1.0), product of:
              0.043892894 = queryWeight, product of:
                1.194529 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.015537397 = queryNorm
              0.22171268 = fieldWeight in 5865, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.09375 = fieldNorm(doc=5865)
          0.14358507 = weight(abstract_txt:stemming in 5865) [ClassicSimilarity], result of:
            0.14358507 = score(doc=5865,freq=2.0), product of:
              0.14530933 = queryWeight, product of:
                1.2548324 = boost
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.015537397 = queryNorm
              0.98813385 = fieldWeight in 5865, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.09375 = fieldNorm(doc=5865)
          0.038838938 = weight(abstract_txt:document in 5865) [ClassicSimilarity], result of:
            0.038838938 = score(doc=5865,freq=1.0), product of:
              0.09647591 = queryWeight, product of:
                1.4459838 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.015537397 = queryNorm
              0.40257657 = fieldWeight in 5865, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.09375 = fieldNorm(doc=5865)
          0.22769943 = weight(abstract_txt:stopword in 5865) [ClassicSimilarity], result of:
            0.22769943 = score(doc=5865,freq=1.0), product of:
              0.24896517 = queryWeight, product of:
                1.642511 = boost
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.015537397 = queryNorm
              0.91458344 = fieldWeight in 5865, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.09375 = fieldNorm(doc=5865)
          0.05152316 = weight(abstract_txt:retrieval in 5865) [ClassicSimilarity], result of:
            0.05152316 = score(doc=5865,freq=1.0), product of:
              0.15808438 = queryWeight, product of:
                2.9266343 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.015537397 = queryNorm
              0.3259219 = fieldWeight in 5865, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.09375 = fieldNorm(doc=5865)
          1.2829019 = weight(abstract_txt:turkish in 5865) [ClassicSimilarity], result of:
            1.2829019 = score(doc=5865,freq=6.0), product of:
              0.62567616 = queryWeight, product of:
                4.509978 = boost
                8.928879 = idf(docFreq=15, maxDocs=44421)
                0.015537397 = queryNorm
              2.0504248 = fieldWeight in 5865, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                8.928879 = idf(docFreq=15, maxDocs=44421)
                0.09375 = fieldNorm(doc=5865)
        0.28 = coord(7/25)
    
  2. Can, F.; Kocberber, S.; Baglioglu, O.; Kardas, S.; Ocalan, H.C.; Uyar, E.: New event detection and topic tracking in Turkish (2010) 0.38
    0.38149318 = sum of:
      0.38149318 = product of:
        1.1921662 = sum of:
          0.014507029 = weight(abstract_txt:that in 429) [ClassicSimilarity], result of:
            0.014507029 = score(doc=429,freq=5.0), product of:
              0.043892894 = queryWeight, product of:
                1.194529 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.015537397 = queryNorm
              0.33050975 = fieldWeight in 429, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=429)
          0.095723376 = weight(abstract_txt:stemming in 429) [ClassicSimilarity], result of:
            0.095723376 = score(doc=429,freq=2.0), product of:
              0.14530933 = queryWeight, product of:
                1.2548324 = boost
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.015537397 = queryNorm
              0.6587559 = fieldWeight in 429, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.0625 = fieldNorm(doc=429)
          0.024214346 = weight(abstract_txt:approach in 429) [ClassicSimilarity], result of:
            0.024214346 = score(doc=429,freq=2.0), product of:
              0.07322735 = queryWeight, product of:
                1.2597682 = boost
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.015537397 = queryNorm
              0.33067352 = fieldWeight in 429, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.0625 = fieldNorm(doc=429)
          0.05109402 = weight(abstract_txt:investigate in 429) [ClassicSimilarity], result of:
            0.05109402 = score(doc=429,freq=1.0), product of:
              0.15178011 = queryWeight, product of:
                1.8136832 = boost
                5.38611 = idf(docFreq=552, maxDocs=44421)
                0.015537397 = queryNorm
              0.33663186 = fieldWeight in 429, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.38611 = idf(docFreq=552, maxDocs=44421)
                0.0625 = fieldNorm(doc=429)
          0.074369304 = weight(abstract_txt:word in 429) [ClassicSimilarity], result of:
            0.074369304 = score(doc=429,freq=2.0), product of:
              0.15472268 = queryWeight, product of:
                1.8311797 = boost
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.015537397 = queryNorm
              0.48066196 = fieldWeight in 429, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.0625 = fieldNorm(doc=429)
          0.19958605 = weight(abstract_txt:truncation in 429) [ClassicSimilarity], result of:
            0.19958605 = score(doc=429,freq=1.0), product of:
              0.3764624 = queryWeight, product of:
                2.856372 = boost
                8.482592 = idf(docFreq=24, maxDocs=44421)
                0.015537397 = queryNorm
              0.530162 = fieldWeight in 429, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.482592 = idf(docFreq=24, maxDocs=44421)
                0.0625 = fieldNorm(doc=429)
          0.03434877 = weight(abstract_txt:retrieval in 429) [ClassicSimilarity], result of:
            0.03434877 = score(doc=429,freq=1.0), product of:
              0.15808438 = queryWeight, product of:
                2.9266343 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.015537397 = queryNorm
              0.21728125 = fieldWeight in 429, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.0625 = fieldNorm(doc=429)
          0.6983233 = weight(abstract_txt:turkish in 429) [ClassicSimilarity], result of:
            0.6983233 = score(doc=429,freq=4.0), product of:
              0.62567616 = queryWeight, product of:
                4.509978 = boost
                8.928879 = idf(docFreq=15, maxDocs=44421)
                0.015537397 = queryNorm
              1.1161098 = fieldWeight in 429, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                8.928879 = idf(docFreq=15, maxDocs=44421)
                0.0625 = fieldNorm(doc=429)
        0.32 = coord(8/25)
    
  3. Ahlgren, P.; Kekäläinen, J.: Indexing strategies for Swedish full text retrieval under different user scenarios (2007) 0.20
    0.20298226 = sum of:
      0.20298226 = product of:
        0.72493666 = sum of:
          0.011237095 = weight(abstract_txt:that in 1896) [ClassicSimilarity], result of:
            0.011237095 = score(doc=1896,freq=3.0), product of:
              0.043892894 = queryWeight, product of:
                1.194529 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.015537397 = queryNorm
              0.25601172 = fieldWeight in 1896, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=1896)
          0.0366177 = weight(abstract_txt:document in 1896) [ClassicSimilarity], result of:
            0.0366177 = score(doc=1896,freq=2.0), product of:
              0.09647591 = queryWeight, product of:
                1.4459838 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.015537397 = queryNorm
              0.3795528 = fieldWeight in 1896, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.0625 = fieldNorm(doc=1896)
          0.055842277 = weight(abstract_txt:performance in 1896) [ClassicSimilarity], result of:
            0.055842277 = score(doc=1896,freq=3.0), product of:
              0.11166142 = queryWeight, product of:
                1.5556272 = boost
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.015537397 = queryNorm
              0.5001036 = fieldWeight in 1896, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.0625 = fieldNorm(doc=1896)
          0.07858539 = weight(abstract_txt:query in 1896) [ClassicSimilarity], result of:
            0.07858539 = score(doc=1896,freq=5.0), product of:
              0.11826948 = queryWeight, product of:
                1.6009963 = boost
                4.754492 = idf(docFreq=1039, maxDocs=44421)
                0.015537397 = queryNorm
              0.6644604 = fieldWeight in 1896, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.754492 = idf(docFreq=1039, maxDocs=44421)
                0.0625 = fieldNorm(doc=1896)
          0.08398828 = weight(abstract_txt:effects in 1896) [ClassicSimilarity], result of:
            0.08398828 = score(doc=1896,freq=1.0), product of:
              0.24199758 = queryWeight, product of:
                2.8048208 = boost
                5.5529995 = idf(docFreq=467, maxDocs=44421)
                0.015537397 = queryNorm
              0.34706247 = fieldWeight in 1896, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5529995 = idf(docFreq=467, maxDocs=44421)
                0.0625 = fieldNorm(doc=1896)
          0.3991721 = weight(abstract_txt:truncation in 1896) [ClassicSimilarity], result of:
            0.3991721 = score(doc=1896,freq=4.0), product of:
              0.3764624 = queryWeight, product of:
                2.856372 = boost
                8.482592 = idf(docFreq=24, maxDocs=44421)
                0.015537397 = queryNorm
              1.060324 = fieldWeight in 1896, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                8.482592 = idf(docFreq=24, maxDocs=44421)
                0.0625 = fieldNorm(doc=1896)
          0.059493814 = weight(abstract_txt:retrieval in 1896) [ClassicSimilarity], result of:
            0.059493814 = score(doc=1896,freq=3.0), product of:
              0.15808438 = queryWeight, product of:
                2.9266343 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.015537397 = queryNorm
              0.37634215 = fieldWeight in 1896, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.0625 = fieldNorm(doc=1896)
        0.28 = coord(7/25)
    
  4. Yilmaz, T.; Ozcan, R.; Altingovde, I.S.; Ulusoy, Ö.: Improving educational web search for question-like queries through subject classification (2019) 0.16
    0.16022302 = sum of:
      0.16022302 = product of:
        0.5722251 = sum of:
          0.04578553 = weight(abstract_txt:length in 41) [ClassicSimilarity], result of:
            0.04578553 = score(doc=41,freq=1.0), product of:
              0.11197223 = queryWeight, product of:
                1.1015245 = boost
                6.5424123 = idf(docFreq=173, maxDocs=44421)
                0.015537397 = queryNorm
              0.40890077 = fieldWeight in 41, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5424123 = idf(docFreq=173, maxDocs=44421)
                0.0625 = fieldNorm(doc=41)
          0.014507029 = weight(abstract_txt:that in 41) [ClassicSimilarity], result of:
            0.014507029 = score(doc=41,freq=5.0), product of:
              0.043892894 = queryWeight, product of:
                1.194529 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.015537397 = queryNorm
              0.33050975 = fieldWeight in 41, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=41)
          0.025892625 = weight(abstract_txt:document in 41) [ClassicSimilarity], result of:
            0.025892625 = score(doc=41,freq=1.0), product of:
              0.09647591 = queryWeight, product of:
                1.4459838 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.015537397 = queryNorm
              0.26838437 = fieldWeight in 41, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.0625 = fieldNorm(doc=41)
          0.032240555 = weight(abstract_txt:performance in 41) [ClassicSimilarity], result of:
            0.032240555 = score(doc=41,freq=1.0), product of:
              0.11166142 = queryWeight, product of:
                1.5556272 = boost
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.015537397 = queryNorm
              0.28873494 = fieldWeight in 41, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.0625 = fieldNorm(doc=41)
          0.07028891 = weight(abstract_txt:query in 41) [ClassicSimilarity], result of:
            0.07028891 = score(doc=41,freq=4.0), product of:
              0.11826948 = queryWeight, product of:
                1.6009963 = boost
                4.754492 = idf(docFreq=1039, maxDocs=44421)
                0.015537397 = queryNorm
              0.5943115 = fieldWeight in 41, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.754492 = idf(docFreq=1039, maxDocs=44421)
                0.0625 = fieldNorm(doc=41)
          0.03434877 = weight(abstract_txt:retrieval in 41) [ClassicSimilarity], result of:
            0.03434877 = score(doc=41,freq=1.0), product of:
              0.15808438 = queryWeight, product of:
                2.9266343 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.015537397 = queryNorm
              0.21728125 = fieldWeight in 41, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.0625 = fieldNorm(doc=41)
          0.34916165 = weight(abstract_txt:turkish in 41) [ClassicSimilarity], result of:
            0.34916165 = score(doc=41,freq=1.0), product of:
              0.62567616 = queryWeight, product of:
                4.509978 = boost
                8.928879 = idf(docFreq=15, maxDocs=44421)
                0.015537397 = queryNorm
              0.5580549 = fieldWeight in 41, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.928879 = idf(docFreq=15, maxDocs=44421)
                0.0625 = fieldNorm(doc=41)
        0.28 = coord(7/25)
    
  5. Savoy, J.: Searching strategies for the Hungarian language (2008) 0.14
    0.1424669 = sum of:
      0.1424669 = product of:
        0.50881034 = sum of:
          0.016219351 = weight(abstract_txt:that in 3037) [ClassicSimilarity], result of:
            0.016219351 = score(doc=3037,freq=4.0), product of:
              0.043892894 = queryWeight, product of:
                1.194529 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.015537397 = queryNorm
              0.3695211 = fieldWeight in 3037, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.078125 = fieldNorm(doc=3037)
          0.14654589 = weight(abstract_txt:stemming in 3037) [ClassicSimilarity], result of:
            0.14654589 = score(doc=3037,freq=3.0), product of:
              0.14530933 = queryWeight, product of:
                1.2548324 = boost
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.015537397 = queryNorm
              1.0085099 = fieldWeight in 3037, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.078125 = fieldNorm(doc=3037)
          0.037070498 = weight(abstract_txt:approach in 3037) [ClassicSimilarity], result of:
            0.037070498 = score(doc=3037,freq=3.0), product of:
              0.07322735 = queryWeight, product of:
                1.2597682 = boost
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.015537397 = queryNorm
              0.5062384 = fieldWeight in 3037, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.741144 = idf(docFreq=2864, maxDocs=44421)
                0.078125 = fieldNorm(doc=3037)
          0.16000415 = weight(abstract_txt:stemmer in 3037) [ClassicSimilarity], result of:
            0.16000415 = score(doc=3037,freq=1.0), product of:
              0.22221446 = queryWeight, product of:
                1.5517621 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.015537397 = queryNorm
              0.72004384 = fieldWeight in 3037, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.078125 = fieldNorm(doc=3037)
          0.04030069 = weight(abstract_txt:performance in 3037) [ClassicSimilarity], result of:
            0.04030069 = score(doc=3037,freq=1.0), product of:
              0.11166142 = queryWeight, product of:
                1.5556272 = boost
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.015537397 = queryNorm
              0.36091867 = fieldWeight in 3037, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.619759 = idf(docFreq=1189, maxDocs=44421)
                0.078125 = fieldNorm(doc=3037)
          0.0657338 = weight(abstract_txt:word in 3037) [ClassicSimilarity], result of:
            0.0657338 = score(doc=3037,freq=1.0), product of:
              0.15472268 = queryWeight, product of:
                1.8311797 = boost
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.015537397 = queryNorm
              0.42484915 = fieldWeight in 3037, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4380693 = idf(docFreq=524, maxDocs=44421)
                0.078125 = fieldNorm(doc=3037)
          0.042935964 = weight(abstract_txt:retrieval in 3037) [ClassicSimilarity], result of:
            0.042935964 = score(doc=3037,freq=1.0), product of:
              0.15808438 = queryWeight, product of:
                2.9266343 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.015537397 = queryNorm
              0.27160156 = fieldWeight in 3037, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.078125 = fieldNorm(doc=3037)
        0.28 = coord(7/25)