Document (#14866)

Author
Ekmekcioglu, F.C.
Lynch, M.F.
Willet, P.
Title
Development and evaluation of conflation techniques for the implementation of a document retrieval system for Turkish text databases
Source
New review of document and text management. 1995, no.1, S.131-146
Year
1995
Abstract
Considers language processing techniques necessary for the implementation of a document retrieval system for Turkish text databases. Introduces the main characteristics of the Turkish language. Discusses the development of a stopword list and the evaluation of a stemming algorithm that takes account of the language's morphological structure. A 2 level description of Turkish morphology developed in Bilkent University, Ankara, is incorporated into a morphological parser, PC-KIMMO, to carry out stemming in Turkish databases. Describes the evaluation of string similarity measures - n-gram matching techniques - for Turkish. Reports experiments on 6 different Turkish text corpora
Theme
Computerlinguistik

Similar documents (author)

  1. Lynch, C.A.: ¬The use of heuristics in user interfaces for online information retrieval systems (1987) 4.98
    4.9773736 = sum of:
      4.9773736 = weight(author_txt:lynch in 2235) [ClassicSimilarity], result of:
        4.9773736 = fieldWeight in 2235, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.963798 = idf(docFreq=41, maxDocs=44421)
          0.625 = fieldNorm(doc=2235)
    
  2. Lynch, C.A.: ¬The MELVYL system : looking back, looking forward (1992) 4.98
    4.9773736 = sum of:
      4.9773736 = weight(author_txt:lynch in 2251) [ClassicSimilarity], result of:
        4.9773736 = fieldWeight in 2251, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.963798 = idf(docFreq=41, maxDocs=44421)
          0.625 = fieldNorm(doc=2251)
    
  3. Lynch, M.J.: Access technology in academic libraries (1992) 4.98
    4.9773736 = sum of:
      4.9773736 = weight(author_txt:lynch in 2343) [ClassicSimilarity], result of:
        4.9773736 = fieldWeight in 2343, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.963798 = idf(docFreq=41, maxDocs=44421)
          0.625 = fieldNorm(doc=2343)
    
  4. Lynch, C.A.: Subject access in MELVYL : reducing search results to manageable size (1990) 4.98
    4.9773736 = sum of:
      4.9773736 = weight(author_txt:lynch in 2680) [ClassicSimilarity], result of:
        4.9773736 = fieldWeight in 2680, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.963798 = idf(docFreq=41, maxDocs=44421)
          0.625 = fieldNorm(doc=2680)
    
  5. Lynch, C.A.: ¬The next generation of public access information retrieval systems for research libraries : lessons from ten years of the MELVYL system (1992) 4.98
    4.9773736 = sum of:
      4.9773736 = weight(author_txt:lynch in 2970) [ClassicSimilarity], result of:
        4.9773736 = fieldWeight in 2970, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.963798 = idf(docFreq=41, maxDocs=44421)
          0.625 = fieldNorm(doc=2970)
    

Similar documents (content)

  1. Can, F.; Kocberber, S.; Balcik, E.; Kaynak, C.; Ocalan, H.C.: Information retrieval on Turkish texts (2008) 0.39
    0.3932776 = sum of:
      0.3932776 = product of:
        1.4045628 = sum of:
          0.03240191 = weight(abstract_txt:matching in 2373) [ClassicSimilarity], result of:
            0.03240191 = score(doc=2373,freq=1.0), product of:
              0.057203107 = queryWeight, product of:
                1.0369772 = boost
                6.0419855 = idf(docFreq=286, maxDocs=44421)
                0.009129999 = queryNorm
              0.5664362 = fieldWeight in 2373, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0419855 = idf(docFreq=286, maxDocs=44421)
                0.09375 = fieldNorm(doc=2373)
          0.024689838 = weight(abstract_txt:retrieval in 2373) [ClassicSimilarity], result of:
            0.024689838 = score(doc=2373,freq=4.0), product of:
              0.037876926 = queryWeight, product of:
                1.1933333 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.009129999 = queryNorm
              0.6518438 = fieldWeight in 2373, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.09375 = fieldNorm(doc=2373)
          0.021319559 = weight(abstract_txt:language in 2373) [ClassicSimilarity], result of:
            0.021319559 = score(doc=2373,freq=1.0), product of:
              0.054521535 = queryWeight, product of:
                1.4317211 = boost
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.009129999 = queryNorm
              0.39103007 = fieldWeight in 2373, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.09375 = fieldNorm(doc=2373)
          0.032900922 = weight(abstract_txt:document in 2373) [ClassicSimilarity], result of:
            0.032900922 = score(doc=2373,freq=2.0), product of:
              0.057788927 = queryWeight, product of:
                1.4739974 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.009129999 = queryNorm
              0.5693292 = fieldWeight in 2373, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.09375 = fieldNorm(doc=2373)
          0.13639164 = weight(abstract_txt:stopword in 2373) [ClassicSimilarity], result of:
            0.13639164 = score(doc=2373,freq=1.0), product of:
              0.1491298 = queryWeight, product of:
                1.674332 = boost
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.009129999 = queryNorm
              0.91458344 = fieldWeight in 2373, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.09375 = fieldNorm(doc=2373)
          0.121632636 = weight(abstract_txt:stemming in 2373) [ClassicSimilarity], result of:
            0.121632636 = score(doc=2373,freq=1.0), product of:
              0.17408018 = queryWeight, product of:
                2.5582857 = boost
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.009129999 = queryNorm
              0.69871616 = fieldWeight in 2373, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.09375 = fieldNorm(doc=2373)
          1.0352263 = weight(abstract_txt:turkish in 2373) [ClassicSimilarity], result of:
            1.0352263 = score(doc=2373,freq=2.0), product of:
              0.87448454 = queryWeight, product of:
                10.727153 = boost
                8.928879 = idf(docFreq=15, maxDocs=44421)
                0.009129999 = queryNorm
              1.1838132 = fieldWeight in 2373, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.928879 = idf(docFreq=15, maxDocs=44421)
                0.09375 = fieldNorm(doc=2373)
        0.28 = coord(7/25)
    
  2. Snajder, J.; Dalbelo Basic, B.D.; Tadic, M.: Automatic acquisition of inflectional lexica for morphological normalisation (2008) 0.18
    0.18017411 = sum of:
      0.18017411 = product of:
        0.643479 = sum of:
          0.010287432 = weight(abstract_txt:retrieval in 3910) [ClassicSimilarity], result of:
            0.010287432 = score(doc=3910,freq=1.0), product of:
              0.037876926 = queryWeight, product of:
                1.1933333 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.009129999 = queryNorm
              0.27160156 = fieldWeight in 3910, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.078125 = fieldNorm(doc=3910)
          0.042172235 = weight(abstract_txt:corpora in 3910) [ClassicSimilarity], result of:
            0.042172235 = score(doc=3910,freq=1.0), product of:
              0.07700362 = queryWeight, product of:
                1.2031367 = boost
                7.01012 = idf(docFreq=108, maxDocs=44421)
                0.009129999 = queryNorm
              0.5476656 = fieldWeight in 3910, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.01012 = idf(docFreq=108, maxDocs=44421)
                0.078125 = fieldNorm(doc=3910)
          0.025125343 = weight(abstract_txt:language in 3910) [ClassicSimilarity], result of:
            0.025125343 = score(doc=3910,freq=2.0), product of:
              0.054521535 = queryWeight, product of:
                1.4317211 = boost
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.009129999 = queryNorm
              0.46083337 = fieldWeight in 3910, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.078125 = fieldNorm(doc=3910)
          0.12885356 = weight(abstract_txt:morphology in 3910) [ClassicSimilarity], result of:
            0.12885356 = score(doc=3910,freq=2.0), product of:
              0.12869085 = queryWeight, product of:
                1.5553683 = boost
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.009129999 = queryNorm
              1.0012643 = fieldWeight in 3910, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.06241 = idf(docFreq=13, maxDocs=44421)
                0.078125 = fieldNorm(doc=3910)
          0.024232604 = weight(abstract_txt:text in 3910) [ClassicSimilarity], result of:
            0.024232604 = score(doc=3910,freq=1.0), product of:
              0.07675981 = queryWeight, product of:
                2.0805922 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.009129999 = queryNorm
              0.3156939 = fieldWeight in 3910, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.078125 = fieldNorm(doc=3910)
          0.10136053 = weight(abstract_txt:stemming in 3910) [ClassicSimilarity], result of:
            0.10136053 = score(doc=3910,freq=1.0), product of:
              0.17408018 = queryWeight, product of:
                2.5582857 = boost
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.009129999 = queryNorm
              0.58226347 = fieldWeight in 3910, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.078125 = fieldNorm(doc=3910)
          0.31144732 = weight(abstract_txt:morphological in 3910) [ClassicSimilarity], result of:
            0.31144732 = score(doc=3910,freq=6.0), product of:
              0.20247716 = queryWeight, product of:
                2.759068 = boost
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.009129999 = queryNorm
              1.538185 = fieldWeight in 3910, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.078125 = fieldNorm(doc=3910)
        0.28 = coord(7/25)
    
  3. Can, F.; Kocberber, S.; Baglioglu, O.; Kardas, S.; Ocalan, H.C.; Uyar, E.: New event detection and topic tracking in Turkish (2010) 0.18
    0.17810242 = sum of:
      0.17810242 = product of:
        1.1131401 = sum of:
          0.008229946 = weight(abstract_txt:retrieval in 429) [ClassicSimilarity], result of:
            0.008229946 = score(doc=429,freq=1.0), product of:
              0.037876926 = queryWeight, product of:
                1.1933333 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.009129999 = queryNorm
              0.21728125 = fieldWeight in 429, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.0625 = fieldNorm(doc=429)
          0.0142130405 = weight(abstract_txt:language in 429) [ClassicSimilarity], result of:
            0.0142130405 = score(doc=429,freq=1.0), product of:
              0.054521535 = queryWeight, product of:
                1.4317211 = boost
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.009129999 = queryNorm
              0.26068673 = fieldWeight in 429, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.0625 = fieldNorm(doc=429)
          0.11467634 = weight(abstract_txt:stemming in 429) [ClassicSimilarity], result of:
            0.11467634 = score(doc=429,freq=2.0), product of:
              0.17408018 = queryWeight, product of:
                2.5582857 = boost
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.009129999 = queryNorm
              0.6587559 = fieldWeight in 429, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.0625 = fieldNorm(doc=429)
          0.9760208 = weight(abstract_txt:turkish in 429) [ClassicSimilarity], result of:
            0.9760208 = score(doc=429,freq=4.0), product of:
              0.87448454 = queryWeight, product of:
                10.727153 = boost
                8.928879 = idf(docFreq=15, maxDocs=44421)
                0.009129999 = queryNorm
              1.1161098 = fieldWeight in 429, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                8.928879 = idf(docFreq=15, maxDocs=44421)
                0.0625 = fieldNorm(doc=429)
        0.16 = coord(4/25)
    
  4. Kettunen, K.; Kunttu, T.; Järvelin, K.: To stem or lemmatize a highly inflectional language in a probabilistic IR environment? (2005) 0.11
    0.114019975 = sum of:
      0.114019975 = product of:
        0.4072142 = sum of:
          0.00929941 = weight(abstract_txt:system in 5395) [ClassicSimilarity], result of:
            0.00929941 = score(doc=5395,freq=2.0), product of:
              0.035650447 = queryWeight, product of:
                1.1577289 = boost
                3.372775 = idf(docFreq=4140, maxDocs=44421)
                0.009129999 = queryNorm
              0.26084974 = fieldWeight in 5395, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.372775 = idf(docFreq=4140, maxDocs=44421)
                0.0546875 = fieldNorm(doc=5395)
          0.0072012027 = weight(abstract_txt:retrieval in 5395) [ClassicSimilarity], result of:
            0.0072012027 = score(doc=5395,freq=1.0), product of:
              0.037876926 = queryWeight, product of:
                1.1933333 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.009129999 = queryNorm
              0.1901211 = fieldWeight in 5395, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.0546875 = fieldNorm(doc=5395)
          0.02487282 = weight(abstract_txt:language in 5395) [ClassicSimilarity], result of:
            0.02487282 = score(doc=5395,freq=4.0), product of:
              0.054521535 = queryWeight, product of:
                1.4317211 = boost
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.009129999 = queryNorm
              0.45620176 = fieldWeight in 5395, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.0546875 = fieldNorm(doc=5395)
          0.024376007 = weight(abstract_txt:implementation in 5395) [ClassicSimilarity], result of:
            0.024376007 = score(doc=5395,freq=1.0), product of:
              0.08539119 = queryWeight, product of:
                1.7917645 = boost
                5.2198906 = idf(docFreq=652, maxDocs=44421)
                0.009129999 = queryNorm
              0.28546277 = fieldWeight in 5395, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2198906 = idf(docFreq=652, maxDocs=44421)
                0.0546875 = fieldNorm(doc=5395)
          0.023109822 = weight(abstract_txt:evaluation in 5395) [ClassicSimilarity], result of:
            0.023109822 = score(doc=5395,freq=1.0), product of:
              0.09433355 = queryWeight, product of:
                2.3064983 = boost
                4.479632 = idf(docFreq=1368, maxDocs=44421)
                0.009129999 = queryNorm
              0.24497987 = fieldWeight in 5395, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.479632 = idf(docFreq=1368, maxDocs=44421)
                0.0546875 = fieldNorm(doc=5395)
          0.100341804 = weight(abstract_txt:stemming in 5395) [ClassicSimilarity], result of:
            0.100341804 = score(doc=5395,freq=2.0), product of:
              0.17408018 = queryWeight, product of:
                2.5582857 = boost
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.009129999 = queryNorm
              0.5764114 = fieldWeight in 5395, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.4529724 = idf(docFreq=69, maxDocs=44421)
                0.0546875 = fieldNorm(doc=5395)
          0.21801314 = weight(abstract_txt:morphological in 5395) [ClassicSimilarity], result of:
            0.21801314 = score(doc=5395,freq=6.0), product of:
              0.20247716 = queryWeight, product of:
                2.759068 = boost
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.009129999 = queryNorm
              1.0767295 = fieldWeight in 5395, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.0546875 = fieldNorm(doc=5395)
        0.28 = coord(7/25)
    
  5. Mustafa, S.H.; AI-Radaideh, Q.A.: Using n-grams for Arabic text searching (2004) 0.10
    0.102325805 = sum of:
      0.102325805 = product of:
        0.42635754 = sum of:
          0.027001588 = weight(abstract_txt:matching in 3888) [ClassicSimilarity], result of:
            0.027001588 = score(doc=3888,freq=1.0), product of:
              0.057203107 = queryWeight, product of:
                1.0369772 = boost
                6.0419855 = idf(docFreq=286, maxDocs=44421)
                0.009129999 = queryNorm
              0.4720301 = fieldWeight in 3888, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0419855 = idf(docFreq=286, maxDocs=44421)
                0.078125 = fieldNorm(doc=3888)
          0.014548627 = weight(abstract_txt:retrieval in 3888) [ClassicSimilarity], result of:
            0.014548627 = score(doc=3888,freq=2.0), product of:
              0.037876926 = queryWeight, product of:
                1.1933333 = boost
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.009129999 = queryNorm
              0.3841026 = fieldWeight in 3888, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4765 = idf(docFreq=3732, maxDocs=44421)
                0.078125 = fieldNorm(doc=3888)
          0.1061491 = weight(abstract_txt:gram in 3888) [ClassicSimilarity], result of:
            0.1061491 = score(doc=3888,freq=3.0), product of:
              0.09879399 = queryWeight, product of:
                1.3627765 = boost
                7.9402676 = idf(docFreq=42, maxDocs=44421)
                0.009129999 = queryNorm
              1.074449 = fieldWeight in 3888, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.9402676 = idf(docFreq=42, maxDocs=44421)
                0.078125 = fieldNorm(doc=3888)
          0.21020007 = weight(abstract_txt:conflation in 3888) [ClassicSimilarity], result of:
            0.21020007 = score(doc=3888,freq=4.0), product of:
              0.14154525 = queryWeight, product of:
                1.6311994 = boost
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.009129999 = queryNorm
              1.4850379 = fieldWeight in 3888, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.078125 = fieldNorm(doc=3888)
          0.034270078 = weight(abstract_txt:text in 3888) [ClassicSimilarity], result of:
            0.034270078 = score(doc=3888,freq=2.0), product of:
              0.07675981 = queryWeight, product of:
                2.0805922 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.009129999 = queryNorm
              0.4464586 = fieldWeight in 3888, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.078125 = fieldNorm(doc=3888)
          0.034188103 = weight(abstract_txt:techniques in 3888) [ClassicSimilarity], result of:
            0.034188103 = score(doc=3888,freq=1.0), product of:
              0.09655701 = queryWeight, product of:
                2.333522 = boost
                4.5321174 = idf(docFreq=1298, maxDocs=44421)
                0.009129999 = queryNorm
              0.35407168 = fieldWeight in 3888, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5321174 = idf(docFreq=1298, maxDocs=44421)
                0.078125 = fieldNorm(doc=3888)
        0.24 = coord(6/25)