Document (#33346)

Author
Kuo, J.-S.
Li, H.
Yang, Y.-K.
Title
Active learning for constructing transliteration lexicons from the Web
Source
Journal of the American Society for Information Science and Technology. 59(2008) no.1, S.126-135
Year
2008
Abstract
This article presents an adaptive learning framework for Phonetic Similarity Modeling (PSM) that supports the automatic construction of transliteration lexicons. The learning algorithm starts with minimum prior knowledge about machine transliteration and acquires knowledge iteratively from the Web. We study the unsupervised learning and the active learning strategies that minimize human supervision in terms of data labeling. The learning process refines the PSM and constructs a transliteration lexicon at the same time. We evaluate the proposed PSM and its learning algorithm through a series of systematic experiments, which show that the proposed framework is reliably effective on two independent databases.
Theme
Computerlinguistik

Similar documents (author)

  1. Yang, S.C.: ¬An interpretive and situated approach to an evaluation of Perseus digital libraries (2001) 4.48
    4.4805427 = sum of:
      4.4805427 = weight(author_txt:yang in 933) [ClassicSimilarity], result of:
        4.4805427 = fieldWeight in 933, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.168868 = idf(docFreq=92, maxDocs=44421)
          0.625 = fieldNorm(doc=933)
    
  2. Yang, K.: Information retrieval on the Web (2004) 4.48
    4.4805427 = sum of:
      4.4805427 = weight(author_txt:yang in 5278) [ClassicSimilarity], result of:
        4.4805427 = fieldWeight in 5278, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.168868 = idf(docFreq=92, maxDocs=44421)
          0.625 = fieldNorm(doc=5278)
    
  3. Yang, C.C.: Content-based image retrievaI : a comparison between query by example and image browsing map approaches (2005) 4.48
    4.4805427 = sum of:
      4.4805427 = weight(author_txt:yang in 5649) [ClassicSimilarity], result of:
        4.4805427 = fieldWeight in 5649, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.168868 = idf(docFreq=92, maxDocs=44421)
          0.625 = fieldNorm(doc=5649)
    
  4. Salton, G.; Yang, C.S.: On the specification of term values in automatic indexing (1973) 3.58
    3.584434 = sum of:
      3.584434 = weight(author_txt:yang in 5475) [ClassicSimilarity], result of:
        3.584434 = fieldWeight in 5475, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.168868 = idf(docFreq=92, maxDocs=44421)
          0.5 = fieldNorm(doc=5475)
    
  5. Yang, Y.; Chute, C.G.A.: ¬A schematic analysis of the Unified Medical Language System (1992) 3.58
    3.584434 = sum of:
      3.584434 = weight(author_txt:yang in 6444) [ClassicSimilarity], result of:
        3.584434 = fieldWeight in 6444, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.168868 = idf(docFreq=92, maxDocs=44421)
          0.5 = fieldNorm(doc=6444)
    

Similar documents (content)

  1. Malo, P.; Sinha, A.; Korhonen, P.; Wallenius, J.; Takala, P.: Good debt or bad debt : detecting semantic orientations in economic texts (2014) 0.16
    0.15616837 = sum of:
      0.15616837 = product of:
        0.5577442 = sum of:
          0.013298883 = weight(abstract_txt:that in 2226) [ClassicSimilarity], result of:
            0.013298883 = score(doc=2226,freq=6.0), product of:
              0.036731634 = queryWeight, product of:
                1.135091 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.013683285 = queryNorm
              0.3620553 = fieldWeight in 2226, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=2226)
          0.06322622 = weight(abstract_txt:lexicon in 2226) [ClassicSimilarity], result of:
            0.06322622 = score(doc=2226,freq=1.0), product of:
              0.13084938 = queryWeight, product of:
                1.2369032 = boost
                7.731176 = idf(docFreq=52, maxDocs=44421)
                0.013683285 = queryNorm
              0.4831985 = fieldWeight in 2226, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.731176 = idf(docFreq=52, maxDocs=44421)
                0.0625 = fieldNorm(doc=2226)
          0.025683375 = weight(abstract_txt:framework in 2226) [ClassicSimilarity], result of:
            0.025683375 = score(doc=2226,freq=1.0), product of:
              0.09042425 = queryWeight, product of:
                1.4541438 = boost
                4.5445113 = idf(docFreq=1282, maxDocs=44421)
                0.013683285 = queryNorm
              0.28403196 = fieldWeight in 2226, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5445113 = idf(docFreq=1282, maxDocs=44421)
                0.0625 = fieldNorm(doc=2226)
          0.037866995 = weight(abstract_txt:proposed in 2226) [ClassicSimilarity], result of:
            0.037866995 = score(doc=2226,freq=2.0), product of:
              0.092970975 = queryWeight, product of:
                1.474479 = boost
                4.608063 = idf(docFreq=1203, maxDocs=44421)
                0.013683285 = queryNorm
              0.4072991 = fieldWeight in 2226, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.608063 = idf(docFreq=1203, maxDocs=44421)
                0.0625 = fieldNorm(doc=2226)
          0.050811384 = weight(abstract_txt:algorithm in 2226) [ClassicSimilarity], result of:
            0.050811384 = score(doc=2226,freq=1.0), product of:
              0.14250305 = queryWeight, product of:
                1.8254796 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.013683285 = queryNorm
              0.35656348 = fieldWeight in 2226, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.0625 = fieldNorm(doc=2226)
          0.26472524 = weight(abstract_txt:lexicons in 2226) [ClassicSimilarity], result of:
            0.26472524 = score(doc=2226,freq=2.0), product of:
              0.33991507 = queryWeight, product of:
                2.8193572 = boost
                8.811096 = idf(docFreq=17, maxDocs=44421)
                0.013683285 = queryNorm
              0.7787982 = fieldWeight in 2226, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.811096 = idf(docFreq=17, maxDocs=44421)
                0.0625 = fieldNorm(doc=2226)
          0.102132104 = weight(abstract_txt:learning in 2226) [ClassicSimilarity], result of:
            0.102132104 = score(doc=2226,freq=1.0), product of:
              0.34459928 = queryWeight, product of:
                5.310753 = boost
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.013683285 = queryNorm
              0.29637933 = fieldWeight in 2226, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.0625 = fieldNorm(doc=2226)
        0.28 = coord(7/25)
    
  2. Li, M.; Li, H.; Zhou, Z.-H.: Semi-supervised document retrieval (2009) 0.15
    0.1481072 = sum of:
      0.1481072 = product of:
        0.52895427 = sum of:
          0.04732043 = weight(abstract_txt:constructing in 218) [ClassicSimilarity], result of:
            0.04732043 = score(doc=218,freq=1.0), product of:
              0.10786303 = queryWeight, product of:
                1.1230167 = boost
                7.019336 = idf(docFreq=107, maxDocs=44421)
                0.013683285 = queryNorm
              0.4387085 = fieldWeight in 218, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.019336 = idf(docFreq=107, maxDocs=44421)
                0.0625 = fieldNorm(doc=218)
          0.005429246 = weight(abstract_txt:that in 218) [ClassicSimilarity], result of:
            0.005429246 = score(doc=218,freq=1.0), product of:
              0.036731634 = queryWeight, product of:
                1.135091 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.013683285 = queryNorm
              0.14780845 = fieldWeight in 218, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=218)
          0.06063135 = weight(abstract_txt:unsupervised in 218) [ClassicSimilarity], result of:
            0.06063135 = score(doc=218,freq=1.0), product of:
              0.1272443 = queryWeight, product of:
                1.219745 = boost
                7.62393 = idf(docFreq=58, maxDocs=44421)
                0.013683285 = queryNorm
              0.47649562 = fieldWeight in 218, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.62393 = idf(docFreq=58, maxDocs=44421)
                0.0625 = fieldNorm(doc=218)
          0.092897736 = weight(abstract_txt:labeling in 218) [ClassicSimilarity], result of:
            0.092897736 = score(doc=218,freq=2.0), product of:
              0.13422506 = queryWeight, product of:
                1.2527566 = boost
                7.8302665 = idf(docFreq=47, maxDocs=44421)
                0.013683285 = queryNorm
              0.6921043 = fieldWeight in 218, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.8302665 = idf(docFreq=47, maxDocs=44421)
                0.0625 = fieldNorm(doc=218)
          0.025683375 = weight(abstract_txt:framework in 218) [ClassicSimilarity], result of:
            0.025683375 = score(doc=218,freq=1.0), product of:
              0.09042425 = queryWeight, product of:
                1.4541438 = boost
                4.5445113 = idf(docFreq=1282, maxDocs=44421)
                0.013683285 = queryNorm
              0.28403196 = fieldWeight in 218, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5445113 = idf(docFreq=1282, maxDocs=44421)
                0.0625 = fieldNorm(doc=218)
          0.026776008 = weight(abstract_txt:proposed in 218) [ClassicSimilarity], result of:
            0.026776008 = score(doc=218,freq=1.0), product of:
              0.092970975 = queryWeight, product of:
                1.474479 = boost
                4.608063 = idf(docFreq=1203, maxDocs=44421)
                0.013683285 = queryNorm
              0.28800395 = fieldWeight in 218, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.608063 = idf(docFreq=1203, maxDocs=44421)
                0.0625 = fieldNorm(doc=218)
          0.27021614 = weight(abstract_txt:learning in 218) [ClassicSimilarity], result of:
            0.27021614 = score(doc=218,freq=7.0), product of:
              0.34459928 = queryWeight, product of:
                5.310753 = boost
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.013683285 = queryNorm
              0.78414595 = fieldWeight in 218, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.0625 = fieldNorm(doc=218)
        0.28 = coord(7/25)
    
  3. Xing, F.Z.; Pallucchini, F.; Cambria, E.: Cognitive-inspired domain adaptation of sentiment lexicons (2019) 0.14
    0.14329736 = sum of:
      0.14329736 = product of:
        0.7164868 = sum of:
          0.010858492 = weight(abstract_txt:that in 104) [ClassicSimilarity], result of:
            0.010858492 = score(doc=104,freq=4.0), product of:
              0.036731634 = queryWeight, product of:
                1.135091 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.013683285 = queryNorm
              0.2956169 = fieldWeight in 104, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=104)
          0.08941538 = weight(abstract_txt:lexicon in 104) [ClassicSimilarity], result of:
            0.08941538 = score(doc=104,freq=2.0), product of:
              0.13084938 = queryWeight, product of:
                1.2369032 = boost
                7.731176 = idf(docFreq=52, maxDocs=44421)
                0.013683285 = queryNorm
              0.68334585 = fieldWeight in 104, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.731176 = idf(docFreq=52, maxDocs=44421)
                0.0625 = fieldNorm(doc=104)
          0.09739828 = weight(abstract_txt:supervision in 104) [ClassicSimilarity], result of:
            0.09739828 = score(doc=104,freq=1.0), product of:
              0.17453171 = queryWeight, product of:
                1.4285225 = boost
                8.928879 = idf(docFreq=15, maxDocs=44421)
                0.013683285 = queryNorm
              0.5580549 = fieldWeight in 104, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.928879 = idf(docFreq=15, maxDocs=44421)
                0.0625 = fieldNorm(doc=104)
          0.37437806 = weight(abstract_txt:lexicons in 104) [ClassicSimilarity], result of:
            0.37437806 = score(doc=104,freq=4.0), product of:
              0.33991507 = queryWeight, product of:
                2.8193572 = boost
                8.811096 = idf(docFreq=17, maxDocs=44421)
                0.013683285 = queryNorm
              1.101387 = fieldWeight in 104, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                8.811096 = idf(docFreq=17, maxDocs=44421)
                0.0625 = fieldNorm(doc=104)
          0.1444366 = weight(abstract_txt:learning in 104) [ClassicSimilarity], result of:
            0.1444366 = score(doc=104,freq=2.0), product of:
              0.34459928 = queryWeight, product of:
                5.310753 = boost
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.013683285 = queryNorm
              0.41914365 = fieldWeight in 104, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.0625 = fieldNorm(doc=104)
        0.2 = coord(5/25)
    
  4. Silva, R.M.; Gonçalves, M.A.; Veloso, A.: ¬A Two-stage active learning method for learning to rank (2014) 0.14
    0.13628705 = sum of:
      0.13628705 = product of:
        0.5678627 = sum of:
          0.010858492 = weight(abstract_txt:that in 2184) [ClassicSimilarity], result of:
            0.010858492 = score(doc=2184,freq=4.0), product of:
              0.036731634 = queryWeight, product of:
                1.135091 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.013683285 = queryNorm
              0.2956169 = fieldWeight in 2184, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=2184)
          0.092897736 = weight(abstract_txt:labeling in 2184) [ClassicSimilarity], result of:
            0.092897736 = score(doc=2184,freq=2.0), product of:
              0.13422506 = queryWeight, product of:
                1.2527566 = boost
                7.8302665 = idf(docFreq=47, maxDocs=44421)
                0.013683285 = queryNorm
              0.6921043 = fieldWeight in 2184, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.8302665 = idf(docFreq=47, maxDocs=44421)
                0.0625 = fieldNorm(doc=2184)
          0.08876761 = weight(abstract_txt:iteratively in 2184) [ClassicSimilarity], result of:
            0.08876761 = score(doc=2184,freq=1.0), product of:
              0.1640627 = queryWeight, product of:
                1.3850161 = boost
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.013683285 = queryNorm
              0.5410591 = fieldWeight in 2184, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.0625 = fieldNorm(doc=2184)
          0.050811384 = weight(abstract_txt:algorithm in 2184) [ClassicSimilarity], result of:
            0.050811384 = score(doc=2184,freq=1.0), product of:
              0.14250305 = queryWeight, product of:
                1.8254796 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.013683285 = queryNorm
              0.35656348 = fieldWeight in 2184, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.0625 = fieldNorm(doc=2184)
          0.12026327 = weight(abstract_txt:active in 2184) [ClassicSimilarity], result of:
            0.12026327 = score(doc=2184,freq=3.0), product of:
              0.17548166 = queryWeight, product of:
                2.0257263 = boost
                6.3308296 = idf(docFreq=214, maxDocs=44421)
                0.013683285 = queryNorm
              0.6853324 = fieldWeight in 2184, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.3308296 = idf(docFreq=214, maxDocs=44421)
                0.0625 = fieldNorm(doc=2184)
          0.20426421 = weight(abstract_txt:learning in 2184) [ClassicSimilarity], result of:
            0.20426421 = score(doc=2184,freq=4.0), product of:
              0.34459928 = queryWeight, product of:
                5.310753 = boost
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.013683285 = queryNorm
              0.59275866 = fieldWeight in 2184, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.7420692 = idf(docFreq=1052, maxDocs=44421)
                0.0625 = fieldNorm(doc=2184)
        0.24 = coord(6/25)
    
  5. Engerer, V.: Control and syntagmatization : vocabulary requirements in information retrieval thesauri and natural language lexicons (2017) 0.13
    0.12794505 = sum of:
      0.12794505 = product of:
        0.53310436 = sum of:
          0.015256716 = weight(abstract_txt:knowledge in 4678) [ClassicSimilarity], result of:
            0.015256716 = score(doc=4678,freq=1.0), product of:
              0.055066083 = queryWeight, product of:
                1.1347678 = boost
                3.5463927 = idf(docFreq=3480, maxDocs=44421)
                0.013683285 = queryNorm
              0.27706194 = fieldWeight in 4678, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5463927 = idf(docFreq=3480, maxDocs=44421)
                0.078125 = fieldNorm(doc=4678)
          0.009597642 = weight(abstract_txt:that in 4678) [ClassicSimilarity], result of:
            0.009597642 = score(doc=4678,freq=2.0), product of:
              0.036731634 = queryWeight, product of:
                1.135091 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.013683285 = queryNorm
              0.2612909 = fieldWeight in 4678, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.078125 = fieldNorm(doc=4678)
          0.11176922 = weight(abstract_txt:lexicon in 4678) [ClassicSimilarity], result of:
            0.11176922 = score(doc=4678,freq=2.0), product of:
              0.13084938 = queryWeight, product of:
                1.2369032 = boost
                7.731176 = idf(docFreq=52, maxDocs=44421)
                0.013683285 = queryNorm
              0.8541823 = fieldWeight in 4678, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.731176 = idf(docFreq=52, maxDocs=44421)
                0.078125 = fieldNorm(doc=4678)
          0.03210422 = weight(abstract_txt:framework in 4678) [ClassicSimilarity], result of:
            0.03210422 = score(doc=4678,freq=1.0), product of:
              0.09042425 = queryWeight, product of:
                1.4541438 = boost
                4.5445113 = idf(docFreq=1282, maxDocs=44421)
                0.013683285 = queryNorm
              0.35503995 = fieldWeight in 4678, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5445113 = idf(docFreq=1282, maxDocs=44421)
                0.078125 = fieldNorm(doc=4678)
          0.03347001 = weight(abstract_txt:proposed in 4678) [ClassicSimilarity], result of:
            0.03347001 = score(doc=4678,freq=1.0), product of:
              0.092970975 = queryWeight, product of:
                1.474479 = boost
                4.608063 = idf(docFreq=1203, maxDocs=44421)
                0.013683285 = queryNorm
              0.36000493 = fieldWeight in 4678, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.608063 = idf(docFreq=1203, maxDocs=44421)
                0.078125 = fieldNorm(doc=4678)
          0.33090654 = weight(abstract_txt:lexicons in 4678) [ClassicSimilarity], result of:
            0.33090654 = score(doc=4678,freq=2.0), product of:
              0.33991507 = queryWeight, product of:
                2.8193572 = boost
                8.811096 = idf(docFreq=17, maxDocs=44421)
                0.013683285 = queryNorm
              0.97349775 = fieldWeight in 4678, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.811096 = idf(docFreq=17, maxDocs=44421)
                0.078125 = fieldNorm(doc=4678)
        0.24 = coord(6/25)