Document (#41120)

Author
Ferreira, R.S.
Graça Pimentel, M. de
Cristo, M.
Title
¬A wikification prediction model based on the combination of latent, dyadic, and monadic features
Source
Journal of the Association for Information Science and Technology. 69(2018) no.3, S.380-394
Year
2018
Abstract
Considering repositories of web documents that are semantically linked and created in a collaborative fashion, as in the case of Wikipedia, a key problem faced by content providers is the placement of links in the articles. These links must support user navigation and provide a deeper semantic interpretation of the content. Current wikification methods exploit machine learning techniques to capture characteristics of the concepts and its associations. In previous work, we proposed a preliminary prediction model combining traditional predictors with a latent component which captures the concept graph topology by means of matrix factorization. In this work, we provide a detailed description of our method and a deeper comparison with a state-of-the-art wikification method using a sample of Wikipedia and report a gain up to 13% in F1 score. We also provide a comprehensive analysis of the model performance showing the importance of the latent predictor component and the attributes derived from the associations between the concepts. Moreover, we include an analysis that allows us to conclude that the model is resilient to ambiguity without including a disambiguation phase. We finally report the positive impact of selecting training samples from specific content quality classes.
Content
Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23922/full.
Theme
Hypertext
Semantisches Umfeld in Indexierung u. Retrieval

Similar documents (author)

  1. Ferreira Novellino, M.S. -> Salet Ferreira Novellino, M.: 0.84
    0.8400817 = sum of:
      0.8400817 = product of:
        2.520245 = sum of:
          2.520245 = weight(author_txt:ferreira in 5535) [ClassicSimilarity], result of:
            2.520245 = score(doc=5535,freq=2.0), product of:
              0.52868015 = queryWeight, product of:
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.05881519 = queryNorm
              4.7670507 = fieldWeight in 5535, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.375 = fieldNorm(doc=5535)
        0.33333334 = coord(1/3)
    
  2. Ferreira, T. Duarte => Duarte Ferreira, T.: 0.84
    0.8400817 = sum of:
      0.8400817 = product of:
        2.520245 = sum of:
          2.520245 = weight(author_txt:ferreira in 5038) [ClassicSimilarity], result of:
            2.520245 = score(doc=5038,freq=2.0), product of:
              0.52868015 = queryWeight, product of:
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.05881519 = queryNorm
              4.7670507 = fieldWeight in 5038, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.375 = fieldNorm(doc=5038)
        0.33333334 = coord(1/3)
    
  3. Pimentel, M. de Graça => Graça Pimentel, M. de: 0.80
    0.7991409 = sum of:
      0.7991409 = product of:
        2.3974226 = sum of:
          2.3974226 = weight(author_txt:graça in 102) [ClassicSimilarity], result of:
            2.3974226 = score(doc=102,freq=2.0), product of:
              0.5774509 = queryWeight, product of:
                1.0451076 = boost
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.05881519 = queryNorm
              4.1517344 = fieldWeight in 102, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.3125 = fieldNorm(doc=102)
        0.33333334 = coord(1/3)
    
  4. Graça Simões, M. da => Simões, M. da Graça: 0.80
    0.7991409 = sum of:
      0.7991409 = product of:
        2.3974226 = sum of:
          2.3974226 = weight(author_txt:graça in 4700) [ClassicSimilarity], result of:
            2.3974226 = score(doc=4700,freq=2.0), product of:
              0.5774509 = queryWeight, product of:
                1.0451076 = boost
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.05881519 = queryNorm
              4.1517344 = fieldWeight in 4700, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.3125 = fieldNorm(doc=4700)
        0.33333334 = coord(1/3)
    
  5. Salet Ferreira Novellino, M.: Information transfer considering the production and use contexts : information retrieval languages (1998) 0.79
    0.79203665 = sum of:
      0.79203665 = product of:
        2.3761098 = sum of:
          2.3761098 = weight(author_txt:ferreira in 147) [ClassicSimilarity], result of:
            2.3761098 = score(doc=147,freq=1.0), product of:
              0.52868015 = queryWeight, product of:
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.05881519 = queryNorm
              4.4944186 = fieldWeight in 147, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.5 = fieldNorm(doc=147)
        0.33333334 = coord(1/3)
    

Similar documents (content)

  1. Ding, C.H.Q.: ¬A probabilistic model for Latent Semantic Indexing (2005) 0.28
    0.2830519 = sum of:
      0.2830519 = product of:
        0.8845372 = sum of:
          0.021083564 = weight(abstract_txt:analysis in 3459) [ClassicSimilarity], result of:
            0.021083564 = score(doc=3459,freq=1.0), product of:
              0.07386514 = queryWeight, product of:
                1.0214574 = boost
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.019792687 = queryNorm
              0.2854332 = fieldWeight in 3459, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.078125 = fieldNorm(doc=3459)
          0.085485116 = weight(abstract_txt:disambiguation in 3459) [ClassicSimilarity], result of:
            0.085485116 = score(doc=3459,freq=1.0), product of:
              0.14907123 = queryWeight, product of:
                1.0260829 = boost
                7.3401785 = idf(docFreq=77, maxDocs=44218)
                0.019792687 = queryNorm
              0.57345146 = fieldWeight in 3459, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3401785 = idf(docFreq=77, maxDocs=44218)
                0.078125 = fieldNorm(doc=3459)
          0.05771459 = weight(abstract_txt:concepts in 3459) [ClassicSimilarity], result of:
            0.05771459 = score(doc=3459,freq=2.0), product of:
              0.114724785 = queryWeight, product of:
                1.2730021 = boost
                4.5532694 = idf(docFreq=1265, maxDocs=44218)
                0.019792687 = queryNorm
              0.50306994 = fieldWeight in 3459, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.5532694 = idf(docFreq=1265, maxDocs=44218)
                0.078125 = fieldNorm(doc=3459)
          0.04206816 = weight(abstract_txt:provide in 3459) [ClassicSimilarity], result of:
            0.04206816 = score(doc=3459,freq=1.0), product of:
              0.13401176 = queryWeight, product of:
                1.6850686 = boost
                4.0180984 = idf(docFreq=2161, maxDocs=44218)
                0.019792687 = queryNorm
              0.31391394 = fieldWeight in 3459, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0180984 = idf(docFreq=2161, maxDocs=44218)
                0.078125 = fieldNorm(doc=3459)
          0.12623632 = weight(abstract_txt:associations in 3459) [ClassicSimilarity], result of:
            0.12623632 = score(doc=3459,freq=1.0), product of:
              0.2435565 = queryWeight, product of:
                1.854814 = boost
                6.634292 = idf(docFreq=157, maxDocs=44218)
                0.019792687 = queryNorm
              0.51830405 = fieldWeight in 3459, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.634292 = idf(docFreq=157, maxDocs=44218)
                0.078125 = fieldNorm(doc=3459)
          0.16042322 = weight(abstract_txt:deeper in 3459) [ClassicSimilarity], result of:
            0.16042322 = score(doc=3459,freq=1.0), product of:
              0.28575137 = queryWeight, product of:
                2.0090683 = boost
                7.1860275 = idf(docFreq=90, maxDocs=44218)
                0.019792687 = queryNorm
              0.5614084 = fieldWeight in 3459, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1860275 = idf(docFreq=90, maxDocs=44218)
                0.078125 = fieldNorm(doc=3459)
          0.07745223 = weight(abstract_txt:model in 3459) [ClassicSimilarity], result of:
            0.07745223 = score(doc=3459,freq=2.0), product of:
              0.17585962 = queryWeight, product of:
                2.228941 = boost
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.019792687 = queryNorm
              0.44042078 = fieldWeight in 3459, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.078125 = fieldNorm(doc=3459)
          0.31407404 = weight(abstract_txt:latent in 3459) [ClassicSimilarity], result of:
            0.31407404 = score(doc=3459,freq=2.0), product of:
              0.4063048 = queryWeight, product of:
                2.9340813 = boost
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.019792687 = queryNorm
              0.7730011 = fieldWeight in 3459, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.078125 = fieldNorm(doc=3459)
        0.32 = coord(8/25)
    
  2. Sebastian, Y.: Literature-based discovery by learning heterogeneous bibliographic information networks (2017) 0.13
    0.12624037 = sum of:
      0.12624037 = product of:
        0.5260016 = sum of:
          0.059839576 = weight(abstract_txt:disambiguation in 535) [ClassicSimilarity], result of:
            0.059839576 = score(doc=535,freq=1.0), product of:
              0.14907123 = queryWeight, product of:
                1.0260829 = boost
                7.3401785 = idf(docFreq=77, maxDocs=44218)
                0.019792687 = queryNorm
              0.401416 = fieldWeight in 535, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3401785 = idf(docFreq=77, maxDocs=44218)
                0.0546875 = fieldNorm(doc=535)
          0.067590676 = weight(abstract_txt:method in 535) [ClassicSimilarity], result of:
            0.067590676 = score(doc=535,freq=6.0), product of:
              0.11210343 = queryWeight, product of:
                1.2583747 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.019792687 = queryNorm
              0.6029314 = fieldWeight in 535, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.0546875 = fieldNorm(doc=535)
          0.061524283 = weight(abstract_txt:links in 535) [ClassicSimilarity], result of:
            0.061524283 = score(doc=535,freq=2.0), product of:
              0.1518562 = queryWeight, product of:
                1.4645925 = boost
                5.2385488 = idf(docFreq=637, maxDocs=44218)
                0.019792687 = queryNorm
              0.40514833 = fieldWeight in 535, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.2385488 = idf(docFreq=637, maxDocs=44218)
                0.0546875 = fieldNorm(doc=535)
          0.02944771 = weight(abstract_txt:provide in 535) [ClassicSimilarity], result of:
            0.02944771 = score(doc=535,freq=1.0), product of:
              0.13401176 = queryWeight, product of:
                1.6850686 = boost
                4.0180984 = idf(docFreq=2161, maxDocs=44218)
                0.019792687 = queryNorm
              0.21973975 = fieldWeight in 535, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0180984 = idf(docFreq=2161, maxDocs=44218)
                0.0546875 = fieldNorm(doc=535)
          0.0383369 = weight(abstract_txt:model in 535) [ClassicSimilarity], result of:
            0.0383369 = score(doc=535,freq=1.0), product of:
              0.17585962 = queryWeight, product of:
                2.228941 = boost
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.019792687 = queryNorm
              0.21799716 = fieldWeight in 535, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.0546875 = fieldNorm(doc=535)
          0.2692624 = weight(abstract_txt:latent in 535) [ClassicSimilarity], result of:
            0.2692624 = score(doc=535,freq=3.0), product of:
              0.4063048 = queryWeight, product of:
                2.9340813 = boost
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.019792687 = queryNorm
              0.66271037 = fieldWeight in 535, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.0546875 = fieldNorm(doc=535)
        0.24 = coord(6/25)
    
  3. Zielinski, K.; Nielek, R.; Wierzbicki, A.; Jatowt, A.: Computing controversy : formal model and algorithms for detecting controversy on Wikipedia and in search queries (2018) 0.11
    0.11116246 = sum of:
      0.11116246 = product of:
        0.39700878 = sum of:
          0.01889382 = weight(abstract_txt:work in 5093) [ClassicSimilarity], result of:
            0.01889382 = score(doc=5093,freq=1.0), product of:
              0.07967034 = queryWeight, product of:
                1.0608375 = boost
                3.7943997 = idf(docFreq=2703, maxDocs=44218)
                0.019792687 = queryNorm
              0.23714998 = fieldWeight in 5093, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7943997 = idf(docFreq=2703, maxDocs=44218)
                0.0625 = fieldNorm(doc=5093)
          0.054621514 = weight(abstract_txt:method in 5093) [ClassicSimilarity], result of:
            0.054621514 = score(doc=5093,freq=3.0), product of:
              0.11210343 = queryWeight, product of:
                1.2583747 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.019792687 = queryNorm
              0.4872421 = fieldWeight in 5093, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.0625 = fieldNorm(doc=5093)
          0.046171673 = weight(abstract_txt:concepts in 5093) [ClassicSimilarity], result of:
            0.046171673 = score(doc=5093,freq=2.0), product of:
              0.114724785 = queryWeight, product of:
                1.2730021 = boost
                4.5532694 = idf(docFreq=1265, maxDocs=44218)
                0.019792687 = queryNorm
              0.40245596 = fieldWeight in 5093, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.5532694 = idf(docFreq=1265, maxDocs=44218)
                0.0625 = fieldNorm(doc=5093)
          0.07520225 = weight(abstract_txt:component in 5093) [ClassicSimilarity], result of:
            0.07520225 = score(doc=5093,freq=1.0), product of:
              0.20009553 = queryWeight, product of:
                1.6811993 = boost
                6.0133076 = idf(docFreq=293, maxDocs=44218)
                0.019792687 = queryNorm
              0.37583172 = fieldWeight in 5093, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0133076 = idf(docFreq=293, maxDocs=44218)
                0.0625 = fieldNorm(doc=5093)
          0.1204196 = weight(abstract_txt:wikipedia in 5093) [ClassicSimilarity], result of:
            0.1204196 = score(doc=5093,freq=2.0), product of:
              0.21737269 = queryWeight, product of:
                1.752278 = boost
                6.2675414 = idf(docFreq=227, maxDocs=44218)
                0.019792687 = queryNorm
              0.5539776 = fieldWeight in 5093, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.2675414 = idf(docFreq=227, maxDocs=44218)
                0.0625 = fieldNorm(doc=5093)
          0.03788634 = weight(abstract_txt:content in 5093) [ClassicSimilarity], result of:
            0.03788634 = score(doc=5093,freq=1.0), product of:
              0.1450226 = queryWeight, product of:
                1.7529277 = boost
                4.17991 = idf(docFreq=1838, maxDocs=44218)
                0.019792687 = queryNorm
              0.2612444 = fieldWeight in 5093, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.17991 = idf(docFreq=1838, maxDocs=44218)
                0.0625 = fieldNorm(doc=5093)
          0.043813597 = weight(abstract_txt:model in 5093) [ClassicSimilarity], result of:
            0.043813597 = score(doc=5093,freq=1.0), product of:
              0.17585962 = queryWeight, product of:
                2.228941 = boost
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.019792687 = queryNorm
              0.24913962 = fieldWeight in 5093, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.0625 = fieldNorm(doc=5093)
        0.28 = coord(7/25)
    
  4. Zhao, G.; Wu, J.; Wang, D.; Li, T.: Entity disambiguation to Wikipedia using collective ranking (2016) 0.10
    0.10182227 = sum of:
      0.10182227 = product of:
        0.50911134 = sum of:
          0.14806455 = weight(abstract_txt:disambiguation in 3266) [ClassicSimilarity], result of:
            0.14806455 = score(doc=3266,freq=3.0), product of:
              0.14907123 = queryWeight, product of:
                1.0260829 = boost
                7.3401785 = idf(docFreq=77, maxDocs=44218)
                0.019792687 = queryNorm
              0.99324703 = fieldWeight in 3266, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.3401785 = idf(docFreq=77, maxDocs=44218)
                0.078125 = fieldNorm(doc=3266)
          0.03941968 = weight(abstract_txt:method in 3266) [ClassicSimilarity], result of:
            0.03941968 = score(doc=3266,freq=1.0), product of:
              0.11210343 = queryWeight, product of:
                1.2583747 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.019792687 = queryNorm
              0.3516367 = fieldWeight in 3266, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.078125 = fieldNorm(doc=3266)
          0.1064369 = weight(abstract_txt:wikipedia in 3266) [ClassicSimilarity], result of:
            0.1064369 = score(doc=3266,freq=1.0), product of:
              0.21737269 = queryWeight, product of:
                1.752278 = boost
                6.2675414 = idf(docFreq=227, maxDocs=44218)
                0.019792687 = queryNorm
              0.48965168 = fieldWeight in 3266, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2675414 = idf(docFreq=227, maxDocs=44218)
                0.078125 = fieldNorm(doc=3266)
          0.16042322 = weight(abstract_txt:prediction in 3266) [ClassicSimilarity], result of:
            0.16042322 = score(doc=3266,freq=1.0), product of:
              0.28575137 = queryWeight, product of:
                2.0090683 = boost
                7.1860275 = idf(docFreq=90, maxDocs=44218)
                0.019792687 = queryNorm
              0.5614084 = fieldWeight in 3266, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1860275 = idf(docFreq=90, maxDocs=44218)
                0.078125 = fieldNorm(doc=3266)
          0.054766998 = weight(abstract_txt:model in 3266) [ClassicSimilarity], result of:
            0.054766998 = score(doc=3266,freq=1.0), product of:
              0.17585962 = queryWeight, product of:
                2.228941 = boost
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.019792687 = queryNorm
              0.31142452 = fieldWeight in 3266, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.078125 = fieldNorm(doc=3266)
        0.2 = coord(5/25)
    
  5. Li, C.; Sun, A.; Datta, A.: TSDW: Two-stage word sense disambiguation using Wikipedia (2013) 0.10
    0.10164097 = sum of:
      0.10164097 = product of:
        0.63525605 = sum of:
          0.18093787 = weight(abstract_txt:disambiguation in 956) [ClassicSimilarity], result of:
            0.18093787 = score(doc=956,freq=7.0), product of:
              0.14907123 = queryWeight, product of:
                1.0260829 = boost
                7.3401785 = idf(docFreq=77, maxDocs=44218)
                0.019792687 = queryNorm
              1.2137679 = fieldWeight in 956, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                7.3401785 = idf(docFreq=77, maxDocs=44218)
                0.0625 = fieldNorm(doc=956)
          0.10635204 = weight(abstract_txt:component in 956) [ClassicSimilarity], result of:
            0.10635204 = score(doc=956,freq=2.0), product of:
              0.20009553 = queryWeight, product of:
                1.6811993 = boost
                6.0133076 = idf(docFreq=293, maxDocs=44218)
                0.019792687 = queryNorm
              0.5315063 = fieldWeight in 956, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.0133076 = idf(docFreq=293, maxDocs=44218)
                0.0625 = fieldNorm(doc=956)
          0.17029904 = weight(abstract_txt:wikipedia in 956) [ClassicSimilarity], result of:
            0.17029904 = score(doc=956,freq=4.0), product of:
              0.21737269 = queryWeight, product of:
                1.752278 = boost
                6.2675414 = idf(docFreq=227, maxDocs=44218)
                0.019792687 = queryNorm
              0.7834427 = fieldWeight in 956, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.2675414 = idf(docFreq=227, maxDocs=44218)
                0.0625 = fieldNorm(doc=956)
          0.17766711 = weight(abstract_txt:latent in 956) [ClassicSimilarity], result of:
            0.17766711 = score(doc=956,freq=1.0), product of:
              0.4063048 = queryWeight, product of:
                2.9340813 = boost
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.019792687 = queryNorm
              0.43727544 = fieldWeight in 956, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.0625 = fieldNorm(doc=956)
        0.16 = coord(4/25)