Document (#38305)

Author
Vechtomova, O.
Title
¬A method for automatic extraction of multiword units representing business aspects from user reviews
Source
Journal of the Association for Information Science and Technology. 65(2014) no.7, S.1463-1477
Year
2014
Abstract
The article describes a semi-supervised approach to extracting multiword aspects of user-written reviews that belong to a given category. The method starts with a small set of seed words, representing the target category, and calculates distributional similarity between the candidate and seed words. We compare 3 distributional similarity measures (Lin's, Weeds's, and balAPinc), and a document retrieval function, BM25, adapted as a word similarity measure. We then introduce a method for identifying multiword aspects by using a combination of syntactic rules and a co-occurrence association measure. Finally, we describe a method for ranking multiword aspects by the likelihood of belonging to the target aspect category. The task used for evaluation is extraction of restaurant dish names from a corpus of restaurant reviews.
Theme
Computerlinguistik

Similar documents (author)

  1. Vechtomova, O.: Facet-based opinion retrieval from blogs (2010) 6.01
    6.0137663 = sum of:
      6.0137663 = weight(author_txt:vechtomova in 225) [ClassicSimilarity], result of:
        6.0137663 = fieldWeight in 225, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.622026 = idf(docFreq=7, maxDocs=44421)
          0.625 = fieldNorm(doc=225)
    
  2. Vechtomova, O.; Karamuftuoglu, M.: Query expansion with terms selected using lexical cohesion analysis of documents (2007) 4.81
    4.811013 = sum of:
      4.811013 = weight(author_txt:vechtomova in 1908) [ClassicSimilarity], result of:
        4.811013 = fieldWeight in 1908, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.622026 = idf(docFreq=7, maxDocs=44421)
          0.5 = fieldNorm(doc=1908)
    
  3. Vechtomova, O.; Karamuftuoglu, M.: Elicitation and use of relevance feedback information (2006) 4.81
    4.811013 = sum of:
      4.811013 = weight(author_txt:vechtomova in 1966) [ClassicSimilarity], result of:
        4.811013 = fieldWeight in 1966, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.622026 = idf(docFreq=7, maxDocs=44421)
          0.5 = fieldNorm(doc=1966)
    
  4. Vechtomova, O.; Karamuftuoglu, M.: Lexical cohesion and term proximity in document ranking (2008) 4.81
    4.811013 = sum of:
      4.811013 = weight(author_txt:vechtomova in 3101) [ClassicSimilarity], result of:
        4.811013 = fieldWeight in 3101, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.622026 = idf(docFreq=7, maxDocs=44421)
          0.5 = fieldNorm(doc=3101)
    
  5. Vechtomova, O.; Robertson, S.E.: ¬A domain-independent approach to finding related entities (2012) 4.81
    4.811013 = sum of:
      4.811013 = weight(author_txt:vechtomova in 3733) [ClassicSimilarity], result of:
        4.811013 = fieldWeight in 3733, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.622026 = idf(docFreq=7, maxDocs=44421)
          0.5 = fieldNorm(doc=3733)
    

Similar documents (content)

  1. Vechtomova, O.; Robertson, S.E.: ¬A domain-independent approach to finding related entities (2012) 0.30
    0.29680687 = sum of:
      0.29680687 = product of:
        0.92752147 = sum of:
          0.11532814 = weight(abstract_txt:candidate in 3733) [ClassicSimilarity], result of:
            0.11532814 = score(doc=3733,freq=4.0), product of:
              0.100841045 = queryWeight, product of:
                1.0816386 = boost
                7.319441 = idf(docFreq=79, maxDocs=44421)
                0.012737297 = queryNorm
              1.1436627 = fieldWeight in 3733, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.319441 = idf(docFreq=79, maxDocs=44421)
                0.078125 = fieldNorm(doc=3733)
          0.06271209 = weight(abstract_txt:likelihood in 3733) [ClassicSimilarity], result of:
            0.06271209 = score(doc=3733,freq=1.0), product of:
              0.106643565 = queryWeight, product of:
                1.1123227 = boost
                7.5270805 = idf(docFreq=64, maxDocs=44421)
                0.012737297 = queryNorm
              0.58805317 = fieldWeight in 3733, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5270805 = idf(docFreq=64, maxDocs=44421)
                0.078125 = fieldNorm(doc=3733)
          0.0472476 = weight(abstract_txt:measure in 3733) [ClassicSimilarity], result of:
            0.0472476 = score(doc=3733,freq=1.0), product of:
              0.11124922 = queryWeight, product of:
                1.6066711 = boost
                5.4361663 = idf(docFreq=525, maxDocs=44421)
                0.012737297 = queryNorm
              0.4247005 = fieldWeight in 3733, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4361663 = idf(docFreq=525, maxDocs=44421)
                0.078125 = fieldNorm(doc=3733)
          0.11803851 = weight(abstract_txt:target in 3733) [ClassicSimilarity], result of:
            0.11803851 = score(doc=3733,freq=2.0), product of:
              0.16257346 = queryWeight, product of:
                1.9422418 = boost
                6.571569 = idf(docFreq=168, maxDocs=44421)
                0.012737297 = queryNorm
              0.72606266 = fieldWeight in 3733, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.571569 = idf(docFreq=168, maxDocs=44421)
                0.078125 = fieldNorm(doc=3733)
          0.25754744 = weight(abstract_txt:seed in 3733) [ClassicSimilarity], result of:
            0.25754744 = score(doc=3733,freq=2.0), product of:
              0.27348822 = queryWeight, product of:
                2.5191138 = boost
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.012737297 = queryNorm
              0.9417131 = fieldWeight in 3733, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.078125 = fieldNorm(doc=3733)
          0.08688533 = weight(abstract_txt:similarity in 3733) [ClassicSimilarity], result of:
            0.08688533 = score(doc=3733,freq=1.0), product of:
              0.19114894 = queryWeight, product of:
                2.5793486 = boost
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.012737297 = queryNorm
              0.4545426 = fieldWeight in 3733, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.078125 = fieldNorm(doc=3733)
          0.05355789 = weight(abstract_txt:method in 3733) [ClassicSimilarity], result of:
            0.05355789 = score(doc=3733,freq=1.0), product of:
              0.15238287 = queryWeight, product of:
                2.6592646 = boost
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.012737297 = queryNorm
              0.35146925 = fieldWeight in 3733, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.078125 = fieldNorm(doc=3733)
          0.1862044 = weight(abstract_txt:category in 3733) [ClassicSimilarity], result of:
            0.1862044 = score(doc=3733,freq=3.0), product of:
              0.22030641 = queryWeight, product of:
                2.7690938 = boost
                6.2461467 = idf(docFreq=233, maxDocs=44421)
                0.012737297 = queryNorm
              0.8452065 = fieldWeight in 3733, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.2461467 = idf(docFreq=233, maxDocs=44421)
                0.078125 = fieldNorm(doc=3733)
        0.32 = coord(8/25)
    
  2. Cruys, T. van de; Moirón, B.V.: Semantics-based multiword expression extraction (2007) 0.21
    0.2059822 = sum of:
      0.2059822 = product of:
        1.0299109 = sum of:
          0.08379001 = weight(abstract_txt:extraction in 3919) [ClassicSimilarity], result of:
            0.08379001 = score(doc=3919,freq=1.0), product of:
              0.14433926 = queryWeight, product of:
                1.8300828 = boost
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.012737297 = queryNorm
              0.5805074 = fieldWeight in 3919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.09375 = fieldNorm(doc=3919)
          0.1042624 = weight(abstract_txt:similarity in 3919) [ClassicSimilarity], result of:
            0.1042624 = score(doc=3919,freq=1.0), product of:
              0.19114894 = queryWeight, product of:
                2.5793486 = boost
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.012737297 = queryNorm
              0.5454511 = fieldWeight in 3919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.09375 = fieldNorm(doc=3919)
          0.09089075 = weight(abstract_txt:method in 3919) [ClassicSimilarity], result of:
            0.09089075 = score(doc=3919,freq=2.0), product of:
              0.15238287 = queryWeight, product of:
                2.6592646 = boost
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.012737297 = queryNorm
              0.5964631 = fieldWeight in 3919, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.09375 = fieldNorm(doc=3919)
          0.2930296 = weight(abstract_txt:distributional in 3919) [ClassicSimilarity], result of:
            0.2930296 = score(doc=3919,freq=1.0), product of:
              0.33255538 = queryWeight, product of:
                2.7778606 = boost
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.012737297 = queryNorm
              0.88114524 = fieldWeight in 3919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.398883 = idf(docFreq=9, maxDocs=44421)
                0.09375 = fieldNorm(doc=3919)
          0.45793816 = weight(abstract_txt:multiword in 3919) [ClassicSimilarity], result of:
            0.45793816 = score(doc=3919,freq=1.0), product of:
              0.5642491 = queryWeight, product of:
                5.117159 = boost
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.012737297 = queryNorm
              0.81158864 = fieldWeight in 3919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.09375 = fieldNorm(doc=3919)
        0.2 = coord(5/25)
    
  3. Nagy T., I.: Detecting multiword expressions and named entities in natural language texts (2014) 0.20
    0.20349786 = sum of:
      0.20349786 = product of:
        0.8479078 = sum of:
          0.028832035 = weight(abstract_txt:candidate in 2536) [ClassicSimilarity], result of:
            0.028832035 = score(doc=2536,freq=1.0), product of:
              0.100841045 = queryWeight, product of:
                1.0816386 = boost
                7.319441 = idf(docFreq=79, maxDocs=44421)
                0.012737297 = queryNorm
              0.28591567 = fieldWeight in 2536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.319441 = idf(docFreq=79, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2536)
          0.043297023 = weight(abstract_txt:supervised in 2536) [ClassicSimilarity], result of:
            0.043297023 = score(doc=2536,freq=2.0), product of:
              0.10495807 = queryWeight, product of:
                1.1034976 = boost
                7.467361 = idf(docFreq=68, maxDocs=44421)
                0.012737297 = queryNorm
              0.4125173 = fieldWeight in 2536, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.467361 = idf(docFreq=68, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2536)
          0.022591868 = weight(abstract_txt:words in 2536) [ClassicSimilarity], result of:
            0.022591868 = score(doc=2536,freq=1.0), product of:
              0.10798545 = queryWeight, product of:
                1.5829278 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.012737297 = queryNorm
              0.20921215 = fieldWeight in 2536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2536)
          0.06047023 = weight(abstract_txt:extraction in 2536) [ClassicSimilarity], result of:
            0.06047023 = score(doc=2536,freq=3.0), product of:
              0.14433926 = queryWeight, product of:
                1.8300828 = boost
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.012737297 = queryNorm
              0.41894513 = fieldWeight in 2536, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.192079 = idf(docFreq=246, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2536)
          0.059879545 = weight(abstract_txt:method in 2536) [ClassicSimilarity], result of:
            0.059879545 = score(doc=2536,freq=5.0), product of:
              0.15238287 = queryWeight, product of:
                2.6592646 = boost
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.012737297 = queryNorm
              0.3929546 = fieldWeight in 2536, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2536)
          0.6328371 = weight(abstract_txt:multiword in 2536) [ClassicSimilarity], result of:
            0.6328371 = score(doc=2536,freq=11.0), product of:
              0.5642491 = queryWeight, product of:
                5.117159 = boost
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.012737297 = queryNorm
              1.1215563 = fieldWeight in 2536, product of:
                3.3166249 = tf(freq=11.0), with freq of:
                  11.0 = termFreq=11.0
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.0390625 = fieldNorm(doc=2536)
        0.24 = coord(6/25)
    
  4. Landauer, T.K.; Foltz, P.W.; Laham, D.: ¬An introduction to Latent Semantic Analysis (1998) 0.10
    0.10431357 = sum of:
      0.10431357 = product of:
        0.43463987 = sum of:
          0.04895741 = weight(abstract_txt:extracting in 2162) [ClassicSimilarity], result of:
            0.04895741 = score(doc=2162,freq=1.0), product of:
              0.09041617 = queryWeight, product of:
                1.0242041 = boost
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.012737297 = queryNorm
              0.5414674 = fieldWeight in 2162, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.930783 = idf(docFreq=117, maxDocs=44421)
                0.078125 = fieldNorm(doc=2162)
          0.07826053 = weight(abstract_txt:words in 2162) [ClassicSimilarity], result of:
            0.07826053 = score(doc=2162,freq=3.0), product of:
              0.10798545 = queryWeight, product of:
                1.5829278 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.012737297 = queryNorm
              0.72473216 = fieldWeight in 2162, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.078125 = fieldNorm(doc=2162)
          0.05947353 = weight(abstract_txt:representing in 2162) [ClassicSimilarity], result of:
            0.05947353 = score(doc=2162,freq=1.0), product of:
              0.1296959 = queryWeight, product of:
                1.7347689 = boost
                5.869585 = idf(docFreq=340, maxDocs=44421)
                0.012737297 = queryNorm
              0.45856133 = fieldWeight in 2162, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.869585 = idf(docFreq=340, maxDocs=44421)
                0.078125 = fieldNorm(doc=2162)
          0.08688533 = weight(abstract_txt:similarity in 2162) [ClassicSimilarity], result of:
            0.08688533 = score(doc=2162,freq=1.0), product of:
              0.19114894 = queryWeight, product of:
                2.5793486 = boost
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.012737297 = queryNorm
              0.4545426 = fieldWeight in 2162, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.078125 = fieldNorm(doc=2162)
          0.05355789 = weight(abstract_txt:method in 2162) [ClassicSimilarity], result of:
            0.05355789 = score(doc=2162,freq=1.0), product of:
              0.15238287 = queryWeight, product of:
                2.6592646 = boost
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.012737297 = queryNorm
              0.35146925 = fieldWeight in 2162, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.078125 = fieldNorm(doc=2162)
          0.10750517 = weight(abstract_txt:category in 2162) [ClassicSimilarity], result of:
            0.10750517 = score(doc=2162,freq=1.0), product of:
              0.22030641 = queryWeight, product of:
                2.7690938 = boost
                6.2461467 = idf(docFreq=233, maxDocs=44421)
                0.012737297 = queryNorm
              0.48798022 = fieldWeight in 2162, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2461467 = idf(docFreq=233, maxDocs=44421)
                0.078125 = fieldNorm(doc=2162)
        0.24 = coord(6/25)
    
  5. Gödert, W.: Detecting multiword phrases in mathematical text corpora (2012) 0.10
    0.09512802 = sum of:
      0.09512802 = product of:
        0.79273355 = sum of:
          0.054220486 = weight(abstract_txt:words in 1466) [ClassicSimilarity], result of:
            0.054220486 = score(doc=1466,freq=1.0), product of:
              0.10798545 = queryWeight, product of:
                1.5829278 = boost
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.012737297 = queryNorm
              0.50210917 = fieldWeight in 1466, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.355831 = idf(docFreq=569, maxDocs=44421)
                0.09375 = fieldNorm(doc=1466)
          0.09089075 = weight(abstract_txt:method in 1466) [ClassicSimilarity], result of:
            0.09089075 = score(doc=1466,freq=2.0), product of:
              0.15238287 = queryWeight, product of:
                2.6592646 = boost
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.012737297 = queryNorm
              0.5964631 = fieldWeight in 1466, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4988065 = idf(docFreq=1342, maxDocs=44421)
                0.09375 = fieldNorm(doc=1466)
          0.64762235 = weight(abstract_txt:multiword in 1466) [ClassicSimilarity], result of:
            0.64762235 = score(doc=1466,freq=2.0), product of:
              0.5642491 = queryWeight, product of:
                5.117159 = boost
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.012737297 = queryNorm
              1.1477597 = fieldWeight in 1466, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.656945 = idf(docFreq=20, maxDocs=44421)
                0.09375 = fieldNorm(doc=1466)
        0.12 = coord(3/25)