Document (#38665)

Author
Moohebat, M.
Raj, R.G.
Kareem, S.B.A.
Thorleuchter, D.
Title
Identifying ISI-indexed articles by their lexical usage : a text analysis approach
Source
Journal of the Association for Information Science and Technology. 66(2015) no.3, S.501-511
Year
2015
Abstract
This research creates an architecture for investigating the existence of probable lexical divergences between articles, categorized as Institute for Scientific Information (ISI) and non-ISI, and consequently, if such a difference is discovered, to propose the best available classification method. Based on a collection of ISI- and non-ISI-indexed articles in the areas of business and computer science, three classification models are trained. A sensitivity analysis is applied to demonstrate the impact of words in different syntactical forms on the classification decision. The results demonstrate that the lexical domains of ISI and non-ISI articles are distinguishable by machine learning techniques. Our findings indicate that the support vector machine identifies ISI-indexed articles in both disciplines with higher precision than do the Naïve Bayesian and K-Nearest Neighbors techniques.
Content
Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23194/abstract.
Theme
Informetrie
Computerlinguistik

Similar documents (content)

  1. Xiang, R.; Chersoni, E.; Lu, Q.; Huang, C.-R.; Li, W.; Long, Y.: Lexical data augmentation for sentiment analysis (2021) 0.12
    0.12183026 = sum of:
      0.12183026 = product of:
        0.5076261 = sum of:
          0.051766895 = weight(abstract_txt:consequently in 1393) [ClassicSimilarity], result of:
            0.051766895 = score(doc=1393,freq=1.0), product of:
              0.1357304 = queryWeight, product of:
                1.0840073 = boost
                6.9740796 = idf(docFreq=112, maxDocs=44421)
                0.017953869 = queryNorm
              0.38139498 = fieldWeight in 1393, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9740796 = idf(docFreq=112, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1393)
          0.053419277 = weight(abstract_txt:trained in 1393) [ClassicSimilarity], result of:
            0.053419277 = score(doc=1393,freq=1.0), product of:
              0.13860357 = queryWeight, product of:
                1.0954205 = boost
                7.0475073 = idf(docFreq=104, maxDocs=44421)
                0.017953869 = queryNorm
              0.38541055 = fieldWeight in 1393, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0475073 = idf(docFreq=104, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1393)
          0.029579127 = weight(abstract_txt:analysis in 1393) [ClassicSimilarity], result of:
            0.029579127 = score(doc=1393,freq=4.0), product of:
              0.074180424 = queryWeight, product of:
                1.1333219 = boost
                3.6456752 = idf(docFreq=3151, maxDocs=44421)
                0.017953869 = queryNorm
              0.39874572 = fieldWeight in 1393, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.6456752 = idf(docFreq=3151, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1393)
          0.04018277 = weight(abstract_txt:techniques in 1393) [ClassicSimilarity], result of:
            0.04018277 = score(doc=1393,freq=2.0), product of:
              0.11463986 = queryWeight, product of:
                1.408888 = boost
                4.5321174 = idf(docFreq=1298, maxDocs=44421)
                0.017953869 = queryNorm
              0.35051307 = fieldWeight in 1393, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.5321174 = idf(docFreq=1298, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1393)
          0.07759689 = weight(abstract_txt:machine in 1393) [ClassicSimilarity], result of:
            0.07759689 = score(doc=1393,freq=3.0), product of:
              0.15530121 = queryWeight, product of:
                1.6398195 = boost
                5.274979 = idf(docFreq=617, maxDocs=44421)
                0.017953869 = queryNorm
              0.49965408 = fieldWeight in 1393, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.274979 = idf(docFreq=617, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1393)
          0.25508118 = weight(abstract_txt:lexical in 1393) [ClassicSimilarity], result of:
            0.25508118 = score(doc=1393,freq=4.0), product of:
              0.35709336 = queryWeight, product of:
                3.0454056 = boost
                6.5309834 = idf(docFreq=175, maxDocs=44421)
                0.017953869 = queryNorm
              0.7143263 = fieldWeight in 1393, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.5309834 = idf(docFreq=175, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1393)
        0.24 = coord(6/25)
    
  2. Ikae, C.; Savoy, J.: Gender identification on Twitter (2022) 0.12
    0.11718488 = sum of:
      0.11718488 = product of:
        0.48827034 = sum of:
          0.04833514 = weight(abstract_txt:vector in 1446) [ClassicSimilarity], result of:
            0.04833514 = score(doc=1446,freq=1.0), product of:
              0.11861959 = queryWeight, product of:
                1.0133789 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.017953869 = queryNorm
              0.40748024 = fieldWeight in 1446, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.0625 = fieldNorm(doc=1446)
          0.016902357 = weight(abstract_txt:analysis in 1446) [ClassicSimilarity], result of:
            0.016902357 = score(doc=1446,freq=1.0), product of:
              0.074180424 = queryWeight, product of:
                1.1333219 = boost
                3.6456752 = idf(docFreq=3151, maxDocs=44421)
                0.017953869 = queryNorm
              0.2278547 = fieldWeight in 1446, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6456752 = idf(docFreq=3151, maxDocs=44421)
                0.0625 = fieldNorm(doc=1446)
          0.09428349 = weight(abstract_txt:nearest in 1446) [ClassicSimilarity], result of:
            0.09428349 = score(doc=1446,freq=1.0), product of:
              0.18518461 = queryWeight, product of:
                1.2661818 = boost
                8.146119 = idf(docFreq=34, maxDocs=44421)
                0.017953869 = queryNorm
              0.50913244 = fieldWeight in 1446, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.146119 = idf(docFreq=34, maxDocs=44421)
                0.0625 = fieldNorm(doc=1446)
          0.100964986 = weight(abstract_txt:naïve in 1446) [ClassicSimilarity], result of:
            0.100964986 = score(doc=1446,freq=1.0), product of:
              0.19383326 = queryWeight, product of:
                1.2954116 = boost
                8.334172 = idf(docFreq=28, maxDocs=44421)
                0.017953869 = queryNorm
              0.52088577 = fieldWeight in 1446, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.334172 = idf(docFreq=28, maxDocs=44421)
                0.0625 = fieldNorm(doc=1446)
          0.15537567 = weight(abstract_txt:neighbors in 1446) [ClassicSimilarity], result of:
            0.15537567 = score(doc=1446,freq=1.0), product of:
              0.25836664 = queryWeight, product of:
                1.4955876 = boost
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.017953869 = queryNorm
              0.60137665 = fieldWeight in 1446, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.622026 = idf(docFreq=7, maxDocs=44421)
                0.0625 = fieldNorm(doc=1446)
          0.072408676 = weight(abstract_txt:machine in 1446) [ClassicSimilarity], result of:
            0.072408676 = score(doc=1446,freq=2.0), product of:
              0.15530121 = queryWeight, product of:
                1.6398195 = boost
                5.274979 = idf(docFreq=617, maxDocs=44421)
                0.017953869 = queryNorm
              0.4662467 = fieldWeight in 1446, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.274979 = idf(docFreq=617, maxDocs=44421)
                0.0625 = fieldNorm(doc=1446)
        0.24 = coord(6/25)
    
  3. Huang, C.; Fu, T.; Chen, H.: Text-based video content classification for online video-sharing sites (2010) 0.11
    0.1119083 = sum of:
      0.1119083 = product of:
        0.4662846 = sum of:
          0.059811685 = weight(abstract_txt:vector in 439) [ClassicSimilarity], result of:
            0.059811685 = score(doc=439,freq=2.0), product of:
              0.11861959 = queryWeight, product of:
                1.0133789 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.017953869 = queryNorm
              0.5042311 = fieldWeight in 439, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.0546875 = fieldNorm(doc=439)
          0.12493779 = weight(abstract_txt:naïve in 439) [ClassicSimilarity], result of:
            0.12493779 = score(doc=439,freq=2.0), product of:
              0.19383326 = queryWeight, product of:
                1.2954116 = boost
                8.334172 = idf(docFreq=28, maxDocs=44421)
                0.017953869 = queryNorm
              0.6445632 = fieldWeight in 439, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.334172 = idf(docFreq=28, maxDocs=44421)
                0.0546875 = fieldNorm(doc=439)
          0.04018277 = weight(abstract_txt:techniques in 439) [ClassicSimilarity], result of:
            0.04018277 = score(doc=439,freq=2.0), product of:
              0.11463986 = queryWeight, product of:
                1.408888 = boost
                4.5321174 = idf(docFreq=1298, maxDocs=44421)
                0.017953869 = queryNorm
              0.35051307 = fieldWeight in 439, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.5321174 = idf(docFreq=1298, maxDocs=44421)
                0.0546875 = fieldNorm(doc=439)
          0.06335759 = weight(abstract_txt:machine in 439) [ClassicSimilarity], result of:
            0.06335759 = score(doc=439,freq=2.0), product of:
              0.15530121 = queryWeight, product of:
                1.6398195 = boost
                5.274979 = idf(docFreq=617, maxDocs=44421)
                0.017953869 = queryNorm
              0.40796587 = fieldWeight in 439, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.274979 = idf(docFreq=617, maxDocs=44421)
                0.0546875 = fieldNorm(doc=439)
          0.05045416 = weight(abstract_txt:classification in 439) [ClassicSimilarity], result of:
            0.05045416 = score(doc=439,freq=3.0), product of:
              0.13342598 = queryWeight, product of:
                1.8615489 = boost
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.017953869 = queryNorm
              0.37814343 = fieldWeight in 439, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.9921594 = idf(docFreq=2228, maxDocs=44421)
                0.0546875 = fieldNorm(doc=439)
          0.12754059 = weight(abstract_txt:lexical in 439) [ClassicSimilarity], result of:
            0.12754059 = score(doc=439,freq=1.0), product of:
              0.35709336 = queryWeight, product of:
                3.0454056 = boost
                6.5309834 = idf(docFreq=175, maxDocs=44421)
                0.017953869 = queryNorm
              0.35716316 = fieldWeight in 439, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5309834 = idf(docFreq=175, maxDocs=44421)
                0.0546875 = fieldNorm(doc=439)
        0.24 = coord(6/25)
    
  4. Sabourin, C.F. (Bearb.): Computational lexicology and lexicography : bibliography (1994) 0.10
    0.09617037 = sum of:
      0.09617037 = product of:
        0.8014198 = sum of:
          0.042255897 = weight(abstract_txt:analysis in 485) [ClassicSimilarity], result of:
            0.042255897 = score(doc=485,freq=1.0), product of:
              0.074180424 = queryWeight, product of:
                1.1333219 = boost
                3.6456752 = idf(docFreq=3151, maxDocs=44421)
                0.017953869 = queryNorm
              0.56963676 = fieldWeight in 485, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6456752 = idf(docFreq=3151, maxDocs=44421)
                0.15625 = fieldNorm(doc=485)
          0.12800166 = weight(abstract_txt:machine in 485) [ClassicSimilarity], result of:
            0.12800166 = score(doc=485,freq=1.0), product of:
              0.15530121 = queryWeight, product of:
                1.6398195 = boost
                5.274979 = idf(docFreq=617, maxDocs=44421)
                0.017953869 = queryNorm
              0.8242155 = fieldWeight in 485, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.274979 = idf(docFreq=617, maxDocs=44421)
                0.15625 = fieldNorm(doc=485)
          0.6311622 = weight(abstract_txt:lexical in 485) [ClassicSimilarity], result of:
            0.6311622 = score(doc=485,freq=3.0), product of:
              0.35709336 = queryWeight, product of:
                3.0454056 = boost
                6.5309834 = idf(docFreq=175, maxDocs=44421)
                0.017953869 = queryNorm
              1.7674992 = fieldWeight in 485, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.5309834 = idf(docFreq=175, maxDocs=44421)
                0.15625 = fieldNorm(doc=485)
        0.12 = coord(3/25)
    
  5. Lu, C.; Bu, Y.; Wang, J.; Ding, Y.; Torvik, V.; Schnaars, M.; Zhang, C.: Examining scientific writing styles from the perspective of linguistic complexity : a cross-level moderation model (2019) 0.09
    0.09007564 = sum of:
      0.09007564 = product of:
        0.7506304 = sum of:
          0.13703248 = weight(abstract_txt:syntactical in 219) [ClassicSimilarity], result of:
            0.13703248 = score(doc=219,freq=1.0), product of:
              0.2047655 = queryWeight, product of:
                1.3314413 = boost
                8.565973 = idf(docFreq=22, maxDocs=44421)
                0.017953869 = queryNorm
              0.66921663 = fieldWeight in 219, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.565973 = idf(docFreq=22, maxDocs=44421)
                0.078125 = fieldNorm(doc=219)
          0.40741348 = weight(abstract_txt:lexical in 219) [ClassicSimilarity], result of:
            0.40741348 = score(doc=219,freq=5.0), product of:
              0.35709336 = queryWeight, product of:
                3.0454056 = boost
                6.5309834 = idf(docFreq=175, maxDocs=44421)
                0.017953869 = queryNorm
              1.1409159 = fieldWeight in 219, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.5309834 = idf(docFreq=175, maxDocs=44421)
                0.078125 = fieldNorm(doc=219)
          0.20618437 = weight(abstract_txt:articles in 219) [ClassicSimilarity], result of:
            0.20618437 = score(doc=219,freq=3.0), product of:
              0.31878254 = queryWeight, product of:
                3.7147183 = boost
                4.7798095 = idf(docFreq=1013, maxDocs=44421)
                0.017953869 = queryNorm
              0.6467869 = fieldWeight in 219, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.7798095 = idf(docFreq=1013, maxDocs=44421)
                0.078125 = fieldNorm(doc=219)
        0.12 = coord(3/25)