Document (#43746)

Author
Pech, G.
Delgado, C.
Sorella, S.P.
Title
Classifying papers into subfields using Abstracts, Titles, Keywords and KeyWords Plus through pattern detection and optimization procedures : an application in Physics
Source
Journal of the Association for Information Science and Technology. 73(2022) no.11, S.1513-1528
Year
2022
Abstract
Classifying papers according to the fields of knowledge is critical to clearly understand the dynamics of scientific (sub)fields, their leading questions, and trends. Most studies rely on journal categories defined by popular databases such as WoS or Scopus, but some experts find that those categories may not correctly map the existing subfields nor identify the subfield of a specific article. This study addresses the classification problem using data from each paper (Abstract, Title, Keywords, and the KeyWords Plus) and the help of experts to identify the existing subfields and journals exclusive of each subfield. These "exclusive journals" are critical to obtain, through a pattern detection procedure that uses machine learning techniques (from software NVivo), a list of the frequent terms that are specific to each subfield. With that list of terms and with the help of optimization procedures, we can identify to which subfield each paper most likely belongs. This study can contribute to support scientific policy-makers, funding, and research institutions-via more accurate academic performance evaluations-, to support editors in their tasks to redefine the scopes of journals, and to support popular databases in their processes of refining categories.
Content
https://asistdl.onlinelibrary.wiley.com/doi/10.1002/asi.24655.
Theme
Automatisches Klassifizieren
Field
Physik

Similar documents (author)

  1. Delgado, Y. Hidalgo- => Hidalgo-Delgado, Y.: 5.04
    5.0403857 = sum of:
      5.0403857 = weight(author_txt:delgado in 4705) [ClassicSimilarity], result of:
        5.0403857 = fieldWeight in 4705, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          9.504243 = idf(docFreq=8, maxDocs=44421)
          0.375 = fieldNorm(doc=4705)
    
  2. Quirós, L. Delgado- => Delgado-Quirós, L.: 5.04
    5.0403857 = sum of:
      5.0403857 = weight(author_txt:delgado in 1841) [ClassicSimilarity], result of:
        5.0403857 = fieldWeight in 1841, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          9.504243 = idf(docFreq=8, maxDocs=44421)
          0.375 = fieldNorm(doc=1841)
    
  3. Thelwall, M.; Delgado, M.M.: Arts and humanities research evaluation : no metrics please, just data (2015) 4.75
    4.7521214 = sum of:
      4.7521214 = weight(author_txt:delgado in 3313) [ClassicSimilarity], result of:
        4.7521214 = fieldWeight in 3313, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.504243 = idf(docFreq=8, maxDocs=44421)
          0.5 = fieldNorm(doc=3313)
    
  4. Montalvo, S.; Martínez, R.; Fresno, V.; Delgado, A.: Exploiting named entities for bilingual news clustering (2015) 2.97
    2.9700758 = sum of:
      2.9700758 = weight(author_txt:delgado in 2642) [ClassicSimilarity], result of:
        2.9700758 = fieldWeight in 2642, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.504243 = idf(docFreq=8, maxDocs=44421)
          0.3125 = fieldNorm(doc=2642)
    
  5. Delgado, A.D.; Martínez, R.; Montalvo, S.; Fresno, V.: Person name disambiguation in the Web using adaptive threshold clustering (2017) 2.97
    2.9700758 = sum of:
      2.9700758 = weight(author_txt:delgado in 4694) [ClassicSimilarity], result of:
        2.9700758 = fieldWeight in 4694, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.504243 = idf(docFreq=8, maxDocs=44421)
          0.3125 = fieldNorm(doc=4694)
    

Similar documents (content)

  1. Zhang, J.; Yu, Q.; Zheng, F.; Long, C.; Lu, Z.; Duan, Z.: Comparing keywords plus of WOS and author keywords : a case study of patient adherence research (2016) 0.20
    0.20226015 = sum of:
      0.20226015 = product of:
        0.84275067 = sum of:
          0.008227799 = weight(abstract_txt:that in 3857) [ClassicSimilarity], result of:
            0.008227799 = score(doc=3857,freq=1.0), product of:
              0.04453223 = queryWeight, product of:
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.018830212 = queryNorm
              0.18476056 = fieldWeight in 3857, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.078125 = fieldNorm(doc=3857)
          0.055748492 = weight(abstract_txt:fields in 3857) [ClassicSimilarity], result of:
            0.055748492 = score(doc=3857,freq=2.0), product of:
              0.100450955 = queryWeight, product of:
                1.0620008 = boost
                5.0231256 = idf(docFreq=794, maxDocs=44421)
                0.018830212 = queryNorm
              0.5549822 = fieldWeight in 3857, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.0231256 = idf(docFreq=794, maxDocs=44421)
                0.078125 = fieldNorm(doc=3857)
          0.045401447 = weight(abstract_txt:papers in 3857) [ClassicSimilarity], result of:
            0.045401447 = score(doc=3857,freq=1.0), product of:
              0.11037103 = queryWeight, product of:
                1.1132054 = boost
                5.2653174 = idf(docFreq=623, maxDocs=44421)
                0.018830212 = queryNorm
              0.41135293 = fieldWeight in 3857, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2653174 = idf(docFreq=623, maxDocs=44421)
                0.078125 = fieldNorm(doc=3857)
          0.0598892 = weight(abstract_txt:popular in 3857) [ClassicSimilarity], result of:
            0.0598892 = score(doc=3857,freq=1.0), product of:
              0.13275197 = queryWeight, product of:
                1.2208668 = boost
                5.7745414 = idf(docFreq=374, maxDocs=44421)
                0.018830212 = queryNorm
              0.45113605 = fieldWeight in 3857, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7745414 = idf(docFreq=374, maxDocs=44421)
                0.078125 = fieldNorm(doc=3857)
          0.22687668 = weight(abstract_txt:plus in 3857) [ClassicSimilarity], result of:
            0.22687668 = score(doc=3857,freq=7.0), product of:
              0.16864334 = queryWeight, product of:
                1.3760442 = boost
                6.5085106 = idf(docFreq=179, maxDocs=44421)
                0.018830212 = queryNorm
              1.3453047 = fieldWeight in 3857, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.5085106 = idf(docFreq=179, maxDocs=44421)
                0.078125 = fieldNorm(doc=3857)
          0.44660708 = weight(abstract_txt:keywords in 3857) [ClassicSimilarity], result of:
            0.44660708 = score(doc=3857,freq=11.0), product of:
              0.2870592 = queryWeight, product of:
                2.538917 = boost
                6.004374 = idf(docFreq=297, maxDocs=44421)
                0.018830212 = queryNorm
              1.5558014 = fieldWeight in 3857, product of:
                3.3166249 = tf(freq=11.0), with freq of:
                  11.0 = termFreq=11.0
                6.004374 = idf(docFreq=297, maxDocs=44421)
                0.078125 = fieldNorm(doc=3857)
        0.24 = coord(6/25)
    
  2. Gwak, J.H.; Sohn, S.J.: ¬A novel approach to explore patent development paths for subfield technologies (2018) 0.15
    0.15085989 = sum of:
      0.15085989 = product of:
        0.9428743 = sum of:
          0.008227799 = weight(abstract_txt:that in 120) [ClassicSimilarity], result of:
            0.008227799 = score(doc=120,freq=1.0), product of:
              0.04453223 = queryWeight, product of:
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.018830212 = queryNorm
              0.18476056 = fieldWeight in 120, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.078125 = fieldNorm(doc=120)
          0.04343039 = weight(abstract_txt:each in 120) [ClassicSimilarity], result of:
            0.04343039 = score(doc=120,freq=1.0), product of:
              0.13500436 = queryWeight, product of:
                1.7411519 = boost
                4.1177115 = idf(docFreq=1965, maxDocs=44421)
                0.018830212 = queryNorm
              0.32169622 = fieldWeight in 120, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1177115 = idf(docFreq=1965, maxDocs=44421)
                0.078125 = fieldNorm(doc=120)
          0.4343702 = weight(abstract_txt:subfields in 120) [ClassicSimilarity], result of:
            0.4343702 = score(doc=120,freq=4.0), product of:
              0.35869533 = queryWeight, product of:
                2.457855 = boost
                7.750224 = idf(docFreq=51, maxDocs=44421)
                0.018830212 = queryNorm
              1.2109725 = fieldWeight in 120, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.750224 = idf(docFreq=51, maxDocs=44421)
                0.078125 = fieldNorm(doc=120)
          0.4568459 = weight(abstract_txt:subfield in 120) [ClassicSimilarity], result of:
            0.4568459 = score(doc=120,freq=2.0), product of:
              0.5144247 = queryWeight, product of:
                3.3987849 = boost
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.018830212 = queryNorm
              0.88807154 = fieldWeight in 120, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.078125 = fieldNorm(doc=120)
        0.16 = coord(4/25)
    
  3. Rorissa, A.; Yuan, X.: Visualizing and mapping the intellectual structure of information retrieval (2012) 0.14
    0.14160721 = sum of:
      0.14160721 = product of:
        0.70803607 = sum of:
          0.008227799 = weight(abstract_txt:that in 3744) [ClassicSimilarity], result of:
            0.008227799 = score(doc=3744,freq=1.0), product of:
              0.04453223 = queryWeight, product of:
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.018830212 = queryNorm
              0.18476056 = fieldWeight in 3744, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.078125 = fieldNorm(doc=3744)
          0.045401447 = weight(abstract_txt:papers in 3744) [ClassicSimilarity], result of:
            0.045401447 = score(doc=3744,freq=1.0), product of:
              0.11037103 = queryWeight, product of:
                1.1132054 = boost
                5.2653174 = idf(docFreq=623, maxDocs=44421)
                0.018830212 = queryNorm
              0.41135293 = fieldWeight in 3744, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2653174 = idf(docFreq=623, maxDocs=44421)
                0.078125 = fieldNorm(doc=3744)
          0.062903866 = weight(abstract_txt:journals in 3744) [ClassicSimilarity], result of:
            0.062903866 = score(doc=3744,freq=1.0), product of:
              0.15702084 = queryWeight, product of:
                1.6261929 = boost
                5.1277876 = idf(docFreq=715, maxDocs=44421)
                0.018830212 = queryNorm
              0.40060842 = fieldWeight in 3744, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1277876 = idf(docFreq=715, maxDocs=44421)
                0.078125 = fieldNorm(doc=3744)
          0.13465708 = weight(abstract_txt:keywords in 3744) [ClassicSimilarity], result of:
            0.13465708 = score(doc=3744,freq=1.0), product of:
              0.2870592 = queryWeight, product of:
                2.538917 = boost
                6.004374 = idf(docFreq=297, maxDocs=44421)
                0.018830212 = queryNorm
              0.4690917 = fieldWeight in 3744, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.004374 = idf(docFreq=297, maxDocs=44421)
                0.078125 = fieldNorm(doc=3744)
          0.4568459 = weight(abstract_txt:subfield in 3744) [ClassicSimilarity], result of:
            0.4568459 = score(doc=3744,freq=2.0), product of:
              0.5144247 = queryWeight, product of:
                3.3987849 = boost
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.018830212 = queryNorm
              0.88807154 = fieldWeight in 3744, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.078125 = fieldNorm(doc=3744)
        0.2 = coord(5/25)
    
  4. Chen, Y.-N.; Ke, H.-R.: ¬A study on mental models of taggers and experts for article indexing based on analysis of keyword usage (2014) 0.14
    0.13954143 = sum of:
      0.13954143 = product of:
        0.5814226 = sum of:
          0.06775689 = weight(abstract_txt:popular in 2334) [ClassicSimilarity], result of:
            0.06775689 = score(doc=2334,freq=2.0), product of:
              0.13275197 = queryWeight, product of:
                1.2208668 = boost
                5.7745414 = idf(docFreq=374, maxDocs=44421)
                0.018830212 = queryNorm
              0.51040214 = fieldWeight in 2334, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7745414 = idf(docFreq=374, maxDocs=44421)
                0.0625 = fieldNorm(doc=2334)
          0.11694606 = weight(abstract_txt:experts in 2334) [ClassicSimilarity], result of:
            0.11694606 = score(doc=2334,freq=5.0), product of:
              0.1407394 = queryWeight, product of:
                1.2570589 = boost
                5.9457254 = idf(docFreq=315, maxDocs=44421)
                0.018830212 = queryNorm
              0.8309404 = fieldWeight in 2334, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.9457254 = idf(docFreq=315, maxDocs=44421)
                0.0625 = fieldNorm(doc=2334)
          0.12077471 = weight(abstract_txt:pattern in 2334) [ClassicSimilarity], result of:
            0.12077471 = score(doc=2334,freq=4.0), product of:
              0.15489806 = queryWeight, product of:
                1.3187752 = boost
                6.2376356 = idf(docFreq=235, maxDocs=44421)
                0.018830212 = queryNorm
              0.77970445 = fieldWeight in 2334, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.2376356 = idf(docFreq=235, maxDocs=44421)
                0.0625 = fieldNorm(doc=2334)
          0.050323095 = weight(abstract_txt:journals in 2334) [ClassicSimilarity], result of:
            0.050323095 = score(doc=2334,freq=1.0), product of:
              0.15702084 = queryWeight, product of:
                1.6261929 = boost
                5.1277876 = idf(docFreq=715, maxDocs=44421)
                0.018830212 = queryNorm
              0.32048672 = fieldWeight in 2334, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1277876 = idf(docFreq=715, maxDocs=44421)
                0.0625 = fieldNorm(doc=2334)
          0.07327477 = weight(abstract_txt:categories in 2334) [ClassicSimilarity], result of:
            0.07327477 = score(doc=2334,freq=2.0), product of:
              0.16010518 = queryWeight, product of:
                1.6420867 = boost
                5.177905 = idf(docFreq=680, maxDocs=44421)
                0.018830212 = queryNorm
              0.45766646 = fieldWeight in 2334, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.177905 = idf(docFreq=680, maxDocs=44421)
                0.0625 = fieldNorm(doc=2334)
          0.1523471 = weight(abstract_txt:keywords in 2334) [ClassicSimilarity], result of:
            0.1523471 = score(doc=2334,freq=2.0), product of:
              0.2870592 = queryWeight, product of:
                2.538917 = boost
                6.004374 = idf(docFreq=297, maxDocs=44421)
                0.018830212 = queryNorm
              0.5307167 = fieldWeight in 2334, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.004374 = idf(docFreq=297, maxDocs=44421)
                0.0625 = fieldNorm(doc=2334)
        0.24 = coord(6/25)
    
  5. Hjoerland, B.: Citation analysis : a social and dynamic approach to knowledge organization (2013) 0.13
    0.1330662 = sum of:
      0.1330662 = product of:
        0.5544425 = sum of:
          0.0065822396 = weight(abstract_txt:that in 3710) [ClassicSimilarity], result of:
            0.0065822396 = score(doc=3710,freq=1.0), product of:
              0.04453223 = queryWeight, product of:
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.018830212 = queryNorm
              0.14780845 = fieldWeight in 3710, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=3710)
          0.036321156 = weight(abstract_txt:papers in 3710) [ClassicSimilarity], result of:
            0.036321156 = score(doc=3710,freq=1.0), product of:
              0.11037103 = queryWeight, product of:
                1.1132054 = boost
                5.2653174 = idf(docFreq=623, maxDocs=44421)
                0.018830212 = queryNorm
              0.32908234 = fieldWeight in 3710, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2653174 = idf(docFreq=623, maxDocs=44421)
                0.0625 = fieldNorm(doc=3710)
          0.044615667 = weight(abstract_txt:identify in 3710) [ClassicSimilarity], result of:
            0.044615667 = score(doc=3710,freq=1.0), product of:
              0.14491189 = queryWeight, product of:
                1.5622315 = boost
                4.9261017 = idf(docFreq=875, maxDocs=44421)
                0.018830212 = queryNorm
              0.30788136 = fieldWeight in 3710, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9261017 = idf(docFreq=875, maxDocs=44421)
                0.0625 = fieldNorm(doc=3710)
          0.03474431 = weight(abstract_txt:each in 3710) [ClassicSimilarity], result of:
            0.03474431 = score(doc=3710,freq=1.0), product of:
              0.13500436 = queryWeight, product of:
                1.7411519 = boost
                4.1177115 = idf(docFreq=1965, maxDocs=44421)
                0.018830212 = queryNorm
              0.25735697 = fieldWeight in 3710, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1177115 = idf(docFreq=1965, maxDocs=44421)
                0.0625 = fieldNorm(doc=3710)
          0.17374808 = weight(abstract_txt:subfields in 3710) [ClassicSimilarity], result of:
            0.17374808 = score(doc=3710,freq=1.0), product of:
              0.35869533 = queryWeight, product of:
                2.457855 = boost
                7.750224 = idf(docFreq=51, maxDocs=44421)
                0.018830212 = queryNorm
              0.484389 = fieldWeight in 3710, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.750224 = idf(docFreq=51, maxDocs=44421)
                0.0625 = fieldNorm(doc=3710)
          0.25843108 = weight(abstract_txt:subfield in 3710) [ClassicSimilarity], result of:
            0.25843108 = score(doc=3710,freq=1.0), product of:
              0.5144247 = queryWeight, product of:
                3.3987849 = boost
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.018830212 = queryNorm
              0.5023691 = fieldWeight in 3710, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.037906 = idf(docFreq=38, maxDocs=44421)
                0.0625 = fieldNorm(doc=3710)
        0.24 = coord(6/25)