Document (#42033)

Author
Grün, S.
Poley, C
Title
Statistische Analysen von Semantic Entities aus Metadaten- und Volltextbeständen von German Medical Science
Source
GMS Medizin-Bibliothek-Information. 17(2017) no.3, S.1-5
Year
2017
Abstract
This paper analyzes the information content of metadata and full texts in German Medical Science (GMS) articles in English language. The object of the study is to compare semantic entities that are used to enrich GMS metadata (titles and abstracts) and GMS full texts. The aim of the study is to test whether using full texts increases the value added information. The comparison and evaluation of semantic entities was done statistically. Measures of descriptive statistics were gathered for this purpose. In addition to the ratio of central tendencies and scatterings, we computed the overlaps and complements of the values. The results show a distinct increase of information when full texts are added. On average, metadata contain 25 different entities and full texts 215. 89% of the concepts in the metadata are also represented in the full texts. Hence, 11% of the metadata concepts are found in the metadata only. In summary, the results show that the addition of full texts increases the informational value, e.g. for information retrieval processes.
Theme
Metadaten
Field
Medizin

Similar documents (content)

  1. Chen, S.-J.: Semantic enrichment of linked personal authority data : a case study of elites in late imperial China (2019) 0.19
    0.18845478 = sum of:
      0.18845478 = product of:
        0.78522825 = sum of:
          0.0121374475 = weight(abstract_txt:results in 642) [ClassicSimilarity], result of:
            0.0121374475 = score(doc=642,freq=1.0), product of:
              0.055826083 = queryWeight, product of:
                1.0157464 = boost
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.01579944 = queryNorm
              0.21741535 = fieldWeight in 642, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.0625 = fieldNorm(doc=642)
          0.039100353 = weight(abstract_txt:addition in 642) [ClassicSimilarity], result of:
            0.039100353 = score(doc=642,freq=1.0), product of:
              0.12176974 = queryWeight, product of:
                1.5001559 = boost
                5.137612 = idf(docFreq=708, maxDocs=44421)
                0.01579944 = queryNorm
              0.32110074 = fieldWeight in 642, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.137612 = idf(docFreq=708, maxDocs=44421)
                0.0625 = fieldNorm(doc=642)
          0.077492714 = weight(abstract_txt:semantic in 642) [ClassicSimilarity], result of:
            0.077492714 = score(doc=642,freq=4.0), product of:
              0.13854896 = queryWeight, product of:
                1.95981 = boost
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.01579944 = queryNorm
              0.55931646 = fieldWeight in 642, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.0625 = fieldNorm(doc=642)
          0.2802161 = weight(abstract_txt:entities in 642) [ClassicSimilarity], result of:
            0.2802161 = score(doc=642,freq=6.0), product of:
              0.31383923 = queryWeight, product of:
                3.4059267 = boost
                5.8321705 = idf(docFreq=353, maxDocs=44421)
                0.01579944 = queryNorm
              0.8928651 = fieldWeight in 642, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.8321705 = idf(docFreq=353, maxDocs=44421)
                0.0625 = fieldNorm(doc=642)
          0.12055165 = weight(abstract_txt:full in 642) [ClassicSimilarity], result of:
            0.12055165 = score(doc=642,freq=1.0), product of:
              0.39164302 = queryWeight, product of:
                5.03322 = boost
                4.9249606 = idf(docFreq=876, maxDocs=44421)
                0.01579944 = queryNorm
              0.30781004 = fieldWeight in 642, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9249606 = idf(docFreq=876, maxDocs=44421)
                0.0625 = fieldNorm(doc=642)
          0.25573 = weight(abstract_txt:texts in 642) [ClassicSimilarity], result of:
            0.25573 = score(doc=642,freq=2.0), product of:
              0.5131993 = queryWeight, product of:
                5.7616086 = boost
                5.6376824 = idf(docFreq=429, maxDocs=44421)
                0.01579944 = queryNorm
              0.49830544 = fieldWeight in 642, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.6376824 = idf(docFreq=429, maxDocs=44421)
                0.0625 = fieldNorm(doc=642)
        0.24 = coord(6/25)
    
  2. Kragelj, M.; Borstnar, M.K.: Automatic classification of older electronic texts into the Universal Decimal Classification-UDC (2021) 0.17
    0.16866285 = sum of:
      0.16866285 = product of:
        0.7027619 = sum of:
          0.010620266 = weight(abstract_txt:results in 1176) [ClassicSimilarity], result of:
            0.010620266 = score(doc=1176,freq=1.0), product of:
              0.055826083 = queryWeight, product of:
                1.0157464 = boost
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.01579944 = queryNorm
              0.19023843 = fieldWeight in 1176, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1176)
          0.014404062 = weight(abstract_txt:science in 1176) [ClassicSimilarity], result of:
            0.014404062 = score(doc=1176,freq=1.0), product of:
              0.06840222 = queryWeight, product of:
                1.1243508 = boost
                3.850585 = idf(docFreq=2567, maxDocs=44421)
                0.01579944 = queryNorm
              0.21057886 = fieldWeight in 1176, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.850585 = idf(docFreq=2567, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1176)
          0.021070063 = weight(abstract_txt:value in 1176) [ClassicSimilarity], result of:
            0.021070063 = score(doc=1176,freq=1.0), product of:
              0.0881436 = queryWeight, product of:
                1.2763274 = boost
                4.3710623 = idf(docFreq=1525, maxDocs=44421)
                0.01579944 = queryNorm
              0.23904246 = fieldWeight in 1176, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3710623 = idf(docFreq=1525, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1176)
          0.007141482 = weight(abstract_txt:information in 1176) [ClassicSimilarity], result of:
            0.007141482 = score(doc=1176,freq=1.0), product of:
              0.053986162 = queryWeight, product of:
                1.4126121 = boost
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.01579944 = queryNorm
              0.13228357 = fieldWeight in 1176, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1176)
          0.14917505 = weight(abstract_txt:full in 1176) [ClassicSimilarity], result of:
            0.14917505 = score(doc=1176,freq=2.0), product of:
              0.39164302 = queryWeight, product of:
                5.03322 = boost
                4.9249606 = idf(docFreq=876, maxDocs=44421)
                0.01579944 = queryNorm
              0.38089547 = fieldWeight in 1176, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.9249606 = idf(docFreq=876, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1176)
          0.50035095 = weight(abstract_txt:texts in 1176) [ClassicSimilarity], result of:
            0.50035095 = score(doc=1176,freq=10.0), product of:
              0.5131993 = queryWeight, product of:
                5.7616086 = boost
                5.6376824 = idf(docFreq=429, maxDocs=44421)
                0.01579944 = queryNorm
              0.9749642 = fieldWeight in 1176, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                5.6376824 = idf(docFreq=429, maxDocs=44421)
                0.0546875 = fieldNorm(doc=1176)
        0.24 = coord(6/25)
    
  3. Chen, S.-J.: Semantic enrichment of linked archival materials (2019) 0.16
    0.15656438 = sum of:
      0.15656438 = product of:
        0.4892637 = sum of:
          0.0121374475 = weight(abstract_txt:results in 488) [ClassicSimilarity], result of:
            0.0121374475 = score(doc=488,freq=1.0), product of:
              0.055826083 = queryWeight, product of:
                1.0157464 = boost
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.01579944 = queryNorm
              0.21741535 = fieldWeight in 488, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.0625 = fieldNorm(doc=488)
          0.05712148 = weight(abstract_txt:enrich in 488) [ClassicSimilarity], result of:
            0.05712148 = score(doc=488,freq=1.0), product of:
              0.124434814 = queryWeight, product of:
                1.0723157 = boost
                7.344759 = idf(docFreq=77, maxDocs=44421)
                0.01579944 = queryNorm
              0.45904744 = fieldWeight in 488, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.344759 = idf(docFreq=77, maxDocs=44421)
                0.0625 = fieldNorm(doc=488)
          0.024080073 = weight(abstract_txt:value in 488) [ClassicSimilarity], result of:
            0.024080073 = score(doc=488,freq=1.0), product of:
              0.0881436 = queryWeight, product of:
                1.2763274 = boost
                4.3710623 = idf(docFreq=1525, maxDocs=44421)
                0.01579944 = queryNorm
              0.2731914 = fieldWeight in 488, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3710623 = idf(docFreq=1525, maxDocs=44421)
                0.0625 = fieldNorm(doc=488)
          0.008161694 = weight(abstract_txt:information in 488) [ClassicSimilarity], result of:
            0.008161694 = score(doc=488,freq=1.0), product of:
              0.053986162 = queryWeight, product of:
                1.4126121 = boost
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.01579944 = queryNorm
              0.15118122 = fieldWeight in 488, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.0625 = fieldNorm(doc=488)
          0.064153485 = weight(abstract_txt:added in 488) [ClassicSimilarity], result of:
            0.064153485 = score(doc=488,freq=1.0), product of:
              0.16939442 = queryWeight, product of:
                1.7693602 = boost
                6.059561 = idf(docFreq=281, maxDocs=44421)
                0.01579944 = queryNorm
              0.37872255 = fieldWeight in 488, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.059561 = idf(docFreq=281, maxDocs=44421)
                0.0625 = fieldNorm(doc=488)
          0.06711065 = weight(abstract_txt:semantic in 488) [ClassicSimilarity], result of:
            0.06711065 = score(doc=488,freq=3.0), product of:
              0.13854896 = queryWeight, product of:
                1.95981 = boost
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.01579944 = queryNorm
              0.48438224 = fieldWeight in 488, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.0625 = fieldNorm(doc=488)
          0.11439774 = weight(abstract_txt:entities in 488) [ClassicSimilarity], result of:
            0.11439774 = score(doc=488,freq=1.0), product of:
              0.31383923 = queryWeight, product of:
                3.4059267 = boost
                5.8321705 = idf(docFreq=353, maxDocs=44421)
                0.01579944 = queryNorm
              0.36451066 = fieldWeight in 488, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8321705 = idf(docFreq=353, maxDocs=44421)
                0.0625 = fieldNorm(doc=488)
          0.14210115 = weight(abstract_txt:metadata in 488) [ClassicSimilarity], result of:
            0.14210115 = score(doc=488,freq=2.0), product of:
              0.32949418 = queryWeight, product of:
                4.274164 = boost
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.01579944 = queryNorm
              0.4312706 = fieldWeight in 488, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.0625 = fieldNorm(doc=488)
        0.32 = coord(8/25)
    
  4. Mai, F.; Galke, L.; Scherp, A.: Using deep learning for title-based semantic subject indexing to reach competitive performance to full-text (2018) 0.15
    0.14770368 = sum of:
      0.14770368 = product of:
        0.73851836 = sum of:
          0.010620266 = weight(abstract_txt:results in 93) [ClassicSimilarity], result of:
            0.010620266 = score(doc=93,freq=1.0), product of:
              0.055826083 = queryWeight, product of:
                1.0157464 = boost
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.01579944 = queryNorm
              0.19023843 = fieldWeight in 93, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4786456 = idf(docFreq=3724, maxDocs=44421)
                0.0546875 = fieldNorm(doc=93)
          0.0494759 = weight(abstract_txt:medical in 93) [ClassicSimilarity], result of:
            0.0494759 = score(doc=93,freq=1.0), product of:
              0.15571938 = queryWeight, product of:
                1.6964382 = boost
                5.8098235 = idf(docFreq=361, maxDocs=44421)
                0.01579944 = queryNorm
              0.31772473 = fieldWeight in 93, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8098235 = idf(docFreq=361, maxDocs=44421)
                0.0546875 = fieldNorm(doc=93)
          0.08792061 = weight(abstract_txt:metadata in 93) [ClassicSimilarity], result of:
            0.08792061 = score(doc=93,freq=1.0), product of:
              0.32949418 = queryWeight, product of:
                4.274164 = boost
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.01579944 = queryNorm
              0.2668351 = fieldWeight in 93, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.0546875 = fieldNorm(doc=93)
          0.3164481 = weight(abstract_txt:full in 93) [ClassicSimilarity], result of:
            0.3164481 = score(doc=93,freq=9.0), product of:
              0.39164302 = queryWeight, product of:
                5.03322 = boost
                4.9249606 = idf(docFreq=876, maxDocs=44421)
                0.01579944 = queryNorm
              0.80800134 = fieldWeight in 93, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                4.9249606 = idf(docFreq=876, maxDocs=44421)
                0.0546875 = fieldNorm(doc=93)
          0.2740535 = weight(abstract_txt:texts in 93) [ClassicSimilarity], result of:
            0.2740535 = score(doc=93,freq=3.0), product of:
              0.5131993 = queryWeight, product of:
                5.7616086 = boost
                5.6376824 = idf(docFreq=429, maxDocs=44421)
                0.01579944 = queryNorm
              0.5340099 = fieldWeight in 93, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.6376824 = idf(docFreq=429, maxDocs=44421)
                0.0546875 = fieldNorm(doc=93)
        0.2 = coord(5/25)
    
  5. Wang, S.; Koopman, R.: Embed first, then predict (2019) 0.14
    0.13818459 = sum of:
      0.13818459 = product of:
        0.5757691 = sum of:
          0.0798571 = weight(abstract_txt:computed in 400) [ClassicSimilarity], result of:
            0.0798571 = score(doc=400,freq=1.0), product of:
              0.13407402 = queryWeight, product of:
                1.113074 = boost
                7.62393 = idf(docFreq=58, maxDocs=44421)
                0.01579944 = queryNorm
              0.59561956 = fieldWeight in 400, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.62393 = idf(docFreq=58, maxDocs=44421)
                0.078125 = fieldNorm(doc=400)
          0.030722704 = weight(abstract_txt:show in 400) [ClassicSimilarity], result of:
            0.030722704 = score(doc=400,freq=1.0), product of:
              0.08935493 = queryWeight, product of:
                1.2850676 = boost
                4.400995 = idf(docFreq=1480, maxDocs=44421)
                0.01579944 = queryNorm
              0.34382772 = fieldWeight in 400, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.400995 = idf(docFreq=1480, maxDocs=44421)
                0.078125 = fieldNorm(doc=400)
          0.048875444 = weight(abstract_txt:addition in 400) [ClassicSimilarity], result of:
            0.048875444 = score(doc=400,freq=1.0), product of:
              0.12176974 = queryWeight, product of:
                1.5001559 = boost
                5.137612 = idf(docFreq=708, maxDocs=44421)
                0.01579944 = queryNorm
              0.40137592 = fieldWeight in 400, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.137612 = idf(docFreq=708, maxDocs=44421)
                0.078125 = fieldNorm(doc=400)
          0.100140974 = weight(abstract_txt:increases in 400) [ClassicSimilarity], result of:
            0.100140974 = score(doc=400,freq=1.0), product of:
              0.19643557 = queryWeight, product of:
                1.9053588 = boost
                6.5253177 = idf(docFreq=176, maxDocs=44421)
                0.01579944 = queryNorm
              0.5097904 = fieldWeight in 400, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5253177 = idf(docFreq=176, maxDocs=44421)
                0.078125 = fieldNorm(doc=400)
          0.06849453 = weight(abstract_txt:semantic in 400) [ClassicSimilarity], result of:
            0.06849453 = score(doc=400,freq=2.0), product of:
              0.13854896 = queryWeight, product of:
                1.95981 = boost
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.01579944 = queryNorm
              0.49437058 = fieldWeight in 400, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.078125 = fieldNorm(doc=400)
          0.24767837 = weight(abstract_txt:entities in 400) [ClassicSimilarity], result of:
            0.24767837 = score(doc=400,freq=3.0), product of:
              0.31383923 = queryWeight, product of:
                3.4059267 = boost
                5.8321705 = idf(docFreq=353, maxDocs=44421)
                0.01579944 = queryNorm
              0.7891887 = fieldWeight in 400, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.8321705 = idf(docFreq=353, maxDocs=44421)
                0.078125 = fieldNorm(doc=400)
        0.24 = coord(6/25)