Document (#42133)

Author
Hubain, R.
Wilde, M. De
Hooland, S. van
Title
Automated SKOS vocabulary design for the biopharmaceutical industry
Source
Cataloging and classification quarterly. 54(2016) no.7, S.403-417
Year
2016
Abstract
Ensuring quick and consistent access to large collections of unstructured documents is one of the biggest challenges facing knowledge-intensive organizations. Designing specific vocabularies to index and retrieve documents is often deemed too expensive, full-text search being preferred despite its known limitations. However, the process of creating controlled vocabularies can be partly automated thanks to natural language processing and machine learning techniques. With a case study from the biopharmaceutical industry, we demonstrate how small organizations can use an automated workflow in order to create a controlled vocabulary to index unstructured documents in a semantically meaningful way.
Content
Vgl.: https://doi.org/10.1080/01639374.2016.1201560.
Theme
Semantische Interoperabilität
Field
Pharmazie
Object
SKOS
Area
Informationswirtschaft

Similar documents (author)

  1. Hooland, S. van; Verborgh, R.; Wilde, M. De; Hercher, J.; Mannens, E.; Wa, R.Van de: Evaluating the success of vocabulary reconciliation for cultural heritage collections (2013) 3.43
    3.4319367 = sum of:
      3.4319367 = sum of:
        1.6085781 = weight(author_txt:wilde in 1662) [ClassicSimilarity], result of:
          1.6085781 = score(doc=1662,freq=1.0), product of:
            0.67699367 = queryWeight, product of:
              9.504243 = idf(docFreq=8, maxDocs=44421)
              0.07123068 = queryNorm
            2.3760607 = fieldWeight in 1662, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.504243 = idf(docFreq=8, maxDocs=44421)
              0.25 = fieldNorm(doc=1662)
        1.8233587 = weight(author_txt:hooland in 1662) [ClassicSimilarity], result of:
          1.8233587 = score(doc=1662,freq=1.0), product of:
            0.73598886 = queryWeight, product of:
              1.0426614 = boost
              9.909708 = idf(docFreq=5, maxDocs=44421)
              0.07123068 = queryNorm
            2.477427 = fieldWeight in 1662, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.909708 = idf(docFreq=5, maxDocs=44421)
              0.25 = fieldNorm(doc=1662)
    
  2. Wilde, D.U.: Generation and use of machine-readable data bases (1976) 2.01
    2.0107226 = sum of:
      2.0107226 = product of:
        4.0214453 = sum of:
          4.0214453 = weight(author_txt:wilde in 267) [ClassicSimilarity], result of:
            4.0214453 = score(doc=267,freq=1.0), product of:
              0.67699367 = queryWeight, product of:
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.07123068 = queryNorm
              5.9401517 = fieldWeight in 267, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.625 = fieldNorm(doc=267)
        0.5 = coord(1/2)
    
  3. Wilde, E.: Semantische Interoperabilität von XML Schemas (2005) 2.01
    2.0107226 = sum of:
      2.0107226 = product of:
        4.0214453 = sum of:
          4.0214453 = weight(author_txt:wilde in 1155) [ClassicSimilarity], result of:
            4.0214453 = score(doc=1155,freq=1.0), product of:
              0.67699367 = queryWeight, product of:
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.07123068 = queryNorm
              5.9401517 = fieldWeight in 1155, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.504243 = idf(docFreq=8, maxDocs=44421)
                0.625 = fieldNorm(doc=1155)
        0.5 = coord(1/2)
    
  4. Hooland, S. van; Verborgh, R.: Linked data for Lilibraries, archives and museums : how to clean, link, and publish your metadata (2014) 1.60
    1.5954388 = sum of:
      1.5954388 = product of:
        3.1908777 = sum of:
          3.1908777 = weight(author_txt:hooland in 153) [ClassicSimilarity], result of:
            3.1908777 = score(doc=153,freq=1.0), product of:
              0.73598886 = queryWeight, product of:
                1.0426614 = boost
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.07123068 = queryNorm
              4.3354974 = fieldWeight in 153, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.4375 = fieldNorm(doc=153)
        0.5 = coord(1/2)
    
  5. Hooland, S. van; Bontemps, Y.; Kaufman, S.: Answering the call for more accountability : applying data profiling to museum metadata (2008) 1.37
    1.367519 = sum of:
      1.367519 = product of:
        2.735038 = sum of:
          2.735038 = weight(author_txt:hooland in 3644) [ClassicSimilarity], result of:
            2.735038 = score(doc=3644,freq=1.0), product of:
              0.73598886 = queryWeight, product of:
                1.0426614 = boost
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.07123068 = queryNorm
              3.7161405 = fieldWeight in 3644, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.375 = fieldNorm(doc=3644)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Zhang, J.; Mostafa, J.; Tripathy, H.: Information retrieval by semantic analysis and visualization of the concept space of D-Lib® magazine (2002) 0.12
    0.11523625 = sum of:
      0.11523625 = product of:
        0.3601133 = sum of:
          0.043023515 = weight(abstract_txt:intensive in 2211) [ClassicSimilarity], result of:
            0.043023515 = score(doc=2211,freq=2.0), product of:
              0.14691637 = queryWeight, product of:
                1.0790627 = boost
                6.6262937 = idf(docFreq=159, maxDocs=44421)
                0.02054721 = queryNorm
              0.29284358 = fieldWeight in 2211, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.6262937 = idf(docFreq=159, maxDocs=44421)
                0.03125 = fieldNorm(doc=2211)
          0.044055726 = weight(abstract_txt:partly in 2211) [ClassicSimilarity], result of:
            0.044055726 = score(doc=2211,freq=1.0), product of:
              0.18805195 = queryWeight, product of:
                1.2208169 = boost
                7.496775 = idf(docFreq=66, maxDocs=44421)
                0.02054721 = queryNorm
              0.23427422 = fieldWeight in 2211, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.496775 = idf(docFreq=66, maxDocs=44421)
                0.03125 = fieldNorm(doc=2211)
          0.022408158 = weight(abstract_txt:index in 2211) [ClassicSimilarity], result of:
            0.022408158 = score(doc=2211,freq=1.0), product of:
              0.15096991 = queryWeight, product of:
                1.546934 = boost
                4.7496953 = idf(docFreq=1044, maxDocs=44421)
                0.02054721 = queryNorm
              0.14842798 = fieldWeight in 2211, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7496953 = idf(docFreq=1044, maxDocs=44421)
                0.03125 = fieldNorm(doc=2211)
          0.032223493 = weight(abstract_txt:vocabulary in 2211) [ClassicSimilarity], result of:
            0.032223493 = score(doc=2211,freq=1.0), product of:
              0.19233929 = queryWeight, product of:
                1.7460657 = boost
                5.3611083 = idf(docFreq=566, maxDocs=44421)
                0.02054721 = queryNorm
              0.16753463 = fieldWeight in 2211, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3611083 = idf(docFreq=566, maxDocs=44421)
                0.03125 = fieldNorm(doc=2211)
          0.043269165 = weight(abstract_txt:vocabularies in 2211) [ClassicSimilarity], result of:
            0.043269165 = score(doc=2211,freq=1.0), product of:
              0.23410209 = queryWeight, product of:
                1.9263235 = boost
                5.9145703 = idf(docFreq=325, maxDocs=44421)
                0.02054721 = queryNorm
              0.18483032 = fieldWeight in 2211, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9145703 = idf(docFreq=325, maxDocs=44421)
                0.03125 = fieldNorm(doc=2211)
          0.043981686 = weight(abstract_txt:documents in 2211) [ClassicSimilarity], result of:
            0.043981686 = score(doc=2211,freq=4.0), product of:
              0.17066506 = queryWeight, product of:
                2.0143945 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.02054721 = queryNorm
              0.25770763 = fieldWeight in 2211, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.03125 = fieldNorm(doc=2211)
          0.076028734 = weight(abstract_txt:unstructured in 2211) [ClassicSimilarity], result of:
            0.076028734 = score(doc=2211,freq=1.0), product of:
              0.34088257 = queryWeight, product of:
                2.324497 = boost
                7.1371193 = idf(docFreq=95, maxDocs=44421)
                0.02054721 = queryNorm
              0.22303498 = fieldWeight in 2211, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1371193 = idf(docFreq=95, maxDocs=44421)
                0.03125 = fieldNorm(doc=2211)
          0.055122823 = weight(abstract_txt:automated in 2211) [ClassicSimilarity], result of:
            0.055122823 = score(doc=2211,freq=1.0), product of:
              0.31492296 = queryWeight, product of:
                2.7363672 = boost
                5.6011486 = idf(docFreq=445, maxDocs=44421)
                0.02054721 = queryNorm
              0.1750359 = fieldWeight in 2211, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6011486 = idf(docFreq=445, maxDocs=44421)
                0.03125 = fieldNorm(doc=2211)
        0.32 = coord(8/25)
    
  2. Angjeli, A.; Isaac, A.: Semantic web and vocabularies interoperability : an experiment with illuminations collections (2008) 0.11
    0.10883644 = sum of:
      0.10883644 = product of:
        0.4534852 = sum of:
          0.059622046 = weight(abstract_txt:semantically in 3324) [ClassicSimilarity], result of:
            0.059622046 = score(doc=3324,freq=1.0), product of:
              0.1584366 = queryWeight, product of:
                1.1205709 = boost
                6.881186 = idf(docFreq=123, maxDocs=44421)
                0.02054721 = queryNorm
              0.37631485 = fieldWeight in 3324, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.881186 = idf(docFreq=123, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3324)
          0.069648765 = weight(abstract_txt:skos in 3324) [ClassicSimilarity], result of:
            0.069648765 = score(doc=3324,freq=1.0), product of:
              0.17573565 = queryWeight, product of:
                1.1801617 = boost
                7.2471204 = idf(docFreq=85, maxDocs=44421)
                0.02054721 = queryNorm
              0.3963269 = fieldWeight in 3324, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2471204 = idf(docFreq=85, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3324)
          0.039214276 = weight(abstract_txt:index in 3324) [ClassicSimilarity], result of:
            0.039214276 = score(doc=3324,freq=1.0), product of:
              0.15096991 = queryWeight, product of:
                1.546934 = boost
                4.7496953 = idf(docFreq=1044, maxDocs=44421)
                0.02054721 = queryNorm
              0.25974897 = fieldWeight in 3324, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7496953 = idf(docFreq=1044, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3324)
          0.056391113 = weight(abstract_txt:vocabulary in 3324) [ClassicSimilarity], result of:
            0.056391113 = score(doc=3324,freq=1.0), product of:
              0.19233929 = queryWeight, product of:
                1.7460657 = boost
                5.3611083 = idf(docFreq=566, maxDocs=44421)
                0.02054721 = queryNorm
              0.29318562 = fieldWeight in 3324, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3611083 = idf(docFreq=566, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3324)
          0.059291594 = weight(abstract_txt:controlled in 3324) [ClassicSimilarity], result of:
            0.059291594 = score(doc=3324,freq=1.0), product of:
              0.19887933 = queryWeight, product of:
                1.775503 = boost
                5.4514923 = idf(docFreq=517, maxDocs=44421)
                0.02054721 = queryNorm
              0.2981285 = fieldWeight in 3324, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4514923 = idf(docFreq=517, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3324)
          0.1693174 = weight(abstract_txt:vocabularies in 3324) [ClassicSimilarity], result of:
            0.1693174 = score(doc=3324,freq=5.0), product of:
              0.23410209 = queryWeight, product of:
                1.9263235 = boost
                5.9145703 = idf(docFreq=325, maxDocs=44421)
                0.02054721 = queryNorm
              0.7232631 = fieldWeight in 3324, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.9145703 = idf(docFreq=325, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3324)
        0.24 = coord(6/25)
    
  3. Harpring, P.: Introduction to controlled vocabularies : terminology for art, architecture, and other cultural works (2010) 0.10
    0.10269827 = sum of:
      0.10269827 = product of:
        0.6418642 = sum of:
          0.08055873 = weight(abstract_txt:vocabulary in 164) [ClassicSimilarity], result of:
            0.08055873 = score(doc=164,freq=1.0), product of:
              0.19233929 = queryWeight, product of:
                1.7460657 = boost
                5.3611083 = idf(docFreq=566, maxDocs=44421)
                0.02054721 = queryNorm
              0.4188366 = fieldWeight in 164, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3611083 = idf(docFreq=566, maxDocs=44421)
                0.078125 = fieldNorm(doc=164)
          0.18940006 = weight(abstract_txt:controlled in 164) [ClassicSimilarity], result of:
            0.18940006 = score(doc=164,freq=5.0), product of:
              0.19887933 = queryWeight, product of:
                1.775503 = boost
                5.4514923 = idf(docFreq=517, maxDocs=44421)
                0.02054721 = queryNorm
              0.95233655 = fieldWeight in 164, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.4514923 = idf(docFreq=517, maxDocs=44421)
                0.078125 = fieldNorm(doc=164)
          0.08570673 = weight(abstract_txt:organizations in 164) [ClassicSimilarity], result of:
            0.08570673 = score(doc=164,freq=1.0), product of:
              0.20044853 = queryWeight, product of:
                1.7824938 = boost
                5.4729567 = idf(docFreq=506, maxDocs=44421)
                0.02054721 = queryNorm
              0.42757475 = fieldWeight in 164, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4729567 = idf(docFreq=506, maxDocs=44421)
                0.078125 = fieldNorm(doc=164)
          0.28619862 = weight(abstract_txt:vocabularies in 164) [ClassicSimilarity], result of:
            0.28619862 = score(doc=164,freq=7.0), product of:
              0.23410209 = queryWeight, product of:
                1.9263235 = boost
                5.9145703 = idf(docFreq=325, maxDocs=44421)
                0.02054721 = queryNorm
              1.2225376 = fieldWeight in 164, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                5.9145703 = idf(docFreq=325, maxDocs=44421)
                0.078125 = fieldNorm(doc=164)
        0.16 = coord(4/25)
    
  4. Vatant, B.; Dunsire, G.: Use case vocabulary merging (2010) 0.10
    0.10048599 = sum of:
      0.10048599 = product of:
        0.41869164 = sum of:
          0.059698943 = weight(abstract_txt:skos in 336) [ClassicSimilarity], result of:
            0.059698943 = score(doc=336,freq=1.0), product of:
              0.17573565 = queryWeight, product of:
                1.1801617 = boost
                7.2471204 = idf(docFreq=85, maxDocs=44421)
                0.02054721 = queryNorm
              0.33970878 = fieldWeight in 336, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2471204 = idf(docFreq=85, maxDocs=44421)
                0.046875 = fieldNorm(doc=336)
          0.062472615 = weight(abstract_txt:expensive in 336) [ClassicSimilarity], result of:
            0.062472615 = score(doc=336,freq=1.0), product of:
              0.18113758 = queryWeight, product of:
                1.1981629 = boost
                7.357662 = idf(docFreq=76, maxDocs=44421)
                0.02054721 = queryNorm
              0.34489042 = fieldWeight in 336, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.357662 = idf(docFreq=76, maxDocs=44421)
                0.046875 = fieldNorm(doc=336)
          0.047534883 = weight(abstract_txt:index in 336) [ClassicSimilarity], result of:
            0.047534883 = score(doc=336,freq=2.0), product of:
              0.15096991 = queryWeight, product of:
                1.546934 = boost
                4.7496953 = idf(docFreq=1044, maxDocs=44421)
                0.02054721 = queryNorm
              0.3148633 = fieldWeight in 336, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.7496953 = idf(docFreq=1044, maxDocs=44421)
                0.046875 = fieldNorm(doc=336)
          0.06835634 = weight(abstract_txt:vocabulary in 336) [ClassicSimilarity], result of:
            0.06835634 = score(doc=336,freq=2.0), product of:
              0.19233929 = queryWeight, product of:
                1.7460657 = boost
                5.3611083 = idf(docFreq=566, maxDocs=44421)
                0.02054721 = queryNorm
              0.3553946 = fieldWeight in 336, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.3611083 = idf(docFreq=566, maxDocs=44421)
                0.046875 = fieldNorm(doc=336)
          0.050821368 = weight(abstract_txt:controlled in 336) [ClassicSimilarity], result of:
            0.050821368 = score(doc=336,freq=1.0), product of:
              0.19887933 = queryWeight, product of:
                1.775503 = boost
                5.4514923 = idf(docFreq=517, maxDocs=44421)
                0.02054721 = queryNorm
              0.2555387 = fieldWeight in 336, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4514923 = idf(docFreq=517, maxDocs=44421)
                0.046875 = fieldNorm(doc=336)
          0.1298075 = weight(abstract_txt:vocabularies in 336) [ClassicSimilarity], result of:
            0.1298075 = score(doc=336,freq=4.0), product of:
              0.23410209 = queryWeight, product of:
                1.9263235 = boost
                5.9145703 = idf(docFreq=325, maxDocs=44421)
                0.02054721 = queryNorm
              0.554491 = fieldWeight in 336, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.9145703 = idf(docFreq=325, maxDocs=44421)
                0.046875 = fieldNorm(doc=336)
        0.24 = coord(6/25)
    
  5. Wang, J.: Automatic thesaurus development : term extraction from title metadata (2006) 0.10
    0.09693005 = sum of:
      0.09693005 = product of:
        0.48465025 = sum of:
          0.05887391 = weight(abstract_txt:meaningful in 63) [ClassicSimilarity], result of:
            0.05887391 = score(doc=63,freq=1.0), product of:
              0.14372694 = queryWeight, product of:
                1.0672857 = boost
                6.553973 = idf(docFreq=171, maxDocs=44421)
                0.02054721 = queryNorm
              0.40962332 = fieldWeight in 63, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.553973 = idf(docFreq=171, maxDocs=44421)
                0.0625 = fieldNorm(doc=63)
          0.111625455 = weight(abstract_txt:vocabulary in 63) [ClassicSimilarity], result of:
            0.111625455 = score(doc=63,freq=3.0), product of:
              0.19233929 = queryWeight, product of:
                1.7460657 = boost
                5.3611083 = idf(docFreq=566, maxDocs=44421)
                0.02054721 = queryNorm
              0.580357 = fieldWeight in 63, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.3611083 = idf(docFreq=566, maxDocs=44421)
                0.0625 = fieldNorm(doc=63)
          0.11736692 = weight(abstract_txt:controlled in 63) [ClassicSimilarity], result of:
            0.11736692 = score(doc=63,freq=3.0), product of:
              0.19887933 = queryWeight, product of:
                1.775503 = boost
                5.4514923 = idf(docFreq=517, maxDocs=44421)
                0.02054721 = queryNorm
              0.59014136 = fieldWeight in 63, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.4514923 = idf(docFreq=517, maxDocs=44421)
                0.0625 = fieldNorm(doc=63)
          0.08653833 = weight(abstract_txt:vocabularies in 63) [ClassicSimilarity], result of:
            0.08653833 = score(doc=63,freq=1.0), product of:
              0.23410209 = queryWeight, product of:
                1.9263235 = boost
                5.9145703 = idf(docFreq=325, maxDocs=44421)
                0.02054721 = queryNorm
              0.36966065 = fieldWeight in 63, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9145703 = idf(docFreq=325, maxDocs=44421)
                0.0625 = fieldNorm(doc=63)
          0.110245645 = weight(abstract_txt:automated in 63) [ClassicSimilarity], result of:
            0.110245645 = score(doc=63,freq=1.0), product of:
              0.31492296 = queryWeight, product of:
                2.7363672 = boost
                5.6011486 = idf(docFreq=445, maxDocs=44421)
                0.02054721 = queryNorm
              0.3500718 = fieldWeight in 63, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6011486 = idf(docFreq=445, maxDocs=44421)
                0.0625 = fieldNorm(doc=63)
        0.2 = coord(5/25)