Document (#38058)

Author
Sojka, P.
Lee, M.
Rehurek, R.
Hatlapatka, R.
Kucbel, M.
Bouche, T.
Goutorbe, C.
Anghelache, R.
Wojciechowski, K.
Title
Toolset for entity and semantic associations : Final Release
Issue
Revision: 1.0 as of 8th February 2013.
Source
https://wiki.eudml.eu/eudml-w/images/D8.4-v1.0.pdf
Year
2013
Abstract
In this document we describe the final release of the toolset for entity and semantic associations, integrating two versions (language dependent and language independent) of Unsupervised Document Similarity implemented by MU (using gensim tool) and Citation Indexing, Resolution and Matching (UJF/CMD). We give a brief description of tools, the rationale behind decisions made, and provide elementary evaluation. Tools are integrated in the main project result, EuDML website, and they deliver the needed functionality for exploratory searching and browsing the collected documents. EuDML users and content providers thus benefit from millions of algorithmically generated similarity and citation links, developed using state of the art machine learning and matching methods.
Content
Vgl. auch: https://is.muni.cz/repo/1076213/en/Lee-Sojka-Rehurek-Bolikowski/Toolset-for-Entity-and-Semantic-Associations-Initial-Release-Deliverable-82-of-project-EuDML?lang=en.
Theme
Automatisches Klassifizieren
Field
Mathematik
Object
GENSIM
Latent Semantic Indexing
Zentralblatt für Mathematik

Similar documents (author)

  1. Sojka, P.: Exploiting semantic annotations in math information retrieval (2012) 6.19
    6.1935673 = sum of:
      6.1935673 = weight(author_txt:sojka in 1032) [ClassicSimilarity], result of:
        6.1935673 = fieldWeight in 1032, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.909708 = idf(docFreq=5, maxDocs=44421)
          0.625 = fieldNorm(doc=1032)
    
  2. Rehurek, R.; Sojka, P.: Software framework for topic modelling with large corpora (2010) 4.95
    4.954854 = sum of:
      4.954854 = weight(author_txt:sojka in 2058) [ClassicSimilarity], result of:
        4.954854 = fieldWeight in 2058, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.909708 = idf(docFreq=5, maxDocs=44421)
          0.5 = fieldNorm(doc=2058)
    
  3. Líska, M.; Sojka, P.: MIaS 1.5 (2014) 4.95
    4.954854 = sum of:
      4.954854 = weight(author_txt:sojka in 2652) [ClassicSimilarity], result of:
        4.954854 = fieldWeight in 2652, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.909708 = idf(docFreq=5, maxDocs=44421)
          0.5 = fieldNorm(doc=2652)
    
  4. Sojka, P.; Liska, M.: ¬The art of mathematics retrieval (2011) 4.95
    4.954854 = sum of:
      4.954854 = weight(author_txt:sojka in 4450) [ClassicSimilarity], result of:
        4.954854 = fieldWeight in 4450, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.909708 = idf(docFreq=5, maxDocs=44421)
          0.5 = fieldNorm(doc=4450)
    

Similar documents (content)

  1. Kim, J.-M.; Shin, H.; Kim, H.-J.: Schema and constraints-based matching and merging of Topic Maps (2007) 0.17
    0.17297946 = sum of:
      0.17297946 = product of:
        0.6177838 = sum of:
          0.079172194 = weight(abstract_txt:dependent in 1922) [ClassicSimilarity], result of:
            0.079172194 = score(doc=1922,freq=2.0), product of:
              0.13999818 = queryWeight, product of:
                1.0068933 = boost
                6.398163 = idf(docFreq=200, maxDocs=44421)
                0.021731198 = queryNorm
              0.565523 = fieldWeight in 1922, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.398163 = idf(docFreq=200, maxDocs=44421)
                0.0625 = fieldNorm(doc=1922)
          0.017659212 = weight(abstract_txt:using in 1922) [ClassicSimilarity], result of:
            0.017659212 = score(doc=1922,freq=1.0), product of:
              0.081735015 = queryWeight, product of:
                1.0880312 = boost
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.021731198 = queryNorm
              0.21605442 = fieldWeight in 1922, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.0625 = fieldNorm(doc=1922)
          0.07946734 = weight(abstract_txt:resolution in 1922) [ClassicSimilarity], result of:
            0.07946734 = score(doc=1922,freq=1.0), product of:
              0.17682475 = queryWeight, product of:
                1.1316022 = boost
                7.190608 = idf(docFreq=90, maxDocs=44421)
                0.021731198 = queryNorm
              0.449413 = fieldWeight in 1922, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.190608 = idf(docFreq=90, maxDocs=44421)
                0.0625 = fieldNorm(doc=1922)
          0.03101977 = weight(abstract_txt:language in 1922) [ClassicSimilarity], result of:
            0.03101977 = score(doc=1922,freq=1.0), product of:
              0.118992515 = queryWeight, product of:
                1.3127955 = boost
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.021731198 = queryNorm
              0.26068673 = fieldWeight in 1922, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.0625 = fieldNorm(doc=1922)
          0.038296986 = weight(abstract_txt:semantic in 1922) [ClassicSimilarity], result of:
            0.038296986 = score(doc=1922,freq=1.0), product of:
              0.1369421 = queryWeight, product of:
                1.4083343 = boost
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.021731198 = queryNorm
              0.27965823 = fieldWeight in 1922, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.0625 = fieldNorm(doc=1922)
          0.26668948 = weight(abstract_txt:matching in 1922) [ClassicSimilarity], result of:
            0.26668948 = score(doc=1922,freq=8.0), product of:
              0.24969003 = queryWeight, product of:
                1.9016818 = boost
                6.0419855 = idf(docFreq=286, maxDocs=44421)
                0.021731198 = queryNorm
              1.0680822 = fieldWeight in 1922, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                6.0419855 = idf(docFreq=286, maxDocs=44421)
                0.0625 = fieldNorm(doc=1922)
          0.10547883 = weight(abstract_txt:entity in 1922) [ClassicSimilarity], result of:
            0.10547883 = score(doc=1922,freq=1.0), product of:
              0.26907343 = queryWeight, product of:
                1.974116 = boost
                6.272122 = idf(docFreq=227, maxDocs=44421)
                0.021731198 = queryNorm
              0.39200762 = fieldWeight in 1922, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.272122 = idf(docFreq=227, maxDocs=44421)
                0.0625 = fieldNorm(doc=1922)
        0.28 = coord(7/25)
    
  2. Steinberger, J.; Poesio, M.; Kabadjov, M.A.; Jezek, K.: Two uses of anaphora resolution in summarization (2007) 0.11
    0.11003921 = sum of:
      0.11003921 = product of:
        0.55019605 = sum of:
          0.038233314 = weight(abstract_txt:using in 1949) [ClassicSimilarity], result of:
            0.038233314 = score(doc=1949,freq=3.0), product of:
              0.081735015 = queryWeight, product of:
                1.0880312 = boost
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.021731198 = queryNorm
              0.46777156 = fieldWeight in 1949, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.078125 = fieldNorm(doc=1949)
          0.17205183 = weight(abstract_txt:resolution in 1949) [ClassicSimilarity], result of:
            0.17205183 = score(doc=1949,freq=3.0), product of:
              0.17682475 = queryWeight, product of:
                1.1316022 = boost
                7.190608 = idf(docFreq=90, maxDocs=44421)
                0.021731198 = queryNorm
              0.9730077 = fieldWeight in 1949, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.190608 = idf(docFreq=90, maxDocs=44421)
                0.078125 = fieldNorm(doc=1949)
          0.05983819 = weight(abstract_txt:document in 1949) [ClassicSimilarity], result of:
            0.05983819 = score(doc=1949,freq=2.0), product of:
              0.12612356 = queryWeight, product of:
                1.3515601 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.021731198 = queryNorm
              0.47444102 = fieldWeight in 1949, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.078125 = fieldNorm(doc=1949)
          0.047871232 = weight(abstract_txt:semantic in 1949) [ClassicSimilarity], result of:
            0.047871232 = score(doc=1949,freq=1.0), product of:
              0.1369421 = queryWeight, product of:
                1.4083343 = boost
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.021731198 = queryNorm
              0.34957278 = fieldWeight in 1949, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.078125 = fieldNorm(doc=1949)
          0.23220152 = weight(abstract_txt:release in 1949) [ClassicSimilarity], result of:
            0.23220152 = score(doc=1949,freq=1.0), product of:
              0.39240146 = queryWeight, product of:
                2.3839798 = boost
                7.574333 = idf(docFreq=61, maxDocs=44421)
                0.021731198 = queryNorm
              0.5917448 = fieldWeight in 1949, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.574333 = idf(docFreq=61, maxDocs=44421)
                0.078125 = fieldNorm(doc=1949)
        0.2 = coord(5/25)
    
  3. Shakir, H.S.; Nagao, M.: Context-sensitive processing of semantic queries in an image database system (1996) 0.11
    0.106201105 = sum of:
      0.106201105 = product of:
        0.5310055 = sum of:
          0.022074014 = weight(abstract_txt:using in 6694) [ClassicSimilarity], result of:
            0.022074014 = score(doc=6694,freq=1.0), product of:
              0.081735015 = queryWeight, product of:
                1.0880312 = boost
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.021731198 = queryNorm
              0.27006802 = fieldWeight in 6694, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.078125 = fieldNorm(doc=6694)
          0.06770015 = weight(abstract_txt:semantic in 6694) [ClassicSimilarity], result of:
            0.06770015 = score(doc=6694,freq=2.0), product of:
              0.1369421 = queryWeight, product of:
                1.4083343 = boost
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.021731198 = queryNorm
              0.49437058 = fieldWeight in 6694, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.078125 = fieldNorm(doc=6694)
          0.10524114 = weight(abstract_txt:similarity in 6694) [ClassicSimilarity], result of:
            0.10524114 = score(doc=6694,freq=1.0), product of:
              0.23153196 = queryWeight, product of:
                1.8312293 = boost
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.021731198 = queryNorm
              0.4545426 = fieldWeight in 6694, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.078125 = fieldNorm(doc=6694)
          0.20414162 = weight(abstract_txt:matching in 6694) [ClassicSimilarity], result of:
            0.20414162 = score(doc=6694,freq=3.0), product of:
              0.24969003 = queryWeight, product of:
                1.9016818 = boost
                6.0419855 = idf(docFreq=286, maxDocs=44421)
                0.021731198 = queryNorm
              0.81758016 = fieldWeight in 6694, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.0419855 = idf(docFreq=286, maxDocs=44421)
                0.078125 = fieldNorm(doc=6694)
          0.13184854 = weight(abstract_txt:entity in 6694) [ClassicSimilarity], result of:
            0.13184854 = score(doc=6694,freq=1.0), product of:
              0.26907343 = queryWeight, product of:
                1.974116 = boost
                6.272122 = idf(docFreq=227, maxDocs=44421)
                0.021731198 = queryNorm
              0.49000952 = fieldWeight in 6694, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.272122 = idf(docFreq=227, maxDocs=44421)
                0.078125 = fieldNorm(doc=6694)
        0.2 = coord(5/25)
    
  4. Vani, K.; Gupta, D.: Integrating syntax-semantic-based text analysis with structural and citation information for scientific plagiarism detection (2018) 0.10
    0.10057911 = sum of:
      0.10057911 = product of:
        0.41907963 = sum of:
          0.017659212 = weight(abstract_txt:using in 543) [ClassicSimilarity], result of:
            0.017659212 = score(doc=543,freq=1.0), product of:
              0.081735015 = queryWeight, product of:
                1.0880312 = boost
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.021731198 = queryNorm
              0.21605442 = fieldWeight in 543, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.0625 = fieldNorm(doc=543)
          0.033849593 = weight(abstract_txt:document in 543) [ClassicSimilarity], result of:
            0.033849593 = score(doc=543,freq=1.0), product of:
              0.12612356 = queryWeight, product of:
                1.3515601 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.021731198 = queryNorm
              0.26838437 = fieldWeight in 543, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.0625 = fieldNorm(doc=543)
          0.066332325 = weight(abstract_txt:semantic in 543) [ClassicSimilarity], result of:
            0.066332325 = score(doc=543,freq=3.0), product of:
              0.1369421 = queryWeight, product of:
                1.4083343 = boost
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.021731198 = queryNorm
              0.48438224 = fieldWeight in 543, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.0625 = fieldNorm(doc=543)
          0.11178741 = weight(abstract_txt:citation in 543) [ClassicSimilarity], result of:
            0.11178741 = score(doc=543,freq=5.0), product of:
              0.16356832 = queryWeight, product of:
                1.5391709 = boost
                4.890223 = idf(docFreq=907, maxDocs=44421)
                0.021731198 = queryNorm
              0.6834295 = fieldWeight in 543, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.890223 = idf(docFreq=907, maxDocs=44421)
                0.0625 = fieldNorm(doc=543)
          0.08419291 = weight(abstract_txt:similarity in 543) [ClassicSimilarity], result of:
            0.08419291 = score(doc=543,freq=1.0), product of:
              0.23153196 = queryWeight, product of:
                1.8312293 = boost
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.021731198 = queryNorm
              0.36363408 = fieldWeight in 543, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.0625 = fieldNorm(doc=543)
          0.1052582 = weight(abstract_txt:final in 543) [ClassicSimilarity], result of:
            0.1052582 = score(doc=543,freq=1.0), product of:
              0.26869807 = queryWeight, product of:
                1.9727385 = boost
                6.2677455 = idf(docFreq=228, maxDocs=44421)
                0.021731198 = queryNorm
              0.3917341 = fieldWeight in 543, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2677455 = idf(docFreq=228, maxDocs=44421)
                0.0625 = fieldNorm(doc=543)
        0.24 = coord(6/25)
    
  5. Gipp, B.; Meuschke, N.; Breitinger, C.: Citation-based plagiarism detection : practicability on a large-scale scientific corpus (2014) 0.10
    0.09677233 = sum of:
      0.09677233 = product of:
        0.40321803 = sum of:
          0.017659212 = weight(abstract_txt:using in 4332) [ClassicSimilarity], result of:
            0.017659212 = score(doc=4332,freq=1.0), product of:
              0.081735015 = queryWeight, product of:
                1.0880312 = boost
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.021731198 = queryNorm
              0.21605442 = fieldWeight in 4332, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.0625 = fieldNorm(doc=4332)
          0.03101977 = weight(abstract_txt:language in 4332) [ClassicSimilarity], result of:
            0.03101977 = score(doc=4332,freq=1.0), product of:
              0.118992515 = queryWeight, product of:
                1.3127955 = boost
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.021731198 = queryNorm
              0.26068673 = fieldWeight in 4332, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1709876 = idf(docFreq=1863, maxDocs=44421)
                0.0625 = fieldNorm(doc=4332)
          0.047870554 = weight(abstract_txt:document in 4332) [ClassicSimilarity], result of:
            0.047870554 = score(doc=4332,freq=2.0), product of:
              0.12612356 = queryWeight, product of:
                1.3515601 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.021731198 = queryNorm
              0.3795528 = fieldWeight in 4332, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.0625 = fieldNorm(doc=4332)
          0.038296986 = weight(abstract_txt:semantic in 4332) [ClassicSimilarity], result of:
            0.038296986 = score(doc=4332,freq=1.0), product of:
              0.1369421 = queryWeight, product of:
                1.4083343 = boost
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.021731198 = queryNorm
              0.27965823 = fieldWeight in 4332, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4745317 = idf(docFreq=1375, maxDocs=44421)
                0.0625 = fieldNorm(doc=4332)
          0.0999857 = weight(abstract_txt:citation in 4332) [ClassicSimilarity], result of:
            0.0999857 = score(doc=4332,freq=4.0), product of:
              0.16356832 = queryWeight, product of:
                1.5391709 = boost
                4.890223 = idf(docFreq=907, maxDocs=44421)
                0.021731198 = queryNorm
              0.6112779 = fieldWeight in 4332, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.890223 = idf(docFreq=907, maxDocs=44421)
                0.0625 = fieldNorm(doc=4332)
          0.16838582 = weight(abstract_txt:similarity in 4332) [ClassicSimilarity], result of:
            0.16838582 = score(doc=4332,freq=4.0), product of:
              0.23153196 = queryWeight, product of:
                1.8312293 = boost
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.021731198 = queryNorm
              0.72726816 = fieldWeight in 4332, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.0625 = fieldNorm(doc=4332)
        0.24 = coord(6/25)