Document (#40450)

Zanibbi, R.
Yuan, B.
Keyword and image-based retrieval for mathematical expressions
Two new methods for retrieving mathematical expressions using conventional keyword search and expression images are presented. An expression-level TF-IDF (term frequency-inverse document frequency) approach is used for keyword search, where queries and indexed expressions are represented by keywords taken from LATEX strings. TF-IDF is computed at the level of individual expressions rather than documents to increase the precision of matching. The second retrieval technique is a form of Content-Base Image Retrieval (CBIR). Expressions are segmented into connected components, and then components in the query expression and each expression in the collection are matched using contour and density features, aspect ratios, and relative positions. In an experiment using ten randomly sampled queries from a corpus of over 22,000 expressions, precision-at-k (k= 20) for the keyword-based approach was higher (keyword: µ= 84.0,s= 19.0, image-based:µ= 32.0,s= 30.7), but for a few of the queries better results were obtained using a combination of the two techniques.

Similar documents (author)

  1. Yuan, W.: End-user searching behavior in information retrieval : a longitudinal study (1997) 5.18
    5.184806 = sum of:
      5.184806 = weight(author_txt:yuan in 394) [ClassicSimilarity], result of:
        5.184806 = fieldWeight in 394, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.29569 = idf(docFreq=29, maxDocs=44218)
          0.625 = fieldNorm(doc=394)
  2. Yuan, W.; Meadow, C.T.: ¬A study of the use of variables in information retrieval user studies (1999) 4.15
    4.147845 = sum of:
      4.147845 = weight(author_txt:yuan in 2943) [ClassicSimilarity], result of:
        4.147845 = fieldWeight in 2943, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.29569 = idf(docFreq=29, maxDocs=44218)
          0.5 = fieldNorm(doc=2943)
  3. Jin, Z.; Yuan, C.: On the ambiguity of information retrieval for visualization (1998) 4.15
    4.147845 = sum of:
      4.147845 = weight(author_txt:yuan in 3216) [ClassicSimilarity], result of:
        4.147845 = fieldWeight in 3216, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.29569 = idf(docFreq=29, maxDocs=44218)
          0.5 = fieldNorm(doc=3216)
  4. Yuan, X.; Belkin, N.J.: Investigating information retrieval support techniques for different information-seeking strategies (2010) 4.15
    4.147845 = sum of:
      4.147845 = weight(author_txt:yuan in 3699) [ClassicSimilarity], result of:
        4.147845 = fieldWeight in 3699, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.29569 = idf(docFreq=29, maxDocs=44218)
          0.5 = fieldNorm(doc=3699)
  5. Yuan, X.; Belkin, N.J.: Evaluating an integrated system supporting multiple information-seeking strategies (2010) 4.15
    4.147845 = sum of:
      4.147845 = weight(author_txt:yuan in 3992) [ClassicSimilarity], result of:
        4.147845 = fieldWeight in 3992, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.29569 = idf(docFreq=29, maxDocs=44218)
          0.5 = fieldNorm(doc=3992)

Similar documents (content)

  1. Greiner-Petter, A.; Schubotz, M.; Cohl, H.S.; Gipp, B.: Semantic preserving bijective mappings for expressions involving special functions between computer algebra systems and document preparation systems (2019) 0.22
    0.2244592 = sum of:
      0.2244592 = product of:
        0.9352467 = sum of:
          0.019063236 = weight(abstract_txt:approach in 5499) [ClassicSimilarity], result of:
            0.019063236 = score(doc=5499,freq=2.0), product of:
              0.05758532 = queryWeight, product of:
                1.0222747 = boost
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.015040224 = queryNorm
              0.33104333 = fieldWeight in 5499, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.0625 = fieldNorm(doc=5499)
          0.1842198 = weight(abstract_txt:latex in 5499) [ClassicSimilarity], result of:
            0.1842198 = score(doc=5499,freq=3.0), product of:
              0.18114701 = queryWeight, product of:
                1.2820717 = boost
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.015040224 = queryNorm
              1.016963 = fieldWeight in 5499, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.0625 = fieldNorm(doc=5499)
          0.14688352 = weight(abstract_txt:mathematical in 5499) [ClassicSimilarity], result of:
            0.14688352 = score(doc=5499,freq=5.0), product of:
              0.16551958 = queryWeight, product of:
                1.7331511 = boost
                6.3497796 = idf(docFreq=209, maxDocs=44218)
                0.015040224 = queryNorm
              0.8874087 = fieldWeight in 5499, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.3497796 = idf(docFreq=209, maxDocs=44218)
                0.0625 = fieldNorm(doc=5499)
          0.02131298 = weight(abstract_txt:using in 5499) [ClassicSimilarity], result of:
            0.02131298 = score(doc=5499,freq=1.0), product of:
              0.09846838 = queryWeight, product of:
                1.8904932 = boost
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.015040224 = queryNorm
              0.21644491 = fieldWeight in 5499, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.0625 = fieldNorm(doc=5499)
          0.14611694 = weight(abstract_txt:expression in 5499) [ClassicSimilarity], result of:
            0.14611694 = score(doc=5499,freq=1.0), product of:
              0.35535935 = queryWeight, product of:
                3.5913737 = boost
                6.578893 = idf(docFreq=166, maxDocs=44218)
                0.015040224 = queryNorm
              0.41118082 = fieldWeight in 5499, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.578893 = idf(docFreq=166, maxDocs=44218)
                0.0625 = fieldNorm(doc=5499)
          0.41765022 = weight(abstract_txt:expressions in 5499) [ClassicSimilarity], result of:
            0.41765022 = score(doc=5499,freq=3.0), product of:
              0.5680665 = queryWeight, product of:
                5.5612435 = boost
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.015040224 = queryNorm
              0.73521364 = fieldWeight in 5499, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.0625 = fieldNorm(doc=5499)
        0.24 = coord(6/25)
  2. Yoon, J.W.; Chung, E.K.: Understanding image needs in daily life by analyzing questions in a social Q&A site (2011) 0.19
    0.1895331 = sum of:
      0.1895331 = product of:
        0.6769039 = sum of:
          0.013479745 = weight(abstract_txt:approach in 4922) [ClassicSimilarity], result of:
            0.013479745 = score(doc=4922,freq=1.0), product of:
              0.05758532 = queryWeight, product of:
                1.0222747 = boost
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.015040224 = queryNorm
              0.234083 = fieldWeight in 4922, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.0625 = fieldNorm(doc=4922)
          0.01615177 = weight(abstract_txt:retrieval in 4922) [ClassicSimilarity], result of:
            0.01615177 = score(doc=4922,freq=1.0), product of:
              0.07436487 = queryWeight, product of:
                1.4227915 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.015040224 = queryNorm
              0.21719621 = fieldWeight in 4922, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.0625 = fieldNorm(doc=4922)
          0.04476074 = weight(abstract_txt:components in 4922) [ClassicSimilarity], result of:
            0.04476074 = score(doc=4922,freq=1.0), product of:
              0.12817073 = queryWeight, product of:
                1.5251275 = boost
                5.58764 = idf(docFreq=449, maxDocs=44218)
                0.015040224 = queryNorm
              0.3492275 = fieldWeight in 4922, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.58764 = idf(docFreq=449, maxDocs=44218)
                0.0625 = fieldNorm(doc=4922)
          0.051250167 = weight(abstract_txt:queries in 4922) [ClassicSimilarity], result of:
            0.051250167 = score(doc=4922,freq=1.0), product of:
              0.16057748 = queryWeight, product of:
                2.0907383 = boost
                5.106586 = idf(docFreq=727, maxDocs=44218)
                0.015040224 = queryNorm
              0.31916162 = fieldWeight in 4922, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.106586 = idf(docFreq=727, maxDocs=44218)
                0.0625 = fieldNorm(doc=4922)
          0.16897382 = weight(abstract_txt:image in 4922) [ClassicSimilarity], result of:
            0.16897382 = score(doc=4922,freq=8.0), product of:
              0.1778569 = queryWeight, product of:
                2.2003548 = boost
                5.374322 = idf(docFreq=556, maxDocs=44218)
                0.015040224 = queryNorm
              0.9500549 = fieldWeight in 4922, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                5.374322 = idf(docFreq=556, maxDocs=44218)
                0.0625 = fieldNorm(doc=4922)
          0.14115721 = weight(abstract_txt:keyword in 4922) [ClassicSimilarity], result of:
            0.14115721 = score(doc=4922,freq=1.0), product of:
              0.3740871 = queryWeight, product of:
                4.119724 = boost
                6.037405 = idf(docFreq=286, maxDocs=44218)
                0.015040224 = queryNorm
              0.3773378 = fieldWeight in 4922, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.037405 = idf(docFreq=286, maxDocs=44218)
                0.0625 = fieldNorm(doc=4922)
          0.24113047 = weight(abstract_txt:expressions in 4922) [ClassicSimilarity], result of:
            0.24113047 = score(doc=4922,freq=1.0), product of:
              0.5680665 = queryWeight, product of:
                5.5612435 = boost
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.015040224 = queryNorm
              0.4244758 = fieldWeight in 4922, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.0625 = fieldNorm(doc=4922)
        0.28 = coord(7/25)
  3. D'Ambrosio, D.M.: Conceptualizing metadata via repertory grids : exploring a method for the development of domain-specific systems for knowledge organization (2007) 0.13
    0.12780926 = sum of:
      0.12780926 = product of:
        0.79880786 = sum of:
          0.022842048 = weight(abstract_txt:retrieval in 662) [ClassicSimilarity], result of:
            0.022842048 = score(doc=662,freq=2.0), product of:
              0.07436487 = queryWeight, product of:
                1.4227915 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.015040224 = queryNorm
              0.3071618 = fieldWeight in 662, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.0625 = fieldNorm(doc=662)
          0.030141104 = weight(abstract_txt:using in 662) [ClassicSimilarity], result of:
            0.030141104 = score(doc=662,freq=2.0), product of:
              0.09846838 = queryWeight, product of:
                1.8904932 = boost
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.015040224 = queryNorm
              0.30609933 = fieldWeight in 662, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.0625 = fieldNorm(doc=662)
          0.20664057 = weight(abstract_txt:expression in 662) [ClassicSimilarity], result of:
            0.20664057 = score(doc=662,freq=2.0), product of:
              0.35535935 = queryWeight, product of:
                3.5913737 = boost
                6.578893 = idf(docFreq=166, maxDocs=44218)
                0.015040224 = queryNorm
              0.5814975 = fieldWeight in 662, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.578893 = idf(docFreq=166, maxDocs=44218)
                0.0625 = fieldNorm(doc=662)
          0.53918415 = weight(abstract_txt:expressions in 662) [ClassicSimilarity], result of:
            0.53918415 = score(doc=662,freq=5.0), product of:
              0.5680665 = queryWeight, product of:
                5.5612435 = boost
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.015040224 = queryNorm
              0.94915676 = fieldWeight in 662, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.0625 = fieldNorm(doc=662)
        0.16 = coord(4/25)
  4. Eerola, J.; Vakkari, P.: How a general and a specific thesaurus cover expressions in patients' questions and physicians' answers (2008) 0.13
    0.12630492 = sum of:
      0.12630492 = product of:
        0.7894058 = sum of:
          0.01684968 = weight(abstract_txt:approach in 1732) [ClassicSimilarity], result of:
            0.01684968 = score(doc=1732,freq=1.0), product of:
              0.05758532 = queryWeight, product of:
                1.0222747 = boost
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.015040224 = queryNorm
              0.29260373 = fieldWeight in 1732, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.078125 = fieldNorm(doc=1732)
          0.06784717 = weight(abstract_txt:matched in 1732) [ClassicSimilarity], result of:
            0.06784717 = score(doc=1732,freq=1.0), product of:
              0.11568094 = queryWeight, product of:
                1.024537 = boost
                7.5072327 = idf(docFreq=65, maxDocs=44218)
                0.015040224 = queryNorm
              0.58650255 = fieldWeight in 1732, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5072327 = idf(docFreq=65, maxDocs=44218)
                0.078125 = fieldNorm(doc=1732)
          0.18264619 = weight(abstract_txt:expression in 1732) [ClassicSimilarity], result of:
            0.18264619 = score(doc=1732,freq=1.0), product of:
              0.35535935 = queryWeight, product of:
                3.5913737 = boost
                6.578893 = idf(docFreq=166, maxDocs=44218)
                0.015040224 = queryNorm
              0.51397604 = fieldWeight in 1732, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.578893 = idf(docFreq=166, maxDocs=44218)
                0.078125 = fieldNorm(doc=1732)
          0.5220628 = weight(abstract_txt:expressions in 1732) [ClassicSimilarity], result of:
            0.5220628 = score(doc=1732,freq=3.0), product of:
              0.5680665 = queryWeight, product of:
                5.5612435 = boost
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.015040224 = queryNorm
              0.9190171 = fieldWeight in 1732, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.078125 = fieldNorm(doc=1732)
        0.16 = coord(4/25)
  5. Corridoni, J.M.; Bimbo, A. del; Vicario, E.: Image retrieval by color semantics with incomplete knowledge (1998) 0.13
    0.12532212 = sum of:
      0.12532212 = product of:
        0.5221755 = sum of:
          0.046696857 = weight(abstract_txt:level in 594) [ClassicSimilarity], result of:
            0.046696857 = score(doc=594,freq=4.0), product of:
              0.083054364 = queryWeight, product of:
                1.2277019 = boost
                4.497956 = idf(docFreq=1337, maxDocs=44218)
                0.015040224 = queryNorm
              0.5622445 = fieldWeight in 594, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.497956 = idf(docFreq=1337, maxDocs=44218)
                0.0625 = fieldNorm(doc=594)
          0.027975682 = weight(abstract_txt:retrieval in 594) [ClassicSimilarity], result of:
            0.027975682 = score(doc=594,freq=3.0), product of:
              0.07436487 = queryWeight, product of:
                1.4227915 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.015040224 = queryNorm
              0.37619486 = fieldWeight in 594, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.0625 = fieldNorm(doc=594)
          0.043276582 = weight(abstract_txt:precision in 594) [ClassicSimilarity], result of:
            0.043276582 = score(doc=594,freq=1.0), product of:
              0.12532161 = queryWeight, product of:
                1.5080812 = boost
                5.5251865 = idf(docFreq=478, maxDocs=44218)
                0.015040224 = queryNorm
              0.34532416 = fieldWeight in 594, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5251865 = idf(docFreq=478, maxDocs=44218)
                0.0625 = fieldNorm(doc=594)
          0.051250167 = weight(abstract_txt:queries in 594) [ClassicSimilarity], result of:
            0.051250167 = score(doc=594,freq=1.0), product of:
              0.16057748 = queryWeight, product of:
                2.0907383 = boost
                5.106586 = idf(docFreq=727, maxDocs=44218)
                0.015040224 = queryNorm
              0.31916162 = fieldWeight in 594, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.106586 = idf(docFreq=727, maxDocs=44218)
                0.0625 = fieldNorm(doc=594)
          0.14633563 = weight(abstract_txt:image in 594) [ClassicSimilarity], result of:
            0.14633563 = score(doc=594,freq=6.0), product of:
              0.1778569 = queryWeight, product of:
                2.2003548 = boost
                5.374322 = idf(docFreq=556, maxDocs=44218)
                0.015040224 = queryNorm
              0.82277167 = fieldWeight in 594, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.374322 = idf(docFreq=556, maxDocs=44218)
                0.0625 = fieldNorm(doc=594)
          0.20664057 = weight(abstract_txt:expression in 594) [ClassicSimilarity], result of:
            0.20664057 = score(doc=594,freq=2.0), product of:
              0.35535935 = queryWeight, product of:
                3.5913737 = boost
                6.578893 = idf(docFreq=166, maxDocs=44218)
                0.015040224 = queryNorm
              0.5814975 = fieldWeight in 594, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.578893 = idf(docFreq=166, maxDocs=44218)
                0.0625 = fieldNorm(doc=594)
        0.24 = coord(6/25)