Document (#40896)

Author
Neumann, M.
Steinberg, J.
Schaer, P.
Title
Web-ccraping for non-programmers : introducing OXPath for digital library metadata harvesting
Source
Code4Lib journal. Issue 38(2017), [http://journal.code4lib.org]
Year
2017
Abstract
Building up new collections for digital libraries is a demanding task. Available data sets have to be extracted which is usually done with the help of software developers as it involves custom data handlers or conversion scripts. In cases where the desired data is only available on the data provider's website custom web scrapers are needed. This may be the case for small to medium-size publishers, research institutes or funding agencies. As data curation is a typical task that is done by people with a library and information science background, these people are usually proficient with XML technologies but are not full-stack programmers. Therefore we would like to present a web scraping tool that does not demand the digital library curators to program custom web scrapers from scratch. We present the open-source tool OXPath, an extension of XPath, that allows the user to define data to be extracted from websites in a declarative way. By taking one of our own use cases as an example, we guide you in more detail through the process of creating an OXPath wrapper for metadata harvesting. We also point out some practical things to consider when creating a web scraper (with OXPath). On top of that, we also present a syntax highlighting plugin for the popular text editor Atom that we developed to further support OXPath users and to simplify the authoring process.
Content
Vgl.: http://journal.code4lib.org/articles/13007.
Theme
Metadaten

Similar documents (author)

  1. Schaer, P.: Integration von Open-Access-Repositorien in Fachportale (2010) 2.10
    2.0986493 = sum of:
      2.0986493 = product of:
        4.1972985 = sum of:
          4.1972985 = weight(author_txt:schaer in 3320) [ClassicSimilarity], result of:
            4.1972985 = score(doc=3320,freq=1.0), product of:
              0.7572716 = queryWeight, product of:
                1.0768023 = boost
                8.868255 = idf(docFreq=16, maxDocs=44421)
                0.0793008 = queryNorm
              5.5426593 = fieldWeight in 3320, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.868255 = idf(docFreq=16, maxDocs=44421)
                0.625 = fieldNorm(doc=3320)
        0.5 = coord(1/2)
    
  2. Schaer, P.: Sprachmodelle und neuronale Netze im Information Retrieval (2023) 2.10
    2.0986493 = sum of:
      2.0986493 = product of:
        4.1972985 = sum of:
          4.1972985 = weight(author_txt:schaer in 1800) [ClassicSimilarity], result of:
            4.1972985 = score(doc=1800,freq=1.0), product of:
              0.7572716 = queryWeight, product of:
                1.0768023 = boost
                8.868255 = idf(docFreq=16, maxDocs=44421)
                0.0793008 = queryNorm
              5.5426593 = fieldWeight in 1800, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.868255 = idf(docFreq=16, maxDocs=44421)
                0.625 = fieldNorm(doc=1800)
        0.5 = coord(1/2)
    
  3. Neumann. M.: HAL: Hyperspace Analogue to Language (2012) 1.68
    1.6808618 = sum of:
      1.6808618 = product of:
        3.3617237 = sum of:
          3.3617237 = weight(author_txt:neumann in 965) [ClassicSimilarity], result of:
            3.3617237 = score(doc=965,freq=1.0), product of:
              0.65310013 = queryWeight, product of:
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.0793008 = queryNorm
              5.1473327 = fieldWeight in 965, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.625 = fieldNorm(doc=965)
        0.5 = coord(1/2)
    
  4. Neumann, G.: Studienanleitung für das Lehrgebiet alphabetische Katalogisierung (1986) 1.68
    1.6808618 = sum of:
      1.6808618 = product of:
        3.3617237 = sum of:
          3.3617237 = weight(author_txt:neumann in 6010) [ClassicSimilarity], result of:
            3.3617237 = score(doc=6010,freq=1.0), product of:
              0.65310013 = queryWeight, product of:
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.0793008 = queryNorm
              5.1473327 = fieldWeight in 6010, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.625 = fieldNorm(doc=6010)
        0.5 = coord(1/2)
    
  5. Neumann, G.: ¬Das *¬ISBC# erschließt Btx für die Schulbibliotheken (1994) 1.68
    1.6808618 = sum of:
      1.6808618 = product of:
        3.3617237 = sum of:
          3.3617237 = weight(author_txt:neumann in 284) [ClassicSimilarity], result of:
            3.3617237 = score(doc=284,freq=1.0), product of:
              0.65310013 = queryWeight, product of:
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.0793008 = queryNorm
              5.1473327 = fieldWeight in 284, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.235732 = idf(docFreq=31, maxDocs=44421)
                0.625 = fieldNorm(doc=284)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Harlow, C.: Data munging tools in Preparation for RDF : Catmandu and LODRefine (2015) 0.16
    0.16127256 = sum of:
      0.16127256 = product of:
        0.50397676 = sum of:
          0.025388693 = weight(abstract_txt:library in 3277) [ClassicSimilarity], result of:
            0.025388693 = score(doc=3277,freq=3.0), product of:
              0.073546305 = queryWeight, product of:
                1.1050844 = boost
                3.188885 = idf(docFreq=4976, maxDocs=44421)
                0.020870198 = queryNorm
              0.34520692 = fieldWeight in 3277, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.188885 = idf(docFreq=4976, maxDocs=44421)
                0.0625 = fieldNorm(doc=3277)
          0.060631327 = weight(abstract_txt:metadata in 3277) [ClassicSimilarity], result of:
            0.060631327 = score(doc=3277,freq=3.0), product of:
              0.11478935 = queryWeight, product of:
                1.1272498 = boost
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.020870198 = queryNorm
              0.52819645 = fieldWeight in 3277, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.0625 = fieldNorm(doc=3277)
          0.05166437 = weight(abstract_txt:tool in 3277) [ClassicSimilarity], result of:
            0.05166437 = score(doc=3277,freq=2.0), product of:
              0.118103124 = queryWeight, product of:
                1.143405 = boost
                4.9491973 = idf(docFreq=855, maxDocs=44421)
                0.020870198 = queryNorm
              0.43745136 = fieldWeight in 3277, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.9491973 = idf(docFreq=855, maxDocs=44421)
                0.0625 = fieldNorm(doc=3277)
          0.016235633 = weight(abstract_txt:with in 3277) [ClassicSimilarity], result of:
            0.016235633 = score(doc=3277,freq=3.0), product of:
              0.060084112 = queryWeight, product of:
                1.1533582 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.020870198 = queryNorm
              0.27021506 = fieldWeight in 3277, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.0625 = fieldNorm(doc=3277)
          0.054402284 = weight(abstract_txt:cases in 3277) [ClassicSimilarity], result of:
            0.054402284 = score(doc=3277,freq=1.0), product of:
              0.1540123 = queryWeight, product of:
                1.305711 = boost
                5.6517344 = idf(docFreq=423, maxDocs=44421)
                0.020870198 = queryNorm
              0.3532334 = fieldWeight in 3277, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6517344 = idf(docFreq=423, maxDocs=44421)
                0.0625 = fieldNorm(doc=3277)
          0.014092389 = weight(abstract_txt:that in 3277) [ClassicSimilarity], result of:
            0.014092389 = score(doc=3277,freq=2.0), product of:
              0.067417145 = queryWeight, product of:
                1.3659178 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.020870198 = queryNorm
              0.20903271 = fieldWeight in 3277, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=3277)
          0.18139257 = weight(abstract_txt:programmers in 3277) [ClassicSimilarity], result of:
            0.18139257 = score(doc=3277,freq=1.0), product of:
              0.3437349 = queryWeight, product of:
                1.9506582 = boost
                8.443371 = idf(docFreq=25, maxDocs=44421)
                0.020870198 = queryNorm
              0.5277107 = fieldWeight in 3277, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.443371 = idf(docFreq=25, maxDocs=44421)
                0.0625 = fieldNorm(doc=3277)
          0.100169465 = weight(abstract_txt:data in 3277) [ClassicSimilarity], result of:
            0.100169465 = score(doc=3277,freq=9.0), product of:
              0.1604207 = queryWeight, product of:
                2.3081298 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.020870198 = queryNorm
              0.6244173 = fieldWeight in 3277, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.0625 = fieldNorm(doc=3277)
        0.32 = coord(8/25)
    
  2. Fox, B.; Fox, C.J.: Efficient stemmer generation (2002) 0.15
    0.14828244 = sum of:
      0.14828244 = product of:
        0.92676526 = sum of:
          0.016403882 = weight(abstract_txt:with in 3585) [ClassicSimilarity], result of:
            0.016403882 = score(doc=3585,freq=1.0), product of:
              0.060084112 = queryWeight, product of:
                1.1533582 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.020870198 = queryNorm
              0.2730153 = fieldWeight in 3585, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.109375 = fieldNorm(doc=3585)
          0.017438442 = weight(abstract_txt:that in 3585) [ClassicSimilarity], result of:
            0.017438442 = score(doc=3585,freq=1.0), product of:
              0.067417145 = queryWeight, product of:
                1.3659178 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.020870198 = queryNorm
              0.2586648 = fieldWeight in 3585, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.109375 = fieldNorm(doc=3585)
          0.317437 = weight(abstract_txt:programmers in 3585) [ClassicSimilarity], result of:
            0.317437 = score(doc=3585,freq=1.0), product of:
              0.3437349 = queryWeight, product of:
                1.9506582 = boost
                8.443371 = idf(docFreq=25, maxDocs=44421)
                0.020870198 = queryNorm
              0.9234937 = fieldWeight in 3585, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.443371 = idf(docFreq=25, maxDocs=44421)
                0.109375 = fieldNorm(doc=3585)
          0.57548594 = weight(abstract_txt:custom in 3585) [ClassicSimilarity], result of:
            0.57548594 = score(doc=3585,freq=2.0), product of:
              0.4643322 = queryWeight, product of:
                2.776703 = boost
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.020870198 = queryNorm
              1.239384 = fieldWeight in 3585, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.0125885 = idf(docFreq=39, maxDocs=44421)
                0.109375 = fieldNorm(doc=3585)
        0.16 = coord(4/25)
    
  3. Güven, S.; Feiner, S.: ¬A hypermedia authoring tool for augmented and virtual reality (2003) 0.14
    0.13508923 = sum of:
      0.13508923 = product of:
        0.48246154 = sum of:
          0.044473495 = weight(abstract_txt:task in 935) [ClassicSimilarity], result of:
            0.044473495 = score(doc=935,freq=1.0), product of:
              0.11603922 = queryWeight, product of:
                1.1333702 = boost
                4.9057617 = idf(docFreq=893, maxDocs=44421)
                0.020870198 = queryNorm
              0.38326263 = fieldWeight in 935, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9057617 = idf(docFreq=893, maxDocs=44421)
                0.078125 = fieldNorm(doc=935)
          0.045665286 = weight(abstract_txt:tool in 935) [ClassicSimilarity], result of:
            0.045665286 = score(doc=935,freq=1.0), product of:
              0.118103124 = queryWeight, product of:
                1.143405 = boost
                4.9491973 = idf(docFreq=855, maxDocs=44421)
                0.020870198 = queryNorm
              0.38665605 = fieldWeight in 935, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9491973 = idf(docFreq=855, maxDocs=44421)
                0.078125 = fieldNorm(doc=935)
          0.016570423 = weight(abstract_txt:with in 935) [ClassicSimilarity], result of:
            0.016570423 = score(doc=935,freq=2.0), product of:
              0.060084112 = queryWeight, product of:
                1.1533582 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.020870198 = queryNorm
              0.2757871 = fieldWeight in 935, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.078125 = fieldNorm(doc=935)
          0.0901799 = weight(abstract_txt:creating in 935) [ClassicSimilarity], result of:
            0.0901799 = score(doc=935,freq=2.0), product of:
              0.14754815 = queryWeight, product of:
                1.278016 = boost
                5.531857 = idf(docFreq=477, maxDocs=44421)
                0.020870198 = queryNorm
              0.6111896 = fieldWeight in 935, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.531857 = idf(docFreq=477, maxDocs=44421)
                0.078125 = fieldNorm(doc=935)
          0.012456029 = weight(abstract_txt:that in 935) [ClassicSimilarity], result of:
            0.012456029 = score(doc=935,freq=1.0), product of:
              0.067417145 = queryWeight, product of:
                1.3659178 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.020870198 = queryNorm
              0.18476056 = fieldWeight in 935, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.078125 = fieldNorm(doc=935)
          0.04637569 = weight(abstract_txt:present in 935) [ClassicSimilarity], result of:
            0.04637569 = score(doc=935,freq=1.0), product of:
              0.13659284 = queryWeight, product of:
                1.5060139 = boost
                4.3458266 = idf(docFreq=1564, maxDocs=44421)
                0.020870198 = queryNorm
              0.3395177 = fieldWeight in 935, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3458266 = idf(docFreq=1564, maxDocs=44421)
                0.078125 = fieldNorm(doc=935)
          0.22674072 = weight(abstract_txt:programmers in 935) [ClassicSimilarity], result of:
            0.22674072 = score(doc=935,freq=1.0), product of:
              0.3437349 = queryWeight, product of:
                1.9506582 = boost
                8.443371 = idf(docFreq=25, maxDocs=44421)
                0.020870198 = queryNorm
              0.65963835 = fieldWeight in 935, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.443371 = idf(docFreq=25, maxDocs=44421)
                0.078125 = fieldNorm(doc=935)
        0.28 = coord(7/25)
    
  4. Foulonneau, M.: Information redundancy across metadata collections (2007) 0.13
    0.13316204 = sum of:
      0.13316204 = product of:
        0.41613138 = sum of:
          0.02072978 = weight(abstract_txt:library in 1915) [ClassicSimilarity], result of:
            0.02072978 = score(doc=1915,freq=2.0), product of:
              0.073546305 = queryWeight, product of:
                1.1050844 = boost
                3.188885 = idf(docFreq=4976, maxDocs=44421)
                0.020870198 = queryNorm
              0.28186026 = fieldWeight in 1915, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.188885 = idf(docFreq=4976, maxDocs=44421)
                0.0625 = fieldNorm(doc=1915)
          0.11610016 = weight(abstract_txt:metadata in 1915) [ClassicSimilarity], result of:
            0.11610016 = score(doc=1915,freq=11.0), product of:
              0.11478935 = queryWeight, product of:
                1.1272498 = boost
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.020870198 = queryNorm
              1.0114193 = fieldWeight in 1915, product of:
                3.3166249 = tf(freq=11.0), with freq of:
                  11.0 = termFreq=11.0
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.0625 = fieldNorm(doc=1915)
          0.009373646 = weight(abstract_txt:with in 1915) [ClassicSimilarity], result of:
            0.009373646 = score(doc=1915,freq=1.0), product of:
              0.060084112 = queryWeight, product of:
                1.1533582 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.020870198 = queryNorm
              0.15600874 = fieldWeight in 1915, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.0625 = fieldNorm(doc=1915)
          0.014092389 = weight(abstract_txt:that in 1915) [ClassicSimilarity], result of:
            0.014092389 = score(doc=1915,freq=2.0), product of:
              0.067417145 = queryWeight, product of:
                1.3659178 = boost
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.020870198 = queryNorm
              0.20903271 = fieldWeight in 1915, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3649352 = idf(docFreq=11344, maxDocs=44421)
                0.0625 = fieldNorm(doc=1915)
          0.051805876 = weight(abstract_txt:digital in 1915) [ClassicSimilarity], result of:
            0.051805876 = score(doc=1915,freq=2.0), product of:
              0.13544108 = queryWeight, product of:
                1.4996511 = boost
                4.3274655 = idf(docFreq=1593, maxDocs=44421)
                0.020870198 = queryNorm
              0.38249752 = fieldWeight in 1915, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3274655 = idf(docFreq=1593, maxDocs=44421)
                0.0625 = fieldNorm(doc=1915)
          0.03710055 = weight(abstract_txt:present in 1915) [ClassicSimilarity], result of:
            0.03710055 = score(doc=1915,freq=1.0), product of:
              0.13659284 = queryWeight, product of:
                1.5060139 = boost
                4.3458266 = idf(docFreq=1564, maxDocs=44421)
                0.020870198 = queryNorm
              0.27161416 = fieldWeight in 1915, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3458266 = idf(docFreq=1564, maxDocs=44421)
                0.0625 = fieldNorm(doc=1915)
          0.13353916 = weight(abstract_txt:harvesting in 1915) [ClassicSimilarity], result of:
            0.13353916 = score(doc=1915,freq=1.0), product of:
              0.28025264 = queryWeight, product of:
                1.7613442 = boost
                7.62393 = idf(docFreq=58, maxDocs=44421)
                0.020870198 = queryNorm
              0.47649562 = fieldWeight in 1915, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.62393 = idf(docFreq=58, maxDocs=44421)
                0.0625 = fieldNorm(doc=1915)
          0.033389818 = weight(abstract_txt:data in 1915) [ClassicSimilarity], result of:
            0.033389818 = score(doc=1915,freq=1.0), product of:
              0.1604207 = queryWeight, product of:
                2.3081298 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.020870198 = queryNorm
              0.20813909 = fieldWeight in 1915, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.0625 = fieldNorm(doc=1915)
        0.32 = coord(8/25)
    
  5. Shreeves, S.L.; Kaczmarek, J.S.; Cole, T.W.: Harvesting cultural heritage metadata using OAI Protocol (2003) 0.12
    0.120948754 = sum of:
      0.120948754 = product of:
        0.50395316 = sum of:
          0.025912225 = weight(abstract_txt:library in 5775) [ClassicSimilarity], result of:
            0.025912225 = score(doc=5775,freq=2.0), product of:
              0.073546305 = queryWeight, product of:
                1.1050844 = boost
                3.188885 = idf(docFreq=4976, maxDocs=44421)
                0.020870198 = queryNorm
              0.35232532 = fieldWeight in 5775, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.188885 = idf(docFreq=4976, maxDocs=44421)
                0.078125 = fieldNorm(doc=5775)
          0.12376318 = weight(abstract_txt:metadata in 5775) [ClassicSimilarity], result of:
            0.12376318 = score(doc=5775,freq=8.0), product of:
              0.11478935 = queryWeight, product of:
                1.1272498 = boost
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.020870198 = queryNorm
              1.0781765 = fieldWeight in 5775, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                4.87927 = idf(docFreq=917, maxDocs=44421)
                0.078125 = fieldNorm(doc=5775)
          0.011717058 = weight(abstract_txt:with in 5775) [ClassicSimilarity], result of:
            0.011717058 = score(doc=5775,freq=1.0), product of:
              0.060084112 = queryWeight, product of:
                1.1533582 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.020870198 = queryNorm
              0.19501092 = fieldWeight in 5775, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.078125 = fieldNorm(doc=5775)
          0.06475735 = weight(abstract_txt:digital in 5775) [ClassicSimilarity], result of:
            0.06475735 = score(doc=5775,freq=2.0), product of:
              0.13544108 = queryWeight, product of:
                1.4996511 = boost
                4.3274655 = idf(docFreq=1593, maxDocs=44421)
                0.020870198 = queryNorm
              0.4781219 = fieldWeight in 5775, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3274655 = idf(docFreq=1593, maxDocs=44421)
                0.078125 = fieldNorm(doc=5775)
          0.2360661 = weight(abstract_txt:harvesting in 5775) [ClassicSimilarity], result of:
            0.2360661 = score(doc=5775,freq=2.0), product of:
              0.28025264 = queryWeight, product of:
                1.7613442 = boost
                7.62393 = idf(docFreq=58, maxDocs=44421)
                0.020870198 = queryNorm
              0.8423332 = fieldWeight in 5775, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.62393 = idf(docFreq=58, maxDocs=44421)
                0.078125 = fieldNorm(doc=5775)
          0.041737273 = weight(abstract_txt:data in 5775) [ClassicSimilarity], result of:
            0.041737273 = score(doc=5775,freq=1.0), product of:
              0.1604207 = queryWeight, product of:
                2.3081298 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.020870198 = queryNorm
              0.26017386 = fieldWeight in 5775, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.078125 = fieldNorm(doc=5775)
        0.24 = coord(6/25)