Document (#37092)

Author
Berry, M.W.
Esau, R.
Kiefer, B.
Title
¬The use of text mining techniques in electronic discovery for legal matters
Source
Next generation search engines: advanced models for information retrieval. Eds.: C. Jouis, u.a
Imprint
Hershey, PA : IGI Publishing
Year
2012
Pages
S.174-190
Abstract
Electronic discovery (eDiscovery) is the process of collecting and analyzing electronic documents to determine their relevance to a legal matter. Office technology has advanced and eased the requirements necessary to create a document. As such, the volume of data has outgrown the manual processes previously used to make relevance judgments. Methods of text mining and information retrieval have been put to use in eDiscovery to help tame the volume of data; however, the results have been uneven. This chapter looks at the historical bias of the collection process. The authors examine how tools like classifiers, latent semantic analysis, and non-negative matrix factorization deal with nuances of the collection process.
Footnote
Vgl.: http://www.igi-global.com/book/next-generation-search-engines/64425.
Theme
Data Mining

Similar documents (author)

  1. Berry, J.: CD-ROM: the medium for the moment (1992) 5.76
    5.7603507 = sum of:
      5.7603507 = weight(author_txt:berry in 3634) [ClassicSimilarity], result of:
        5.7603507 = fieldWeight in 3634, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.216561 = idf(docFreq=11, maxDocs=44421)
          0.625 = fieldNorm(doc=3634)
    
  2. Courtois, M.P.; Berry, M.W.: Results ranking in Web search engines (1999) 4.61
    4.6082807 = sum of:
      4.6082807 = weight(author_txt:berry in 3794) [ClassicSimilarity], result of:
        4.6082807 = fieldWeight in 3794, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.216561 = idf(docFreq=11, maxDocs=44421)
          0.5 = fieldNorm(doc=3794)
    
  3. Berry, M.W.; Browne, M.: Understanding search engines : mathematical modeling and text retrieval (1999) 4.61
    4.6082807 = sum of:
      4.6082807 = weight(author_txt:berry in 777) [ClassicSimilarity], result of:
        4.6082807 = fieldWeight in 777, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.216561 = idf(docFreq=11, maxDocs=44421)
          0.5 = fieldNorm(doc=777)
    
  4. Berry, M.W.; Browne, M.: Understanding search engines : mathematical modeling and text retrieval (2005) 4.61
    4.6082807 = sum of:
      4.6082807 = weight(author_txt:berry in 1007) [ClassicSimilarity], result of:
        4.6082807 = fieldWeight in 1007, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.216561 = idf(docFreq=11, maxDocs=44421)
          0.5 = fieldNorm(doc=1007)
    
  5. Martin, D.I.; Berry, M.W.: Latent Semantic Indexing (2009) 4.61
    4.6082807 = sum of:
      4.6082807 = weight(author_txt:berry in 821) [ClassicSimilarity], result of:
        4.6082807 = fieldWeight in 821, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.216561 = idf(docFreq=11, maxDocs=44421)
          0.5 = fieldNorm(doc=821)
    

Similar documents (content)

  1. Mining text data (2012) 0.26
    0.2626178 = sum of:
      0.2626178 = product of:
        0.72949386 = sum of:
          0.05672212 = weight(abstract_txt:chapter in 1362) [ClassicSimilarity], result of:
            0.05672212 = score(doc=1362,freq=1.0), product of:
              0.14335465 = queryWeight, product of:
                6.3308296 = idf(docFreq=214, maxDocs=44421)
                0.022643898 = queryNorm
              0.39567685 = fieldWeight in 1362, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3308296 = idf(docFreq=214, maxDocs=44421)
                0.0625 = fieldNorm(doc=1362)
          0.025360743 = weight(abstract_txt:have in 1362) [ClassicSimilarity], result of:
            0.025360743 = score(doc=1362,freq=3.0), product of:
              0.073224165 = queryWeight, product of:
                1.0107327 = boost
                3.199388 = idf(docFreq=4924, maxDocs=44421)
                0.022643898 = queryNorm
              0.3463439 = fieldWeight in 1362, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.199388 = idf(docFreq=4924, maxDocs=44421)
                0.0625 = fieldNorm(doc=1362)
          0.040448006 = weight(abstract_txt:data in 1362) [ClassicSimilarity], result of:
            0.040448006 = score(doc=1362,freq=6.0), product of:
              0.07933555 = queryWeight, product of:
                1.0520661 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.022643898 = queryNorm
              0.5098346 = fieldWeight in 1362, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.0625 = fieldNorm(doc=1362)
          0.021111758 = weight(abstract_txt:been in 1362) [ClassicSimilarity], result of:
            0.021111758 = score(doc=1362,freq=1.0), product of:
              0.09345512 = queryWeight, product of:
                1.1418542 = boost
                3.614442 = idf(docFreq=3251, maxDocs=44421)
                0.022643898 = queryNorm
              0.22590263 = fieldWeight in 1362, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.614442 = idf(docFreq=3251, maxDocs=44421)
                0.0625 = fieldNorm(doc=1362)
          0.07226113 = weight(abstract_txt:text in 1362) [ClassicSimilarity], result of:
            0.07226113 = score(doc=1362,freq=6.0), product of:
              0.11680809 = queryWeight, product of:
                1.2765727 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.022643898 = queryNorm
              0.61863124 = fieldWeight in 1362, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=1362)
          0.044515382 = weight(abstract_txt:process in 1362) [ClassicSimilarity], result of:
            0.044515382 = score(doc=1362,freq=1.0), product of:
              0.17591006 = queryWeight, product of:
                1.918669 = boost
                4.048922 = idf(docFreq=2105, maxDocs=44421)
                0.022643898 = queryNorm
              0.25305763 = fieldWeight in 1362, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.048922 = idf(docFreq=2105, maxDocs=44421)
                0.0625 = fieldNorm(doc=1362)
          0.10090138 = weight(abstract_txt:volume in 1362) [ClassicSimilarity], result of:
            0.10090138 = score(doc=1362,freq=1.0), product of:
              0.26516625 = queryWeight, product of:
                1.9233938 = boost
                6.0883393 = idf(docFreq=273, maxDocs=44421)
                0.022643898 = queryNorm
              0.3805212 = fieldWeight in 1362, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0883393 = idf(docFreq=273, maxDocs=44421)
                0.0625 = fieldNorm(doc=1362)
          0.3153608 = weight(abstract_txt:mining in 1362) [ClassicSimilarity], result of:
            0.3153608 = score(doc=1362,freq=9.0), product of:
              0.2725071 = queryWeight, product of:
                1.9498357 = boost
                6.1720386 = idf(docFreq=251, maxDocs=44421)
                0.022643898 = queryNorm
              1.1572572 = fieldWeight in 1362, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                6.1720386 = idf(docFreq=251, maxDocs=44421)
                0.0625 = fieldNorm(doc=1362)
          0.052812565 = weight(abstract_txt:electronic in 1362) [ClassicSimilarity], result of:
            0.052812565 = score(doc=1362,freq=1.0), product of:
              0.1971403 = queryWeight, product of:
                2.031152 = boost
                4.2862926 = idf(docFreq=1660, maxDocs=44421)
                0.022643898 = queryNorm
              0.26789328 = fieldWeight in 1362, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2862926 = idf(docFreq=1660, maxDocs=44421)
                0.0625 = fieldNorm(doc=1362)
        0.36 = coord(9/25)
    
  2. Tonkin, E.L.; Tourte, G.J.L.: Working with text. tools, techniques and approaches for text mining (2016) 0.16
    0.15523551 = sum of:
      0.15523551 = product of:
        0.64681464 = sum of:
          0.014642032 = weight(abstract_txt:have in 19) [ClassicSimilarity], result of:
            0.014642032 = score(doc=19,freq=1.0), product of:
              0.073224165 = queryWeight, product of:
                1.0107327 = boost
                3.199388 = idf(docFreq=4924, maxDocs=44421)
                0.022643898 = queryNorm
              0.19996175 = fieldWeight in 19, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.199388 = idf(docFreq=4924, maxDocs=44421)
                0.0625 = fieldNorm(doc=19)
          0.01651283 = weight(abstract_txt:data in 19) [ClassicSimilarity], result of:
            0.01651283 = score(doc=19,freq=1.0), product of:
              0.07933555 = queryWeight, product of:
                1.0520661 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.022643898 = queryNorm
              0.20813909 = fieldWeight in 19, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.0625 = fieldNorm(doc=19)
          0.09328872 = weight(abstract_txt:text in 19) [ClassicSimilarity], result of:
            0.09328872 = score(doc=19,freq=10.0), product of:
              0.11680809 = queryWeight, product of:
                1.2765727 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.022643898 = queryNorm
              0.7986495 = fieldWeight in 19, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=19)
          0.053557172 = weight(abstract_txt:relevance in 19) [ClassicSimilarity], result of:
            0.053557172 = score(doc=19,freq=1.0), product of:
              0.17383288 = queryWeight, product of:
                1.55731 = boost
                4.929532 = idf(docFreq=872, maxDocs=44421)
                0.022643898 = queryNorm
              0.30809575 = fieldWeight in 19, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.929532 = idf(docFreq=872, maxDocs=44421)
                0.0625 = fieldNorm(doc=19)
          0.3153608 = weight(abstract_txt:mining in 19) [ClassicSimilarity], result of:
            0.3153608 = score(doc=19,freq=9.0), product of:
              0.2725071 = queryWeight, product of:
                1.9498357 = boost
                6.1720386 = idf(docFreq=251, maxDocs=44421)
                0.022643898 = queryNorm
              1.1572572 = fieldWeight in 19, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                6.1720386 = idf(docFreq=251, maxDocs=44421)
                0.0625 = fieldNorm(doc=19)
          0.15345308 = weight(abstract_txt:legal in 19) [ClassicSimilarity], result of:
            0.15345308 = score(doc=19,freq=2.0), product of:
              0.27833036 = queryWeight, product of:
                1.9705586 = boost
                6.2376356 = idf(docFreq=235, maxDocs=44421)
                0.022643898 = queryNorm
              0.5513343 = fieldWeight in 19, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.2376356 = idf(docFreq=235, maxDocs=44421)
                0.0625 = fieldNorm(doc=19)
        0.24 = coord(6/25)
    
  3. Chen, Y.-L.; Liu, Y.-H.; Ho, W.-L.: ¬A text mining approach to assist the general public in the retrieval of legal documents (2013) 0.14
    0.14146969 = sum of:
      0.14146969 = product of:
        0.7073484 = sum of:
          0.12876135 = weight(abstract_txt:judgments in 1521) [ClassicSimilarity], result of:
            0.12876135 = score(doc=1521,freq=2.0), product of:
              0.16936247 = queryWeight, product of:
                1.0869328 = boost
                6.881186 = idf(docFreq=123, maxDocs=44421)
                0.022643898 = queryNorm
              0.76027083 = fieldWeight in 1521, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.881186 = idf(docFreq=123, maxDocs=44421)
                0.078125 = fieldNorm(doc=1521)
          0.037320666 = weight(abstract_txt:been in 1521) [ClassicSimilarity], result of:
            0.037320666 = score(doc=1521,freq=2.0), product of:
              0.09345512 = queryWeight, product of:
                1.1418542 = boost
                3.614442 = idf(docFreq=3251, maxDocs=44421)
                0.022643898 = queryNorm
              0.3993432 = fieldWeight in 1521, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.614442 = idf(docFreq=3251, maxDocs=44421)
                0.078125 = fieldNorm(doc=1521)
          0.052149978 = weight(abstract_txt:text in 1521) [ClassicSimilarity], result of:
            0.052149978 = score(doc=1521,freq=2.0), product of:
              0.11680809 = queryWeight, product of:
                1.2765727 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.022643898 = queryNorm
              0.4464586 = fieldWeight in 1521, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.078125 = fieldNorm(doc=1521)
          0.18582813 = weight(abstract_txt:mining in 1521) [ClassicSimilarity], result of:
            0.18582813 = score(doc=1521,freq=2.0), product of:
              0.2725071 = queryWeight, product of:
                1.9498357 = boost
                6.1720386 = idf(docFreq=251, maxDocs=44421)
                0.022643898 = queryNorm
              0.68192035 = fieldWeight in 1521, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1720386 = idf(docFreq=251, maxDocs=44421)
                0.078125 = fieldNorm(doc=1521)
          0.30328828 = weight(abstract_txt:legal in 1521) [ClassicSimilarity], result of:
            0.30328828 = score(doc=1521,freq=5.0), product of:
              0.27833036 = queryWeight, product of:
                1.9705586 = boost
                6.2376356 = idf(docFreq=235, maxDocs=44421)
                0.022643898 = queryNorm
              1.0896702 = fieldWeight in 1521, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.2376356 = idf(docFreq=235, maxDocs=44421)
                0.078125 = fieldNorm(doc=1521)
        0.2 = coord(5/25)
    
  4. Benoit, G.: Data mining (2002) 0.13
    0.13043466 = sum of:
      0.13043466 = product of:
        0.5434778 = sum of:
          0.05672212 = weight(abstract_txt:chapter in 5296) [ClassicSimilarity], result of:
            0.05672212 = score(doc=5296,freq=1.0), product of:
              0.14335465 = queryWeight, product of:
                6.3308296 = idf(docFreq=214, maxDocs=44421)
                0.022643898 = queryNorm
              0.39567685 = fieldWeight in 5296, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3308296 = idf(docFreq=214, maxDocs=44421)
                0.0625 = fieldNorm(doc=5296)
          0.040448006 = weight(abstract_txt:data in 5296) [ClassicSimilarity], result of:
            0.040448006 = score(doc=5296,freq=6.0), product of:
              0.07933555 = queryWeight, product of:
                1.0520661 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.022643898 = queryNorm
              0.5098346 = fieldWeight in 5296, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.0625 = fieldNorm(doc=5296)
          0.029500483 = weight(abstract_txt:text in 5296) [ClassicSimilarity], result of:
            0.029500483 = score(doc=5296,freq=1.0), product of:
              0.11680809 = queryWeight, product of:
                1.2765727 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.022643898 = queryNorm
              0.25255513 = fieldWeight in 5296, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=5296)
          0.1372357 = weight(abstract_txt:discovery in 5296) [ClassicSimilarity], result of:
            0.1372357 = score(doc=5296,freq=3.0), product of:
              0.22569633 = queryWeight, product of:
                1.7744809 = boost
                5.616968 = idf(docFreq=438, maxDocs=44421)
                0.022643898 = queryNorm
              0.60805464 = fieldWeight in 5296, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.616968 = idf(docFreq=438, maxDocs=44421)
                0.0625 = fieldNorm(doc=5296)
          0.044515382 = weight(abstract_txt:process in 5296) [ClassicSimilarity], result of:
            0.044515382 = score(doc=5296,freq=1.0), product of:
              0.17591006 = queryWeight, product of:
                1.918669 = boost
                4.048922 = idf(docFreq=2105, maxDocs=44421)
                0.022643898 = queryNorm
              0.25305763 = fieldWeight in 5296, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.048922 = idf(docFreq=2105, maxDocs=44421)
                0.0625 = fieldNorm(doc=5296)
          0.23505607 = weight(abstract_txt:mining in 5296) [ClassicSimilarity], result of:
            0.23505607 = score(doc=5296,freq=5.0), product of:
              0.2725071 = queryWeight, product of:
                1.9498357 = boost
                6.1720386 = idf(docFreq=251, maxDocs=44421)
                0.022643898 = queryNorm
              0.8625686 = fieldWeight in 5296, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.1720386 = idf(docFreq=251, maxDocs=44421)
                0.0625 = fieldNorm(doc=5296)
        0.24 = coord(6/25)
    
  5. Principles of data mining and knowledge discovery (1998) 0.12
    0.119032405 = sum of:
      0.119032405 = product of:
        0.59516203 = sum of:
          0.028897451 = weight(abstract_txt:data in 4822) [ClassicSimilarity], result of:
            0.028897451 = score(doc=4822,freq=1.0), product of:
              0.07933555 = queryWeight, product of:
                1.0520661 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.022643898 = queryNorm
              0.36424342 = fieldWeight in 4822, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.109375 = fieldNorm(doc=4822)
          0.051625844 = weight(abstract_txt:text in 4822) [ClassicSimilarity], result of:
            0.051625844 = score(doc=4822,freq=1.0), product of:
              0.11680809 = queryWeight, product of:
                1.2765727 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.022643898 = queryNorm
              0.44197148 = fieldWeight in 4822, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.109375 = fieldNorm(doc=4822)
          0.07790192 = weight(abstract_txt:process in 4822) [ClassicSimilarity], result of:
            0.07790192 = score(doc=4822,freq=1.0), product of:
              0.17591006 = queryWeight, product of:
                1.918669 = boost
                4.048922 = idf(docFreq=2105, maxDocs=44421)
                0.022643898 = queryNorm
              0.44285086 = fieldWeight in 4822, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.048922 = idf(docFreq=2105, maxDocs=44421)
                0.109375 = fieldNorm(doc=4822)
          0.17657742 = weight(abstract_txt:volume in 4822) [ClassicSimilarity], result of:
            0.17657742 = score(doc=4822,freq=1.0), product of:
              0.26516625 = queryWeight, product of:
                1.9233938 = boost
                6.0883393 = idf(docFreq=273, maxDocs=44421)
                0.022643898 = queryNorm
              0.6659121 = fieldWeight in 4822, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0883393 = idf(docFreq=273, maxDocs=44421)
                0.109375 = fieldNorm(doc=4822)
          0.2601594 = weight(abstract_txt:mining in 4822) [ClassicSimilarity], result of:
            0.2601594 = score(doc=4822,freq=2.0), product of:
              0.2725071 = queryWeight, product of:
                1.9498357 = boost
                6.1720386 = idf(docFreq=251, maxDocs=44421)
                0.022643898 = queryNorm
              0.9546885 = fieldWeight in 4822, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1720386 = idf(docFreq=251, maxDocs=44421)
                0.109375 = fieldNorm(doc=4822)
        0.2 = coord(5/25)