Document (#37092)

Author
Berry, M.W.
Esau, R.
Kiefer, B.
Title
¬The use of text mining techniques in electronic discovery for legal matters
Source
Next generation search engines: advanced models for information retrieval. Eds.: C. Jouis, u.a
Imprint
Hershey, PA : IGI Publishing
Year
2012
Pages
S.174-190
Abstract
Electronic discovery (eDiscovery) is the process of collecting and analyzing electronic documents to determine their relevance to a legal matter. Office technology has advanced and eased the requirements necessary to create a document. As such, the volume of data has outgrown the manual processes previously used to make relevance judgments. Methods of text mining and information retrieval have been put to use in eDiscovery to help tame the volume of data; however, the results have been uneven. This chapter looks at the historical bias of the collection process. The authors examine how tools like classifiers, latent semantic analysis, and non-negative matrix factorization deal with nuances of the collection process.
Footnote
Vgl.: http://www.igi-global.com/book/next-generation-search-engines/64425.
Theme
Data Mining

Similar documents (author)

  1. Berry, J.: CD-ROM: the medium for the moment (1992) 5.76
    5.7574883 = sum of:
      5.7574883 = weight(author_txt:berry in 3635) [ClassicSimilarity], result of:
        5.7574883 = fieldWeight in 3635, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.211981 = idf(docFreq=11, maxDocs=44218)
          0.625 = fieldNorm(doc=3635)
    
  2. Courtois, M.P.; Berry, M.W.: Results ranking in Web search engines (1999) 4.61
    4.6059904 = sum of:
      4.6059904 = weight(author_txt:berry in 3726) [ClassicSimilarity], result of:
        4.6059904 = fieldWeight in 3726, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.211981 = idf(docFreq=11, maxDocs=44218)
          0.5 = fieldNorm(doc=3726)
    
  3. Berry, M.W.; Browne, M.: Understanding search engines : mathematical modeling and text retrieval (1999) 4.61
    4.6059904 = sum of:
      4.6059904 = weight(author_txt:berry in 5777) [ClassicSimilarity], result of:
        4.6059904 = fieldWeight in 5777, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.211981 = idf(docFreq=11, maxDocs=44218)
          0.5 = fieldNorm(doc=5777)
    
  4. Berry, M.W.; Browne, M.: Understanding search engines : mathematical modeling and text retrieval (2005) 4.61
    4.6059904 = sum of:
      4.6059904 = weight(author_txt:berry in 7) [ClassicSimilarity], result of:
        4.6059904 = fieldWeight in 7, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.211981 = idf(docFreq=11, maxDocs=44218)
          0.5 = fieldNorm(doc=7)
    
  5. Martin, D.I.; Berry, M.W.: Latent Semantic Indexing (2009) 4.61
    4.6059904 = sum of:
      4.6059904 = weight(author_txt:berry in 3834) [ClassicSimilarity], result of:
        4.6059904 = fieldWeight in 3834, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.211981 = idf(docFreq=11, maxDocs=44218)
          0.5 = fieldNorm(doc=3834)
    

Similar documents (content)

  1. Mining text data (2012) 0.26
    0.26278168 = sum of:
      0.26278168 = product of:
        0.72994906 = sum of:
          0.0566441 = weight(abstract_txt:chapter in 362) [ClassicSimilarity], result of:
            0.0566441 = score(doc=362,freq=1.0), product of:
              0.14315563 = queryWeight, product of:
                6.330911 = idf(docFreq=213, maxDocs=44218)
                0.02261217 = queryNorm
              0.39568195 = fieldWeight in 362, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.330911 = idf(docFreq=213, maxDocs=44218)
                0.0625 = fieldNorm(doc=362)
          0.025448889 = weight(abstract_txt:have in 362) [ClassicSimilarity], result of:
            0.025448889 = score(doc=362,freq=3.0), product of:
              0.073359124 = queryWeight, product of:
                1.0123667 = boost
                3.2046018 = idf(docFreq=4876, maxDocs=44218)
                0.02261217 = queryNorm
              0.3469083 = fieldWeight in 362, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.2046018 = idf(docFreq=4876, maxDocs=44218)
                0.0625 = fieldNorm(doc=362)
          0.04061399 = weight(abstract_txt:data in 362) [ClassicSimilarity], result of:
            0.04061399 = score(doc=362,freq=6.0), product of:
              0.07951493 = queryWeight, product of:
                1.0539867 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.02261217 = queryNorm
              0.5107719 = fieldWeight in 362, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.0625 = fieldNorm(doc=362)
          0.021136845 = weight(abstract_txt:been in 362) [ClassicSimilarity], result of:
            0.021136845 = score(doc=362,freq=1.0), product of:
              0.093485035 = queryWeight, product of:
                1.1428305 = boost
                3.617579 = idf(docFreq=3226, maxDocs=44218)
                0.02261217 = queryNorm
              0.22609869 = fieldWeight in 362, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.617579 = idf(docFreq=3226, maxDocs=44218)
                0.0625 = fieldNorm(doc=362)
          0.072318956 = weight(abstract_txt:text in 362) [ClassicSimilarity], result of:
            0.072318956 = score(doc=362,freq=6.0), product of:
              0.116815284 = queryWeight, product of:
                1.2774991 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.02261217 = queryNorm
              0.6190881 = fieldWeight in 362, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=362)
          0.04452129 = weight(abstract_txt:process in 362) [ClassicSimilarity], result of:
            0.04452129 = score(doc=362,freq=1.0), product of:
              0.17584266 = queryWeight, product of:
                1.9196343 = boost
                4.0510116 = idf(docFreq=2091, maxDocs=44218)
                0.02261217 = queryNorm
              0.25318822 = fieldWeight in 362, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0510116 = idf(docFreq=2091, maxDocs=44218)
                0.0625 = fieldNorm(doc=362)
          0.10126229 = weight(abstract_txt:volume in 362) [ClassicSimilarity], result of:
            0.10126229 = score(doc=362,freq=1.0), product of:
              0.26567283 = queryWeight, product of:
                1.926568 = boost
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.02261217 = queryNorm
              0.3811541 = fieldWeight in 362, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.0625 = fieldNorm(doc=362)
          0.31543374 = weight(abstract_txt:mining in 362) [ClassicSimilarity], result of:
            0.31543374 = score(doc=362,freq=9.0), product of:
              0.2724206 = queryWeight, product of:
                1.9508808 = boost
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.02261217 = queryNorm
              1.1578925 = fieldWeight in 362, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.0625 = fieldNorm(doc=362)
          0.052568994 = weight(abstract_txt:electronic in 362) [ClassicSimilarity], result of:
            0.052568994 = score(doc=362,freq=1.0), product of:
              0.19644102 = queryWeight, product of:
                2.0289555 = boost
                4.281712 = idf(docFreq=1660, maxDocs=44218)
                0.02261217 = queryNorm
              0.267607 = fieldWeight in 362, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.281712 = idf(docFreq=1660, maxDocs=44218)
                0.0625 = fieldNorm(doc=362)
        0.36 = coord(9/25)
    
  2. Tonkin, E.L.; Tourte, G.J.L.: Working with text. tools, techniques and approaches for text mining (2016) 0.16
    0.15585731 = sum of:
      0.15585731 = product of:
        0.6494055 = sum of:
          0.014692924 = weight(abstract_txt:have in 4019) [ClassicSimilarity], result of:
            0.014692924 = score(doc=4019,freq=1.0), product of:
              0.073359124 = queryWeight, product of:
                1.0123667 = boost
                3.2046018 = idf(docFreq=4876, maxDocs=44218)
                0.02261217 = queryNorm
              0.20028761 = fieldWeight in 4019, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.2046018 = idf(docFreq=4876, maxDocs=44218)
                0.0625 = fieldNorm(doc=4019)
          0.016580591 = weight(abstract_txt:data in 4019) [ClassicSimilarity], result of:
            0.016580591 = score(doc=4019,freq=1.0), product of:
              0.07951493 = queryWeight, product of:
                1.0539867 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.02261217 = queryNorm
              0.20852174 = fieldWeight in 4019, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.0625 = fieldNorm(doc=4019)
          0.09336337 = weight(abstract_txt:text in 4019) [ClassicSimilarity], result of:
            0.09336337 = score(doc=4019,freq=10.0), product of:
              0.116815284 = queryWeight, product of:
                1.2774991 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.02261217 = queryNorm
              0.79923934 = fieldWeight in 4019, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=4019)
          0.053556856 = weight(abstract_txt:relevance in 4019) [ClassicSimilarity], result of:
            0.053556856 = score(doc=4019,freq=1.0), product of:
              0.17375022 = queryWeight, product of:
                1.5580215 = boost
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.02261217 = queryNorm
              0.3082405 = fieldWeight in 4019, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.0625 = fieldNorm(doc=4019)
          0.31543374 = weight(abstract_txt:mining in 4019) [ClassicSimilarity], result of:
            0.31543374 = score(doc=4019,freq=9.0), product of:
              0.2724206 = queryWeight, product of:
                1.9508808 = boost
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.02261217 = queryNorm
              1.1578925 = fieldWeight in 4019, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.0625 = fieldNorm(doc=4019)
          0.155778 = weight(abstract_txt:legal in 4019) [ClassicSimilarity], result of:
            0.155778 = score(doc=4019,freq=2.0), product of:
              0.281002 = queryWeight, product of:
                1.9813696 = boost
                6.2719374 = idf(docFreq=226, maxDocs=44218)
                0.02261217 = queryNorm
              0.5543662 = fieldWeight in 4019, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.2719374 = idf(docFreq=226, maxDocs=44218)
                0.0625 = fieldNorm(doc=4019)
        0.24 = coord(6/25)
    
  3. Chen, Y.-L.; Liu, Y.-H.; Ho, W.-L.: ¬A text mining approach to assist the general public in the retrieval of legal documents (2013) 0.14
    0.14250927 = sum of:
      0.14250927 = product of:
        0.7125463 = sum of:
          0.12923513 = weight(abstract_txt:judgments in 521) [ClassicSimilarity], result of:
            0.12923513 = score(doc=521,freq=2.0), product of:
              0.16969761 = queryWeight, product of:
                1.0887637 = boost
                6.892866 = idf(docFreq=121, maxDocs=44218)
                0.02261217 = queryNorm
              0.76156133 = fieldWeight in 521, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.892866 = idf(docFreq=121, maxDocs=44218)
                0.078125 = fieldNorm(doc=521)
          0.037365016 = weight(abstract_txt:been in 521) [ClassicSimilarity], result of:
            0.037365016 = score(doc=521,freq=2.0), product of:
              0.093485035 = queryWeight, product of:
                1.1428305 = boost
                3.617579 = idf(docFreq=3226, maxDocs=44218)
                0.02261217 = queryNorm
              0.3996898 = fieldWeight in 521, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.617579 = idf(docFreq=3226, maxDocs=44218)
                0.078125 = fieldNorm(doc=521)
          0.052191712 = weight(abstract_txt:text in 521) [ClassicSimilarity], result of:
            0.052191712 = score(doc=521,freq=2.0), product of:
              0.116815284 = queryWeight, product of:
                1.2774991 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.02261217 = queryNorm
              0.44678837 = fieldWeight in 521, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=521)
          0.18587111 = weight(abstract_txt:mining in 521) [ClassicSimilarity], result of:
            0.18587111 = score(doc=521,freq=2.0), product of:
              0.2724206 = queryWeight, product of:
                1.9508808 = boost
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.02261217 = queryNorm
              0.68229467 = fieldWeight in 521, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.078125 = fieldNorm(doc=521)
          0.30788332 = weight(abstract_txt:legal in 521) [ClassicSimilarity], result of:
            0.30788332 = score(doc=521,freq=5.0), product of:
              0.281002 = queryWeight, product of:
                1.9813696 = boost
                6.2719374 = idf(docFreq=226, maxDocs=44218)
                0.02261217 = queryNorm
              1.0956624 = fieldWeight in 521, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.2719374 = idf(docFreq=226, maxDocs=44218)
                0.078125 = fieldNorm(doc=521)
        0.2 = coord(5/25)
    
  4. Benoit, G.: Data mining (2002) 0.13
    0.13046934 = sum of:
      0.13046934 = product of:
        0.54362226 = sum of:
          0.0566441 = weight(abstract_txt:chapter in 4296) [ClassicSimilarity], result of:
            0.0566441 = score(doc=4296,freq=1.0), product of:
              0.14315563 = queryWeight, product of:
                6.330911 = idf(docFreq=213, maxDocs=44218)
                0.02261217 = queryNorm
              0.39568195 = fieldWeight in 4296, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.330911 = idf(docFreq=213, maxDocs=44218)
                0.0625 = fieldNorm(doc=4296)
          0.04061399 = weight(abstract_txt:data in 4296) [ClassicSimilarity], result of:
            0.04061399 = score(doc=4296,freq=6.0), product of:
              0.07951493 = queryWeight, product of:
                1.0539867 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.02261217 = queryNorm
              0.5107719 = fieldWeight in 4296, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.0625 = fieldNorm(doc=4296)
          0.02952409 = weight(abstract_txt:text in 4296) [ClassicSimilarity], result of:
            0.02952409 = score(doc=4296,freq=1.0), product of:
              0.116815284 = queryWeight, product of:
                1.2774991 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.02261217 = queryNorm
              0.25274166 = fieldWeight in 4296, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=4296)
          0.13720836 = weight(abstract_txt:discovery in 4296) [ClassicSimilarity], result of:
            0.13720836 = score(doc=4296,freq=3.0), product of:
              0.22555993 = queryWeight, product of:
                1.7751774 = boost
                5.619245 = idf(docFreq=435, maxDocs=44218)
                0.02261217 = queryNorm
              0.6083011 = fieldWeight in 4296, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.619245 = idf(docFreq=435, maxDocs=44218)
                0.0625 = fieldNorm(doc=4296)
          0.04452129 = weight(abstract_txt:process in 4296) [ClassicSimilarity], result of:
            0.04452129 = score(doc=4296,freq=1.0), product of:
              0.17584266 = queryWeight, product of:
                1.9196343 = boost
                4.0510116 = idf(docFreq=2091, maxDocs=44218)
                0.02261217 = queryNorm
              0.25318822 = fieldWeight in 4296, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0510116 = idf(docFreq=2091, maxDocs=44218)
                0.0625 = fieldNorm(doc=4296)
          0.23511043 = weight(abstract_txt:mining in 4296) [ClassicSimilarity], result of:
            0.23511043 = score(doc=4296,freq=5.0), product of:
              0.2724206 = queryWeight, product of:
                1.9508808 = boost
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.02261217 = queryNorm
              0.8630421 = fieldWeight in 4296, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.0625 = fieldNorm(doc=4296)
        0.24 = coord(6/25)
    
  5. Principles of data mining and knowledge discovery (1998) 0.12
    0.11920481 = sum of:
      0.11920481 = product of:
        0.59602404 = sum of:
          0.029016035 = weight(abstract_txt:data in 3822) [ClassicSimilarity], result of:
            0.029016035 = score(doc=3822,freq=1.0), product of:
              0.07951493 = queryWeight, product of:
                1.0539867 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.02261217 = queryNorm
              0.36491305 = fieldWeight in 3822, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.109375 = fieldNorm(doc=3822)
          0.051667154 = weight(abstract_txt:text in 3822) [ClassicSimilarity], result of:
            0.051667154 = score(doc=3822,freq=1.0), product of:
              0.116815284 = queryWeight, product of:
                1.2774991 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.02261217 = queryNorm
              0.4422979 = fieldWeight in 3822, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.109375 = fieldNorm(doc=3822)
          0.077912256 = weight(abstract_txt:process in 3822) [ClassicSimilarity], result of:
            0.077912256 = score(doc=3822,freq=1.0), product of:
              0.17584266 = queryWeight, product of:
                1.9196343 = boost
                4.0510116 = idf(docFreq=2091, maxDocs=44218)
                0.02261217 = queryNorm
              0.44307938 = fieldWeight in 3822, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0510116 = idf(docFreq=2091, maxDocs=44218)
                0.109375 = fieldNorm(doc=3822)
          0.177209 = weight(abstract_txt:volume in 3822) [ClassicSimilarity], result of:
            0.177209 = score(doc=3822,freq=1.0), product of:
              0.26567283 = queryWeight, product of:
                1.926568 = boost
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.02261217 = queryNorm
              0.66701967 = fieldWeight in 3822, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.109375 = fieldNorm(doc=3822)
          0.26021954 = weight(abstract_txt:mining in 3822) [ClassicSimilarity], result of:
            0.26021954 = score(doc=3822,freq=2.0), product of:
              0.2724206 = queryWeight, product of:
                1.9508808 = boost
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.02261217 = queryNorm
              0.95521253 = fieldWeight in 3822, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.109375 = fieldNorm(doc=3822)
        0.2 = coord(5/25)