Document (#33999)

Author
Leroy, G.
Miller, T.
Rosemblat, G.
Browne, A.
Title
¬A balanced approach to health information evaluation : a vocabulary-based naïve Bayes classifier and readability formulas
Source
Journal of the American Society for Information Science and Technology. 59(2008) no.9, S.1409-1419
Year
2008
Abstract
Since millions seek health information online, it is vital for this information to be comprehensible. Most studies use readability formulas, which ignore vocabulary, and conclude that online health information is too difficult. We developed a vocabularly-based, naïve Bayes classifier to distinguish between three difficulty levels in text. It proved 98% accurate in a 250-document evaluation. We compared our classifier with readability formulas for 90 new documents with different origins and asked representative human evaluators, an expert and a consumer, to judge each document. Average readability grade levels for educational and commercial pages was 10th grade or higher, too difficult according to current literature. In contrast, the classifier showed that 70-90% of these pages were written at an intermediate, appropriate level indicating that vocabulary usage is frequently appropriate in text considered too difficult by readability formula evaluations. The expert considered the pages more difficult for a consumer than the consumer did.
Theme
Automatisches Klassifizieren
Field
Medizin

Similar documents (author)

  1. Rosemblat, G.; Graham, L.: Cross-language search in a monolingual health information system : flexible designs and lexical processes (2006) 0.72
    0.72356004 = sum of:
      0.72356004 = product of:
        2.8942401 = sum of:
          2.8942401 = weight(author_txt:rosemblat in 1241) [ClassicSimilarity], result of:
            2.8942401 = score(doc=1241,freq=1.0), product of:
              0.5841222 = queryWeight, product of:
                1.3884745 = boost
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.042452663 = queryNorm
              4.954854 = fieldWeight in 1241, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.909708 = idf(docFreq=5, maxDocs=44421)
                0.5 = fieldNorm(doc=1241)
        0.25 = coord(1/4)
    
  2. Browne, G.: Scope notes for LISA subject headings (1992) 0.71
    0.70883346 = sum of:
      0.70883346 = product of:
        2.8353338 = sum of:
          2.8353338 = weight(author_txt:browne in 1498) [ClassicSimilarity], result of:
            2.8353338 = score(doc=1498,freq=1.0), product of:
              0.49652764 = queryWeight, product of:
                1.2801409 = boost
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.042452663 = queryNorm
              5.7103243 = fieldWeight in 1498, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.625 = fieldNorm(doc=1498)
        0.25 = coord(1/4)
    
  3. Browne, G.: Professional liability of indexers (1996) 0.71
    0.70883346 = sum of:
      0.70883346 = product of:
        2.8353338 = sum of:
          2.8353338 = weight(author_txt:browne in 4643) [ClassicSimilarity], result of:
            2.8353338 = score(doc=4643,freq=1.0), product of:
              0.49652764 = queryWeight, product of:
                1.2801409 = boost
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.042452663 = queryNorm
              5.7103243 = fieldWeight in 4643, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.625 = fieldNorm(doc=4643)
        0.25 = coord(1/4)
    
  4. Browne, G.: ¬The definite article : acknowledging The in index entries (2001) 0.71
    0.70883346 = sum of:
      0.70883346 = product of:
        2.8353338 = sum of:
          2.8353338 = weight(author_txt:browne in 1012) [ClassicSimilarity], result of:
            2.8353338 = score(doc=1012,freq=1.0), product of:
              0.49652764 = queryWeight, product of:
                1.2801409 = boost
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.042452663 = queryNorm
              5.7103243 = fieldWeight in 1012, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.625 = fieldNorm(doc=1012)
        0.25 = coord(1/4)
    
  5. Browne, G.: Changes in website indexing (2007) 0.71
    0.70883346 = sum of:
      0.70883346 = product of:
        2.8353338 = sum of:
          2.8353338 = weight(author_txt:browne in 1747) [ClassicSimilarity], result of:
            2.8353338 = score(doc=1747,freq=1.0), product of:
              0.49652764 = queryWeight, product of:
                1.2801409 = boost
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.042452663 = queryNorm
              5.7103243 = fieldWeight in 1747, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.625 = fieldNorm(doc=1747)
        0.25 = coord(1/4)
    

Similar documents (content)

  1. Collins-Thompson, K.; Callan, J.: Predicting reading difficulty with statistical language models (2005) 0.28
    0.27930123 = sum of:
      0.27930123 = product of:
        0.8728163 = sum of:
          0.023559302 = weight(abstract_txt:text in 5579) [ClassicSimilarity], result of:
            0.023559302 = score(doc=5579,freq=3.0), product of:
              0.053857427 = queryWeight, product of:
                1.0348437 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.012879372 = queryNorm
              0.4374383 = fieldWeight in 5579, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=5579)
          0.016323183 = weight(abstract_txt:document in 5579) [ClassicSimilarity], result of:
            0.016323183 = score(doc=5579,freq=1.0), product of:
              0.06082017 = queryWeight, product of:
                1.0997039 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.012879372 = queryNorm
              0.26838437 = fieldWeight in 5579, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.0625 = fieldNorm(doc=5579)
          0.0058352137 = weight(abstract_txt:information in 5579) [ClassicSimilarity], result of:
            0.0058352137 = score(doc=5579,freq=1.0), product of:
              0.038597476 = queryWeight, product of:
                1.238929 = boost
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.012879372 = queryNorm
              0.15118122 = fieldWeight in 5579, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.0625 = fieldNorm(doc=5579)
          0.04131876 = weight(abstract_txt:levels in 5579) [ClassicSimilarity], result of:
            0.04131876 = score(doc=5579,freq=2.0), product of:
              0.08966011 = queryWeight, product of:
                1.3352162 = boost
                5.2137837 = idf(docFreq=656, maxDocs=44421)
                0.012879372 = queryNorm
              0.46083772 = fieldWeight in 5579, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.2137837 = idf(docFreq=656, maxDocs=44421)
                0.0625 = fieldNorm(doc=5579)
          0.13996449 = weight(abstract_txt:grade in 5579) [ClassicSimilarity], result of:
            0.13996449 = score(doc=5579,freq=2.0), product of:
              0.20223035 = queryWeight, product of:
                2.0052805 = boost
                7.8302665 = idf(docFreq=47, maxDocs=44421)
                0.012879372 = queryNorm
              0.6921043 = fieldWeight in 5579, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.8302665 = idf(docFreq=47, maxDocs=44421)
                0.0625 = fieldNorm(doc=5579)
          0.07702928 = weight(abstract_txt:pages in 5579) [ClassicSimilarity], result of:
            0.07702928 = score(doc=5579,freq=2.0), product of:
              0.15546599 = queryWeight, product of:
                2.1533532 = boost
                5.6056433 = idf(docFreq=443, maxDocs=44421)
                0.012879372 = queryNorm
              0.49547353 = fieldWeight in 5579, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.6056433 = idf(docFreq=443, maxDocs=44421)
                0.0625 = fieldNorm(doc=5579)
          0.15692773 = weight(abstract_txt:classifier in 5579) [ClassicSimilarity], result of:
            0.15692773 = score(doc=5579,freq=1.0), product of:
              0.34646088 = queryWeight, product of:
                3.7118812 = boost
                7.2471204 = idf(docFreq=85, maxDocs=44421)
                0.012879372 = queryNorm
              0.45294502 = fieldWeight in 5579, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2471204 = idf(docFreq=85, maxDocs=44421)
                0.0625 = fieldNorm(doc=5579)
          0.41185835 = weight(abstract_txt:readability in 5579) [ClassicSimilarity], result of:
            0.41185835 = score(doc=5579,freq=2.0), product of:
              0.56361127 = queryWeight, product of:
                5.2931204 = boost
                8.267481 = idf(docFreq=30, maxDocs=44421)
                0.012879372 = queryNorm
              0.73074895 = fieldWeight in 5579, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.267481 = idf(docFreq=30, maxDocs=44421)
                0.0625 = fieldNorm(doc=5579)
        0.32 = coord(8/25)
    
  2. Denning, J.; Pera, M.S.; Ng, Y.-K.: ¬A readability level prediction tool for K-12 books (2016) 0.24
    0.24250332 = sum of:
      0.24250332 = product of:
        1.2125165 = sum of:
          0.01360197 = weight(abstract_txt:text in 3772) [ClassicSimilarity], result of:
            0.01360197 = score(doc=3772,freq=1.0), product of:
              0.053857427 = queryWeight, product of:
                1.0348437 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.012879372 = queryNorm
              0.25255513 = fieldWeight in 3772, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=3772)
          0.04131876 = weight(abstract_txt:levels in 3772) [ClassicSimilarity], result of:
            0.04131876 = score(doc=3772,freq=2.0), product of:
              0.08966011 = queryWeight, product of:
                1.3352162 = boost
                5.2137837 = idf(docFreq=656, maxDocs=44421)
                0.012879372 = queryNorm
              0.46083772 = fieldWeight in 3772, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.2137837 = idf(docFreq=656, maxDocs=44421)
                0.0625 = fieldNorm(doc=3772)
          0.13996449 = weight(abstract_txt:grade in 3772) [ClassicSimilarity], result of:
            0.13996449 = score(doc=3772,freq=2.0), product of:
              0.20223035 = queryWeight, product of:
                2.0052805 = boost
                7.8302665 = idf(docFreq=47, maxDocs=44421)
                0.012879372 = queryNorm
              0.6921043 = fieldWeight in 3772, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.8302665 = idf(docFreq=47, maxDocs=44421)
                0.0625 = fieldNorm(doc=3772)
          0.24711499 = weight(abstract_txt:formulas in 3772) [ClassicSimilarity], result of:
            0.24711499 = score(doc=3772,freq=2.0), product of:
              0.33816674 = queryWeight, product of:
                3.1758723 = boost
                8.267481 = idf(docFreq=30, maxDocs=44421)
                0.012879372 = queryNorm
              0.73074895 = fieldWeight in 3772, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.267481 = idf(docFreq=30, maxDocs=44421)
                0.0625 = fieldNorm(doc=3772)
          0.7705164 = weight(abstract_txt:readability in 3772) [ClassicSimilarity], result of:
            0.7705164 = score(doc=3772,freq=7.0), product of:
              0.56361127 = queryWeight, product of:
                5.2931204 = boost
                8.267481 = idf(docFreq=30, maxDocs=44421)
                0.012879372 = queryNorm
              1.3671061 = fieldWeight in 3772, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                8.267481 = idf(docFreq=30, maxDocs=44421)
                0.0625 = fieldNorm(doc=3772)
        0.2 = coord(5/25)
    
  3. Mengle, S.S.R.; Goharian, N.: Ambiguity measure feature-selection algorithm (2009) 0.20
    0.19778454 = sum of:
      0.19778454 = product of:
        0.8241023 = sum of:
          0.033317886 = weight(abstract_txt:text in 3804) [ClassicSimilarity], result of:
            0.033317886 = score(doc=3804,freq=6.0), product of:
              0.053857427 = queryWeight, product of:
                1.0348437 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.012879372 = queryNorm
              0.61863124 = fieldWeight in 3804, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=3804)
          0.023084467 = weight(abstract_txt:document in 3804) [ClassicSimilarity], result of:
            0.023084467 = score(doc=3804,freq=2.0), product of:
              0.06082017 = queryWeight, product of:
                1.0997039 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.012879372 = queryNorm
              0.3795528 = fieldWeight in 3804, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.0625 = fieldNorm(doc=3804)
          0.16876239 = weight(abstract_txt:naïve in 3804) [ClassicSimilarity], result of:
            0.16876239 = score(doc=3804,freq=2.0), product of:
              0.22909635 = queryWeight, product of:
                2.1343274 = boost
                8.334172 = idf(docFreq=28, maxDocs=44421)
                0.012879372 = queryNorm
              0.7366437 = fieldWeight in 3804, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.334172 = idf(docFreq=28, maxDocs=44421)
                0.0625 = fieldNorm(doc=3804)
          0.1805215 = weight(abstract_txt:bayes in 3804) [ClassicSimilarity], result of:
            0.1805215 = score(doc=3804,freq=2.0), product of:
              0.23961851 = queryWeight, product of:
                2.182791 = boost
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.012879372 = queryNorm
              0.75337046 = fieldWeight in 3804, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.523414 = idf(docFreq=23, maxDocs=44421)
                0.0625 = fieldNorm(doc=3804)
          0.067514986 = weight(abstract_txt:difficult in 3804) [ClassicSimilarity], result of:
            0.067514986 = score(doc=3804,freq=1.0), product of:
              0.1974488 = queryWeight, product of:
                2.8021684 = boost
                5.4709864 = idf(docFreq=507, maxDocs=44421)
                0.012879372 = queryNorm
              0.34193665 = fieldWeight in 3804, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4709864 = idf(docFreq=507, maxDocs=44421)
                0.0625 = fieldNorm(doc=3804)
          0.35090107 = weight(abstract_txt:classifier in 3804) [ClassicSimilarity], result of:
            0.35090107 = score(doc=3804,freq=5.0), product of:
              0.34646088 = queryWeight, product of:
                3.7118812 = boost
                7.2471204 = idf(docFreq=85, maxDocs=44421)
                0.012879372 = queryNorm
              1.0128158 = fieldWeight in 3804, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.2471204 = idf(docFreq=85, maxDocs=44421)
                0.0625 = fieldNorm(doc=3804)
        0.24 = coord(6/25)
    
  4. Azpiazu, I.M.; Soledad Pera, M.: Is cross-lingual readability assessment possible? (2020) 0.11
    0.11456811 = sum of:
      0.11456811 = product of:
        0.7160507 = sum of:
          0.007220708 = weight(abstract_txt:information in 868) [ClassicSimilarity], result of:
            0.007220708 = score(doc=868,freq=2.0), product of:
              0.038597476 = queryWeight, product of:
                1.238929 = boost
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.012879372 = queryNorm
              0.18707721 = fieldWeight in 868, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.0546875 = fieldNorm(doc=868)
          0.02556468 = weight(abstract_txt:levels in 868) [ClassicSimilarity], result of:
            0.02556468 = score(doc=868,freq=1.0), product of:
              0.08966011 = queryWeight, product of:
                1.3352162 = boost
                5.2137837 = idf(docFreq=656, maxDocs=44421)
                0.012879372 = queryNorm
              0.2851288 = fieldWeight in 868, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2137837 = idf(docFreq=656, maxDocs=44421)
                0.0546875 = fieldNorm(doc=868)
          0.059075613 = weight(abstract_txt:difficult in 868) [ClassicSimilarity], result of:
            0.059075613 = score(doc=868,freq=1.0), product of:
              0.1974488 = queryWeight, product of:
                2.8021684 = boost
                5.4709864 = idf(docFreq=507, maxDocs=44421)
                0.012879372 = queryNorm
              0.29919457 = fieldWeight in 868, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4709864 = idf(docFreq=507, maxDocs=44421)
                0.0546875 = fieldNorm(doc=868)
          0.6241897 = weight(abstract_txt:readability in 868) [ClassicSimilarity], result of:
            0.6241897 = score(doc=868,freq=6.0), product of:
              0.56361127 = queryWeight, product of:
                5.2931204 = boost
                8.267481 = idf(docFreq=30, maxDocs=44421)
                0.012879372 = queryNorm
              1.1074826 = fieldWeight in 868, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                8.267481 = idf(docFreq=30, maxDocs=44421)
                0.0546875 = fieldNorm(doc=868)
        0.16 = coord(4/25)
    
  5. Lantz, C.: Evaluating the readability of instructional visuals (1996) 0.10
    0.10208996 = sum of:
      0.10208996 = product of:
        0.8507497 = sum of:
          0.029449126 = weight(abstract_txt:text in 549) [ClassicSimilarity], result of:
            0.029449126 = score(doc=549,freq=3.0), product of:
              0.053857427 = queryWeight, product of:
                1.0348437 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.012879372 = queryNorm
              0.5467979 = fieldWeight in 549, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.078125 = fieldNorm(doc=549)
          0.007294017 = weight(abstract_txt:information in 549) [ClassicSimilarity], result of:
            0.007294017 = score(doc=549,freq=1.0), product of:
              0.038597476 = queryWeight, product of:
                1.238929 = boost
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.012879372 = queryNorm
              0.18897653 = fieldWeight in 549, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.078125 = fieldNorm(doc=549)
          0.8140065 = weight(abstract_txt:readability in 549) [ClassicSimilarity], result of:
            0.8140065 = score(doc=549,freq=5.0), product of:
              0.56361127 = queryWeight, product of:
                5.2931204 = boost
                8.267481 = idf(docFreq=30, maxDocs=44421)
                0.012879372 = queryNorm
              1.4442694 = fieldWeight in 549, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.267481 = idf(docFreq=30, maxDocs=44421)
                0.078125 = fieldNorm(doc=549)
        0.12 = coord(3/25)