Document (#28326)

Author
Goren-Bar, D.
Kuflik, T.
Title
Supporting user-subjective categorization with self-organizing maps and learning vector quantization
Source
Journal of the American Society for Information Science and Technology. 56(2005) no.4, S.345-355
Year
2005
Abstract
Today, most document categorization in organizations is done manually. We save at work hundreds of files and e-mail messages in folders every day. While automatic document categorization has been widely studied, much challenging research still remains to support usersubjective categorization. This study evaluates and compares the application of self-organizing maps (SOMs) and learning vector quantization (LVO) with automatic document classification, using a set of documents from an organization, in a specific domain, manually classified by a domain expert. After running the SOM and LVO we requested the user to reclassify documents that were misclassified by the system. Results show that despite the subjective nature of human categorization, automatic document categorization methods correlate weIl with subjective, personal categorization, and the LVO method outperforms the SOM. The reclassification process revealed an interesting pattern: About 40% of the documents were classified according to their original categorization, about 35% according to the system's categorization (the users changed the original categorization), and the remainder received a different (new) categorization. Based an these results we conclude that automatic support for subjective categorization is feasible; however, an exact match is probably impossible due to the users' changing categorization behavior.
Object
SOM

Similar documents (author)

  1. Kuflik, T.; Shapira, B.; Shoval, P.: Stereotype-based versus personal-based filtering rules in information filtering systems (2003) 3.72
    3.7161405 = sum of:
      3.7161405 = weight(author_txt:kuflik in 2234) [ClassicSimilarity], result of:
        3.7161405 = fieldWeight in 2234, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.909708 = idf(docFreq=5, maxDocs=44421)
          0.375 = fieldNorm(doc=2234)
    
  2. Minkov, E.; Kahanov, K.; Kuflik, T.: Graph-based recommendation integrating rating history and domain knowledge : application to on-site guidance of museum visitors (2017) 3.72
    3.7161405 = sum of:
      3.7161405 = weight(author_txt:kuflik in 4756) [ClassicSimilarity], result of:
        3.7161405 = fieldWeight in 4756, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.909708 = idf(docFreq=5, maxDocs=44421)
          0.375 = fieldNorm(doc=4756)
    
  3. Shapira, B.; Elovici, Y.; Meshiach, A.; Kuflik, T.: PRAW-A PRivAcy model for the Web (2005) 3.10
    3.0967836 = sum of:
      3.0967836 = weight(author_txt:kuflik in 4309) [ClassicSimilarity], result of:
        3.0967836 = fieldWeight in 4309, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.909708 = idf(docFreq=5, maxDocs=44421)
          0.3125 = fieldNorm(doc=4309)
    
  4. Paramita, M.L.; Kasinidou, M.; Kleanthous, S.; Rosso, P.; Kuflik, T.; Hopfgartner, F.: Towards improving user awareness of search engine biases : a participatory design approach (2024) 2.48
    2.477427 = sum of:
      2.477427 = weight(author_txt:kuflik in 2274) [ClassicSimilarity], result of:
        2.477427 = fieldWeight in 2274, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.909708 = idf(docFreq=5, maxDocs=44421)
          0.25 = fieldNorm(doc=2274)
    

Similar documents (content)

  1. Roitblat, H.L.; Kershaw, A.; Oot, P.: Document categorization in legal electronic discovery : computer classification vs. manual review (2009) 0.22
    0.22026233 = sum of:
      0.22026233 = product of:
        0.9177597 = sum of:
          0.045144066 = weight(abstract_txt:requested in 301) [ClassicSimilarity], result of:
            0.045144066 = score(doc=301,freq=1.0), product of:
              0.09069856 = queryWeight, product of:
                1.035048 = boost
                7.963798 = idf(docFreq=41, maxDocs=44421)
                0.011003217 = queryNorm
              0.49773738 = fieldWeight in 301, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.963798 = idf(docFreq=41, maxDocs=44421)
                0.0625 = fieldNorm(doc=301)
          0.014949179 = weight(abstract_txt:support in 301) [ClassicSimilarity], result of:
            0.014949179 = score(doc=301,freq=1.0), product of:
              0.05469591 = queryWeight, product of:
                1.1367179 = boost
                4.37303 = idf(docFreq=1522, maxDocs=44421)
                0.011003217 = queryNorm
              0.2733144 = fieldWeight in 301, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.37303 = idf(docFreq=1522, maxDocs=44421)
                0.0625 = fieldNorm(doc=301)
          0.028187636 = weight(abstract_txt:original in 301) [ClassicSimilarity], result of:
            0.028187636 = score(doc=301,freq=1.0), product of:
              0.08347999 = queryWeight, product of:
                1.4043213 = boost
                5.4025183 = idf(docFreq=543, maxDocs=44421)
                0.011003217 = queryNorm
              0.3376574 = fieldWeight in 301, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4025183 = idf(docFreq=543, maxDocs=44421)
                0.0625 = fieldNorm(doc=301)
          0.04604459 = weight(abstract_txt:documents in 301) [ClassicSimilarity], result of:
            0.04604459 = score(doc=301,freq=6.0), product of:
              0.07294167 = queryWeight, product of:
                1.6077139 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.011003217 = queryNorm
              0.6312522 = fieldWeight in 301, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.0625 = fieldNorm(doc=301)
          0.040035617 = weight(abstract_txt:document in 301) [ClassicSimilarity], result of:
            0.040035617 = score(doc=301,freq=2.0), product of:
              0.10548101 = queryWeight, product of:
                2.2324278 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.011003217 = queryNorm
              0.3795528 = fieldWeight in 301, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.0625 = fieldNorm(doc=301)
          0.7433986 = weight(abstract_txt:categorization in 301) [ClassicSimilarity], result of:
            0.7433986 = score(doc=301,freq=5.0), product of:
              0.8072454 = queryWeight, product of:
                11.133577 = boost
                6.58948 = idf(docFreq=165, maxDocs=44421)
                0.011003217 = queryNorm
              0.92090786 = fieldWeight in 301, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.58948 = idf(docFreq=165, maxDocs=44421)
                0.0625 = fieldNorm(doc=301)
        0.24 = coord(6/25)
    
  2. Westman, S.; Laine-Hernandez, M.; Oittinen, P.: Development and evaluation of a multifaceted magazine image categorization model (2011) 0.17
    0.16948266 = sum of:
      0.16948266 = product of:
        0.8474133 = sum of:
          0.0107328165 = weight(abstract_txt:about in 193) [ClassicSimilarity], result of:
            0.0107328165 = score(doc=193,freq=1.0), product of:
              0.043854997 = queryWeight, product of:
                1.0178524 = boost
                3.9157467 = idf(docFreq=2405, maxDocs=44421)
                0.011003217 = queryNorm
              0.24473417 = fieldWeight in 193, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9157467 = idf(docFreq=2405, maxDocs=44421)
                0.0625 = fieldNorm(doc=193)
          0.014949179 = weight(abstract_txt:support in 193) [ClassicSimilarity], result of:
            0.014949179 = score(doc=193,freq=1.0), product of:
              0.05469591 = queryWeight, product of:
                1.1367179 = boost
                4.37303 = idf(docFreq=1522, maxDocs=44421)
                0.011003217 = queryNorm
              0.2733144 = fieldWeight in 193, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.37303 = idf(docFreq=1522, maxDocs=44421)
                0.0625 = fieldNorm(doc=193)
          0.028187636 = weight(abstract_txt:original in 193) [ClassicSimilarity], result of:
            0.028187636 = score(doc=193,freq=1.0), product of:
              0.08347999 = queryWeight, product of:
                1.4043213 = boost
                5.4025183 = idf(docFreq=543, maxDocs=44421)
                0.011003217 = queryNorm
              0.3376574 = fieldWeight in 193, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4025183 = idf(docFreq=543, maxDocs=44421)
                0.0625 = fieldNorm(doc=193)
          0.050145034 = weight(abstract_txt:automatic in 193) [ClassicSimilarity], result of:
            0.050145034 = score(doc=193,freq=1.0), product of:
              0.1544206 = queryWeight, product of:
                2.701114 = boost
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.011003217 = queryNorm
              0.32473022 = fieldWeight in 193, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.0625 = fieldNorm(doc=193)
          0.7433986 = weight(abstract_txt:categorization in 193) [ClassicSimilarity], result of:
            0.7433986 = score(doc=193,freq=5.0), product of:
              0.8072454 = queryWeight, product of:
                11.133577 = boost
                6.58948 = idf(docFreq=165, maxDocs=44421)
                0.011003217 = queryNorm
              0.92090786 = fieldWeight in 193, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.58948 = idf(docFreq=165, maxDocs=44421)
                0.0625 = fieldNorm(doc=193)
        0.2 = coord(5/25)
    
  3. Sah, M.; Wade, V.: Personalized concept-based search on the Linked Open Data (2015) 0.16
    0.16269785 = sum of:
      0.16269785 = product of:
        0.81348926 = sum of:
          0.018498667 = weight(abstract_txt:support in 3511) [ClassicSimilarity], result of:
            0.018498667 = score(doc=3511,freq=2.0), product of:
              0.05469591 = queryWeight, product of:
                1.1367179 = boost
                4.37303 = idf(docFreq=1522, maxDocs=44421)
                0.011003217 = queryNorm
              0.3382093 = fieldWeight in 3511, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.37303 = idf(docFreq=1522, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3511)
          0.016540464 = weight(abstract_txt:domain in 3511) [ClassicSimilarity], result of:
            0.016540464 = score(doc=3511,freq=1.0), product of:
              0.0639592 = queryWeight, product of:
                1.229212 = boost
                4.7288613 = idf(docFreq=1066, maxDocs=44421)
                0.011003217 = queryNorm
              0.2586096 = fieldWeight in 3511, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7288613 = idf(docFreq=1066, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3511)
          0.02254497 = weight(abstract_txt:according in 3511) [ClassicSimilarity], result of:
            0.02254497 = score(doc=3511,freq=1.0), product of:
              0.07862688 = queryWeight, product of:
                1.36289 = boost
                5.2431293 = idf(docFreq=637, maxDocs=44421)
                0.011003217 = queryNorm
              0.28673363 = fieldWeight in 3511, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2431293 = idf(docFreq=637, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3511)
          0.043346852 = weight(abstract_txt:vector in 3511) [ClassicSimilarity], result of:
            0.043346852 = score(doc=3511,freq=1.0), product of:
              0.12157463 = queryWeight, product of:
                1.6947154 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.011003217 = queryNorm
              0.3565452 = fieldWeight in 3511, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3511)
          0.7125583 = weight(abstract_txt:categorization in 3511) [ClassicSimilarity], result of:
            0.7125583 = score(doc=3511,freq=6.0), product of:
              0.8072454 = queryWeight, product of:
                11.133577 = boost
                6.58948 = idf(docFreq=165, maxDocs=44421)
                0.011003217 = queryNorm
              0.8827035 = fieldWeight in 3511, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.58948 = idf(docFreq=165, maxDocs=44421)
                0.0546875 = fieldNorm(doc=3511)
        0.2 = coord(5/25)
    
  4. Kim, J.-H.; Choi, K.-S.: Patent document categorization based on semantic structural information (2007) 0.16
    0.1625322 = sum of:
      0.1625322 = product of:
        1.0158262 = sum of:
          0.04604459 = weight(abstract_txt:documents in 1933) [ClassicSimilarity], result of:
            0.04604459 = score(doc=1933,freq=6.0), product of:
              0.07294167 = queryWeight, product of:
                1.6077139 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.011003217 = queryNorm
              0.6312522 = fieldWeight in 1933, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.0625 = fieldNorm(doc=1933)
          0.040035617 = weight(abstract_txt:document in 1933) [ClassicSimilarity], result of:
            0.040035617 = score(doc=1933,freq=2.0), product of:
              0.10548101 = queryWeight, product of:
                2.2324278 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.011003217 = queryNorm
              0.3795528 = fieldWeight in 1933, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.0625 = fieldNorm(doc=1933)
          0.050145034 = weight(abstract_txt:automatic in 1933) [ClassicSimilarity], result of:
            0.050145034 = score(doc=1933,freq=1.0), product of:
              0.1544206 = queryWeight, product of:
                2.701114 = boost
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.011003217 = queryNorm
              0.32473022 = fieldWeight in 1933, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.0625 = fieldNorm(doc=1933)
          0.879601 = weight(abstract_txt:categorization in 1933) [ClassicSimilarity], result of:
            0.879601 = score(doc=1933,freq=7.0), product of:
              0.8072454 = queryWeight, product of:
                11.133577 = boost
                6.58948 = idf(docFreq=165, maxDocs=44421)
                0.011003217 = queryNorm
              1.0896327 = fieldWeight in 1933, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.58948 = idf(docFreq=165, maxDocs=44421)
                0.0625 = fieldNorm(doc=1933)
        0.16 = coord(4/25)
    
  5. Humphrey, S.M.; Névéol, A.; Browne, A.; Gobeil, J.; Ruch, P.; Darmoni, S.J.: Comparing a rule-based versus statistical system for automatic categorization of MEDLINE documents according to biomedical specialty (2009) 0.16
    0.16249385 = sum of:
      0.16249385 = product of:
        0.81246924 = sum of:
          0.0107328165 = weight(abstract_txt:about in 287) [ClassicSimilarity], result of:
            0.0107328165 = score(doc=287,freq=1.0), product of:
              0.043854997 = queryWeight, product of:
                1.0178524 = boost
                3.9157467 = idf(docFreq=2405, maxDocs=44421)
                0.011003217 = queryNorm
              0.24473417 = fieldWeight in 287, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9157467 = idf(docFreq=2405, maxDocs=44421)
                0.0625 = fieldNorm(doc=287)
          0.03759525 = weight(abstract_txt:documents in 287) [ClassicSimilarity], result of:
            0.03759525 = score(doc=287,freq=4.0), product of:
              0.07294167 = queryWeight, product of:
                1.6077139 = boost
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.011003217 = queryNorm
              0.51541525 = fieldWeight in 287, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.123322 = idf(docFreq=1954, maxDocs=44421)
                0.0625 = fieldNorm(doc=287)
          0.028309455 = weight(abstract_txt:document in 287) [ClassicSimilarity], result of:
            0.028309455 = score(doc=287,freq=1.0), product of:
              0.10548101 = queryWeight, product of:
                2.2324278 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.011003217 = queryNorm
              0.26838437 = fieldWeight in 287, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.0625 = fieldNorm(doc=287)
          0.07091579 = weight(abstract_txt:automatic in 287) [ClassicSimilarity], result of:
            0.07091579 = score(doc=287,freq=2.0), product of:
              0.1544206 = queryWeight, product of:
                2.701114 = boost
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.011003217 = queryNorm
              0.45923787 = fieldWeight in 287, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.1956835 = idf(docFreq=668, maxDocs=44421)
                0.0625 = fieldNorm(doc=287)
          0.6649159 = weight(abstract_txt:categorization in 287) [ClassicSimilarity], result of:
            0.6649159 = score(doc=287,freq=4.0), product of:
              0.8072454 = queryWeight, product of:
                11.133577 = boost
                6.58948 = idf(docFreq=165, maxDocs=44421)
                0.011003217 = queryNorm
              0.823685 = fieldWeight in 287, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.58948 = idf(docFreq=165, maxDocs=44421)
                0.0625 = fieldNorm(doc=287)
        0.2 = coord(5/25)