Document (#25130)

Author
Polanco, X.
Francois, C.
Title
Data clustering and cluster mapping or visualization in text processing and mining
Source
Dynamism and stability in knowledge organization: Proceedings of the 6th International ISKO-Conference, 10-13 July 2000, Toronto, Canada. Ed.: C. Beghtol et al
Imprint
Würzburg : Ergon
Year
2000
Pages
S.359-365
Series
Advances in knowledge organization; vol.7
Abstract
The focus of this paper is on a cooperative use of the text data clustering and mapping as visualization-based analysis tools. Whether we expose a generic approach in text processing and mining, we only concentrate on the two-middle steps of the process: data clustering and cluster mapping. In the data clustering analysis step, we use the axial k-means (AKM) algorithm: an iterative partitioning unsupervised winner-take-all (WTA) method, producing overlapping clusters. In the step of mapping the clusters, we use a nonlinear multilayer perceptron (MLP) with two hidden layers. Finally, the map is proposed as an analysis device rather than of visualization. It allows the analyst to evaluate the relative position of clusters which are indicators of themes induced from data themselves.

Similar documents (author)

  1. Polanco, X.: Extraction et modélisation des connaissances : une approche et ses technologies (EMCAT) (1999) 6.19
    6.1935673 = sum of:
      6.1935673 = weight(author_txt:polanco in 241) [ClassicSimilarity], result of:
        6.1935673 = fieldWeight in 241, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.909708 = idf(docFreq=5, maxDocs=44421)
          0.625 = fieldNorm(doc=241)
    
  2. Polanco, X.: Clusters, graphs, and networks for analyzing Internet-Web-supported communication within a virtual community (2003) 6.19
    6.1935673 = sum of:
      6.1935673 = weight(author_txt:polanco in 3737) [ClassicSimilarity], result of:
        6.1935673 = fieldWeight in 3737, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.909708 = idf(docFreq=5, maxDocs=44421)
          0.625 = fieldNorm(doc=3737)
    
  3. Grivel, L.; Mutschke, P.; Polanco, X.: Thematic mapping on bibliographic databases by cluster analysis : a description of the SDOC environment with SOLIS (1995) 3.72
    3.7161405 = sum of:
      3.7161405 = weight(author_txt:polanco in 1968) [ClassicSimilarity], result of:
        3.7161405 = fieldWeight in 1968, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.909708 = idf(docFreq=5, maxDocs=44421)
          0.375 = fieldNorm(doc=1968)
    
  4. Polanco, X.; François, C.; Aly Ould Louly, M.: ¬An artificial neural network perspective on knowledge representation from databases : the use of a multilayer perception for data clusters cartography (1998) 3.10
    3.0967836 = sum of:
      3.0967836 = weight(author_txt:polanco in 1072) [ClassicSimilarity], result of:
        3.0967836 = fieldWeight in 1072, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.909708 = idf(docFreq=5, maxDocs=44421)
          0.3125 = fieldNorm(doc=1072)
    

Similar documents (content)

  1. Liu, X.; Yu, S.; Janssens, F.; Glänzel, W.; Moreau, Y.; Moor, B.de: Weighted hybrid clustering by combining text mining and bibliometrics on a large-scale journal database (2010) 0.25
    0.24668272 = sum of:
      0.24668272 = product of:
        0.8810097 = sum of:
          0.045435034 = weight(abstract_txt:processing in 451) [ClassicSimilarity], result of:
            0.045435034 = score(doc=451,freq=1.0), product of:
              0.1180859 = queryWeight, product of:
                1.3502756 = boost
                4.9249606 = idf(docFreq=876, maxDocs=44421)
                0.017757133 = queryNorm
              0.38476256 = fieldWeight in 451, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9249606 = idf(docFreq=876, maxDocs=44421)
                0.078125 = fieldNorm(doc=451)
          0.027644435 = weight(abstract_txt:analysis in 451) [ClassicSimilarity], result of:
            0.027644435 = score(doc=451,freq=1.0), product of:
              0.09705987 = queryWeight, product of:
                1.4993012 = boost
                3.6456752 = idf(docFreq=3151, maxDocs=44421)
                0.017757133 = queryNorm
              0.28481838 = fieldWeight in 451, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6456752 = idf(docFreq=3151, maxDocs=44421)
                0.078125 = fieldNorm(doc=451)
          0.03764457 = weight(abstract_txt:text in 451) [ClassicSimilarity], result of:
            0.03764457 = score(doc=451,freq=1.0), product of:
              0.11924389 = queryWeight, product of:
                1.661832 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.017757133 = queryNorm
              0.3156939 = fieldWeight in 451, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.078125 = fieldNorm(doc=451)
          0.08942696 = weight(abstract_txt:mining in 451) [ClassicSimilarity], result of:
            0.08942696 = score(doc=451,freq=1.0), product of:
              0.1854598 = queryWeight, product of:
                1.6921868 = boost
                6.1720386 = idf(docFreq=251, maxDocs=44421)
                0.017757133 = queryNorm
              0.48219052 = fieldWeight in 451, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1720386 = idf(docFreq=251, maxDocs=44421)
                0.078125 = fieldNorm(doc=451)
          0.04966592 = weight(abstract_txt:data in 451) [ClassicSimilarity], result of:
            0.04966592 = score(doc=451,freq=2.0), product of:
              0.13498323 = queryWeight, product of:
                2.2826185 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.017757133 = queryNorm
              0.3679414 = fieldWeight in 451, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.078125 = fieldNorm(doc=451)
          0.14667882 = weight(abstract_txt:mapping in 451) [ClassicSimilarity], result of:
            0.14667882 = score(doc=451,freq=1.0), product of:
              0.32498184 = queryWeight, product of:
                3.1678743 = boost
                5.7772117 = idf(docFreq=373, maxDocs=44421)
                0.017757133 = queryNorm
              0.45134467 = fieldWeight in 451, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7772117 = idf(docFreq=373, maxDocs=44421)
                0.078125 = fieldNorm(doc=451)
          0.48451394 = weight(abstract_txt:clustering in 451) [ClassicSimilarity], result of:
            0.48451394 = score(doc=451,freq=7.0), product of:
              0.37680703 = queryWeight, product of:
                3.411127 = boost
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.017757133 = queryNorm
              1.285841 = fieldWeight in 451, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.078125 = fieldNorm(doc=451)
        0.28 = coord(7/25)
    
  2. Chen, C.; Ibekwe-SanJuan, F.; Hou, J.: ¬The structure and dynamics of cocitation clusters : a multiple-perspective cocitation analysis (2010) 0.20
    0.20125565 = sum of:
      0.20125565 = product of:
        0.83856523 = sum of:
          0.05528887 = weight(abstract_txt:analysis in 578) [ClassicSimilarity], result of:
            0.05528887 = score(doc=578,freq=4.0), product of:
              0.09705987 = queryWeight, product of:
                1.4993012 = boost
                3.6456752 = idf(docFreq=3151, maxDocs=44421)
                0.017757133 = queryNorm
              0.56963676 = fieldWeight in 578, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.6456752 = idf(docFreq=3151, maxDocs=44421)
                0.078125 = fieldNorm(doc=578)
          0.03764457 = weight(abstract_txt:text in 578) [ClassicSimilarity], result of:
            0.03764457 = score(doc=578,freq=1.0), product of:
              0.11924389 = queryWeight, product of:
                1.661832 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.017757133 = queryNorm
              0.3156939 = fieldWeight in 578, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.078125 = fieldNorm(doc=578)
          0.15102838 = weight(abstract_txt:cluster in 578) [ClassicSimilarity], result of:
            0.15102838 = score(doc=578,freq=2.0), product of:
              0.20875321 = queryWeight, product of:
                1.7953123 = boost
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.017757133 = queryNorm
              0.7234781 = fieldWeight in 578, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.078125 = fieldNorm(doc=578)
          0.13762355 = weight(abstract_txt:visualization in 578) [ClassicSimilarity], result of:
            0.13762355 = score(doc=578,freq=1.0), product of:
              0.28298476 = queryWeight, product of:
                2.5600626 = boost
                6.225004 = idf(docFreq=238, maxDocs=44421)
                0.017757133 = queryNorm
              0.48632845 = fieldWeight in 578, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.225004 = idf(docFreq=238, maxDocs=44421)
                0.078125 = fieldNorm(doc=578)
          0.27385083 = weight(abstract_txt:clusters in 578) [ClassicSimilarity], result of:
            0.27385083 = score(doc=578,freq=3.0), product of:
              0.31041083 = queryWeight, product of:
                2.681251 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.017757133 = queryNorm
              0.88222057 = fieldWeight in 578, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.078125 = fieldNorm(doc=578)
          0.18312906 = weight(abstract_txt:clustering in 578) [ClassicSimilarity], result of:
            0.18312906 = score(doc=578,freq=1.0), product of:
              0.37680703 = queryWeight, product of:
                3.411127 = boost
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.017757133 = queryNorm
              0.48600224 = fieldWeight in 578, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.078125 = fieldNorm(doc=578)
        0.24 = coord(6/25)
    
  3. Small, H.: ¬A general framework for creating large scale maps of science in two or three dimensions : the SciViz system (1998) 0.18
    0.18498434 = sum of:
      0.18498434 = product of:
        0.9249217 = sum of:
          0.04916675 = weight(abstract_txt:data in 2039) [ClassicSimilarity], result of:
            0.04916675 = score(doc=2039,freq=1.0), product of:
              0.13498323 = queryWeight, product of:
                2.2826185 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.017757133 = queryNorm
              0.36424342 = fieldWeight in 2039, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.109375 = fieldNorm(doc=2039)
          0.19267295 = weight(abstract_txt:visualization in 2039) [ClassicSimilarity], result of:
            0.19267295 = score(doc=2039,freq=1.0), product of:
              0.28298476 = queryWeight, product of:
                2.5600626 = boost
                6.225004 = idf(docFreq=238, maxDocs=44421)
                0.017757133 = queryNorm
              0.6808598 = fieldWeight in 2039, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.225004 = idf(docFreq=238, maxDocs=44421)
                0.109375 = fieldNorm(doc=2039)
          0.22135098 = weight(abstract_txt:clusters in 2039) [ClassicSimilarity], result of:
            0.22135098 = score(doc=2039,freq=1.0), product of:
              0.31041083 = queryWeight, product of:
                2.681251 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.017757133 = queryNorm
              0.7130904 = fieldWeight in 2039, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.109375 = fieldNorm(doc=2039)
          0.20535035 = weight(abstract_txt:mapping in 2039) [ClassicSimilarity], result of:
            0.20535035 = score(doc=2039,freq=1.0), product of:
              0.32498184 = queryWeight, product of:
                3.1678743 = boost
                5.7772117 = idf(docFreq=373, maxDocs=44421)
                0.017757133 = queryNorm
              0.63188255 = fieldWeight in 2039, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7772117 = idf(docFreq=373, maxDocs=44421)
                0.109375 = fieldNorm(doc=2039)
          0.25638068 = weight(abstract_txt:clustering in 2039) [ClassicSimilarity], result of:
            0.25638068 = score(doc=2039,freq=1.0), product of:
              0.37680703 = queryWeight, product of:
                3.411127 = boost
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.017757133 = queryNorm
              0.6804031 = fieldWeight in 2039, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.109375 = fieldNorm(doc=2039)
        0.2 = coord(5/25)
    
  4. Janssens, F.; Leta, J.; Glänzel, W.; Moor, B. de: Towards mapping library and information science (2006) 0.18
    0.17827722 = sum of:
      0.17827722 = product of:
        0.7428218 = sum of:
          0.038305253 = weight(abstract_txt:analysis in 1992) [ClassicSimilarity], result of:
            0.038305253 = score(doc=1992,freq=3.0), product of:
              0.09705987 = queryWeight, product of:
                1.4993012 = boost
                3.6456752 = idf(docFreq=3151, maxDocs=44421)
                0.017757133 = queryNorm
              0.3946559 = fieldWeight in 1992, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.6456752 = idf(docFreq=3151, maxDocs=44421)
                0.0625 = fieldNorm(doc=1992)
          0.052161846 = weight(abstract_txt:text in 1992) [ClassicSimilarity], result of:
            0.052161846 = score(doc=1992,freq=3.0), product of:
              0.11924389 = queryWeight, product of:
                1.661832 = boost
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.017757133 = queryNorm
              0.4374383 = fieldWeight in 1992, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.040882 = idf(docFreq=2122, maxDocs=44421)
                0.0625 = fieldNorm(doc=1992)
          0.12082269 = weight(abstract_txt:cluster in 1992) [ClassicSimilarity], result of:
            0.12082269 = score(doc=1992,freq=2.0), product of:
              0.20875321 = queryWeight, product of:
                1.7953123 = boost
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.017757133 = queryNorm
              0.57878244 = fieldWeight in 1992, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.0625 = fieldNorm(doc=1992)
          0.21908066 = weight(abstract_txt:clusters in 1992) [ClassicSimilarity], result of:
            0.21908066 = score(doc=1992,freq=3.0), product of:
              0.31041083 = queryWeight, product of:
                2.681251 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.017757133 = queryNorm
              0.70577645 = fieldWeight in 1992, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.0625 = fieldNorm(doc=1992)
          0.16594814 = weight(abstract_txt:mapping in 1992) [ClassicSimilarity], result of:
            0.16594814 = score(doc=1992,freq=2.0), product of:
              0.32498184 = queryWeight, product of:
                3.1678743 = boost
                5.7772117 = idf(docFreq=373, maxDocs=44421)
                0.017757133 = queryNorm
              0.5106382 = fieldWeight in 1992, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7772117 = idf(docFreq=373, maxDocs=44421)
                0.0625 = fieldNorm(doc=1992)
          0.14650324 = weight(abstract_txt:clustering in 1992) [ClassicSimilarity], result of:
            0.14650324 = score(doc=1992,freq=1.0), product of:
              0.37680703 = queryWeight, product of:
                3.411127 = boost
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.017757133 = queryNorm
              0.38880178 = fieldWeight in 1992, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.0625 = fieldNorm(doc=1992)
        0.24 = coord(6/25)
    
  5. Gamber, T.; Friedrich-Nishio, M.; Grupp, H.: Science and technology in standardization : a statistical analysis of merging knowledge structures (2008) 0.16
    0.16128463 = sum of:
      0.16128463 = product of:
        0.6720193 = sum of:
          0.027644435 = weight(abstract_txt:analysis in 3260) [ClassicSimilarity], result of:
            0.027644435 = score(doc=3260,freq=1.0), product of:
              0.09705987 = queryWeight, product of:
                1.4993012 = boost
                3.6456752 = idf(docFreq=3151, maxDocs=44421)
                0.017757133 = queryNorm
              0.28481838 = fieldWeight in 3260, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6456752 = idf(docFreq=3151, maxDocs=44421)
                0.078125 = fieldNorm(doc=3260)
          0.10679318 = weight(abstract_txt:cluster in 3260) [ClassicSimilarity], result of:
            0.10679318 = score(doc=3260,freq=1.0), product of:
              0.20875321 = queryWeight, product of:
                1.7953123 = boost
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.017757133 = queryNorm
              0.51157624 = fieldWeight in 3260, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.548176 = idf(docFreq=172, maxDocs=44421)
                0.078125 = fieldNorm(doc=3260)
          0.04966592 = weight(abstract_txt:data in 3260) [ClassicSimilarity], result of:
            0.04966592 = score(doc=3260,freq=2.0), product of:
              0.13498323 = queryWeight, product of:
                2.2826185 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.017757133 = queryNorm
              0.3679414 = fieldWeight in 3260, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.078125 = fieldNorm(doc=3260)
          0.15810785 = weight(abstract_txt:clusters in 3260) [ClassicSimilarity], result of:
            0.15810785 = score(doc=3260,freq=1.0), product of:
              0.31041083 = queryWeight, product of:
                2.681251 = boost
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.017757133 = queryNorm
              0.5093503 = fieldWeight in 3260, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.519684 = idf(docFreq=177, maxDocs=44421)
                0.078125 = fieldNorm(doc=3260)
          0.14667882 = weight(abstract_txt:mapping in 3260) [ClassicSimilarity], result of:
            0.14667882 = score(doc=3260,freq=1.0), product of:
              0.32498184 = queryWeight, product of:
                3.1678743 = boost
                5.7772117 = idf(docFreq=373, maxDocs=44421)
                0.017757133 = queryNorm
              0.45134467 = fieldWeight in 3260, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7772117 = idf(docFreq=373, maxDocs=44421)
                0.078125 = fieldNorm(doc=3260)
          0.18312906 = weight(abstract_txt:clustering in 3260) [ClassicSimilarity], result of:
            0.18312906 = score(doc=3260,freq=1.0), product of:
              0.37680703 = queryWeight, product of:
                3.411127 = boost
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.017757133 = queryNorm
              0.48600224 = fieldWeight in 3260, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2208285 = idf(docFreq=239, maxDocs=44421)
                0.078125 = fieldNorm(doc=3260)
        0.24 = coord(6/25)