Document (#39850)

Author
Scherer Auberson, K.
Title
Counteracting concept drift in natural language classifiers : proposal for an automated method
Imprint
Chur : Hochschule für Technik und Wirtschaft / Arbeitsbereich Informationswissenschaft
Year
2018
Pages
VIII, 88 S
Series
Churer Schriften zur Informationswissenschaft / Arbeitsbereich Informationswissenschaft; Schrift 98
Abstract
Natural Language Classifier helfen Unternehmen zunehmend dabei die Flut von Textdaten zu überwinden. Aber diese Classifier, einmal trainiert, verlieren mit der Zeit ihre Nützlichkeit. Sie bleiben statisch, aber die zugrundeliegende Domäne der Textdaten verändert sich: Ihre Genauigkeit nimmt aufgrund eines Phänomens ab, das als Konzeptdrift bekannt ist. Die Frage ist ob Konzeptdrift durch die Ausgabe eines Classifiers zuverlässig erkannt werden kann, und falls ja: ist es möglich dem durch nachtrainieren des Classifiers entgegenzuwirken. Es wird eine System-Implementierung mittels Proof-of-Concept vorgestellt, bei der das Konfidenzmass des Classifiers zur Erkennung von Konzeptdrift verwendet wird. Der Classifier wird dann iterativ neu trainiert, indem er Stichproben mit niedrigem Konfidenzmass auswählt, sie korrigiert und im Trainingsset der nächsten Iteration verwendet. Die Leistung des Classifiers wird über die Zeit gemessen, und die Leistung des Systems beobachtet. Basierend darauf werden schließlich Empfehlungen gegeben, die sich bei der Implementierung solcher Systeme als nützlich erweisen können.
Content
Diese Publikation entstand im Rahmen einer Thesis zum Master of Science FHO in Business Administration, Major Information and Data Management.
Vgl. unter: https://www.htwchur.ch/fileadmin/htw_chur/angewandte_zukunftstechnologien/SII/churer_schriften/CSI_98_Counteracting_Concept_Drift_in_Natural_Language_Classifiers.pdf.
Theme
Computerlinguistik

Similar documents (author)

  1. Scherer, A.: Neuronale Netze : Grundlagen und Anwendungen (1995) 5.76
    5.7603507 = sum of:
      5.7603507 = weight(author_txt:scherer in 1966) [ClassicSimilarity], result of:
        5.7603507 = fieldWeight in 1966, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.216561 = idf(docFreq=11, maxDocs=44421)
          0.625 = fieldNorm(doc=1966)
    
  2. Scherer, H.: Zwerge auf den Schultern von Riesen : das Schachbuch Alfons' des Weisen: Beispiel früher Fachkommunikation (1996) 5.76
    5.7603507 = sum of:
      5.7603507 = weight(author_txt:scherer in 3594) [ClassicSimilarity], result of:
        5.7603507 = fieldWeight in 3594, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.216561 = idf(docFreq=11, maxDocs=44421)
          0.625 = fieldNorm(doc=3594)
    
  3. Scherer, A.: Intranet : Kommunikation im Unternehmen (1998) 5.76
    5.7603507 = sum of:
      5.7603507 = weight(author_txt:scherer in 5906) [ClassicSimilarity], result of:
        5.7603507 = fieldWeight in 5906, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.216561 = idf(docFreq=11, maxDocs=44421)
          0.625 = fieldNorm(doc=5906)
    
  4. Scherer, B.: Automatische Indexierung und ihre Anwendung im DFG-Projekt "Gemeinsames Portal für Bibliotheken, Archive und Museen (BAM)" (2003) 5.76
    5.7603507 = sum of:
      5.7603507 = weight(author_txt:scherer in 283) [ClassicSimilarity], result of:
        5.7603507 = fieldWeight in 283, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.216561 = idf(docFreq=11, maxDocs=44421)
          0.625 = fieldNorm(doc=283)
    
  5. Scherer, B.: ¬Die Pandemie ist kein Überfall von Außerirdischen (2020) 5.76
    5.7603507 = sum of:
      5.7603507 = weight(author_txt:scherer in 706) [ClassicSimilarity], result of:
        5.7603507 = fieldWeight in 706, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.216561 = idf(docFreq=11, maxDocs=44421)
          0.625 = fieldNorm(doc=706)
    

Similar documents (content)

  1. Gauch, S.; Chandramouli, A.; Ranganathan, S.: Training a hierarchical classifier using inter document relationships (2009) 0.10
    0.09701171 = sum of:
      0.09701171 = product of:
        0.8084309 = sum of:
          0.05210167 = weight(abstract_txt:concept in 3697) [ClassicSimilarity], result of:
            0.05210167 = score(doc=3697,freq=4.0), product of:
              0.0740215 = queryWeight, product of:
                1.0463551 = boost
                4.5047812 = idf(docFreq=1334, maxDocs=44421)
                0.015703812 = queryNorm
              0.7038721 = fieldWeight in 3697, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.5047812 = idf(docFreq=1334, maxDocs=44421)
                0.078125 = fieldNorm(doc=3697)
          0.23009291 = weight(abstract_txt:classifier in 3697) [ClassicSimilarity], result of:
            0.23009291 = score(doc=3697,freq=2.0), product of:
              0.28736424 = queryWeight, product of:
                2.5250044 = boost
                7.2471204 = idf(docFreq=85, maxDocs=44421)
                0.015703812 = queryNorm
              0.80070126 = fieldWeight in 3697, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.2471204 = idf(docFreq=85, maxDocs=44421)
                0.078125 = fieldNorm(doc=3697)
          0.52623636 = weight(abstract_txt:classifiers in 3697) [ClassicSimilarity], result of:
            0.52623636 = score(doc=3697,freq=3.0), product of:
              0.51665854 = queryWeight, product of:
                4.370911 = boost
                7.5270805 = idf(docFreq=64, maxDocs=44421)
                0.015703812 = queryNorm
              1.018538 = fieldWeight in 3697, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.5270805 = idf(docFreq=64, maxDocs=44421)
                0.078125 = fieldNorm(doc=3697)
        0.12 = coord(3/25)
    
  2. Fabian, C.; Haller, K.: ¬Der Image-Katalog als alternatives Modell der Konversion : Die Konversion des Alphabetischen Katalogs 1953-1981 der Bayerischen Staatsbibliothek (1998) 0.08
    0.07533594 = sum of:
      0.07533594 = product of:
        0.47084966 = sum of:
          0.03866202 = weight(abstract_txt:eines in 1865) [ClassicSimilarity], result of:
            0.03866202 = score(doc=1865,freq=2.0), product of:
              0.076440446 = queryWeight, product of:
                1.0633146 = boost
                4.577795 = idf(docFreq=1240, maxDocs=44421)
                0.015703812 = queryNorm
              0.5057796 = fieldWeight in 1865, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.577795 = idf(docFreq=1240, maxDocs=44421)
                0.078125 = fieldNorm(doc=1865)
          0.027410569 = weight(abstract_txt:aber in 1865) [ClassicSimilarity], result of:
            0.027410569 = score(doc=1865,freq=1.0), product of:
              0.076575324 = queryWeight, product of:
                1.0642523 = boost
                4.581832 = idf(docFreq=1235, maxDocs=44421)
                0.015703812 = queryNorm
              0.35795563 = fieldWeight in 1865, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.581832 = idf(docFreq=1235, maxDocs=44421)
                0.078125 = fieldNorm(doc=1865)
          0.030604675 = weight(abstract_txt:wird in 1865) [ClassicSimilarity], result of:
            0.030604675 = score(doc=1865,freq=1.0), product of:
              0.103835374 = queryWeight, product of:
                1.7526202 = boost
                3.7727013 = idf(docFreq=2775, maxDocs=44421)
                0.015703812 = queryNorm
              0.2947423 = fieldWeight in 1865, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7727013 = idf(docFreq=2775, maxDocs=44421)
                0.078125 = fieldNorm(doc=1865)
          0.3741724 = weight(abstract_txt:textdaten in 1865) [ClassicSimilarity], result of:
            0.3741724 = score(doc=1865,freq=2.0), product of:
              0.34714797 = queryWeight, product of:
                2.2659874 = boost
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.015703812 = queryNorm
              1.077847 = fieldWeight in 1865, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.755557 = idf(docFreq=6, maxDocs=44421)
                0.078125 = fieldNorm(doc=1865)
        0.16 = coord(4/25)
    
  3. Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013) 0.06
    0.062355425 = sum of:
      0.062355425 = product of:
        0.77944285 = sum of:
          0.18407433 = weight(abstract_txt:classifier in 2107) [ClassicSimilarity], result of:
            0.18407433 = score(doc=2107,freq=2.0), product of:
              0.28736424 = queryWeight, product of:
                2.5250044 = boost
                7.2471204 = idf(docFreq=85, maxDocs=44421)
                0.015703812 = queryNorm
              0.640561 = fieldWeight in 2107, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.2471204 = idf(docFreq=85, maxDocs=44421)
                0.0625 = fieldNorm(doc=2107)
          0.5953685 = weight(abstract_txt:classifiers in 2107) [ClassicSimilarity], result of:
            0.5953685 = score(doc=2107,freq=6.0), product of:
              0.51665854 = queryWeight, product of:
                4.370911 = boost
                7.5270805 = idf(docFreq=64, maxDocs=44421)
                0.015703812 = queryNorm
              1.1523442 = fieldWeight in 2107, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.5270805 = idf(docFreq=64, maxDocs=44421)
                0.0625 = fieldNorm(doc=2107)
        0.08 = coord(2/25)
    
  4. Sprachtechnologie, mobile Kommunikation und linguistische Ressourcen : Beiträge zur GLDV Tagung 2005 in Bonn (2005) 0.06
    0.06014943 = sum of:
      0.06014943 = product of:
        0.37593395 = sum of:
          0.03289268 = weight(abstract_txt:aber in 4578) [ClassicSimilarity], result of:
            0.03289268 = score(doc=4578,freq=1.0), product of:
              0.076575324 = queryWeight, product of:
                1.0642523 = boost
                4.581832 = idf(docFreq=1235, maxDocs=44421)
                0.015703812 = queryNorm
              0.42954674 = fieldWeight in 4578, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.581832 = idf(docFreq=1235, maxDocs=44421)
                0.09375 = fieldNorm(doc=4578)
          0.03859089 = weight(abstract_txt:ihre in 4578) [ClassicSimilarity], result of:
            0.03859089 = score(doc=4578,freq=1.0), product of:
              0.08518161 = queryWeight, product of:
                1.1224657 = boost
                4.8324533 = idf(docFreq=961, maxDocs=44421)
                0.015703812 = queryNorm
              0.4530425 = fieldWeight in 4578, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8324533 = idf(docFreq=961, maxDocs=44421)
                0.09375 = fieldNorm(doc=4578)
          0.03672561 = weight(abstract_txt:wird in 4578) [ClassicSimilarity], result of:
            0.03672561 = score(doc=4578,freq=1.0), product of:
              0.103835374 = queryWeight, product of:
                1.7526202 = boost
                3.7727013 = idf(docFreq=2775, maxDocs=44421)
                0.015703812 = queryNorm
              0.35369074 = fieldWeight in 4578, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7727013 = idf(docFreq=2775, maxDocs=44421)
                0.09375 = fieldNorm(doc=4578)
          0.26772475 = weight(abstract_txt:trainiert in 4578) [ClassicSimilarity], result of:
            0.26772475 = score(doc=4578,freq=1.0), product of:
              0.30984774 = queryWeight, product of:
                2.1407912 = boost
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.015703812 = queryNorm
              0.86405265 = fieldWeight in 4578, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.216561 = idf(docFreq=11, maxDocs=44421)
                0.09375 = fieldNorm(doc=4578)
        0.16 = coord(4/25)
    
  5. Oberhauser, O.: ¬Die Dewey Decimal Classification im Österreichischen Verbundkatalog : Status und Perspektiven (2009) 0.06
    0.05537901 = sum of:
      0.05537901 = product of:
        0.27689505 = sum of:
          0.027338179 = weight(abstract_txt:eines in 3922) [ClassicSimilarity], result of:
            0.027338179 = score(doc=3922,freq=1.0), product of:
              0.076440446 = queryWeight, product of:
                1.0633146 = boost
                4.577795 = idf(docFreq=1240, maxDocs=44421)
                0.015703812 = queryNorm
              0.35764024 = fieldWeight in 3922, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.577795 = idf(docFreq=1240, maxDocs=44421)
                0.078125 = fieldNorm(doc=3922)
          0.027410569 = weight(abstract_txt:aber in 3922) [ClassicSimilarity], result of:
            0.027410569 = score(doc=3922,freq=1.0), product of:
              0.076575324 = queryWeight, product of:
                1.0642523 = boost
                4.581832 = idf(docFreq=1235, maxDocs=44421)
                0.015703812 = queryNorm
              0.35795563 = fieldWeight in 3922, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.581832 = idf(docFreq=1235, maxDocs=44421)
                0.078125 = fieldNorm(doc=3922)
          0.08510332 = weight(abstract_txt:verwendet in 3922) [ClassicSimilarity], result of:
            0.08510332 = score(doc=3922,freq=1.0), product of:
              0.16297005 = queryWeight, product of:
                1.5525802 = boost
                6.684188 = idf(docFreq=150, maxDocs=44421)
                0.015703812 = queryNorm
              0.5222022 = fieldWeight in 3922, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.684188 = idf(docFreq=150, maxDocs=44421)
                0.078125 = fieldNorm(doc=3922)
          0.1064383 = weight(abstract_txt:implementierung in 3922) [ClassicSimilarity], result of:
            0.1064383 = score(doc=3922,freq=1.0), product of:
              0.18918009 = queryWeight, product of:
                1.6727763 = boost
                7.201658 = idf(docFreq=89, maxDocs=44421)
                0.015703812 = queryNorm
              0.5626295 = fieldWeight in 3922, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.201658 = idf(docFreq=89, maxDocs=44421)
                0.078125 = fieldNorm(doc=3922)
          0.030604675 = weight(abstract_txt:wird in 3922) [ClassicSimilarity], result of:
            0.030604675 = score(doc=3922,freq=1.0), product of:
              0.103835374 = queryWeight, product of:
                1.7526202 = boost
                3.7727013 = idf(docFreq=2775, maxDocs=44421)
                0.015703812 = queryNorm
              0.2947423 = fieldWeight in 3922, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7727013 = idf(docFreq=2775, maxDocs=44421)
                0.078125 = fieldNorm(doc=3922)
        0.2 = coord(5/25)