Document (#36630)

Author
Ku, C.-H.
Leroy, G.
Title
¬A crime reports analysis system to identify related crimes
Source
Journal of the American Society for Information Science and Technology. 62(2011) no.8, S.1533-1547
Year
2011
Abstract
The popularity of online and anonymous options to report crimes, such as tips websites and text messaging, has led to an increasing amount of textual information available to law enforcement personnel. However, locating, filtering, extracting, and combining information to solve crimes is a time-consuming task. In response, we are developing entity and document similarity algorithms to automatically identify overlapping and complementary information. These are essential components for systems that combine and contrast crime information. The entity similarity algorithm integrates a domain-specific hierarchical lexicon with Jaccard coefficients. The document similarity algorithm combines the entity similarity scores using a Dice coefficient. We describe the evaluation of both components. To evaluate the entity similarity algorithm, we compared the new algorithm and four generic algorithms with a gold standard. The strongest correlation with the gold standard, r = 0.710, was found with our entity similarity algorithm. To evaluate the document similarity algorithm, we first developed a test bed containing witness reports for 17 crimes shown in video clips. We evaluated five versions of the algorithm that differ in how much importance is assigned to different entity types. Cosine similarity is then used as a baseline comparison to evaluate the performance of the document similarity algorithms for accuracy in recognizing reports describing the same crime and distinguishing them from reports on different crimes. The best version achieved 92% accuracy.

Similar documents (author)

  1. Leroy, G.; Chen, H.: Genescene: an ontology-enhanced integration of linguistic and co-occurrence based relations in biomedical texts (2005) 4.88
    4.8777785 = sum of:
      4.8777785 = weight(author_txt:leroy in 259) [ClassicSimilarity], result of:
        4.8777785 = fieldWeight in 259, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.755557 = idf(docFreq=6, maxDocs=44421)
          0.5 = fieldNorm(doc=259)
    
  2. Leroy, S.Y.; Thomas, S.L.: Impact of Web access on cataloging (2004) 4.88
    4.8777785 = sum of:
      4.8777785 = weight(author_txt:leroy in 656) [ClassicSimilarity], result of:
        4.8777785 = fieldWeight in 656, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.755557 = idf(docFreq=6, maxDocs=44421)
          0.5 = fieldNorm(doc=656)
    
  3. Kauchak, D.; Leroy, G.; Hogue, A.: Measuring text difficulty using parse-tree frequency (2017) 3.66
    3.6583338 = sum of:
      3.6583338 = weight(author_txt:leroy in 4786) [ClassicSimilarity], result of:
        3.6583338 = fieldWeight in 4786, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.755557 = idf(docFreq=6, maxDocs=44421)
          0.375 = fieldNorm(doc=4786)
    
  4. Leroy, G.; Miller, T.; Rosemblat, G.; Browne, A.: ¬A balanced approach to health information evaluation : a vocabulary-based naïve Bayes classifier and readability formulas (2008) 3.05
    3.0486116 = sum of:
      3.0486116 = weight(author_txt:leroy in 2998) [ClassicSimilarity], result of:
        3.0486116 = fieldWeight in 2998, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.755557 = idf(docFreq=6, maxDocs=44421)
          0.3125 = fieldNorm(doc=2998)
    
  5. Thirion, B.; Leroy, J.P.; Baudic, F.; Douyère, M.; Piot, J.; Darmoni, S.J.: SDI selecting, decribing, and indexing : did you mean automatically? (2001) 2.44
    2.4388893 = sum of:
      2.4388893 = weight(author_txt:leroy in 198) [ClassicSimilarity], result of:
        2.4388893 = fieldWeight in 198, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.755557 = idf(docFreq=6, maxDocs=44421)
          0.25 = fieldNorm(doc=198)
    

Similar documents (content)

  1. Ellis, D.; Furner-Hines, J.; Willett, P.: Measuring the degree of similarity between objects in text retrieval systems (1993) 0.19
    0.19191289 = sum of:
      0.19191289 = product of:
        0.95956445 = sum of:
          0.16586325 = weight(abstract_txt:coefficients in 6715) [ClassicSimilarity], result of:
            0.16586325 = score(doc=6715,freq=5.0), product of:
              0.11344566 = queryWeight, product of:
                1.0477431 = boost
                8.369263 = idf(docFreq=27, maxDocs=44421)
                0.012937367 = queryNorm
              1.4620501 = fieldWeight in 6715, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.369263 = idf(docFreq=27, maxDocs=44421)
                0.078125 = fieldNorm(doc=6715)
          0.0071633644 = weight(abstract_txt:information in 6715) [ClassicSimilarity], result of:
            0.0071633644 = score(doc=6715,freq=1.0), product of:
              0.037906107 = queryWeight, product of:
                1.2112825 = boost
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.012937367 = queryNorm
              0.18897653 = fieldWeight in 6715, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.078125 = fieldNorm(doc=6715)
          0.00787173 = weight(abstract_txt:with in 6715) [ClassicSimilarity], result of:
            0.00787173 = score(doc=6715,freq=1.0), product of:
              0.040365588 = queryWeight, product of:
                1.2499611 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.012937367 = queryNorm
              0.19501092 = fieldWeight in 6715, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.078125 = fieldNorm(doc=6715)
          0.06941539 = weight(abstract_txt:document in 6715) [ClassicSimilarity], result of:
            0.06941539 = score(doc=6715,freq=3.0), product of:
              0.11946149 = queryWeight, product of:
                2.1503286 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.012937367 = queryNorm
              0.5810692 = fieldWeight in 6715, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.078125 = fieldNorm(doc=6715)
          0.7092507 = weight(abstract_txt:similarity in 6715) [ClassicSimilarity], result of:
            0.7092507 = score(doc=6715,freq=10.0), product of:
              0.49342954 = queryWeight, product of:
                6.555332 = boost
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.012937367 = queryNorm
              1.43739 = fieldWeight in 6715, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.078125 = fieldNorm(doc=6715)
        0.2 = coord(5/25)
    
  2. Hook, P.A.: Using course-subject Co-occurrence (CSCO) to reveal the structure of an academic discipline : a framework to evaluate different inputs of a domain map (2017) 0.18
    0.18297035 = sum of:
      0.18297035 = product of:
        0.6534655 = sum of:
          0.021408847 = weight(abstract_txt:standard in 4324) [ClassicSimilarity], result of:
            0.021408847 = score(doc=4324,freq=1.0), product of:
              0.07243637 = queryWeight, product of:
                1.1840068 = boost
                4.7288613 = idf(docFreq=1066, maxDocs=44421)
                0.012937367 = queryNorm
              0.29555383 = fieldWeight in 4324, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7288613 = idf(docFreq=1066, maxDocs=44421)
                0.0625 = fieldNorm(doc=4324)
          0.0062973844 = weight(abstract_txt:with in 4324) [ClassicSimilarity], result of:
            0.0062973844 = score(doc=4324,freq=1.0), product of:
              0.040365588 = queryWeight, product of:
                1.2499611 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.012937367 = queryNorm
              0.15600874 = fieldWeight in 4324, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.0625 = fieldNorm(doc=4324)
          0.13856915 = weight(abstract_txt:gold in 4324) [ClassicSimilarity], result of:
            0.13856915 = score(doc=4324,freq=2.0), product of:
              0.19967736 = queryWeight, product of:
                1.9658043 = boost
                7.85132 = idf(docFreq=46, maxDocs=44421)
                0.012937367 = queryNorm
              0.6939652 = fieldWeight in 4324, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.85132 = idf(docFreq=46, maxDocs=44421)
                0.0625 = fieldNorm(doc=4324)
          0.04562855 = weight(abstract_txt:evaluate in 4324) [ClassicSimilarity], result of:
            0.04562855 = score(doc=4324,freq=1.0), product of:
              0.13732493 = queryWeight, product of:
                1.9966235 = boost
                5.316273 = idf(docFreq=592, maxDocs=44421)
                0.012937367 = queryNorm
              0.33226708 = fieldWeight in 4324, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.316273 = idf(docFreq=592, maxDocs=44421)
                0.0625 = fieldNorm(doc=4324)
          0.056240793 = weight(abstract_txt:algorithms in 4324) [ClassicSimilarity], result of:
            0.056240793 = score(doc=4324,freq=1.0), product of:
              0.15786743 = queryWeight, product of:
                2.140759 = boost
                5.7000527 = idf(docFreq=403, maxDocs=44421)
                0.012937367 = queryNorm
              0.3562533 = fieldWeight in 4324, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7000527 = idf(docFreq=403, maxDocs=44421)
                0.0625 = fieldNorm(doc=4324)
          0.13157158 = weight(abstract_txt:algorithm in 4324) [ClassicSimilarity], result of:
            0.13157158 = score(doc=4324,freq=1.0), product of:
              0.36899903 = queryWeight, product of:
                4.999453 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.012937367 = queryNorm
              0.35656348 = fieldWeight in 4324, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.0625 = fieldNorm(doc=4324)
          0.25374922 = weight(abstract_txt:similarity in 4324) [ClassicSimilarity], result of:
            0.25374922 = score(doc=4324,freq=2.0), product of:
              0.49342954 = queryWeight, product of:
                6.555332 = boost
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.012937367 = queryNorm
              0.51425624 = fieldWeight in 4324, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.0625 = fieldNorm(doc=4324)
        0.28 = coord(7/25)
    
  3. Wu, T.; Pottenger, W.M.: ¬A semi-supervised active learning algorithm for information extraction from textual data (2005) 0.16
    0.16310334 = sum of:
      0.16310334 = product of:
        0.67959726 = sum of:
          0.011461383 = weight(abstract_txt:information in 4237) [ClassicSimilarity], result of:
            0.011461383 = score(doc=4237,freq=4.0), product of:
              0.037906107 = queryWeight, product of:
                1.2112825 = boost
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.012937367 = queryNorm
              0.30236244 = fieldWeight in 4237, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.0625 = fieldNorm(doc=4237)
          0.024201019 = weight(abstract_txt:identify in 4237) [ClassicSimilarity], result of:
            0.024201019 = score(doc=4237,freq=1.0), product of:
              0.07860502 = queryWeight, product of:
                1.2333916 = boost
                4.9261017 = idf(docFreq=875, maxDocs=44421)
                0.012937367 = queryNorm
              0.30788136 = fieldWeight in 4237, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9261017 = idf(docFreq=875, maxDocs=44421)
                0.0625 = fieldNorm(doc=4237)
          0.008905846 = weight(abstract_txt:with in 4237) [ClassicSimilarity], result of:
            0.008905846 = score(doc=4237,freq=2.0), product of:
              0.040365588 = queryWeight, product of:
                1.2499611 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.012937367 = queryNorm
              0.22062966 = fieldWeight in 4237, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.0625 = fieldNorm(doc=4237)
          0.055313505 = weight(abstract_txt:reports in 4237) [ClassicSimilarity], result of:
            0.055313505 = score(doc=4237,freq=2.0), product of:
              0.13638982 = queryWeight, product of:
                2.2976391 = boost
                4.5883255 = idf(docFreq=1227, maxDocs=44421)
                0.012937367 = queryNorm
              0.4055545 = fieldWeight in 4237, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.5883255 = idf(docFreq=1227, maxDocs=44421)
                0.0625 = fieldNorm(doc=4237)
          0.23160984 = weight(abstract_txt:crime in 4237) [ClassicSimilarity], result of:
            0.23160984 = score(doc=4237,freq=1.0), product of:
              0.4055984 = queryWeight, product of:
                3.4313862 = boost
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.012937367 = queryNorm
              0.5710324 = fieldWeight in 4237, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.1365185 = idf(docFreq=12, maxDocs=44421)
                0.0625 = fieldNorm(doc=4237)
          0.34810567 = weight(abstract_txt:algorithm in 4237) [ClassicSimilarity], result of:
            0.34810567 = score(doc=4237,freq=7.0), product of:
              0.36899903 = queryWeight, product of:
                4.999453 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.012937367 = queryNorm
              0.94337827 = fieldWeight in 4237, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.0625 = fieldNorm(doc=4237)
        0.24 = coord(6/25)
    
  4. Soulier, L.; Jabeur, L.B.; Tamine, L.; Bahsoun, W.: On ranking relevant entities in heterogeneous networks using a language-based model (2013) 0.16
    0.16190101 = sum of:
      0.16190101 = product of:
        0.67458755 = sum of:
          0.009925849 = weight(abstract_txt:information in 1664) [ClassicSimilarity], result of:
            0.009925849 = score(doc=1664,freq=3.0), product of:
              0.037906107 = queryWeight, product of:
                1.2112825 = boost
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.012937367 = queryNorm
              0.26185355 = fieldWeight in 1664, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.0625 = fieldNorm(doc=1664)
          0.0062973844 = weight(abstract_txt:with in 1664) [ClassicSimilarity], result of:
            0.0062973844 = score(doc=1664,freq=1.0), product of:
              0.040365588 = queryWeight, product of:
                1.2499611 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.012937367 = queryNorm
              0.15600874 = fieldWeight in 1664, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.0625 = fieldNorm(doc=1664)
          0.039112557 = weight(abstract_txt:reports in 1664) [ClassicSimilarity], result of:
            0.039112557 = score(doc=1664,freq=1.0), product of:
              0.13638982 = queryWeight, product of:
                2.2976391 = boost
                4.5883255 = idf(docFreq=1227, maxDocs=44421)
                0.012937367 = queryNorm
              0.28677034 = fieldWeight in 1664, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5883255 = idf(docFreq=1227, maxDocs=44421)
                0.0625 = fieldNorm(doc=1664)
          0.21193528 = weight(abstract_txt:entity in 1664) [ClassicSimilarity], result of:
            0.21193528 = score(doc=1664,freq=2.0), product of:
              0.38229072 = queryWeight, product of:
                4.71122 = boost
                6.272122 = idf(docFreq=227, maxDocs=44421)
                0.012937367 = queryNorm
              0.5543825 = fieldWeight in 1664, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.272122 = idf(docFreq=227, maxDocs=44421)
                0.0625 = fieldNorm(doc=1664)
          0.22788866 = weight(abstract_txt:algorithm in 1664) [ClassicSimilarity], result of:
            0.22788866 = score(doc=1664,freq=3.0), product of:
              0.36899903 = queryWeight, product of:
                4.999453 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.012937367 = queryNorm
              0.6175861 = fieldWeight in 1664, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.0625 = fieldNorm(doc=1664)
          0.1794278 = weight(abstract_txt:similarity in 1664) [ClassicSimilarity], result of:
            0.1794278 = score(doc=1664,freq=1.0), product of:
              0.49342954 = queryWeight, product of:
                6.555332 = boost
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.012937367 = queryNorm
              0.36363408 = fieldWeight in 1664, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.0625 = fieldNorm(doc=1664)
        0.24 = coord(6/25)
    
  5. Chinenyanga, T.T.; Kushmerick, N.: ¬An expressive and efficient language for XML information retrieval (2002) 0.16
    0.1601025 = sum of:
      0.1601025 = product of:
        0.66709375 = sum of:
          0.0057306914 = weight(abstract_txt:information in 1462) [ClassicSimilarity], result of:
            0.0057306914 = score(doc=1462,freq=1.0), product of:
              0.037906107 = queryWeight, product of:
                1.2112825 = boost
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.012937367 = queryNorm
              0.15118122 = fieldWeight in 1462, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4188995 = idf(docFreq=10748, maxDocs=44421)
                0.0625 = fieldNorm(doc=1462)
          0.012594769 = weight(abstract_txt:with in 1462) [ClassicSimilarity], result of:
            0.012594769 = score(doc=1462,freq=4.0), product of:
              0.040365588 = queryWeight, product of:
                1.2499611 = boost
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.012937367 = queryNorm
              0.31201747 = fieldWeight in 1462, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.4961398 = idf(docFreq=9949, maxDocs=44421)
                0.0625 = fieldNorm(doc=1462)
          0.04562855 = weight(abstract_txt:evaluate in 1462) [ClassicSimilarity], result of:
            0.04562855 = score(doc=1462,freq=1.0), product of:
              0.13732493 = queryWeight, product of:
                1.9966235 = boost
                5.316273 = idf(docFreq=592, maxDocs=44421)
                0.012937367 = queryNorm
              0.33226708 = fieldWeight in 1462, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.316273 = idf(docFreq=592, maxDocs=44421)
                0.0625 = fieldNorm(doc=1462)
          0.032061595 = weight(abstract_txt:document in 1462) [ClassicSimilarity], result of:
            0.032061595 = score(doc=1462,freq=1.0), product of:
              0.11946149 = queryWeight, product of:
                2.1503286 = boost
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.012937367 = queryNorm
              0.26838437 = fieldWeight in 1462, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.29415 = idf(docFreq=1647, maxDocs=44421)
                0.0625 = fieldNorm(doc=1462)
          0.13157158 = weight(abstract_txt:algorithm in 1462) [ClassicSimilarity], result of:
            0.13157158 = score(doc=1462,freq=1.0), product of:
              0.36899903 = queryWeight, product of:
                4.999453 = boost
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.012937367 = queryNorm
              0.35656348 = fieldWeight in 1462, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7050157 = idf(docFreq=401, maxDocs=44421)
                0.0625 = fieldNorm(doc=1462)
          0.43950656 = weight(abstract_txt:similarity in 1462) [ClassicSimilarity], result of:
            0.43950656 = score(doc=1462,freq=6.0), product of:
              0.49342954 = queryWeight, product of:
                6.555332 = boost
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.012937367 = queryNorm
              0.890718 = fieldWeight in 1462, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.8181453 = idf(docFreq=358, maxDocs=44421)
                0.0625 = fieldNorm(doc=1462)
        0.24 = coord(6/25)