Document (#40051)

Author
Arora, S.K.
Li, Y.
Youtie, J.
Shapira, P.
Title
Using the wayback machine to mine websites in the social sciences : a methodological resource
Source
Journal of the Association for Information Science and Technology. 67(2016) no.8, S.1904-1915
Year
2016
Abstract
Websites offer an unobtrusive data source for developing and analyzing information about various types of social science phenomena. In this paper, we provide a methodological resource for social scientists looking to expand their toolkit using unstructured web-based text, and in particular, with the Wayback Machine, to access historical website data. After providing a literature review of existing research that uses the Wayback Machine, we put forward a step-by-step description of how the analyst can design a research project using archived websites. We draw on the example of a project that analyzes indicators of innovation activities and strategies in 300 U.S. small- and medium-sized enterprises in green goods industries. We present six steps to access historical Wayback website data: (a) sampling, (b) organizing and defining the boundaries of the web crawl, (c) crawling, (d) website variable operationalization, (e) integration with other data sources, and (f) analysis. Although our examples draw on specific types of firms in green goods industries, the method can be generalized to other areas of research. In discussing the limitations and benefits of using the Wayback Machine, we note that both machine and human effort are essential to developing a high-quality data set from archived web information.
Content
Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23503/abstract.
Theme
Informetrie
Field
Sozialwissenschaften

Similar documents (author)

  1. Shapira, B.: Hypertext browsing : a new model for information filtering based on user profiles and data clustering (1996) 5.62
    5.620886 = sum of:
      5.620886 = weight(author_txt:shapira in 4779) [ClassicSimilarity], result of:
        5.620886 = fieldWeight in 4779, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.993418 = idf(docFreq=14, maxDocs=44421)
          0.625 = fieldNorm(doc=4779)
    
  2. Shapira, B.; Zabar, B.: Personalized search : integrating collaboration and social networks (2011) 4.50
    4.496709 = sum of:
      4.496709 = weight(author_txt:shapira in 140) [ClassicSimilarity], result of:
        4.496709 = fieldWeight in 140, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.993418 = idf(docFreq=14, maxDocs=44421)
          0.5 = fieldNorm(doc=140)
    
  3. Shapira, B.; Shoval, P.; Hanani, U.: Stereotypes in information filtering systems (1997) 3.37
    3.3725317 = sum of:
      3.3725317 = weight(author_txt:shapira in 1157) [ClassicSimilarity], result of:
        3.3725317 = fieldWeight in 1157, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.993418 = idf(docFreq=14, maxDocs=44421)
          0.375 = fieldNorm(doc=1157)
    
  4. Shapira, B.; Kantor, P.B.; Melamed, B.: ¬The effect of extrinsic motivation on user behavior in a collaborative information finding system (2001) 3.37
    3.3725317 = sum of:
      3.3725317 = weight(author_txt:shapira in 525) [ClassicSimilarity], result of:
        3.3725317 = fieldWeight in 525, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.993418 = idf(docFreq=14, maxDocs=44421)
          0.375 = fieldNorm(doc=525)
    
  5. Kuflik, T.; Shapira, B.; Shoval, P.: Stereotype-based versus personal-based filtering rules in information filtering systems (2003) 3.37
    3.3725317 = sum of:
      3.3725317 = weight(author_txt:shapira in 2234) [ClassicSimilarity], result of:
        3.3725317 = fieldWeight in 2234, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.993418 = idf(docFreq=14, maxDocs=44421)
          0.375 = fieldNorm(doc=2234)
    

Similar documents (content)

  1. Vaughan, L.: Uncovering information from social media hyperlinks (2016) 0.11
    0.11437695 = sum of:
      0.11437695 = product of:
        0.4084891 = sum of:
          0.042616926 = weight(abstract_txt:types in 3892) [ClassicSimilarity], result of:
            0.042616926 = score(doc=3892,freq=3.0), product of:
              0.0881673 = queryWeight, product of:
                1.0670303 = boost
                4.4651284 = idf(docFreq=1388, maxDocs=44421)
                0.018505331 = queryNorm
              0.4833643 = fieldWeight in 3892, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.4651284 = idf(docFreq=1388, maxDocs=44421)
                0.0625 = fieldNorm(doc=3892)
          0.013076748 = weight(abstract_txt:research in 3892) [ClassicSimilarity], result of:
            0.013076748 = score(doc=3892,freq=1.0), product of:
              0.06622014 = queryWeight, product of:
                1.1325663 = boost
                3.159582 = idf(docFreq=5124, maxDocs=44421)
                0.018505331 = queryNorm
              0.19747387 = fieldWeight in 3892, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.159582 = idf(docFreq=5124, maxDocs=44421)
                0.0625 = fieldNorm(doc=3892)
          0.037449513 = weight(abstract_txt:developing in 3892) [ClassicSimilarity], result of:
            0.037449513 = score(doc=3892,freq=1.0), product of:
              0.11666054 = queryWeight, product of:
                1.2273967 = boost
                5.136203 = idf(docFreq=709, maxDocs=44421)
                0.018505331 = queryNorm
              0.32101268 = fieldWeight in 3892, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.136203 = idf(docFreq=709, maxDocs=44421)
                0.0625 = fieldNorm(doc=3892)
          0.06862962 = weight(abstract_txt:methodological in 3892) [ClassicSimilarity], result of:
            0.06862962 = score(doc=3892,freq=1.0), product of:
              0.17470324 = queryWeight, product of:
                1.5020121 = boost
                6.285367 = idf(docFreq=224, maxDocs=44421)
                0.018505331 = queryNorm
              0.39283544 = fieldWeight in 3892, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.285367 = idf(docFreq=224, maxDocs=44421)
                0.0625 = fieldNorm(doc=3892)
          0.061569564 = weight(abstract_txt:social in 3892) [ClassicSimilarity], result of:
            0.061569564 = score(doc=3892,freq=4.0), product of:
              0.117187425 = queryWeight, product of:
                1.5066386 = boost
                4.2031517 = idf(docFreq=1804, maxDocs=44421)
                0.018505331 = queryNorm
              0.52539396 = fieldWeight in 3892, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.2031517 = idf(docFreq=1804, maxDocs=44421)
                0.0625 = fieldNorm(doc=3892)
          0.07655998 = weight(abstract_txt:data in 3892) [ClassicSimilarity], result of:
            0.07655998 = score(doc=3892,freq=9.0), product of:
              0.122610286 = queryWeight, product of:
                1.9895571 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.018505331 = queryNorm
              0.6244173 = fieldWeight in 3892, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.0625 = fieldNorm(doc=3892)
          0.10858674 = weight(abstract_txt:websites in 3892) [ClassicSimilarity], result of:
            0.10858674 = score(doc=3892,freq=1.0), product of:
              0.2715448 = queryWeight, product of:
                2.2934504 = boost
                6.398163 = idf(docFreq=200, maxDocs=44421)
                0.018505331 = queryNorm
              0.39988518 = fieldWeight in 3892, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.398163 = idf(docFreq=200, maxDocs=44421)
                0.0625 = fieldNorm(doc=3892)
        0.28 = coord(7/25)
    
  2. Borrego, Á.: Measuring compliance with a Spanish Government open access mandate (2016) 0.10
    0.10150633 = sum of:
      0.10150633 = product of:
        0.50753164 = sum of:
          0.02264959 = weight(abstract_txt:research in 3841) [ClassicSimilarity], result of:
            0.02264959 = score(doc=3841,freq=3.0), product of:
              0.06622014 = queryWeight, product of:
                1.1325663 = boost
                3.159582 = idf(docFreq=5124, maxDocs=44421)
                0.018505331 = queryNorm
              0.34203476 = fieldWeight in 3841, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.159582 = idf(docFreq=5124, maxDocs=44421)
                0.0625 = fieldNorm(doc=3841)
          0.030784782 = weight(abstract_txt:social in 3841) [ClassicSimilarity], result of:
            0.030784782 = score(doc=3841,freq=1.0), product of:
              0.117187425 = queryWeight, product of:
                1.5066386 = boost
                4.2031517 = idf(docFreq=1804, maxDocs=44421)
                0.018505331 = queryNorm
              0.26269698 = fieldWeight in 3841, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2031517 = idf(docFreq=1804, maxDocs=44421)
                0.0625 = fieldNorm(doc=3841)
          0.025519993 = weight(abstract_txt:data in 3841) [ClassicSimilarity], result of:
            0.025519993 = score(doc=3841,freq=1.0), product of:
              0.122610286 = queryWeight, product of:
                1.9895571 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.018505331 = queryNorm
              0.20813909 = fieldWeight in 3841, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.0625 = fieldNorm(doc=3841)
          0.31999052 = weight(abstract_txt:green in 3841) [ClassicSimilarity], result of:
            0.31999052 = score(doc=3841,freq=4.0), product of:
              0.30715996 = queryWeight, product of:
                1.9916145 = boost
                8.334172 = idf(docFreq=28, maxDocs=44421)
                0.018505331 = queryNorm
              1.0417715 = fieldWeight in 3841, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                8.334172 = idf(docFreq=28, maxDocs=44421)
                0.0625 = fieldNorm(doc=3841)
          0.10858674 = weight(abstract_txt:websites in 3841) [ClassicSimilarity], result of:
            0.10858674 = score(doc=3841,freq=1.0), product of:
              0.2715448 = queryWeight, product of:
                2.2934504 = boost
                6.398163 = idf(docFreq=200, maxDocs=44421)
                0.018505331 = queryNorm
              0.39988518 = fieldWeight in 3841, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.398163 = idf(docFreq=200, maxDocs=44421)
                0.0625 = fieldNorm(doc=3841)
        0.2 = coord(5/25)
    
  3. Li, W.; Xiong, B.; Yang, C.: ¬A roadmap to achieving a healthier information ecosystem through GDPR implementation and privacy compliance technologies (2024) 0.10
    0.095858924 = sum of:
      0.095858924 = product of:
        0.4792946 = sum of:
          0.013076748 = weight(abstract_txt:research in 2365) [ClassicSimilarity], result of:
            0.013076748 = score(doc=2365,freq=1.0), product of:
              0.06622014 = queryWeight, product of:
                1.1325663 = boost
                3.159582 = idf(docFreq=5124, maxDocs=44421)
                0.018505331 = queryNorm
              0.19747387 = fieldWeight in 2365, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.159582 = idf(docFreq=5124, maxDocs=44421)
                0.0625 = fieldNorm(doc=2365)
          0.13164788 = weight(abstract_txt:industries in 2365) [ClassicSimilarity], result of:
            0.13164788 = score(doc=2365,freq=1.0), product of:
              0.26971337 = queryWeight, product of:
                1.8662689 = boost
                7.809647 = idf(docFreq=48, maxDocs=44421)
                0.018505331 = queryNorm
              0.48810294 = fieldWeight in 2365, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.809647 = idf(docFreq=48, maxDocs=44421)
                0.0625 = fieldNorm(doc=2365)
          0.03609072 = weight(abstract_txt:data in 2365) [ClassicSimilarity], result of:
            0.03609072 = score(doc=2365,freq=2.0), product of:
              0.122610286 = queryWeight, product of:
                1.9895571 = boost
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.018505331 = queryNorm
              0.29435313 = fieldWeight in 2365, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3302255 = idf(docFreq=4320, maxDocs=44421)
                0.0625 = fieldNorm(doc=2365)
          0.18807775 = weight(abstract_txt:websites in 2365) [ClassicSimilarity], result of:
            0.18807775 = score(doc=2365,freq=3.0), product of:
              0.2715448 = queryWeight, product of:
                2.2934504 = boost
                6.398163 = idf(docFreq=200, maxDocs=44421)
                0.018505331 = queryNorm
              0.6926214 = fieldWeight in 2365, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.398163 = idf(docFreq=200, maxDocs=44421)
                0.0625 = fieldNorm(doc=2365)
          0.11040151 = weight(abstract_txt:website in 2365) [ClassicSimilarity], result of:
            0.11040151 = score(doc=2365,freq=1.0), product of:
              0.2745619 = queryWeight, product of:
                2.3061564 = boost
                6.4336095 = idf(docFreq=193, maxDocs=44421)
                0.018505331 = queryNorm
              0.4021006 = fieldWeight in 2365, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.4336095 = idf(docFreq=193, maxDocs=44421)
                0.0625 = fieldNorm(doc=2365)
        0.2 = coord(5/25)
    
  4. Xu, C.; Zhang, Q.: ¬The dominant factor of social tags for users' decision behavior on e-commerce websites : color or text (2019) 0.09
    0.09039117 = sum of:
      0.09039117 = product of:
        0.5649448 = sum of:
          0.030784782 = weight(abstract_txt:social in 359) [ClassicSimilarity], result of:
            0.030784782 = score(doc=359,freq=1.0), product of:
              0.117187425 = queryWeight, product of:
                1.5066386 = boost
                4.2031517 = idf(docFreq=1804, maxDocs=44421)
                0.018505331 = queryNorm
              0.26269698 = fieldWeight in 359, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2031517 = idf(docFreq=1804, maxDocs=44421)
                0.0625 = fieldNorm(doc=359)
          0.0228349 = weight(abstract_txt:using in 359) [ClassicSimilarity], result of:
            0.0228349 = score(doc=359,freq=1.0), product of:
              0.1056905 = queryWeight, product of:
                1.6521746 = boost
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.018505331 = queryNorm
              0.21605442 = fieldWeight in 359, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4568708 = idf(docFreq=3806, maxDocs=44421)
                0.0625 = fieldNorm(doc=359)
          0.35776028 = weight(abstract_txt:green in 359) [ClassicSimilarity], result of:
            0.35776028 = score(doc=359,freq=5.0), product of:
              0.30715996 = queryWeight, product of:
                1.9916145 = boost
                8.334172 = idf(docFreq=28, maxDocs=44421)
                0.018505331 = queryNorm
              1.164736 = fieldWeight in 359, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.334172 = idf(docFreq=28, maxDocs=44421)
                0.0625 = fieldNorm(doc=359)
          0.15356484 = weight(abstract_txt:websites in 359) [ClassicSimilarity], result of:
            0.15356484 = score(doc=359,freq=2.0), product of:
              0.2715448 = queryWeight, product of:
                2.2934504 = boost
                6.398163 = idf(docFreq=200, maxDocs=44421)
                0.018505331 = queryNorm
              0.565523 = fieldWeight in 359, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.398163 = idf(docFreq=200, maxDocs=44421)
                0.0625 = fieldNorm(doc=359)
        0.16 = coord(4/25)
    
  5. Khoo, C.S.G.; Zhang, D.; Wang, M.; Yun, X.J.: Subject organization in three types of information resources : an exploratory study (2012) 0.08
    0.080156855 = sum of:
      0.080156855 = product of:
        0.40078425 = sum of:
          0.034796577 = weight(abstract_txt:types in 1831) [ClassicSimilarity], result of:
            0.034796577 = score(doc=1831,freq=2.0), product of:
              0.0881673 = queryWeight, product of:
                1.0670303 = boost
                4.4651284 = idf(docFreq=1388, maxDocs=44421)
                0.018505331 = queryNorm
              0.39466533 = fieldWeight in 1831, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4651284 = idf(docFreq=1388, maxDocs=44421)
                0.0625 = fieldNorm(doc=1831)
          0.059384897 = weight(abstract_txt:resource in 1831) [ClassicSimilarity], result of:
            0.059384897 = score(doc=1831,freq=3.0), product of:
              0.10999429 = queryWeight, product of:
                1.1918128 = boost
                4.987297 = idf(docFreq=823, maxDocs=44421)
                0.018505331 = queryNorm
              0.5398907 = fieldWeight in 1831, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.987297 = idf(docFreq=823, maxDocs=44421)
                0.0625 = fieldNorm(doc=1831)
          0.08774025 = weight(abstract_txt:step in 1831) [ClassicSimilarity], result of:
            0.08774025 = score(doc=1831,freq=2.0), product of:
              0.1633362 = queryWeight, product of:
                1.4523263 = boost
                6.0774503 = idf(docFreq=276, maxDocs=44421)
                0.018505331 = queryNorm
              0.5371758 = fieldWeight in 1831, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.0774503 = idf(docFreq=276, maxDocs=44421)
                0.0625 = fieldNorm(doc=1831)
          0.030784782 = weight(abstract_txt:social in 1831) [ClassicSimilarity], result of:
            0.030784782 = score(doc=1831,freq=1.0), product of:
              0.117187425 = queryWeight, product of:
                1.5066386 = boost
                4.2031517 = idf(docFreq=1804, maxDocs=44421)
                0.018505331 = queryNorm
              0.26269698 = fieldWeight in 1831, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2031517 = idf(docFreq=1804, maxDocs=44421)
                0.0625 = fieldNorm(doc=1831)
          0.18807775 = weight(abstract_txt:websites in 1831) [ClassicSimilarity], result of:
            0.18807775 = score(doc=1831,freq=3.0), product of:
              0.2715448 = queryWeight, product of:
                2.2934504 = boost
                6.398163 = idf(docFreq=200, maxDocs=44421)
                0.018505331 = queryNorm
              0.6926214 = fieldWeight in 1831, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.398163 = idf(docFreq=200, maxDocs=44421)
                0.0625 = fieldNorm(doc=1831)
        0.2 = coord(5/25)