Web Collection Development Guidelines


The University of Texas at San Antonio Special Collections Department began partnering with the Internet Archive's Archive-It Program in September of 2009 to preserve web content that is of enduring value to both South Texas and the University of Texas at San Antonio. The Archive-It Program allows UTSA's Special Collection Department to capture relevant web content and ensure its long-term access through the Internet Archive's website. The Archive-It Program selectively crawls either web domains or individual web pages, taking a snapshot of the page, and storing a copy in the Internet Archive. The web page is then made publicly accessible on both the University of Texas at San Antonio Special Collections' homepage and at the Archive-It partner page.


The University of Texas at San Antonio Special Collections Department identifies either a web domain (e.g. http://www.utsa.edu ) or a particular website (e.g. http://www.utsa.edu ) to archive, and then periodically makes a copy of the web site to ensure long-term access to the web site's content. Depending upon the frequency in which the web content changes over time, the web crawler may be set to crawl a particular web site either twice daily, daily, weekly, monthly, quarterly, semi-annually, annually, or a single time. Since most web sites either self-archive a majority their content, or only change their content periodically, UTSA only crawls most web sites on either a semi-annual or annual basis. After the web site or web domain is crawled by Archive-IT, the web content is described and indexed by Special Collections staff and made available on the Archive-IT web site.

Types of Web Content Collected:

  • Official University of Texas at San Antonio web sites
  • University-affiliated web sites (student organization web sites, Facebook, Twitter, Flickr pages created by organizations at the University of Texas at San Antonio)
  • Web sites relevant to Texas and the Southwest (Border Studies, Gender Studies, South Texas and San Antonio History)
  • Web pages (HTML), photographs (jpegs and tiffs), embedded video (mpegs), embedded audio (wav., mp3), and PDFs
  • Publicly available web content that is not password protected

Types of Web Content Not Collected:

  • Web sites created by individual students
  • Password protected sites
  • Databases
  • Calendars
  • JavaScripting
  • Streaming audio/video
  • YouTube videos
  • Web sites that have robots.txt exclusion requests


UTSA does not claim copyright to any of the materials within the archive. It is the sole responsibility of the user to determine the copyright status of archival collections before publishing materials.

Frequently Asked Questions (FAQs)