KEMBAR78
Planning and Managing Digital Library & Archive Projects | PPT
Metropolitan New York Library Council ~ March 23, 2011 Dr. Anthony Cocciolo ~ Assistant Professor Pratt Institute ~ School of Information and Library Science
Workshop Schedule 10a – 1pm Introduction & Workshop Overview Developing a Strategy for Success Managing Digital Assets: Born-digital and conversion 1pm – 2pm – Lunch! 2 – 4pm Creating an Infrastructure: Technical, Organizational and Resources Evaluating your Project
 
 
 
 
 
 
 
 
 
 
What is a Digital Library? focused collection of digital objects, including text, video, and audio, along with methods for access and retrieval, and for selection, organization, and maintenance of the collection. Witten, Bainbridge and Nichols (2010)
 
 
 
Digital Archives
 
 
 
 
 
 
 
Geostoryteller
 
 
 
 
 
Introductions Name What are you currently up to? (Student, Working as Librarian, Archivist, etc. at  X  Institution, Looking for work) Why are you interested in this class? (Starting a Digital Library, my boss made me, etc.)
Planning & Managing  Digital Library & Archive Projects Developing a Strategy for Success
 
Digital Libraries and Archives are  Socio-technical systems.
 
 
 
 
 
 
 
 
Setting an agenda for a Digital Library/Archive Project Trends in Information Use If it’s not easy to get at...  Social media, social nature of information Community Needs Assessment Survey, make it representative Focus groups, Interviews Problems with… Use your institution's creativity; hold a design event.
Sample Size Calculator
Design Event Have someone(s) facilitate the event; be responsible for moving the event forward.  Schedule for a 2.5-4 hour event, with working lunch in the middle. Assemble various stakeholders from across the institution.  Provide  background information .  Divide into groups with members of diverse backgrounds Icebreaker activity, warm-up activities (looking at good & bad digital libraries with targeted questions), and design the digital library user experience, using simple materials (markers, etc.)  Present out to the group as a whole
 
 
 
 
PocketKnowledge Login  |  About PocketKnowledge Teachers College, Columbia University ______________________ Search Communities  Tags  Authors  Uploaders Sub Community Money 5 items my pocket  |  add to pocket  |  create community pocket  | browse all pockets all pockets  > money class Money Class  (edit) Welcome to the money class, the richest Group of students at TC. PIC XML view: thumbnail |  list sort: alphabetical |  date  |  popularity role: all |  student  |  staff  |  faculty  |  other Community A 52 items Intersect with View all Community B 32 items Intersect with View all Community C 32 items Intersect with View all 0 comments RSS Document 1 Firstname Lastname Date
 
 
 
 
A good strategy should… be focused on your users and how it will benefit them. Focus on the needs of the collection, divorced from this factor, could lead you to a product with no users. Grant funders: worst thing is to create something that just sits there  (no impact, low use). How will this digital project impact your community?
On Strategy What will community members learn from this project?  How will you know if they have learned something from your project? Why would someone be intrinsically motivated to use your digital library? How will your project advance specific learning outcomes (class goals), or more general learning outcomes (critical thinking, illiteracies)?
Talking Strategy Get into groups of 4 Pick a digital project you have worked on or are hoping to start working on.  What is your strategy for success? Who is your community? How will it impact your community?  What will individuals learn from using it?  Why is it an important project?  Why do you think your strategy is a good one? How will you know if it is successful?
Planning & Managing  Digital Library & Archive Projects Managing Digital Assets: Born-digital and conversion
Living in a hybrid world Two paradigms: Digitizing artifacts paradigm History / Old Stuff Finite  Not something that will go on forever (although to some degree we will always discover old objects; archaeology)  Capturing digital material paradigm Bizarre middle ground
 
 
Born digital Does the person own the material they are giving to you? Is it copyrighted?  How about Creative Commons licensing? Terms of use – what will the creator allow you to do with it?  Formats- do you have the best copy? Who will create metadata for it?
Digital Conversion Can you digitize? Who can you make that digitization available to? Legal  Preservation- If it is falling apart (e.g., audio, film) Public Domain – life of author +70 years International Publication, Only make available to your community  DMCA Litigious Persons – Dance Project Ethical – LHA project
Making Digital Images Create Digital Masters Can create a variety of derivatives from the master for access needs What scanning settings to choose? Use the Cornell approach (using Quality Index) Choose an already developed standard for type of visual media
 
 
Bitonal: ppi= 3QI/.039h Color/Gray: ppi= 2QI/.039h  QI: barely legible (3.0), marginal (3.6), good (5.0), and excellent (8.0); h is height in mm of smallest detail
 
 
 
 
 
 
Some problems Would not be a problem if this was a derivative of a digital master. Uses Arial font, not invented until 1982 (1906 document) Lost page numbers Headers and footers?  Usually include a bit of citation information. Formatting is not faithful to original Other info?  Advertisements? Lose any traces of how this was bound as a book  (context it was used).  Makes you start to question the authenticity, especially if the PDF gets disconnected from the rest of the collection (e.g., this PDF was “discovered”).  Would a historian want to use this?  Human Error & Computer error of changing image to digital text CS way of thinking: but all the data is there!
 
 
 
Digitizing Audio The minimum: 44.1 kHZ 16-bit Stereo, 2-Channel More info in  Sound Directions   book (web reference)
Metadata
DACS EAD MARC Other output formats
 
Dublin Core 1. TITLE  2. CREATOR  3. SUBJECT  4. DESCRIPTION  5. PUBLISHER  6. CONTRIBUTORS  7. DATE  8. TYPE  9. FORMAT  10. IDENTIFIER  11. SOURCE  12. LANGUAGE  13. RELATION  14. COVERAGE  15. RIGHTS MANAGEMENT
Computer generated metadata Determining the language of a digital document is very accurate (99+% correct)
Most Digital Libraries are run on a  CMS The user interface for the database management system (like MySQL), making the DB user-friendly and appropriate for website’s function. Usually a public-side and staff side; varying degrees of control of the CMS. YouTube is a big CMS. A CMS runs on one or more servers.
Server Running an OS, such as Linux, MacOSX Server, Windows Server 2008.  Dif. Database server: like MySql, Oracle Content Management System: like Omeka, Dspace File System: Containing digital files (.wav, .pdf, etc.) Switches and Routers, connected to Internet Service Providers or other Wide Area Networks, Academic Networks Internet  (same thing as the other blob below)
CMS Infrastructure LAMP Linux – the operating system – like Windows or Mac OS X except good for web servers Apache – the webserver – responses to http requests The Microsoft equivalent is IIS – Internet Information Server.  Apache is run mostly on Linux and Mac Server, and occasionally on Windows. MySQL – the relational database management system PHP – the programming language that the CMS is written in Contrast with WAMP, Server vs. Personal Computer
 
 
Outsourcing Create a detailed projected timeline What date you can expect each deliverable.  Don’t let the timeline slip; hold the vendor accountable for the timeline; ask for discounts if slips from timeline Create a detailed budget Itemize each component
Handout example
 
Planning & Managing  Digital Library & Archive Projects Creating an infrastructure: Technical, Organizational & Resource
Hollywood Fewer than half of the feature films before 1950 have survived Less than 20% survive from the 1920s
 
One of the biggest movies of 1954. Nominated for 6 Academy Awards, winner of 2 Winner of 2 Golden Globes
Archival Masters With the advent of TV and ability to re-broadcast movies on TV, followed by advent of VHS players, Hollywood began to realize that there was a monetary incentive to keep archival masters so the film could be reproduced onto different media (TVs, VHS tape, DVD).
 
Film Preservation “ Film in the Freezer”, “Store and Ignore” Private Vaults
 
 
Long term access Hollywood: Want to ensure archival masters for at least 100 years Most libraries and archive strive for something like “eternal” access.
 
Challenge There is no hardware and software that can ensure long term access alone; the media will break down anywhere from 5 to 10 years. “ Store and ignore” while concentrating on environmental conditions (like humidity & temperature) will not work. For example, magnetic hard drives cannot be stored on a shelf for longer periods of time.  This is because the internal lubrication will be affected by “stiction,” where internal components lock up.  Magnetic hard drives should be powered on a spinning.  Still have a limited operational lifetime.
 
Doing Digital Preservation Permanence in the digital sense means ongoing and systematic preservation process; an active management approach is required. It is more like maintaining a car, than putting a book on a shelf.
Implications (1) That means that the data will be migrated on a schedule Factor migration time (labor), costs in budget and in strategic plans Should be talking in terms of $/TB/year  Labor and electricity costs should be factored in, not just media costs Should be including backup and other multiple copies you will be making Example last week was misleading, must always factor in time.
Implications (2) Media (CDs, DVDs, Blurays, Gold DVDs), hard drives, on a shelf or under a desk is not good digital archive strategy. If you see this, know that it is bad practice, and work to change it. (Trusted) Digital Repository that is (almost) always powered, redundant, and backed-up is the best strategy.
Implications (3) Heavy use is one of the best defenses against digital loss. Patrons will notice if something is amiss. Complete opposite of physical preservation.
Managing Digital Content Physical media is almost never an appropriate digital preservation strategy.  Most commercial sites aren’t either.
 
 
 
 
Trusted Digital Repository You can make your  own Trusted Digital  Repository or join a  group that has one.
Organizational Infrastructure Policy framework Mission statement Financial sustainability/framework (Columbia example) Organizational viability Have a succession plan
Technology Redundant hard disks Backup, move to offsite, security Physical security, staff w/security Physical environment (Air conditioning, above 80 deg F, redundant) Electricity (UPS, Backup generator, surve, voltage regulartor), Power is always on. Piggy back on what IT is already doing, if they are doing a enterprise records management system (e.g., Banner, PeopleSoft, Datatel).
 
 
Evaluating your Project Planning & Managing  Digital Library & Archive Projects
On Evaluating Evaluation is usually started after something has completed or have had time to be used. Used to inform decisions (replication, discontinuation, refinements, more investment, etc.) Alternative is to do mini-evaluations with user community as you develop. This can be a challenge if you don’t have a user community yet  (e.g., have your mom try it out). Evaluation is not the same as usability
Evaluation Methods Quantitative: Analysis of numerical data (surveys, logs) Criticized for not getting at what people really think Qualitative: Analysis of words (e.g., interview transcript), pictures, objects Criticized for being biased, not representative Mixed Methods: Depending on decisions that you are trying to make, you may want to triangulate (use multiple methods to get at what you are looking for).  Example: Survey, Focus Groups & Transaction Log Analysis.  Of course, ability to do all that is budget & time constraints.
Sampling Whichever method you use, sampling is important Get a representative sample that accurately represents the entire population Sampling is not important where you capture 100% of the data, such as in transaction log analysis Qualitative Methods You can remove the interpretive bias by using formal qualitative data analysis methods Use independent coders of transcripts to see the extent to which your interpretations coincide.
Compare alongside past projects
Thank you. Anthony Cocciolo [email_address]

Planning and Managing Digital Library & Archive Projects

  • 1.
    Metropolitan New YorkLibrary Council ~ March 23, 2011 Dr. Anthony Cocciolo ~ Assistant Professor Pratt Institute ~ School of Information and Library Science
  • 2.
    Workshop Schedule 10a– 1pm Introduction & Workshop Overview Developing a Strategy for Success Managing Digital Assets: Born-digital and conversion 1pm – 2pm – Lunch! 2 – 4pm Creating an Infrastructure: Technical, Organizational and Resources Evaluating your Project
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
    What is aDigital Library? focused collection of digital objects, including text, video, and audio, along with methods for access and retrieval, and for selection, organization, and maintenance of the collection. Witten, Bainbridge and Nichols (2010)
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
    Introductions Name Whatare you currently up to? (Student, Working as Librarian, Archivist, etc. at X Institution, Looking for work) Why are you interested in this class? (Starting a Digital Library, my boss made me, etc.)
  • 32.
    Planning & Managing Digital Library & Archive Projects Developing a Strategy for Success
  • 33.
  • 34.
    Digital Libraries andArchives are Socio-technical systems.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
    Setting an agendafor a Digital Library/Archive Project Trends in Information Use If it’s not easy to get at... Social media, social nature of information Community Needs Assessment Survey, make it representative Focus groups, Interviews Problems with… Use your institution's creativity; hold a design event.
  • 44.
  • 45.
    Design Event Havesomeone(s) facilitate the event; be responsible for moving the event forward. Schedule for a 2.5-4 hour event, with working lunch in the middle. Assemble various stakeholders from across the institution. Provide background information . Divide into groups with members of diverse backgrounds Icebreaker activity, warm-up activities (looking at good & bad digital libraries with targeted questions), and design the digital library user experience, using simple materials (markers, etc.) Present out to the group as a whole
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
    PocketKnowledge Login | About PocketKnowledge Teachers College, Columbia University ______________________ Search Communities Tags Authors Uploaders Sub Community Money 5 items my pocket | add to pocket | create community pocket | browse all pockets all pockets > money class Money Class (edit) Welcome to the money class, the richest Group of students at TC. PIC XML view: thumbnail | list sort: alphabetical | date | popularity role: all | student | staff | faculty | other Community A 52 items Intersect with View all Community B 32 items Intersect with View all Community C 32 items Intersect with View all 0 comments RSS Document 1 Firstname Lastname Date
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
    A good strategyshould… be focused on your users and how it will benefit them. Focus on the needs of the collection, divorced from this factor, could lead you to a product with no users. Grant funders: worst thing is to create something that just sits there (no impact, low use). How will this digital project impact your community?
  • 56.
    On Strategy Whatwill community members learn from this project? How will you know if they have learned something from your project? Why would someone be intrinsically motivated to use your digital library? How will your project advance specific learning outcomes (class goals), or more general learning outcomes (critical thinking, illiteracies)?
  • 57.
    Talking Strategy Getinto groups of 4 Pick a digital project you have worked on or are hoping to start working on. What is your strategy for success? Who is your community? How will it impact your community? What will individuals learn from using it? Why is it an important project? Why do you think your strategy is a good one? How will you know if it is successful?
  • 58.
    Planning & Managing Digital Library & Archive Projects Managing Digital Assets: Born-digital and conversion
  • 59.
    Living in ahybrid world Two paradigms: Digitizing artifacts paradigm History / Old Stuff Finite Not something that will go on forever (although to some degree we will always discover old objects; archaeology) Capturing digital material paradigm Bizarre middle ground
  • 60.
  • 61.
  • 62.
    Born digital Doesthe person own the material they are giving to you? Is it copyrighted? How about Creative Commons licensing? Terms of use – what will the creator allow you to do with it? Formats- do you have the best copy? Who will create metadata for it?
  • 63.
    Digital Conversion Canyou digitize? Who can you make that digitization available to? Legal Preservation- If it is falling apart (e.g., audio, film) Public Domain – life of author +70 years International Publication, Only make available to your community DMCA Litigious Persons – Dance Project Ethical – LHA project
  • 64.
    Making Digital ImagesCreate Digital Masters Can create a variety of derivatives from the master for access needs What scanning settings to choose? Use the Cornell approach (using Quality Index) Choose an already developed standard for type of visual media
  • 65.
  • 66.
  • 67.
    Bitonal: ppi= 3QI/.039hColor/Gray: ppi= 2QI/.039h QI: barely legible (3.0), marginal (3.6), good (5.0), and excellent (8.0); h is height in mm of smallest detail
  • 68.
  • 69.
  • 70.
  • 71.
  • 72.
  • 73.
  • 74.
    Some problems Wouldnot be a problem if this was a derivative of a digital master. Uses Arial font, not invented until 1982 (1906 document) Lost page numbers Headers and footers? Usually include a bit of citation information. Formatting is not faithful to original Other info? Advertisements? Lose any traces of how this was bound as a book (context it was used). Makes you start to question the authenticity, especially if the PDF gets disconnected from the rest of the collection (e.g., this PDF was “discovered”). Would a historian want to use this? Human Error & Computer error of changing image to digital text CS way of thinking: but all the data is there!
  • 75.
  • 76.
  • 77.
  • 78.
    Digitizing Audio Theminimum: 44.1 kHZ 16-bit Stereo, 2-Channel More info in Sound Directions book (web reference)
  • 79.
  • 80.
    DACS EAD MARCOther output formats
  • 81.
  • 82.
    Dublin Core 1.TITLE 2. CREATOR 3. SUBJECT 4. DESCRIPTION 5. PUBLISHER 6. CONTRIBUTORS 7. DATE 8. TYPE 9. FORMAT 10. IDENTIFIER 11. SOURCE 12. LANGUAGE 13. RELATION 14. COVERAGE 15. RIGHTS MANAGEMENT
  • 83.
    Computer generated metadataDetermining the language of a digital document is very accurate (99+% correct)
  • 84.
    Most Digital Librariesare run on a CMS The user interface for the database management system (like MySQL), making the DB user-friendly and appropriate for website’s function. Usually a public-side and staff side; varying degrees of control of the CMS. YouTube is a big CMS. A CMS runs on one or more servers.
  • 85.
    Server Running anOS, such as Linux, MacOSX Server, Windows Server 2008. Dif. Database server: like MySql, Oracle Content Management System: like Omeka, Dspace File System: Containing digital files (.wav, .pdf, etc.) Switches and Routers, connected to Internet Service Providers or other Wide Area Networks, Academic Networks Internet (same thing as the other blob below)
  • 86.
    CMS Infrastructure LAMPLinux – the operating system – like Windows or Mac OS X except good for web servers Apache – the webserver – responses to http requests The Microsoft equivalent is IIS – Internet Information Server. Apache is run mostly on Linux and Mac Server, and occasionally on Windows. MySQL – the relational database management system PHP – the programming language that the CMS is written in Contrast with WAMP, Server vs. Personal Computer
  • 87.
  • 88.
  • 89.
    Outsourcing Create adetailed projected timeline What date you can expect each deliverable. Don’t let the timeline slip; hold the vendor accountable for the timeline; ask for discounts if slips from timeline Create a detailed budget Itemize each component
  • 90.
  • 91.
  • 92.
    Planning & Managing Digital Library & Archive Projects Creating an infrastructure: Technical, Organizational & Resource
  • 93.
    Hollywood Fewer thanhalf of the feature films before 1950 have survived Less than 20% survive from the 1920s
  • 94.
  • 95.
    One of thebiggest movies of 1954. Nominated for 6 Academy Awards, winner of 2 Winner of 2 Golden Globes
  • 96.
    Archival Masters Withthe advent of TV and ability to re-broadcast movies on TV, followed by advent of VHS players, Hollywood began to realize that there was a monetary incentive to keep archival masters so the film could be reproduced onto different media (TVs, VHS tape, DVD).
  • 97.
  • 98.
    Film Preservation “Film in the Freezer”, “Store and Ignore” Private Vaults
  • 99.
  • 100.
  • 101.
    Long term accessHollywood: Want to ensure archival masters for at least 100 years Most libraries and archive strive for something like “eternal” access.
  • 102.
  • 103.
    Challenge There isno hardware and software that can ensure long term access alone; the media will break down anywhere from 5 to 10 years. “ Store and ignore” while concentrating on environmental conditions (like humidity & temperature) will not work. For example, magnetic hard drives cannot be stored on a shelf for longer periods of time. This is because the internal lubrication will be affected by “stiction,” where internal components lock up. Magnetic hard drives should be powered on a spinning. Still have a limited operational lifetime.
  • 104.
  • 105.
    Doing Digital PreservationPermanence in the digital sense means ongoing and systematic preservation process; an active management approach is required. It is more like maintaining a car, than putting a book on a shelf.
  • 106.
    Implications (1) Thatmeans that the data will be migrated on a schedule Factor migration time (labor), costs in budget and in strategic plans Should be talking in terms of $/TB/year Labor and electricity costs should be factored in, not just media costs Should be including backup and other multiple copies you will be making Example last week was misleading, must always factor in time.
  • 107.
    Implications (2) Media(CDs, DVDs, Blurays, Gold DVDs), hard drives, on a shelf or under a desk is not good digital archive strategy. If you see this, know that it is bad practice, and work to change it. (Trusted) Digital Repository that is (almost) always powered, redundant, and backed-up is the best strategy.
  • 108.
    Implications (3) Heavyuse is one of the best defenses against digital loss. Patrons will notice if something is amiss. Complete opposite of physical preservation.
  • 109.
    Managing Digital ContentPhysical media is almost never an appropriate digital preservation strategy. Most commercial sites aren’t either.
  • 110.
  • 111.
  • 112.
  • 113.
  • 114.
    Trusted Digital RepositoryYou can make your own Trusted Digital Repository or join a group that has one.
  • 115.
    Organizational Infrastructure Policyframework Mission statement Financial sustainability/framework (Columbia example) Organizational viability Have a succession plan
  • 116.
    Technology Redundant harddisks Backup, move to offsite, security Physical security, staff w/security Physical environment (Air conditioning, above 80 deg F, redundant) Electricity (UPS, Backup generator, surve, voltage regulartor), Power is always on. Piggy back on what IT is already doing, if they are doing a enterprise records management system (e.g., Banner, PeopleSoft, Datatel).
  • 117.
  • 118.
  • 119.
    Evaluating your ProjectPlanning & Managing Digital Library & Archive Projects
  • 120.
    On Evaluating Evaluationis usually started after something has completed or have had time to be used. Used to inform decisions (replication, discontinuation, refinements, more investment, etc.) Alternative is to do mini-evaluations with user community as you develop. This can be a challenge if you don’t have a user community yet (e.g., have your mom try it out). Evaluation is not the same as usability
  • 121.
    Evaluation Methods Quantitative:Analysis of numerical data (surveys, logs) Criticized for not getting at what people really think Qualitative: Analysis of words (e.g., interview transcript), pictures, objects Criticized for being biased, not representative Mixed Methods: Depending on decisions that you are trying to make, you may want to triangulate (use multiple methods to get at what you are looking for). Example: Survey, Focus Groups & Transaction Log Analysis. Of course, ability to do all that is budget & time constraints.
  • 122.
    Sampling Whichever methodyou use, sampling is important Get a representative sample that accurately represents the entire population Sampling is not important where you capture 100% of the data, such as in transaction log analysis Qualitative Methods You can remove the interpretive bias by using formal qualitative data analysis methods Use independent coders of transcripts to see the extent to which your interpretations coincide.
  • 123.
  • 124.
    Thank you. AnthonyCocciolo [email_address]

Editor's Notes

  • #8 Transforming a very old and venerable academic library
  • #9 - Packing it up
  • #10 Into something more interactive, collaborative, and engaging for the students, faculty and staff at the college. If you’ve been there recently you will know what I mean. A big part of it is moving large swaths of functions to digital.
  • #13 Merged institutional repository, digitral library, digital archive Social archive: Institutional repository for current work of faculty, staff, students Web 2.0 design patterns Digitization of archives for prior work - be able to distinguish self-archiving for library archiving efforts
  • #67 Stroke
  • #113 Does not deplete the more it gets used