mfioretti: archival*

Bookmarks on this page are managed by an admin user.

21 bookmark(s) - Sort by: Date ↓ / Title / Voting / - Bookmarks from other users for this tag

  1. What about the actual functioning of the application: What tweets are displayed to whom in what order? Every major social-networking service uses opaque algorithms to shape what data people see. Why does Facebook show you this story and not that one? No one knows, possibly not even the company’s engineers. Outsiders know basically nothing about the specific choices these algorithms make. Journalists and scholars have built up some inferences about the general features of these systems, but our understanding is severely limited. So, even if the LOC has the database of tweets, they still wouldn’t have Twitter.

    In a new paper, “Stewardship in the ‘Age of Algorithms,’” Clifford Lynch, the director of the Coalition for Networked Information, argues that the paradigm for preserving digital artifacts is not up to the challenge of preserving what happens on social networks.

    Over the last 40 years, archivists have begun to gather more digital objects—web pages, PDFs, databases, kinds of software. There is more data about more people than ever before, however, the cultural institutions dedicated to preserving the memory of what it was to be alive in our time, including our hours on the internet, may actually be capturing less usable information than in previous eras.

    “We always used to think for historians working 100 years from now: We need to preserve the bits (the files) and emulate the computing environment to show what people saw a hundred years ago,” said Dan Cohen, a professor at Northeastern University and the former head of the Digital Public Library of America. “Save the HTML and save what a browser was and what Windows 98 was and what an Intel chip was. That was the model for preservation for a decade or more.”

    Which makes sense: If you want to understand how WordPerfect, an old word processor, functioned, then you just need that software and some way of running it.

    But if you want to document the experience of using Facebook five years ago or even two weeks ago ... how do you do it?

    The truth is, right now, you can’t. No one (outside Facebook, at least) has preserved the functioning of the application. And worse, there is no thing that can be squirreled away for future historians to figure out. “The existing models and conceptual frameworks of preserving some kind of ‘canonical’ digital artifacts are increasingly inapplicable in a world of pervasive, unique, personalized, non-repeatable performances,” Lynch writes.

    Nick Seaver of Tufts University, a researcher in the emerging field of “algorithm studies,” wrote a broader summary of the issues with trying to figure out what is happening on the internet. He ticks off the problems of trying to pin down—or in our case, archive—how these web services work. One, they’re always testing out new versions. So there isn’t one Google or one Bing, but “10 million different permutations of Bing.” Two, as a result of that testing and their own internal decision-making, “You can’t log into the same Facebook twice.” It’s constantly changing in big and small ways. Three, the number of inputs and complex interactions between them simply makes these large-scale systems very difficult to understand, even if we have access to outputs and some knowledge of inputs.

    “What we recognize or ‘discover’ when critically approaching algorithms from the outside is often partial, temporary, and contingent,” Seaver concludes.

    The world as we experience it seems to be growing more opaque. More of life now takes place on digital platforms that are different for everyone, closed to inspection, and massively technically complex. What we don't know now about our current experience will resound through time in historians of the future knowing less, too. Maybe this era will be a new dark age, as resistant to analysis then as it has become now.

    If we do want our era to be legible to future generations, our “memory organizations” as Lynch calls them, must take radical steps to probe and document social networks like Facebook. Lynch suggests creating persistent, socially embedded bots that exist to capture a realistic and demographically broad set of experiences on these platforms. Or, alternatively, archivists could go out and recruit actual humans to opt in to having their experiences recorded, as ProPublica has done with political advertising on Facebook.
    Voting 0
  2. Greer’s archive includes floppy disks, tape cassettes and CD-roms, once cutting-edge technologies that are now obsolete. They are vulnerable to decay and disintegration, leftovers from the unrelenting tide of technological advancement. They will last mere decades, unlike the paper records, which could survive for hundreds of years.

    Buchanan and her team are now working out how to access, catalogue and preserve the thousands of files on these disks, some of them last opened in the 1980s. “We don’t really know what’s going to unfold,” Buchanan says.

    The Greer archivists are facing a challenge that extends far beyond the scope of their collection. Out of this process come enormous questions about the fate of records that are “born digital”, meaning they didn’t start out in paper form. Record-keepers around the world are worried about information born of zeroes and ones – binary code, the building blocks of any digital file.

    Archives are the paydirt of history. Everything else is opinion
    Germaine Greer

    Like floppy disks of the past, information stored on USB sticks, on shared drives or in the cloud is so easily lost, changed or corrupted that we risk losing decades of knowledge if we do not figure out how to manage it properly.

    Though the problem applies to everyone – from classic video-game enthusiasts to people who keep photos on smartphones – it is particularly pressing for universities and other institutions responsible for the creation and preservation of knowledge.
    Voting 0
  3. Getting the codec parameters right

    VHS is analogue and exhibits quite a lot of noise – and noise is not so well handled by modern compressors/codecs like xvid, h264 etc. Of course, you can store the v4l output uncompressed, raw, but prepare yourself for roughly 60-100GB of data for an hour of video. Thus we need to figure out a way to compress the digitised video and keep original quality as good as possible. In real-time, capturing TV signals is quite a challenge in terms of CPU stress, so all capturing /codec settings are always a trade-off between compression-quality and CPU speed.

    If you got a big harddrive or want to capture only a short segment of video, for best results and less problems on older systems, I would suggest a two step setup. First, capture uncompressed to a temp file, then run mencoder with a compression/codec combo that your machine normally couldn’t handle in real-time. CLI commands for that, to get you started:

    mencoder tv:// -tv channel=0:driver=v4l2:device=/dev/video0:normid=5:input=0:width=720:height=576:norm=PAL:fps=25:alsa:adevice=hw.1:forceaudio:brightness=0:contrast=0:hue=0:saturation=0:buffersize=128 -oac pcm -ovc copy -endpos 00:15:00 -o VHS1raw.avi

    This is for the “first pass”, a raw copy of the audio and video input. Note the “hw.1” part, it is the device id you get from cat /proc/asound/cards as the identifier of the stk1160’s audio device. This may change in-between boots/USB connects, depending on other soundcards in your system. Next is the buffersize, normally mencoder should adjust the buffer automatically, but giving it a forced value here seems to be no harm. Instead of using -oac copy I use -oac pcm, which is more or less the same, I think, but I once got a strange error with “copy” about frame-sizes and never again saw that with “pcm” which I think muxes better than the 16bit little endian stuff the stk1160 outputs. -endpos obviously tells mencoder to stop after 15 minutes, as that is what we want to grab here. The command should give you a low one-digit percentage CPU usage value and quite a lot to do for your harddrive.
    After that, it’s time for a “second pass”, although that is not a second pass as it is often referred to in video compression, it’s more a second, the real “transcode pass”:

    mencoder -ovc xvid -xvidencopts fixed_quant=4 -oac mp3lame -lameopts cbr:br=128 -ofps 25 -o VHS1.avi VHS1raw.avi

    So this seconds command is the real one. We encode to XVid here, with standard settings, with a “quality target” of 4 – which means a variable bitrate to reach a certain quality. Visual comparisons, for me, came out with 4 being a very good setting, with bitrates around 2500-3500 kbits for video. Later on, we’ll see that 5 is a bit faster to compress and still ok. For the audio part the command uses mp3 with a constant bitrate of 128. We could add a bit of downsampling from 48000 Hz in the original raw audio to 44.1 KHz to shave off some more bits, but a variable bitrate might be more useful to achieve that.
    On a low-range Intel Core2Duo, I get transcoding framerates between 12-14 fps, so transcoding these 15 minutes would take half an hour.

    Note that the grabbed video will have square pixels while the original TV/VHS video has non-square pixels. So adjust your player to present the video with an aspect ratio of 4:3. Otherwise the video played back on a desktop computer will be slightly compressed horizontally. Sadly, embedding this info into the file so that players adjust this automatically doesn’t work reliably.
    Voting 0
  4. Digital evidence storage for legal matters is a common practice. As the use of Solid State Drives (SSD) in consumer and enterprise computers has increased, so too has the number of SSDs in storage increased. When most, if not all, of the drives in storage were mechanical, there was little chance of silent data corruption as long as the environment in the storage enclosure maintained reasonable thresholds. The same is not true for SSDs.

    A stored SSD, without power, can start to lose data in as little as a single week on the shelf.

    SSDs have a shelf life. They need consistent access to a power source in order for them to not lose data over time. There are a number of factors that influence the non-powered retention period that an SSD has before potential data loss. These factors include amount of use the drive has already experienced, the temperature of the storage environment, and the materials that comprise the memory chips in the drive.

    The Joint Electron Device Engineering Council (JEDEC) defines standards for the microelectronics industry, including standards for SSDs. One of those standards is an endurance rating. One of the factors for this rating is that an SSD retains data with power off for the required time for its application class.

    For client application SSDs, the powered-off retention period standard is one year while enterprise application SSDs have a powered-off retention period of three months. These retention periods can vary greatly depending on the temperature of the storage area that houses SSDs.

    In a presentation by Alvin Cox on JEDEC's website titled "JEDEC SSD Specifications Explained" PDF warning » , graphs on slide 27 show that for every 5 degrees C (9 degrees F) rise in temperature where the SSD is stored, the retention period is approximately halved. For example, if a client application SSD is stored at 25 degrees C (77 degrees F) it should last about 2 years on the shelf under optimal conditions. If that temperature goes up 5 degrees C, the storage standard drops to 1 year.

    The standards change dramatically when you consider JEDEC's standards for enterprise class drives. The storage standard for this class of drive at the same operating temperature as the consumer class drive drops from 2 years under optimal conditions to 20 weeks. Five degrees of temperature rise in the storage environment drops the data retention period to 10 weeks. Overall, JEDEC lists a 3-month period of data retention as the standard for enterprise class drives.
    Voting 0
  5. "We save it as a picture as it's longer life than a file. You don't rely on PowerPoint or Word. In 50 years they can still just look at it,"
    Voting 0
  6. One of the most important decisions you face when scanning anything with your scanner is choosing what dpi (“dots per inch”) to scan with. And specifically for this post, what is the best dpi to use when scanning and archiving your 8×10″ and smaller paper photographic prints – which for most people, make up the majority of our pre-digital collection.

    Making this decision was very challenging for me and certainly a huge part of my 8 year delay. The reason for this is that dpi is the critical variable in a fairly simple mathematical equation that will determine several important outcomes for your digital images:
    Voting 0
  7. that is a great question. And you’re right, up until now I have not covered what I feel is the best file format(s) to save scanned photos with. But, as you astutely noticed, I did sort of allude to my personal choice in a couple of my posts. Especially in some of my images I used in my 3-part “naming convention” series you brought up called What Everyone Ought to Know When Naming Your Scanned Photos.

    I think your question actually deserves a slightly more complex answer than I could normally get away with. Had you simply asked, “Which do you prefer for scanning photos, the TIFF or PNG format?”, I would feel comfortable quickly answering you that in my humble opinion, the TIFF format is by far more superior for the purpose of scanning photos. But, since you brought up your interest in “archiving” your photographs, I want to make sure I elaborate a bit more to explain why our personal goals of scanning need to be considered when making the final decision which file format to save our master image files.
    Voting 0
  8. If TheKhanly truly made out like a bandit, netting $9.35 per ad per thousand views, and if each listener stuck out all 14 ads, TheKhanly made around $175,000 in two years.

    No matter how much or little he or she generated, in all likelihood TheKhanly, who could not be reached for comment, has made far more money off Follow The Leader than a weed dealer does off an ounce of kush, or a sex worker off a common trick.

    For that matter, uploading an album to a website requires arguably less savvy and effort than dealing drugs or prostituting. TheKhanly theoretically could be making bank off the least taxing form of counterfeiting possible, appealing to a guaranteed audience of dermatologists and schoolteachers and Target clerks who only need to type “korn leader” into a search box.

    The ease of finding this material is facilitated by Google's omnipresence, bringing us back to Google's mission.

    "Look at Google's » name," Steven Levy, author of In The Plex: How Google Thinks, Works, and Shapes Our Lives, told me. "It’s a really big number. Google all along has been about operating on a scale that was tough to imagine before the internet age.”

    While Google’s early competitors like Altavista and Yahoo may have included little perks like collecting news or weather, Google has turned into an aggregator of everything from merchandise prices to metrics for linguistic trends.

    The founder of Network Awesome, Jason Forrest, considers the site’s curatorial effort an antidote to “your Buzzfeeds and Mashables, which » get paid to focus on this very lowbrow mainstream."

    The easy access to David Lynch’s television commercials, a compendium of videos from Chicago’s drill scene, and a PBS documentary on Carl Jung, for example, validates Forrest’s claim that the site uses similar mechanisms as those clickbait powerhouses to “supply a never-ending stream of inspirations.” YouTube’s complicity in this stream cannot be understated, as Network Awesome is, at the end of the day, a mechanism for comprehending the multitudes contained by the archive.

    And the very need for an entity like Network Awesome says a great deal about how YouTube is handling its librarian duties. Searching for Beyonce’s “Single Ladies” several years after it was a hit gives you the sense that YouTube is less like the Library of Congress or Alexandria and more like a hoarder’s house where the plastic plates from the Labor Day barbecue are piled on top of the good china.

    You will find the official "Single Ladies" music video, several “lyric videos” boasting audio of varying quality, smart phone videos of the song performed live, parodies, and acapella covers. Google and YouTube are perhaps not archiving entities with a mission to preserve, rather with one to hoard information simply because they can, suffering from what the late Jacques Derrida would call “archive fever.”

    having your music listened to at the same place where people stream fail videos and ‘I like turtles’, it really makes music seem like trash, just junk you click on and forget about.”

    This viewpoint might ring a bit extreme—especially if you’re in the camp that believes the ability to jump from a remix of a girl getting hit with a shovel to a Laurie Spiegel composition is somehow kind of beautiful—it does raise the question of worth as human creative energy morphs into, simply, a piece of content.
    Voting 0
  9. Imagine in the not-too-distant future, your entire genome is on archival storage and accessed by your doctors for critical medical decisions. You'd want that data to be safe from hackers and data corruption, wouldn't you? Oh, and it would need to be error-free and accessible for about a hundred years too. The problem is, we currently don't have the data integrity, security and format migration standards to ensure that, according to Henry Newman at Enterprise Storage Forum. Newman calls for standards groups to add new features like collision-proof hash to archive interfaces and software.

    'It will not be long until your genome is tracked from birth to death. I am sure we do not want to have genome objects hacked or changed via silent corruption, yet this data will need to be kept maybe a hundred or more years through a huge number of technology changes. The big problem with archiving data today is not really the media, though that too is a problem. The big problem is the software that is needed and the standards that do not yet exist to manage and control long-term data
    Voting 0
  10. Roma - Secondo l'avvocato generale della Corte di Giustizia dell'Unione Europea Niilo Jääskinen, uno stato membro può autorizzare le biblioteche a "digitalizzare, senza l'accordo dei titolari dei diritti d'autore, opere da essa detenute nella propria collezione per proporle su posti di lettura elettronica": ma solo sottostando a determinate condizioni.

    In ballo c'è l'interpretazione della direttiva europea EUCD (2001/29/CE) sul diritto d'autore che stabilisce che gli stati membri devono riconoscere agli autori il diritto esclusivo di autorizzare o vietare la riproduzione e la comunicazione al pubblico delle loro opere, ma che altresì consente loro di prevedere eccezioni e limitazioni a tale diritto.

    Tali eccezioni, in particolare, sono previste qualora l'utilizzo dell'opera, come quella contenuta nel catalogo di una biblioteca, non sia soggetta a vincoli di vendita o di licenza e "quando l'utilizzo abbia come scopo la comunicazione o la messa a disposizione, a singoli individui, a scopo di ricerca o di attività privata di studio, su terminali dedicati situati nei locali delle istituzioni di cui trattasi".
    Voting 0

Top of the page

First / Previous / Next / Last / Page 1 of 3 Online Bookmarks of M. Fioretti: Tags: archival

About - Propulsed by SemanticScuttle