mfioretti: archiving*

Bookmarks on this page are managed by an admin user.

21 bookmark(s) - Sort by: Date ↓ / Title / Voting / - Bookmarks from other users for this tag

  1. What about the actual functioning of the application: What tweets are displayed to whom in what order? Every major social-networking service uses opaque algorithms to shape what data people see. Why does Facebook show you this story and not that one? No one knows, possibly not even the company’s engineers. Outsiders know basically nothing about the specific choices these algorithms make. Journalists and scholars have built up some inferences about the general features of these systems, but our understanding is severely limited. So, even if the LOC has the database of tweets, they still wouldn’t have Twitter.

    In a new paper, “Stewardship in the ‘Age of Algorithms,’” Clifford Lynch, the director of the Coalition for Networked Information, argues that the paradigm for preserving digital artifacts is not up to the challenge of preserving what happens on social networks.

    Over the last 40 years, archivists have begun to gather more digital objects—web pages, PDFs, databases, kinds of software. There is more data about more people than ever before, however, the cultural institutions dedicated to preserving the memory of what it was to be alive in our time, including our hours on the internet, may actually be capturing less usable information than in previous eras.

    “We always used to think for historians working 100 years from now: We need to preserve the bits (the files) and emulate the computing environment to show what people saw a hundred years ago,” said Dan Cohen, a professor at Northeastern University and the former head of the Digital Public Library of America. “Save the HTML and save what a browser was and what Windows 98 was and what an Intel chip was. That was the model for preservation for a decade or more.”

    Which makes sense: If you want to understand how WordPerfect, an old word processor, functioned, then you just need that software and some way of running it.

    But if you want to document the experience of using Facebook five years ago or even two weeks ago ... how do you do it?

    The truth is, right now, you can’t. No one (outside Facebook, at least) has preserved the functioning of the application. And worse, there is no thing that can be squirreled away for future historians to figure out. “The existing models and conceptual frameworks of preserving some kind of ‘canonical’ digital artifacts are increasingly inapplicable in a world of pervasive, unique, personalized, non-repeatable performances,” Lynch writes.

    Nick Seaver of Tufts University, a researcher in the emerging field of “algorithm studies,” wrote a broader summary of the issues with trying to figure out what is happening on the internet. He ticks off the problems of trying to pin down—or in our case, archive—how these web services work. One, they’re always testing out new versions. So there isn’t one Google or one Bing, but “10 million different permutations of Bing.” Two, as a result of that testing and their own internal decision-making, “You can’t log into the same Facebook twice.” It’s constantly changing in big and small ways. Three, the number of inputs and complex interactions between them simply makes these large-scale systems very difficult to understand, even if we have access to outputs and some knowledge of inputs.

    “What we recognize or ‘discover’ when critically approaching algorithms from the outside is often partial, temporary, and contingent,” Seaver concludes.

    The world as we experience it seems to be growing more opaque. More of life now takes place on digital platforms that are different for everyone, closed to inspection, and massively technically complex. What we don't know now about our current experience will resound through time in historians of the future knowing less, too. Maybe this era will be a new dark age, as resistant to analysis then as it has become now.

    If we do want our era to be legible to future generations, our “memory organizations” as Lynch calls them, must take radical steps to probe and document social networks like Facebook. Lynch suggests creating persistent, socially embedded bots that exist to capture a realistic and demographically broad set of experiences on these platforms. Or, alternatively, archivists could go out and recruit actual humans to opt in to having their experiences recorded, as ProPublica has done with political advertising on Facebook.
    https://www.theatlantic.com/technolog...ans-to-understand-our-internet/547463
    Voting 0
  2. Tapes (VHS, Hi8, Etc.)
    Analog-to-digital video converters are the most common tools for digitizing your old tapes. In fact, your DV camera may have conversion capabilities. If you can input composite audio and video or s-video to your DV camera you can use it to digitize just about any analog source. All you'll need to get it onto your computer is software that can handle a DV stream. Pretty much any video editing software made in the last decade can capture DV so you won't be hard-pressed to find something. Apple's iMovie and Final Cut Express/Pro (Mac), Windows Movie Maker (Windows), and Kino (Linux) are just a few examples. If you don't have a DV Camera, you can also use TV Tuner cards with composite input or DV bridges made specifically for the purpose of converting analog video. For more information, Videohelp.org covers the analog-to-digital conversion process in greater detail.

    While newer video recording formats tend to avoid or minimize the pitfalls of quality degradation through use, VHS tapes do not provide that luxury. You may find that when digitizing especially worn-out VHS tapes, the digital signal will cut out due to something as minor as a little jitter. This is due to a break in the timecode on the VHS tape. The simplest way around this issue is to obtain a high-quality, professional VHS deck with a time-base corrector. The time-base corrector will generate the timecode instead of the actual tape and this will prevent the jitter. While these decks were once fairly expensive, you can now find them used online for a fairly reasonable price.

    Most analog video formats can be digitized by utilizing a converter, but in the case of Hi8 tapes you have another option. Sony created Digital8 camcorders that have the ability to digitize Hi8 tapes in-camera and output a DV signal.

    Though the suggestion so far has been to save video in the DV codec, with enough video you'll need a significant amount of disk space to store it. If you're comfortable with more aggressive compression, encoding your newly digitized videos in MPEG4 or H.264 will help save a significant amount of space. Encoding at a data rate of around 2mbps and an audio data rate of 192kbps should provide you with a smaller file and a negligible loss of quality.
    http://lifehacker.com/5557695/the-ste...p-guide-to-digitizing-your-life#video
    Tags: , , , , by M. Fioretti (2016-03-17)
    Voting 0
  3. “Music today has become wallpaper,” says Trevor Jackson, a hyphenate resident of the graphic design and electronic music worlds. He’s got a point: thanks to Spotify, Pandora, Songza, Rdio, Beats, Shazam, and untold number of other streaming upstarts out there, music is more accessible, and therefore ubiquitous, than ever before.

    There are merits and evils to that (some of which have been debated on WIRED in recent months), but one con in particular has been irking Jackson: the format through which we get our music. These days, music—a powerful art form that gets created with many different magnificent machines that have been improved upon over the ages—is really just a bunch of files accessed through a screen. For Jackson’s generation, he says, that’s not as satisfying: “Unless something physically exists, unless I can touch it, it doesn’t have the same significance. I like records, I like noises, creases, tears, things you can’t have digitally.”
    http://www.wired.com/2015/01/argument...ds-8-track-tapes/?mbid=social_twitter
    Voting 0
  4. The average life of a Web page is about a hundred days. Strelkov’s “We just downed a plane” post lasted barely two hours. It might seem, and it often feels, as though stuff on the Web lasts forever, for better and frequently for worse: the embarrassing photograph, the regretted blog (more usually regrettable not in the way the slaughter of civilians is regrettable but in the way that bad hair is regrettable). No one believes any longer, if anyone ever did, that “if it’s on the Web it must be true,” but a lot of people do believe that if it’s on the Web it will stay on the Web. Chances are, though, that it actually won’t. In 2006, David Cameron gave a speech in which he said that Google was democratizing the world, because “making more information available to more people” was providing “the power for anyone to hold to account those who in the past might have had a monopoly of power.” Seven years later, Britain’s Conservative Party scrubbed from its Web site ten years’ worth of Tory speeches, including that one. Last year, BuzzFeed deleted more than four thousand of its staff writers’ early posts, apparently because, as time passed, they looked stupider and stupider. Social media, public records, junk: in the end, everything goes.

    Web pages don’t have to be deliberately deleted to disappear. Sites hosted by corporations tend to die with their hosts. When MySpace, GeoCities, and Friendster were reconfigured or sold, millions of accounts vanished.
    http://www.newyorker.com/magazine/2015/01/26/cobweb
    Voting 0
  5. Twitpic is just the latest example of a website filled with user uploaded content getting shut down before any archives are made.

    The Archive Team collective started in 2009. Soon after, Yahoo! announced it was shutting down Geocities, a service allowing Yahoo users to host their own website for free.

    While a lot of websites hosted on the Geocities platform are by today’s standards old and ugly, they give a glimpse into the early days of the Internet, when millions of people suddenly discovered it.

    “What we were facing, you see, was the wholesale destruction of the still-rare combination of words and digital heritage, the erasing and silencing of hundreds of thousands of voices, voices that representing the dawn of what one might call “regular people” joining the World Wide Web,” wrote the collective at the time.

    “A surprising amount of people came forward to help, everyone from coders and archivists through designers and supporters” said Scott.

    “Over the last five years, we’ve been involved in hundreds of smaller projects and dozens of larger ones to provide at least some record of websites that are going away.”

    In 2010 the Archive Team managed to grab 10 Terabyte (Tb) of data from Friendster, a precursor to Facebook, created in 2002.

    More recently they retrieved 8 Tb from Google Reader accounts, as Google decided to shut down the RSS feed reader.

    Keeping archives is tremendously important says Marie-Pierre Aubé, Director of Records Management and Archives at Concordia University in Montreal. “It explains where we’re coming from and how our society works.”

    But as the amount of data generated every day has increased exponentially, archivists are finding themselves constrained by the resources available.

    “In one day we’re creating as much data as was created in one year during the 1900s” Aubé says.

    As the Archive Team is celebrates its fifth birthday, still only a minority of companies are being pro-active, or allow their users to export their own archives.
    http://www.theglobeandmail.com/techno...medium=twitter&utm_source=twitterfeed
    Voting 0
  6. Imagine in the not-too-distant future, your entire genome is on archival storage and accessed by your doctors for critical medical decisions. You'd want that data to be safe from hackers and data corruption, wouldn't you? Oh, and it would need to be error-free and accessible for about a hundred years too. The problem is, we currently don't have the data integrity, security and format migration standards to ensure that, according to Henry Newman at Enterprise Storage Forum. Newman calls for standards groups to add new features like collision-proof hash to archive interfaces and software.

    'It will not be long until your genome is tracked from birth to death. I am sure we do not want to have genome objects hacked or changed via silent corruption, yet this data will need to be kept maybe a hundred or more years through a huge number of technology changes. The big problem with archiving data today is not really the media, though that too is a problem. The big problem is the software that is needed and the standards that do not yet exist to manage and control long-term data
    http://www.enterprisestorageforum.com...g-the-future-of-data-archiving-1.html
    Voting 0
  7. We’re going to show you how to set up a mass document scanning system. To accomplish this, we’ll make use of a variety of Linux tools. The advantage of this approach is that you can tailor the process to match your specific requirements. This gives you a system that can handle big jobs and that is open to a lot of customisation.

    We’ll be looking at two possible end products: a PDF file in which each page is a scan of an original page, and a text file that contains the textual content of the original pages. The content of the text file is searchable and we cover a couple of ways of making it into a PDF file.

    This tutorial is modular. For example, if you are dealing with a set of pre-scanned images, you can skip the initial steps and move straight on to OCRing them or converting them into a PDF file. By the same token, if you prefer to use a GUI tool for some parts of the process, there’s nothing to stop you. That said, we’ve tried to make every part of the process scriptable for complete automation.
    http://www.linuxuser.co.uk/tutorials/...c-mass-scanning-of-documents-tutorial
    Voting 0
  8. pax -w writes to standard output. ssh reads standard input and attaches it to whatever utility is invoked, which of course in this case is pax again. pax -r reads from standard input and creates the files from that archive.

    pax is one of the lesser known utilities in a typical Linux installation. But it's both simple and versatile, well worth the time it takes to learn—recommended.
    http://www.linuxjournal.com/content/make-peace-pax
    Voting 0
  9. I took along a powerful camera, believing, as I always have, that it would be an indispensable creative tool. But I returned with the unshakeable feeling that I’m done with cameras, and that most of us are, if we weren’t already.
    http://www.newyorker.com/online/blogs/elements/2013/12/goodbye-cameras.html
    Voting 0
  10. ISO 19005-2:2011 specifies the use of the Portable Document Format (PDF) 1.7, as formalized in ISO 32000-1, for preserving the static visual representation of page-based electronic documents over time.
    http://www.iso.org/iso/home/store/cat...c/catalogue_detail.htm?csnumber=50655
    Voting 0

Top of the page

First / Previous / Next / Last / Page 1 of 3 Online Bookmarks of M. Fioretti: Tags: archiving

About - Propulsed by SemanticScuttle