#341304 - 18/01/2011 19:26
directory with zillions of files
|
carpal tunnel
Registered: 30/04/2000
Posts: 3810
|
Here's an oddball one.
I just got a giant Bluray in the mail with a large number of scanned ballots (TIFF files). So far as I can tell, they're all in one single directory, hundreds of thousands of them, and the total disk had 21GB of data on it. Tools like 'ls' and whatnot just don't work because they take too long to run. Just to make things more fun, there appear to be some hard errors at the end of the Bluray disk.
After much pain, 'dd conv=noerror' appeared to let me cleanly extract an ISO from the Bluray, so now I've got that on my hard drive and I'm now dealing with a local ISO image. Clearly, I need to extract files into a bunch of subdirectories.
The problem is getting to the point where I even know the damn file names. I've mounted the ISO, I go into the directory, and 'ls' again just sits and spins, completely consuming one core of my CPU, in kernel space. Surprisingly, even 'find . -print' is unhelpful. All it prints is "." before the kernel again gets slammed. After ten minutes, it hasn't printed a single file name.
So... any advice on how I can extract these files from the ISO image?
|
Top
|
|
|
|
#341305 - 18/01/2011 19:32
Re: directory with zillions of files
[Re: DWallach]
|
carpal tunnel
Registered: 13/07/2000
Posts: 4180
Loc: Cambridge, England
|
Recent libcdio comes with iso-info and iso-read tools that might help. Even if you end up needing to debug your image, at least using them means you'd be debugging in userland.
Peter
|
Top
|
|
|
|
#341306 - 18/01/2011 19:41
Re: directory with zillions of files
[Re: peter]
|
carpal tunnel
Registered: 30/04/2000
Posts: 3810
|
iso-read seems to insist that you know the filename you're trying to get out. It doesn't have a 'dump everything' option. Grumble. I really didn't want to be hacking libcdio just to get at my damn files.
|
Top
|
|
|
|
#341307 - 18/01/2011 19:41
Re: directory with zillions of files
[Re: DWallach]
|
carpal tunnel
Registered: 27/06/1999
Posts: 7058
Loc: Pittsburgh, PA
|
ls on many platforms sorts all entries by default. Try ls -U, maybe?
|
Top
|
|
|
|
#341308 - 18/01/2011 19:43
Re: directory with zillions of files
[Re: peter]
|
carpal tunnel
Registered: 24/12/2001
Posts: 5528
|
What OS you doing this on?
I assume the incredibly long wait for nothing is caused by the errors.
The UDF Tool package is unmaintained now and there is a complete lack of a fsck.udf tool as well. There was a UDF Verifier tool by Phillips Research but that seems to have vanished and it doesn't fix anything anyway.
|
Top
|
|
|
|
#341309 - 18/01/2011 19:44
Re: directory with zillions of files
[Re: DWallach]
|
carpal tunnel
Registered: 29/08/2000
Posts: 14496
Loc: Canada
|
Instead of ls, try this: echo *
-ml
|
Top
|
|
|
|
#341310 - 18/01/2011 19:51
Re: directory with zillions of files
[Re: mlord]
|
carpal tunnel
Registered: 30/04/2000
Posts: 3810
|
This is on my iMac (Core 2 Duo, Mac OS X 10.6.5). I mounted the file with the default options (i.e., I just ran "open" on the ISO file). I'm trying "echo *" right now and it's just sitting there, again with one core slammed to 100% in the kernel, the other core idle.
Any thoughts on tooling to extract the files within?
|
Top
|
|
|
|
#341311 - 18/01/2011 19:52
Re: directory with zillions of files
[Re: mlord]
|
carpal tunnel
Registered: 18/06/2001
Posts: 2504
Loc: Roma, Italy
|
... or, you may use RAR command line, in any OS, without the need to mount the ISO, but by simply having RAR operate on the .ISO file itself, in the file system.
Edit: I was assuming *NIX OS. I don't know if you have RAR on a Mac. If not, you may just copy the .ISO to some windows box of any type that you may have around.
Edited by taym (18/01/2011 19:55)
_________________________
= Taym = MK2a #040103216 * 100Gb *All/Colors* Radio * 3.0a11 * Hijack = taympeg
|
Top
|
|
|
|
#341312 - 18/01/2011 19:57
Re: directory with zillions of files
[Re: DWallach]
|
carpal tunnel
Registered: 29/08/2000
Posts: 14496
Loc: Canada
|
MMmm.. yes, I suppose "echo *" is really no improvement, since it wants to find them all before printing the line. Duh. Best bet is a simple C program to break them up, using readdir() to walk the directory. I can give you a basic readdir program that you can hack away at, if you like.
|
Top
|
|
|
|
#341313 - 18/01/2011 20:17
Re: directory with zillions of files
[Re: mlord]
|
carpal tunnel
Registered: 29/08/2000
Posts: 14496
Loc: Canada
|
I can give you a basic readdir program that you can hack away at, if you like. This could be done more robustly, using execve() rather than system(), but it's probably good enough:
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <dirent.h>
#include <string.h>
#include <errno.h>
static void do_something (const char *dpath, const char *this_name)
{
char tmp[16384]; /* BIG dumb buffer, rather than trying to be clever */
/*
* Insert code here to do whatever with "this_name".
* Here is an example of how to do something:
*/
char destination[] = "/tmp";
sprintf(tmp, "/bin/cp \"%s/%s\" \"%s\"", dpath, this_name, destination);
printf("===> %s\n", tmp);
fflush(stdout);
if (-1 == system(tmp))
perror(tmp);
}
int main (int argc, char *argv[])
{
const char *dpath;
struct dirent *de;
DIR *dp;
if (argc != 2) {
fprintf(stderr, "Usage: %s <dir>\n", argv[0]);
return 1;
}
dpath = argv[1];
dp = opendir(dpath);
if (dp == NULL) {
perror(dpath);
return 1;
}
errno = 0;
while ((de = readdir(dp)) != NULL) {
const char *this_name = de->d_name;
if (strcmp(this_name, ".") && strcmp(this_name, "..")) {
printf("%s\n", this_name);
fflush(stdout);
do_something(dpath, this_name);
}
errno = 0; /* for next readdir() call */
}
if (errno) {
perror(dpath);
return 1;
}
return 0; /* all done */
}
Edited by mlord (18/01/2011 20:22)
|
Top
|
|
|
|
#341314 - 18/01/2011 20:18
Re: directory with zillions of files
[Re: mlord]
|
carpal tunnel
Registered: 29/08/2000
Posts: 14496
Loc: Canada
|
Actually, that's more or less how the find command does things.. I wonder if a plain old find . works?
|
Top
|
|
|
|
#341315 - 18/01/2011 20:20
Re: directory with zillions of files
[Re: mlord]
|
carpal tunnel
Registered: 30/04/2000
Posts: 3810
|
Based on your idea, I wrote my own C program to do the same basic thing. Now I've got all 223K filenames. Next task is to see what I want to do about them.
As I said up top, "find . -print" failed. I have no idea why.
|
Top
|
|
|
|
#341316 - 18/01/2011 20:23
Re: directory with zillions of files
[Re: DWallach]
|
carpal tunnel
Registered: 13/07/2000
Posts: 4180
Loc: Cambridge, England
|
iso-read seems to insist that you know the filename you're trying to get out. It doesn't have a 'dump everything' option. Grumble. I really didn't want to be hacking libcdio just to get at my damn files. iso-info with -f or -l to get the filenames, then iso-read to get them out. Peter
|
Top
|
|
|
|
#341317 - 18/01/2011 20:27
Re: directory with zillions of files
[Re: DWallach]
|
carpal tunnel
Registered: 29/08/2000
Posts: 14496
Loc: Canada
|
|
Top
|
|
|
|
#341318 - 18/01/2011 20:33
Re: directory with zillions of files
[Re: mlord]
|
carpal tunnel
Registered: 30/04/2000
Posts: 3810
|
The extraction is running now. I'm modestly curious why "find" didn't work, but that's a question for another day.
|
Top
|
|
|
|
#341323 - 18/01/2011 21:39
Re: directory with zillions of files
[Re: DWallach]
|
carpal tunnel
Registered: 25/12/2000
Posts: 16706
Loc: Raleigh, NC US
|
"dmesg" might have provided some reasonably useful output. Also the other syslog-type logs. An "strace" or equivalent ("dtruss" in MacOS X 10.6) might have also been helpful.
_________________________
Bitt Faulk
|
Top
|
|
|
|
#341324 - 18/01/2011 22:25
Re: directory with zillions of files
[Re: wfaulk]
|
carpal tunnel
Registered: 24/12/2001
Posts: 5528
|
Who on earth made this disc anyway? They use some sort of packet writing software and just kept dumping files onto it?
|
Top
|
|
|
|
#341325 - 19/01/2011 01:02
Re: directory with zillions of files
[Re: tman]
|
carpal tunnel
Registered: 17/12/2000
Posts: 2665
Loc: Manteca, California
|
Who on earth made this disc anyway? They use some sort of packet writing software and just kept dumping files onto it? The first post said "ballots" and "tiff" so it's likely the take from a scanner type electronic voting machine.
_________________________
Glenn
|
Top
|
|
|
|
#341337 - 19/01/2011 14:39
Re: directory with zillions of files
[Re: gbeer]
|
carpal tunnel
Registered: 30/04/2000
Posts: 3810
|
Syslogs were not very helpful. They indicated I/O errors where there were unreadable bits of the disk. That's about it.
Who made this? A municipality that was providing me with copies of its ballots. I'm grateful for the data (now nicely segregated into hundreds of subdirectories). But what a pain to get at it.
For what it's worth, there's absolutely no standards document of any kind that indicates how somebody might give you digital scans of a million ballots.
|
Top
|
|
|
|
#341341 - 19/01/2011 18:15
Re: directory with zillions of files
[Re: gbeer]
|
carpal tunnel
Registered: 20/12/1999
Posts: 31602
Loc: Seattle, WA
|
The first post said "ballots" and "tiff" so it's likely the take from a scanner type electronic voting machine. Ah, I see that the security at Diebold is up to its usual high standards of quality.
|
Top
|
|
|
|
#341342 - 19/01/2011 18:29
Re: directory with zillions of files
[Re: tfabris]
|
carpal tunnel
Registered: 30/04/2000
Posts: 3810
|
I'm honestly not sure what vendor this particular data set came from, but I'm pretty sure it's not Diebold. What's amazing is that the scans are one-bit (black and white) and fairly low resolution (you can't read most of the text printed on the ballot). This makes it difficult if you want to write a post-facto ballot scanner.
|
Top
|
|
|
|
#341356 - 20/01/2011 02:18
Re: directory with zillions of files
[Re: DWallach]
|
carpal tunnel
Registered: 17/12/2000
Posts: 2665
Loc: Manteca, California
|
Sounds like what you have been given is designed to meet the letter of the law.
That could make it tough if they were handing out randomized ballots, like when Schwarzenegger was first elected, and there were something like 50 names on the ballot for Governor.
Are you auditing these for your own entertainment or someone else?
_________________________
Glenn
|
Top
|
|
|
|
#341374 - 20/01/2011 16:39
Re: directory with zillions of files
[Re: gbeer]
|
carpal tunnel
Registered: 30/04/2000
Posts: 3810
|
I'm working with some attorneys who are in the midst of a public interest lawsuit. I can't really get into the details right now. Suffice to say that I may need to crunch my way through a whole lot of these ballots.
|
Top
|
|
|
|
#341380 - 20/01/2011 21:57
Re: directory with zillions of files
[Re: DWallach]
|
carpal tunnel
Registered: 13/02/2002
Posts: 3212
Loc: Portland, OR
|
scans are one-bit (black and white) and fairly low resolution (you can't read most of the text printed on the ballot). Can you post one?
Edited by canuckInOR (20/01/2011 21:58) Edit Reason: fixed tag.
|
Top
|
|
|
|
#341398 - 21/01/2011 03:50
Re: directory with zillions of files
[Re: canuckInOR]
|
carpal tunnel
Registered: 30/04/2000
Posts: 3810
|
Not right now. Maybe later.
|
Top
|
|
|
|
#341416 - 21/01/2011 16:05
Re: directory with zillions of files
[Re: DWallach]
|
carpal tunnel
Registered: 13/02/2002
Posts: 3212
Loc: Portland, OR
|
No prob. I figured you might be under an NDA, or something, but was curious...
|
Top
|
|
|
|
|
|