IronBits
04-29-2006, 04:26 PM
Took me awhile to figure out how to strip Unicode characters. :bang:
Some of you that do XML and have Perl loaded, if you could give this a whirl and see how it parses 'real' XML files I'd appreciate it.
Should be easy to parse DATA out of the resulting output file ;)
The XML file needs to be named backup.xml
Then type: whateveryounamedthisperlfilename.pl backup.xml > test.txt
Then, notepad text.txt - should be easily readable...
#!/usr/bin/perl -w
# Open file, read one line at a time
my $data_file = 'backup.xml';
open DATA, "$data_file" or die "can't open $data_file $!";
while (<DATA>) {
# invisible control characters and unused code points with unicode UTF-16.
$_ =~ s/ÿþ//gi;
$_ =~ s/\p{C}//gi;
$_ =~ s/cM//gi;
$_ =~ s/^M//gi;
$_ =~ s/\n//gi;
$_ =~ s/></>\n</gi;
$_ =~ s/><\//>\n<\//gi;
print "$_";
}
close (DATA);
Some of you that do XML and have Perl loaded, if you could give this a whirl and see how it parses 'real' XML files I'd appreciate it.
Should be easy to parse DATA out of the resulting output file ;)
The XML file needs to be named backup.xml
Then type: whateveryounamedthisperlfilename.pl backup.xml > test.txt
Then, notepad text.txt - should be easily readable...
#!/usr/bin/perl -w
# Open file, read one line at a time
my $data_file = 'backup.xml';
open DATA, "$data_file" or die "can't open $data_file $!";
while (<DATA>) {
# invisible control characters and unused code points with unicode UTF-16.
$_ =~ s/ÿþ//gi;
$_ =~ s/\p{C}//gi;
$_ =~ s/cM//gi;
$_ =~ s/^M//gi;
$_ =~ s/\n//gi;
$_ =~ s/></>\n</gi;
$_ =~ s/><\//>\n<\//gi;
print "$_";
}
close (DATA);