Find strings in a large file and write each one to a second file in Python
I'm new to Python (And programming generally). To make a work project
easier, I am trying to write some code that searches an XML file for
certain tags and copies the contents to a second file. The file I need to
read from is about 165MB, and will have 10s of thousands of entries to
pull out.
I have successfully made it work for small files (Working from example
code on forums such as this one), but it falls apart above a certain size
(It starts copying large portions of the XML, instead of just the required
strings). I imagine this is because of how I have defined my variables.
Can someone give me a pointer or sample code, to fix this? I'm surprised
it works as far as it does!
This is the code I have now:
text = open("UPC_Small.xml", "r")
lines = text.read()
fo = open("output.log", "wt")
crid1 = 0
while True:
crid1 = lines.find('<ProgramInformation programId="crid://bds.tv/',crid1)
crid2 = lines.find('">',crid1)
crid_string = (lines[crid1+45:crid2])
if crid1 == -1:
fo.write("End of File")
fo.close()
break
title1 = lines.find('<Title xml:lang="EN" type="main">',crid2)
title2 = lines.find('</Title>',title1)
title_string = (lines[title1+33:title2])
genre1 = lines.find('<Name xml:lang="EN">',title2)
genre2 = lines.find('</Name>',genre1)
genre_string = (lines[genre1+20:genre2])
fo.write(crid_string + "|" + title_string + "|" + genre_string + "\n")
No comments:
Post a Comment