Jump to content

User:Gadfium/scripts

From Wikipedia, the free encyclopedia

Electorate.py

[edit]

This is a python 3.x script which reads the 2011 New Zealand electorate result pages and outputs Wikipedia tables for the results.

To run it, use

python electorate.py [electorate number]

The output goes to a file named "Electorate-nn.txt"

The script has been modified minimally from a 2008 version. Most of the changes were required to run under python 3.x.

The interpreter used was python 3.2 under Windows 7. If you want a version which works under python 2.x, see the history of this page. Although I was a professional programmer in a past life, I've never worked professionally with Python and I don't know the language well. I tend to use it purely as a procedural language, although it has far more powerful concepts.

The Windows command prompt has trouble with Python producing the macron in the name of the Māori Party in its stdout stream. While it is possible to circumvent this using "SET PYTHONIOENCODING = UTF8" at the command prompt, it seems unreasonable to expect any potential end user of this program to do that, so I have written results directly to a file rather than to stdout as in previous versions. Candidates with accented characters in their names may still need adjustment. There is a crude method of converting candidate surnames to mixed case form, which mostly works for candidates with British-style names but fails for at least one Pacific Island candidate.

No attempt is made to add changes from the previous election. The script is only aware of the current results, not previous ones.

When run, it is necessary to fix the electorate name in the majority of cases, to match the Wikipedia naming of articles and to fix problems caused by macrons in names. It is also necessary to add the {{MMP election box majority hold}} or equivalent template at the end of the results, and to alter the incumbent template if the incumbent did not win the seat (ie, change {{MMP election box incumbent win}} to {{MMP election box candidate win}} and if the incumbent was a candidate, also change their template to {{MMP election box candidate lose}}

If adapting for similarly structured 2014 results, add parties with a list to the longPartyName structure, and parties with local candidates to the partyName structure. Parties with local candidates not added to partyName will cause this script to abort.


"""electorate.py

A python program to parse the contents of New Zealand electorate result pages
and produce wiki tables from them.

Accepts a parameter of the electorate number to process, and writes to a file called
Electorate-nn.txt. The contents of this file can be manually pasted into the appropriate
Wikipedia entry. This program does not alter Wikipedia itself.
"""

__author__ = "gadfium"
__version__ = "$Revision: 2.1 $"
__date__ = "$Date: 2012/01/03 17:00 $"
__copyright__ = "None"
__license__ = "public domain"

currentUrl = "http://electionresults.govt.nz/electionresults_2011/electorate-"
previousUrl = "http://electionresults.govt.nz/electionresults_2008/electorate-"

#partyName contains the Wikipedia name of the party, the long name used by ElectionResults.govt.nz, and a bool to use 
#{{MMP election box local party candidate}} instead of {{MMP election box candidate}} - use this when Wikipedia 
#doesn't have templates set up for party shortname, party colour etc.
partyName = {'NAT': ("New Zealand National Party", "National Party", False),
             'LAB': ("New Zealand Labour Party", "Labour Party", False),
             'GP': ("Green Party of Aotearoa New Zealand", "Green Party", False),
             'RONZP': ("The Republic of New Zealand Party", "The Republic of New Zealand Party", False),
             'ALL': ("Alliance (New Zealand political party)", "Alliance", False),
             'IND': ("Independent (politician)", "Independent", False),
             'UFNZ': ("United Future New Zealand", "United Future", False),
             'ALCP': ("Aotearoa Legalise Cannabis Party", "Aotearoa Legalise Cannabis Party", False),
	     'MAOR': ("Māori Party", "Māori Party", False),
             'HR': ("[[Human Rights Party (New Zealand)|Human Rights]]", "Independent", True),
             'JAP': ("New Zealand Progressive Party", "Jim Anderton's Progressive", False),
             'ACT': ("ACT New Zealand", "ACT New Zealand", False),
             'RAM': ("Residents Action Movement", "RAM - Residents Action Movement", False),
	     'NZF': ("New Zealand First", "New Zealand First Party", False),
	     'KIWI': ("The Kiwi Party", "Kiwi Party", False),
	     'WP': ("Workers Party of New Zealand", "Workers Party", False),
	     'NZDSC': ("New Zealand Democratic Party", "Democrats for Social Credit", False),
	     'FAM': ("Family Party", "Family Party", False),
	     'RATC': ("Restore All Things In Christ", "Independent", True),
	     'NCAWAP': ("[[No Commercial Airport at Whenuapai Airbase Party|No Commercial Airport at Whenuapai]]", "Independent", True),
	     'LIB': ("Libertarianz", "Libertarianz", False),
	     'NZPP': ("New Zealand Pacific Party", "New Zealand Pacific Party", False),
	     'CL': ("[[Communist League (New Zealand)|Communist League]]", "Independent", True),
	     'DDP': ("Direct Democracy Party of New Zealand", "Independent", False),
	     'McG': ("McGillicuddy Serious Party", "Independent", True),
	     'NZRP': ("New Zealand Representative Party", "Independent", True),
	     'ANYIPP': ("Aotearoa NZ Youth Party", "Independent", True),
	     'NZEE': ("Economic Euthenics",  "Independent", True),
	     'HP': ("Hapu Party",  "Independent", False),
         'CNSP': ("Conservative Party of New Zealand", "Conservative Party", True),
         'MANA': ("Mana Party (New Zealand)", "Mana", True),
         'NZSP': ("New Zealand Sovereignty Party", "Independent", False),
         'PIR': ("Pirate Party of New Zealand",  "Independent", True),
         'NEP': ("New Economics Party", "Independent", False),
         'ANYP': ("Youth", "Independent", False),
         'NI': ("Nga Iwi Morehu Movement", "Independent", False)
}

#longPartyName is used to translate between the long name of the party used by ElectionResults.govt.nz
#and the Wikipedia name. We could use the table above, but this is easier.
longPartyName = {"National Party": "New Zealand National Party",
                 "Labour Party": "New Zealand Labour Party",
                 "Green Party": "Green Party of Aotearoa New Zealand",
                 "Alliance": "Alliance (New Zealand political party)",
                 "Independent": "Independent (politician)",
                 "United Future": "United Future New Zealand",
                 "Jim Anderton's Progressive": "New Zealand Progressive Party",
                 "RAM - Residents Action Movement": "Residents Action Movement",
                 "Democrats for Social Credit": "New Zealand Democratic Party",
                 "Kiwi Party": "The Kiwi Party",
                 "New Zealand First Party": "New Zealand First",
                 "The Bill and Ben Party": "Bill and Ben Party",
                 "Workers Party": "Workers Party of New Zealand",
                 "Conservative Party": "Conservative Party of New Zealand",
                 "Mana": "Mana Party (New Zealand)"
}

import sys
import urllib.request, urllib.error
import locale

class electorateResults():
    def __init__(self, url):
        sock = urllib.request.urlopen(url)
        rawhtml = sock.read()
        sock.close()
        try:
            html = rawhtml.decode("UTF-8")
        except UnicodeDecodeError:
            html = rawhtml.decode("ISO-8859-1")
        self.candidateList = []
        start = html.find("Candidates")
        start = html.find("&nbsp;</td><td>", start)+15 # non-breaking space before each candidate entry
        partystart = start
        independent = False
        while start>15:
            candidate = html[start:html[start:].find("<")+start]
            if (candidate == "&nbsp;") or (candidate == ""): #independent candidate, or party with no local candidate
                start += 15
                independent = True
            else:
                partystart= html.find("<td>", start) + 4
                party = html[partystart: html.find("<", partystart)]
                votes = getNumeric(html[partystart:], party)
                partyVotes = getNumeric(html, partyName[party][1]) 
                if partyVotes == '': partyVotes = '0'
                record = (candidate, votes, party, independent, partyVotes)
                self.candidateList.append(record)
                independent = False
            start = html.find("&nbsp;</td><td>", start)+15 # find next candidate

        self.candidateList.sort(key=lambda record: asInt(record[1]), reverse=True) 
        
        #construct a list of parties without local candidates
        self.partyList = []
        start = html.find("<tr class=", partystart) + 22
        end = html.find("TOTAL")
        while start > 22 and start < end:
            party = html[start: html.find("<", start)]
            votes = getNumeric(html[start:], party)
            record = (party, votes)
            self.partyList.append(record)
            start = html.find("<tr class=", start) + 22

        self.partyList.sort(key=lambda record: asInt(record[1]), reverse=True)

        start = html.find("Official Count Results -- ") + 26
        self.electorateName = html[start:html.find("<", start)]

        #get totals
        self.partyInformals = getNumeric(html, "Party Informals")
        self.partyTotal = getNumeric(html, "TOTAL") # get the first "TOTAL" figure
        self.partyVotes = asInt(self.partyTotal) - asInt(self.partyInformals)
        self.candidateInformals = getNumeric(html, "Candidate Informals")
        self.candidateTotal = getNumeric(html[html.find("TOTAL")+5:], "TOTAL") # get the second "TOTAL" figure
        self.candidateVotes = asInt(self.candidateTotal) - asInt(self.candidateInformals)
    
def constructName (person):
    #given a name in the format "SMITH, John", return "John Smith"
    comma = person.find(",")
    if comma == -1: return person
    surname = person[0] + person[1:comma].lower()
    for i in range(0, len(surname)): #recapitalise after a space, apostrophe or hyphen
	    if surname[i] in [' ', "'", '-']:
		    surname = surname[:i+1] + surname[i+1:].capitalize()
    if surname[:3] == "Mac": surname = "Mac" + surname[3:].capitalize() #Handle Mac and Mc, although there are exceptions
    if surname[:2] == "Mc": surname = "Mc" + surname[2:].capitalize()   # to capitalising after these prefixes
    return person[comma+2:] + ' ' + surname

def getNumeric (html, key):
    #find and return the first right-aligned string following the given key
    start = html.find(key) #find section containing the key
    start = html.find('right">', start)+7 #now find the number following it
    return html[start:html.find("<", start)].strip()

def asInt (figure):
    #remove any commas and return an integer
    clean = figure.replace(",", "")
    return int(clean)

def printElectorate (currentResults, previousResults, url, electorate):
    #construct the MMP election box begin template
    output = "{{MMP election box begin |title=[[New Zealand general election, 2011|General Election 2011]]: "
    output += "[[" + currentResults.electorateName + "]]<ref>[" + url + " 2011 election results]</ref>}}\n"

    #construct MMP election box candidate templates
    winner = True
    
    for candidate in currentResults.candidateList:  # (candidate, votes, party, independent, partyVotes)
        if winner: #first candidate is the one with the most votes, so gets a special template
            output += "{{MMP election box incumbent win|" #change manually if not the incumbent
        elif partyName[candidate[2]][2]: # special template for some small parties
            output += "{{MMP election box local party candidate|"
            output += "\n |color = #7F8E89"
        else:
            output += "{{MMP election box candidate|"
        output += "\n |party         =" + partyName[candidate[2]][0]
        output += "\n |candidate     =" 
        if winner: output += "[[" #winner gets wikilinked
        output += constructName(candidate[0])
        if winner:
            output += "]]" #winner gets wikilinked
            winner = False
        output += "\n |votes         =" + candidate[1]
        percentage = asInt(candidate[1]) * 100.0 / currentResults.candidateVotes
        output += "\n |percentage    = %2.2f" % percentage
        partyPercentage = asInt(candidate[4]) * 100.0 / currentResults.partyVotes
        change, partyChange = "-" * 2
        for prevCandidate in previousResults.candidateList:
            if candidate[2] == prevCandidate[2]:
                prevPercentage = asInt(prevCandidate[1]) * 100.0 / previousResults.candidateVotes
                change = "%2.2f" % (percentage - prevPercentage)
                prevPartyPercentage = asInt(prevCandidate[4]) * 100.0 / previousResults.partyVotes
                partyChange = "%2.2f" % (partyPercentage - prevPartyPercentage)
                break
        output += "\n |change        = " + change
        if candidate[3]:  # Independent candidate
            output += "\n |party votes   = -"
            output += "\n |party percent = -"
        else:
            output += "\n |party votes   =" + candidate[4]
            output += "\n |party percent = %2.2f" % partyPercentage

        output += "\n |party change  = " + partyChange
        output += "\n}}\n"

    #construct MMP election box candidate template for parties without a local candidate
    for party in currentResults.partyList:  # (party, votes)
        output += "{{MMP election box candidate|"
        output += "\n |party = " + longPartyName.get(party[0],party[0])
        output += "\n |candidate = -"
        output += "\n |votes ="
        output += "\n |percentage ="
        output += "\n |change ="
        output += "\n |party votes = " + party[1]
        percentage = asInt(party[1]) * 100.0 / currentResults.partyVotes
        output += "\n |party percent = %2.2f" % percentage
        output += "\n |party change = -"
        output += "\n}}\n"
        
    #construct totals templates
    output += "{{MMP election box informal vote|"
    output += "\n |votes = " + currentResults.candidateInformals
    output += "\n |party votes = " + currentResults.partyInformals
    output += "\n}}\n"

    output += "{{MMP election box total vote|"
    locale.setlocale(locale.LC_ALL, "")
    output += "\n |votes = " + locale.format("%d", currentResults.candidateVotes, 3)
    output += "\n |party votes = " + locale.format("%d", currentResults.partyVotes, 3)
    output += "\n}}\n"
    open("Electorate-"+electorate+".txt","wb").write (output.encode("utf-8"))
    return

if __name__ == "__main__":
    if len(sys.argv) != 2:
        print ("Usage: python electorate electorate-number")
    else:
        electorate = sys.argv[1]
        url= currentUrl + electorate + ".html"
        currentResults = electorateResults(url)
        previousResults = electorateResults(previousUrl + electorate + ".html")
        printElectorate(currentResults, previousResults, url, electorate)

school.py

[edit]

This is a python script which reads the wikitext from an article such as "List of schools in Southland, New Zealand" saved in the file "input.txt" and outputs boilerplate Wikipedia markup suitable for pasting into "Education" sections in the English and Māori wikipedia articles for the relevant locality or suburb. If there is more than one school in a given locality, the results should be hand-massaged to improve the flow of text. I generally try to add at least one more fact about each school anyway, most commonly the date of its founding.

The layout of the lists used as input is being changed as I complete each one, with one field being dropped and the order of the fields being changed. The newer format is not compatible with this script. The lists for Northland, Taranaki, Marlborough and West Coast are in the newer format.

To run it, use

python school.py > [text file]

This script is older than electorate.py above, but was modified to use {{TKI}} rather than hard-coding the equivalent references.


# -*- coding: latin-1 -*-
"""tki.py

A python program to parse the wiki text of "List of schools in xxx, New Zealand"
articles, and produce boilerplate text from them.
"""

__author__ = "gadfium"
__version__ = "$Revision: 1.1 $"
__date__ = "$Date: 2009/01/10 $"
__copyright__ = "None"
__license__ = "public domain"

import urllib
if __name__ == "__main__":
    from time import sleep
    infile = open("input.txt", "rt")
    wiki = infile.read()
    infile.close()
    lines = wiki.splitlines()
    for line in lines:
        if line.find("| ")==0:
            elements = line[2:].split("||")
            name = elements[0][2:-3]
	    bar = name.find("|")
	    if bar > 0:
		    name = name[bar+1:]
            printline = "==Education==\n" + elements[0] + "is a" 
            if elements[4] == " Coed ":
                printline += " coeducational"
            elif elements[4] == " Girls ":
                   printline += " girls'"
            elif elements[4] == " Boys ":
                   printline += " boys'"
            else:
                printline += elements[4]
            if elements[3] == " 1-6 ":
                printline += " contributing primary (years 1-6)"
            elif elements[3] == " 1-8 ":
                printline += " full primary (years 1-8)"
            elif elements[3] == " 7-8 ":
                printline += " intermediate (years 7-8)"
            elif elements[3] == " 7-15 ":
                printline += " secondary (years 7-15)"
            elif elements[3] == " 9-15 ":
                printline += " secondary (years 9-15)"
            elif elements[3] == " 9-13 ":
                printline += " secondary (years 9-13)"
            elif elements[3] == " 1-15 ":
                printline += " composite (years 1-15)"
            printline += " school with a [[Socio-Economic Decile|decile rating]] of" + elements[7] + "and a roll of" +elements[8]+'.'
            element = elements[2]
            url = element.find("http")
            url = element.find(" ",url)+1
            urlend = element.find("]", url)
	    printline += "<ref>{{TKI|" + element[url:urlend] + "|" + name + "}}</ref>"
            printline += "\n\n==Notes==\n{{reflist}}"
            if elements[1] != " - ":
                printline += "\n\n==External links==\n*"
                printline += elements[1][0:-2] + " " + name + " website]"
            print printline + "\n"

            #and now in Maori. This section relies on some of the variables set up in the previous section.
            town = elements[5]
            bar = town.find("|")  #find any pipe in name, and take only the piped value
            if bar > 0:
                town = town[bar+1:]
            name = "te kura o " + town.strip('[] ')
            printline = "==Ko te kura==\nKo " + name + " he kura "
            first = 1
            last  = 15
            if elements[3] == " 1-6 ":
                printline += "tuatahi "
                last = 6
            elif elements[3] == " 1-8 ":
                printline += "tuatahi "
                last = 8
            elif elements[3] == " 7-8 ":
                printline += "takawaenga "
                first = 7
                last = 8
            elif elements[3] == " 7-15 ":
                printline += "tuarua "
                first = 7
            elif elements[3] == " 9-15 ":
                printline += "tuarua "
                first = 9
            elif elements[3] == " 9-13 ":
                printline += "tuarua "
                first = 9
                last = 13
            elif elements[3] == " 1-15 ":
                printline += "hiato "
            printline += "mō ngā "
            if elements[4] == " Coed ":
                printline += "taitama me ngā kōhine "
            elif elements[4] == " Girls ":
                   printline += "kōhine "
            elif elements[4] == " Boys ":
                   printline += "taitama "
            else:
                printline += "***UNKNOWN GENDER*** "
            printline += "o ngā taumata %(#)d tae noa ki te %(#2)d" % {"#":first, "#2":last}
            printline += ". Ko" + elements[7] + "te whakatauranga ōtekau; ā, e" + elements[8] + " te tokomaha o te rārangi ingoa."
            printline += "<ref>{{TKI|" + element[url:urlend] + "|" + name.replace("te kura o", "Te Kura o") + "}}</ref>"
            printline += "\n\n==Kupu tautoko==\n{{reflist}}"
            print printline + "\n"