wm_metrics package¶
wm_metrics - A set of metrics tools for Wikimedia program leaders.
Main modules¶
fdc module¶
analyse_commons_dump module¶
Analysing a Commons collection to retrieve fancy statistics.
-
class
wm_metrics.analyse_commons_dump.CommonsPage(title=None, revisions=None)¶ Bases:
objectRepresent a page.
-
get_top_revision()¶ Return the most recent CommonsRevision.
We assume the revisions list is ordered by time (which is the case when initialized with a dump)
-
-
class
wm_metrics.analyse_commons_dump.CommonsRevision(timestamp=None, username=None, wikitext=None)¶ Bases:
objectRepresentation of a Revision (timestamp + username + wikitext).
-
get_categories()¶ Return the categories in the given revision.
-
is_valued_image()¶ Return whether the given revision is a Valued Image.
-
-
class
wm_metrics.analyse_commons_dump.DumpMediaCollection¶ Bases:
dictRepresentation of a MediaCollection, dump style.
-
categorisation_report()¶ Return a text categorisation report.
Iterate over the pages of the media collection, get the top revision, and collects the categories in two Counters - one indexed by category and the other one by file.
-
get_differential(start_date, end_date)¶ Return a difference between two dates.
-
get_initial_state()¶ Return a Collection in its initial state.
-
get_state(target_datetime)¶ Return a Collection at the time given.
-
get_valued_images()¶ Return a list of valued images in the collection.
-
init_from_xml_dump(xml_dump)¶ Initialise the object using an XML dump.
-
simple_all_time_report()¶ Return an activity text report since the beginning to now.
This report on the number of edits, editors and files touched between two given dates.
-
simple_diff_report(start_date, end_date)¶ Return an activity text report in a given timeframe.
This report on the number of edits, editors and files touched between two given dates.
-
-
wm_metrics.analyse_commons_dump.get_categories_from_text(edit)¶ Return the categories contained in a given wikitext.
-
wm_metrics.analyse_commons_dump.handle_node(node, tag_name)¶ Return the contents of a tag based on his given name inside of a given node.
-
wm_metrics.analyse_commons_dump.main()¶
-
wm_metrics.analyse_commons_dump.parse_xml_dump(xml_dump)¶ Return a dictionary from the given dump.
A dictionary structured as follow: {page_id => { CommonsPage(title => “Some title”git
revisions => [CommonsRevision, ...] } }
-
wm_metrics.analyse_commons_dump.timestamp_to_date(date)¶ Return a datetime object representing the given MediaWiki timestamp.
cat2cohort module¶
Export a Wiki category into a cohort.
The aim of this script is to allow program leaders to export a category filled with User pages into a WikiMetrics cohort CSV file in order to perform their evaluation analysis.
- Test:
- python cat2cohort.py -l fr -c “Utilisateur participant au projet Afripédia”
-
wm_metrics.cat2cohort.api_url(lang)¶ Return the URL of the API based on the language of Wikipedia.
-
wm_metrics.cat2cohort.cat_to_cohort(language, category)¶ Return the CSV cohort from the given category and language.
-
wm_metrics.cat2cohort.list_users(mw, category, lang)¶ List users from a wiki category and print lines of the cohort CSV.
-
wm_metrics.cat2cohort.main()¶ Main function of the script cat2cohort.
categorisation_statistics module¶
Categorisation statistics.
-
wm_metrics.categorisation_statistics.make_categorisation_report(all_categories, categories_count_per_file)¶ Compute statistics on the categorisation.
Return a text report on the categorisation.
commons_cat_metrics module¶
Metrics for FDC on an image category of Wikimedia Commons.
-
class
wm_metrics.commons_cat_metrics.CommonsCatMetrics(category, period, cursor=None)¶ Bases:
objectWrapper class for the Category Metrics
-
close()¶ Close the MariaDB connection.
-
get_global_usage(main=False)¶ Get global usage metrics (total usages, nb of images used, nb of wiki) of files in categories.
Parameters: main (boolean) – whether we only count for main namespaces.
-
get_nb_featured_files()¶ Amount of files that are either FP, VI or QI on Wikimedia Commons.
-
get_nb_files()¶ Amount of files uploaded on the period.
-
get_nb_files_alltime()¶ Returns nb of files in category.
-
get_nb_uploaders()¶ Amount of uploaders on the period.
-
get_pixel_count()¶
-
make_report()¶ Return a text report with all metrics.
-
mw_util module¶
- mw_util.py
Set of helper functions while dealing with MediaWiki.
- str2cat
- Adds prefix Category if string doesn’t have it.
-
wm_metrics.mw_util.str2cat(category)¶ Return a category name starting with Category.
wmflabs_queries module¶
wmflabs_queries.py regroups query builder functions in order to generate queries for wmflabs databases.
-
wm_metrics.wmflabs_queries.count_featured_files_in_category()¶ Count featured pictures in the category uploaded between timestamp t1 and t2.
-
wm_metrics.wmflabs_queries.count_files_in_category()¶ List all files in category uploaded between timestamp t1 and t2
-
wm_metrics.wmflabs_queries.count_files_in_category_alltime()¶ Count files in the category (without limit on upload date) at the time of the query.
-
wm_metrics.wmflabs_queries.count_uploaders_in_category()¶ Count distinct users that have uploaded a files that belongs to category between timestamp t1 and t2
-
wm_metrics.wmflabs_queries.global_usage_count(main=False)¶ Returns global usage query
Parameters: - category (str) – category name
- main (bool) – optional in order to account only for file used in main namespaces
-
wm_metrics.wmflabs_queries.list_files_in_category(category, t1, t2)¶ List all files in category uploaded between timestamp t1 and t2
-
wm_metrics.wmflabs_queries.pixel_count()¶
External services¶
glamorous module¶
A Glamorous parser to retieve file usage among the wikimedia projects.
-
class
wm_metrics.glamorous.GlamorousParser(category)¶ Bases:
HTMLParser.HTMLParser,objectHTML parser glamorous
-
handle_data(data)¶ Parse data inside an HTML tag.
-
handle_endtag(tag)¶ Parse end of an HTML tag.
-
handle_starttag(tag, attrs)¶ Parse start of an HTML tag.
-
statistics()¶ Print GLAMorous statistics for the category.
-
-
wm_metrics.glamorous.main()¶ Main function of the script glamorous.py.
mw_api module¶
mw_api.py is a simple client to MediaWiki API.
-
class
wm_metrics.mw_api.MwApi(action, properties=None, format='json')¶ Bases:
objectAccess to API
-
class
wm_metrics.mw_api.MwApiQuery(properties=None, format='json')¶ Bases:
wm_metrics.mw_api.MwApiQuery actions to the API.
-
exception
wm_metrics.mw_api.MwQueryError(value)¶ Bases:
exceptions.ExceptionException raised when the client encounters a problem.
-
class
wm_metrics.mw_api.MwWiki(url_api='https://commons.wikimedia.org/w/api.php')¶ Bases:
objectWiki API
-
process_prop_query(request, titles)¶ Quick and dirty prop query support.
-
process_prop_query_results(url_req, results)¶ Process the result of a prop query.
-
process_query(request, previous_result=None)¶ Quick and dirty continue support for list query.
-