youtube-dl/youtube_dl/extractor/iqm2.py

# coding: utf-8
from __future__ import unicode_literals
from ..utils import smuggle_url

import re

from .common import InfoExtractor
from ..compat import compat_urlparse
from .generic import GenericIE

# IQM2 aka Accela is a municipal meeting management platform that
# (among other things) stores livestreamed video from municipal
# meetings.  In some cases (e.g. cambridgema.iqm2.com), after a hefty
# (several-hour) processing time, that video is available in easily
# downloadable form from their web portal, but prior to that, the
# video can only be watched in realtime through JWPlayer. Other
# (somervillecityma.iqm2.com) instances don't seem to ever offer a
# downloadable form. This extractor is designed to download the
# realtime video without the download links being available. For more
# info on Accela, see:
#   http://www.iqm2.com/About/Accela.aspx
#   http://www.accela.com/
#   https://github.com/Accela-Inc/leg-man-api-docs

# This processing makes hard to test since there's only a narrow
# window when it matters. After that the extractor finds links to
# the processed video intead.

# No metadata is retrieved, as that would require finding a metadata
# URL and retreiving a 3rd HTTP resource.

# Contributed by John Hawkinson <jhawk@mit.edu>, 6 Oct 2016.

class IQM2IE(InfoExtractor):

    # We commonly see both iqm2.com and IQM2.com.
    _VALID_URL = r'(?i)https?://(?:\w+\.)?iqm2\.com/Citizens/\w+.aspx\?.*MeetingID=(?P<id>[0-9]+)'
    _TESTS = [
        {
            'url': 'http://somervillecityma.iqm2.com/Citizens/SplitView.aspx?Mode=Video&MeetingID=2308',
            'md5': '9ef458ff6c93f8b9323cf79db4ede9cf',
            'info_dict': {
                'id': '70472_480',
                'ext': 'mp4',
                'title': 'City of Somerville, Massachusetts',
                'uploader': 'somervillecityma.iqm2.com',
            }},
        {
            'url': 'http://cambridgema.iqm2.com/Citizens/SplitView.aspx?Mode=Video&MeetingID=1679#',
            'md5': '478ea30eee1966f7be0d8dd623122148',
            'info_dict': {
                'id': '1563_720',
                'ext': 'mp4',
                'title': 'Cambridge, MA (2)',
                'uploader': 'cambridgema.iqm2.com',
            }},
        {
            'url': 'https://CambridgeMA.IQM2.com/Citizens/VideoMain.aspx?MeetingID=1679',
            'only_matching': True,
        }]

    def _real_extract(self, url):
        video_id = self._match_id(url)
        webpage = self._download_webpage(url, video_id)

        # Simple extractor: take, e.g.
        #   http://cambridgema.iqm2.com/Citizens/SplitView.aspx?Mode=Video&MeetingID=1679
        # and look for
        #   <div id="VideoPanel" class="LeftTopContent">
        #     <div id="VideoPanelInner" ... src="/Citizens/VideoScreen.aspx?MediaID=1563&Frame=SplitView">
        # and feed the canonicalized src element to the generic extractor
        inner_url_rel = self._html_search_regex(
            r'<div id="VideoPanelInner".*src="([^"]+)"',
            webpage, 'url');

        inner_url = compat_urlparse.urljoin(url, inner_url_rel)

        # Generic extractor matches this under the "Broaden the
        # findall a little bit: JWPlayer JS loader" (line 2372 as of 6
        # Oct 2016, dcdb292fddc82ae11f4c0b647815a45c88a6b6d5).
        return self.url_result(smuggle_url(inner_url, {'to_generic': True}),
                               'Generic')
[IQM2] Add new extractor first cut 2016-10-06 01:09:53 -04:00			`# coding: utf-8`
			`from __future__ import unicode_literals`
Set to_generic -> True to suppress fallback msg 2016-10-08 23:11:07 -04:00			`from ..utils import smuggle_url`
[IQM2] Add new extractor first cut 2016-10-06 01:09:53 -04:00
			`import re`

			`from .common import InfoExtractor`
Handle relative URLs with urlparse.urljoin() 2016-10-06 01:38:25 -04:00			`from ..compat import compat_urlparse`
[IQM2] Add new extractor first cut 2016-10-06 01:09:53 -04:00			`from .generic import GenericIE`

			`# IQM2 aka Accela is a municipal meeting management platform that`
			`# (among other things) stores livestreamed video from municipal`
Some instances don't have downloadable video E.g. somervillecityma.iqm2.com only has the JWPlayer video. Makes a better test case, so add it as the first. 2016-10-08 21:01:43 -04:00			`# meetings. In some cases (e.g. cambridgema.iqm2.com), after a hefty`
			`# (several-hour) processing time, that video is available in easily`
			`# downloadable form from their web portal, but prior to that, the`
			`# video can only be watched in realtime through JWPlayer. Other`
			`# (somervillecityma.iqm2.com) instances don't seem to ever offer a`
			`# downloadable form. This extractor is designed to download the`
			`# realtime video without the download links being available. For more`
			`# info on Accela, see:`
[IQM2] Add new extractor first cut 2016-10-06 01:09:53 -04:00			`# http://www.iqm2.com/About/Accela.aspx`
			`# http://www.accela.com/`
Some instances don't have downloadable video E.g. somervillecityma.iqm2.com only has the JWPlayer video. Makes a better test case, so add it as the first. 2016-10-08 21:01:43 -04:00			`# https://github.com/Accela-Inc/leg-man-api-docs`
[IQM2] Add new extractor first cut 2016-10-06 01:09:53 -04:00
Condense comments, distribute 2016-10-08 19:38:28 -04:00			`# This processing makes hard to test since there's only a narrow`
			`# window when it matters. After that the extractor finds links to`
			`# the processed video intead.`
[IQM2] Add new extractor first cut 2016-10-06 01:09:53 -04:00
Condense comments, distribute 2016-10-08 19:38:28 -04:00			`# No metadata is retrieved, as that would require finding a metadata`
			`# URL and retreiving a 3rd HTTP resource.`
Case-insensitive URL match 2016-10-06 01:29:58 -04:00
[IQM2] Add new extractor first cut 2016-10-06 01:09:53 -04:00			`# Contributed by John Hawkinson <jhawk@mit.edu>, 6 Oct 2016.`

			`class IQM2IE(InfoExtractor):`
Case-insensitive URL match 2016-10-06 01:29:58 -04:00
Use (?i) for case-insensitivity in URLs 2016-10-08 19:21:33 -04:00			`# We commonly see both iqm2.com and IQM2.com.`
			`_VALID_URL = r'(?i)https?://(?:\w+\.)?iqm2\.com/Citizens/\w+.aspx\?.*MeetingID=(?P<id>[0-9]+)'`
re-fill _TESTS (whitespace) 2016-10-10 01:57:56 -04:00			`_TESTS = [`
			`{`
Some instances don't have downloadable video E.g. somervillecityma.iqm2.com only has the JWPlayer video. Makes a better test case, so add it as the first. 2016-10-08 21:01:43 -04:00			`'url': 'http://somervillecityma.iqm2.com/Citizens/SplitView.aspx?Mode=Video&MeetingID=2308',`
re-fill _TESTS (whitespace) 2016-10-10 01:57:56 -04:00			`'md5': '9ef458ff6c93f8b9323cf79db4ede9cf',`
			`'info_dict': {`
			`'id': '70472_480',`
			`'ext': 'mp4',`
			`'title': 'City of Somerville, Massachusetts',`
			`'uploader': 'somervillecityma.iqm2.com',`
			`}},`
			`{`
			`'url': 'http://cambridgema.iqm2.com/Citizens/SplitView.aspx?Mode=Video&MeetingID=1679#',`
			`'md5': '478ea30eee1966f7be0d8dd623122148',`
			`'info_dict': {`
			`'id': '1563_720',`
			`'ext': 'mp4',`
			`'title': 'Cambridge, MA (2)',`
			`'uploader': 'cambridgema.iqm2.com',`
			`}},`
			`{`
Move test cases from comment to _TESTS 2016-10-08 19:17:32 -04:00			`'url': 'https://CambridgeMA.IQM2.com/Citizens/VideoMain.aspx?MeetingID=1679',`
			`'only_matching': True,`
			`}]`
[IQM2] Add new extractor first cut 2016-10-06 01:09:53 -04:00
			`def _real_extract(self, url):`
Use _match_id() instead of re.match() Oops, when I created this extractor I copied the sample code from the 2014 manpage on my system, thus missing 4bc77c8417ca0340d09dcebb311d06aa7d5ba0ac's introduction of the _match_id() helper function. 2016-10-09 08:00:15 -04:00			`video_id = self._match_id(url)`
[IQM2] Add new extractor first cut 2016-10-06 01:09:53 -04:00			`webpage = self._download_webpage(url, video_id)`

Condense comments, distribute 2016-10-08 19:38:28 -04:00			`# Simple extractor: take, e.g.`
			`# http://cambridgema.iqm2.com/Citizens/SplitView.aspx?Mode=Video&MeetingID=1679`
			`# and look for`
			`# <div id="VideoPanel" class="LeftTopContent">`
			`# <div id="VideoPanelInner" ... src="/Citizens/VideoScreen.aspx?MediaID=1563&Frame=SplitView">`
			`# and feed the canonicalized src element to the generic extractor`
Handle relative URLs with urlparse.urljoin() 2016-10-06 01:38:25 -04:00			`inner_url_rel = self._html_search_regex(`
			`r'<div id="VideoPanelInner".*src="([^"]+)"',`
			`webpage, 'url');`

			`inner_url = compat_urlparse.urljoin(url, inner_url_rel)`
debugging print -> self.to_screen() 2016-10-08 22:46:09 -04:00
Use url_result instead of instance of GenericIE() 2016-10-08 19:31:50 -04:00			`# Generic extractor matches this under the "Broaden the`
			`# findall a little bit: JWPlayer JS loader" (line 2372 as of 6`
			`# Oct 2016, dcdb292fddc82ae11f4c0b647815a45c88a6b6d5).`
Set to_generic -> True to suppress fallback msg 2016-10-08 23:11:07 -04:00			`return self.url_result(smuggle_url(inner_url, {'to_generic': True}),`
			`'Generic')`