release 2018.03.20

[ChangeLog] Actualize
[ci skip]
2026-06-12 15:40:15 +00:00 · 2018-03-20 01:55:48 +07:00 · 2018-03-20 01:54:35 +07:00 · 2018-03-20 01:40:53 +07:00 · 2018-03-20 01:08:03 +07:00 · 2018-03-20 00:27:39 +07:00
17 changed files with 448 additions and 194 deletions
@@ -6,8 +6,8 @@

 ---

-### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2018.03.14*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2018.03.14**
+### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2018.03.20*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
+- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2018.03.20**

 ### Before submitting an *issue* make sure you have:
 - [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@@ -36,7 +36,7 @@ Add the `-v` flag to **your command line** you run youtube-dl with (`youtube-dl
 [debug] User config: []
 [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
-[debug] youtube-dl version 2018.03.14
+[debug] youtube-dl version 2018.03.20
 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
 [debug] Proxy map: {}
@@ -1,3 +1,25 @@
+version 2018.03.20
+
+Core
+* [extractor/common] Improve thumbnail extraction for HTML5 entries
+* Generalize XML manifest processing code and improve XSPF parsing
+ [extractor/common] Add _download_xml_handle
+ [extractor/common] Add support for relative URIs in _parse_xspf (#15794)
+
+Extractors
+ [7plus] Extract series metadata (#15862, #15906)
+* [9now] Bypass geo restriction (#15920)
+* [cbs] Skip unavailable assets (#13490, #13506, #15776)
+ [canalc2] Add support for HTML5 videos (#15916, #15919)
+ [ceskatelevize] Add support for iframe embeds (#15918)
+ [prosiebensat1] Add support for galileo.tv (#15894)
+ [generic] Add support for xfileshare embeds (#15879)
+* [bilibili] Switch to v2 playurl API
+* [bilibili] Fix and improve extraction (#15048, #15430, #15622, #15863)
+* [heise] Improve extraction (#15496, #15784, #15026)
+* [instagram] Fix user videos extraction (#15858)
+
+
 version 2018.03.14

 Extractors
@@ -694,6 +694,55 @@ jwplayer("mediaplayer").setup({"abouttext":"Visit Indie DB","aboutlink":"http:\/
                self.ie._sort_formats(formats)
                expect_value(self, formats, expected_formats, None)

+    def test_parse_xspf(self):
+        _TEST_CASES = [
+            (
+                'foo_xspf',
+                'https://example.org/src/foo_xspf.xspf',
+                [{
+                    'id': 'foo_xspf',
+                    'title': 'Pandemonium',
+                    'description': 'Visit http://bigbrother404.bandcamp.com',
+                    'duration': 202.416,
+                    'formats': [{
+                        'manifest_url': 'https://example.org/src/foo_xspf.xspf',
+                        'url': 'https://example.org/src/cd1/track%201.mp3',
+                    }],
+                }, {
+                    'id': 'foo_xspf',
+                    'title': 'Final Cartridge (Nichico Twelve Remix)',
+                    'description': 'Visit http://bigbrother404.bandcamp.com',
+                    'duration': 255.857,
+                    'formats': [{
+                        'manifest_url': 'https://example.org/src/foo_xspf.xspf',
+                        'url': 'https://example.org/%E3%83%88%E3%83%A9%E3%83%83%E3%82%AF%E3%80%80%EF%BC%92.mp3',
+                    }],
+                }, {
+                    'id': 'foo_xspf',
+                    'title': 'Rebuilding Nightingale',
+                    'description': 'Visit http://bigbrother404.bandcamp.com',
+                    'duration': 287.915,
+                    'formats': [{
+                        'manifest_url': 'https://example.org/src/foo_xspf.xspf',
+                        'url': 'https://example.org/src/track3.mp3',
+                    }, {
+                        'manifest_url': 'https://example.org/src/foo_xspf.xspf',
+                        'url': 'https://example.com/track3.mp3',
+                    }]
+                }]
+            ),
+        ]
+
+        for xspf_file, xspf_url, expected_entries in _TEST_CASES:
+            with io.open('./test/testdata/xspf/%s.xspf' % xspf_file,
+                         mode='r', encoding='utf-8') as f:
+                entries = self.ie._parse_xspf(
+                    compat_etree_fromstring(f.read().encode('utf-8')),
+                    xspf_file, xspf_url=xspf_url, xspf_base_url=xspf_url)
+                expect_value(self, entries, expected_entries, None)
+                for i in range(len(entries)):
+                    expect_dict(self, entries[i], expected_entries[i])
+

 if __name__ == '__main__':
    unittest.main()
@@ -0,0 +1,34 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<playlist version="1" xmlns="http://xspf.org/ns/0/">
+    <date>2018-03-09T18:01:43Z</date>
+    <trackList>
+        <track>
+            <location>cd1/track%201.mp3</location>
+            <title>Pandemonium</title>
+            <creator>Foilverb</creator>
+            <annotation>Visit http://bigbrother404.bandcamp.com</annotation>
+            <album>Pandemonium EP</album>
+            <trackNum>1</trackNum>
+            <duration>202416</duration>
+        </track>
+        <track>
+            <location>../%E3%83%88%E3%83%A9%E3%83%83%E3%82%AF%E3%80%80%EF%BC%92.mp3</location>
+            <title>Final Cartridge (Nichico Twelve Remix)</title>
+            <annotation>Visit http://bigbrother404.bandcamp.com</annotation>
+            <creator>Foilverb</creator>
+            <album>Pandemonium EP</album>
+            <trackNum>2</trackNum>
+            <duration>255857</duration>
+        </track>
+        <track>
+            <location>track3.mp3</location>
+            <location>https://example.com/track3.mp3</location>
+            <title>Rebuilding Nightingale</title>
+            <annotation>Visit http://bigbrother404.bandcamp.com</annotation>
+            <creator>Foilverb</creator>
+            <album>Pandemonium EP</album>
+            <trackNum>3</trackNum>
+            <duration>287915</duration>
+        </track>
+    </trackList>
+</playlist>
@@ -27,14 +27,14 @@ class BiliBiliIE(InfoExtractor):

    _TESTS = [{
        'url': 'http://www.bilibili.tv/video/av1074402/',
-        'md5': '9fa226fe2b8a9a4d5a69b4c6a183417e',
+        'md5': '5f7d29e1a2872f3df0cf76b1f87d3788',
        'info_dict': {
            'id': '1074402',
-            'ext': 'mp4',
+            'ext': 'flv',
            'title': '【金坷垃】金泡沫',
            'description': 'md5:ce18c2a2d2193f0df2917d270f2e5923',
-            'duration': 308.315,
-            'timestamp': 1398012660,
+            'duration': 308.067,
+            'timestamp': 1398012678,
            'upload_date': '20140420',
            'thumbnail': r're:^https?://.+\.jpg',
            'uploader': '菊子桑',
@@ -59,17 +59,38 @@ class BiliBiliIE(InfoExtractor):
        'url': 'http://www.bilibili.com/video/av8903802/',
        'info_dict': {
            'id': '8903802',
-            'ext': 'mp4',
            'title': '阿滴英文｜英文歌分享#6 "Closer',
            'description': '滴妹今天唱Closer給你聽! 有史以来，被推最多次也是最久的歌曲，其实歌词跟我原本想像差蛮多的，不过还是好听！ 微博@阿滴英文',
-            'uploader': '阿滴英文',
-            'uploader_id': '65880958',
-            'timestamp': 1488382620,
-            'upload_date': '20170301',
-        },
-        'params': {
-            'skip_download': True,  # Test metadata only
        },
+        'playlist': [{
+            'info_dict': {
+                'id': '8903802_part1',
+                'ext': 'flv',
+                'title': '阿滴英文｜英文歌分享#6 "Closer',
+                'description': 'md5:3b1b9e25b78da4ef87e9b548b88ee76a',
+                'uploader': '阿滴英文',
+                'uploader_id': '65880958',
+                'timestamp': 1488382634,
+                'upload_date': '20170301',
+            },
+            'params': {
+                'skip_download': True,  # Test metadata only
+            },
+        }, {
+            'info_dict': {
+                'id': '8903802_part2',
+                'ext': 'flv',
+                'title': '阿滴英文｜英文歌分享#6 "Closer',
+                'description': 'md5:3b1b9e25b78da4ef87e9b548b88ee76a',
+                'uploader': '阿滴英文',
+                'uploader_id': '65880958',
+                'timestamp': 1488382634,
+                'upload_date': '20170301',
+            },
+            'params': {
+                'skip_download': True,  # Test metadata only
+            },
+        }]
    }]

    _APP_KEY = '84956560bc028eb7'
@@ -92,9 +113,13 @@ class BiliBiliIE(InfoExtractor):
        webpage = self._download_webpage(url, video_id)

        if 'anime/' not in url:
-            cid = compat_parse_qs(self._search_regex(
-                [r'EmbedPlayer\([^)]+,\s*"([^"]+)"\)',
-                 r'<iframe[^>]+src="https://secure\.bilibili\.com/secure,([^"]+)"'],
+            cid = self._search_regex(
+                r'cid(?:["\']:|=)(\d+)', webpage, 'cid',
+                default=None
+            ) or compat_parse_qs(self._search_regex(
+                [r'1EmbedPlayer\([^)]+,\s*"([^"]+)"\)',
+                 r'1EmbedPlayer\([^)]+,\s*\\"([^"]+)\\"\)',
+                 r'1<iframe[^>]+src="https://secure\.bilibili\.com/secure,([^"]+)"'],
                webpage, 'player parameters'))['cid'][0]
        else:
            if 'no_bangumi_tip' not in smuggled_data:
@@ -114,53 +139,66 @@ class BiliBiliIE(InfoExtractor):
                self._report_error(js)
            cid = js['result']['cid']

-        payload = 'appkey=%s&cid=%s&otype=json&quality=2&type=mp4' % (self._APP_KEY, cid)
-        sign = hashlib.md5((payload + self._BILIBILI_KEY).encode('utf-8')).hexdigest()
-
        headers = {
            'Referer': url
        }
        headers.update(self.geo_verification_headers())

-        video_info = self._download_json(
-            'http://interface.bilibili.com/playurl?%s&sign=%s' % (payload, sign),
-            video_id, note='Downloading video info page',
-            headers=headers)
-
-        if 'durl' not in video_info:
-            self._report_error(video_info)
-
        entries = []

-        for idx, durl in enumerate(video_info['durl']):
-            formats = [{
-                'url': durl['url'],
-                'filesize': int_or_none(durl['size']),
-            }]
-            for backup_url in durl.get('backup_url', []):
-                formats.append({
-                    'url': backup_url,
-                    # backup URLs have lower priorities
-                    'preference': -2 if 'hd.mp4' in backup_url else -3,
+        RENDITIONS = ('qn=80&quality=80&type=', 'quality=2&type=mp4')
+        for num, rendition in enumerate(RENDITIONS, start=1):
+            payload = 'appkey=%s&cid=%s&otype=json&%s' % (self._APP_KEY, cid, rendition)
+            sign = hashlib.md5((payload + self._BILIBILI_KEY).encode('utf-8')).hexdigest()
+
+            video_info = self._download_json(
+                'http://interface.bilibili.com/v2/playurl?%s&sign=%s' % (payload, sign),
+                video_id, note='Downloading video info page',
+                headers=headers, fatal=num == len(RENDITIONS))
+
+            if not video_info:
+                continue
+
+            if 'durl' not in video_info:
+                if num < len(RENDITIONS):
+                    continue
+                self._report_error(video_info)
+
+            for idx, durl in enumerate(video_info['durl']):
+                formats = [{
+                    'url': durl['url'],
+                    'filesize': int_or_none(durl['size']),
+                }]
+                for backup_url in durl.get('backup_url', []):
+                    formats.append({
+                        'url': backup_url,
+                        # backup URLs have lower priorities
+                        'preference': -2 if 'hd.mp4' in backup_url else -3,
+                    })
+
+                for a_format in formats:
+                    a_format.setdefault('http_headers', {}).update({
+                        'Referer': url,
+                    })
+
+                self._sort_formats(formats)
+
+                entries.append({
+                    'id': '%s_part%s' % (video_id, idx),
+                    'duration': float_or_none(durl.get('length'), 1000),
+                    'formats': formats,
                })
+            break

-            for a_format in formats:
-                a_format.setdefault('http_headers', {}).update({
-                    'Referer': url,
-                })
-
-            self._sort_formats(formats)
-
-            entries.append({
-                'id': '%s_part%s' % (video_id, idx),
-                'duration': float_or_none(durl.get('length'), 1000),
-                'formats': formats,
-            })
-
-        title = self._html_search_regex('<h1[^>]*>([^<]+)</h1>', webpage, 'title')
+        title = self._html_search_regex(
+            ('<h1[^>]+\btitle=(["\'])(?P<title>(?:(?!\1).)+)\1',
+             '(?s)<h1[^>]*>(?P<title>.+?)</h1>'), webpage, 'title',
+            group='title')
        description = self._html_search_meta('description', webpage)
        timestamp = unified_timestamp(self._html_search_regex(
-            r'<time[^>]+datetime="([^"]+)"', webpage, 'upload time', default=None))
+            r'<time[^>]+datetime="([^"]+)"', webpage, 'upload time',
+            default=None) or self._html_search_meta(
+            'uploadDate', webpage, 'timestamp', default=None))
        thumbnail = self._html_search_meta(['og:image', 'thumbnailUrl'], webpage)

        # TODO 'view_count' requires deobfuscating Javascript
@@ -174,13 +212,16 @@ class BiliBiliIE(InfoExtractor):
        }

        uploader_mobj = re.search(
-            r'<a[^>]+href="(?:https?:)?//space\.bilibili\.com/(?P<id>\d+)"[^>]+title="(?P<name>[^"]+)"',
+            r'<a[^>]+href="(?:https?:)?//space\.bilibili\.com/(?P<id>\d+)"[^>]*>(?P<name>[^<]+)',
            webpage)
        if uploader_mobj:
            info.update({
                'uploader': uploader_mobj.group('name'),
                'uploader_id': uploader_mobj.group('id'),
            })
+        if not info.get('uploader'):
+            info['uploader'] = self._html_search_meta(
+                'author', webpage, 'uploader', default=None)

        for entry in entries:
            entry.update(info)
@@ -31,6 +31,10 @@ class Canalc2IE(InfoExtractor):
        webpage = self._download_webpage(
            'http://www.canalc2.tv/video/%s' % video_id, video_id)

+        title = self._html_search_regex(
+            r'(?s)class="[^"]*col_description[^"]*">.*?<h3>(.+?)</h3>',
+            webpage, 'title')
+
        formats = []
        for _, video_url in re.findall(r'file\s*=\s*(["\'])(.+?)\1', webpage):
            if video_url.startswith('rtmp://'):
@@ -49,17 +53,21 @@ class Canalc2IE(InfoExtractor):
                    'url': video_url,
                    'format_id': 'http',
                })
-        self._sort_formats(formats)

-        title = self._html_search_regex(
-            r'(?s)class="[^"]*col_description[^"]*">.*?<h3>(.*?)</h3>', webpage, 'title')
-        duration = parse_duration(self._search_regex(
-            r'id=["\']video_duree["\'][^>]*>([^<]+)',
-            webpage, 'duration', fatal=False))
+        if formats:
+            info = {
+                'formats': formats,
+            }
+        else:
+            info = self._parse_html5_media_entries(url, webpage, url)[0]

-        return {
+        self._sort_formats(info['formats'])
+
+        info.update({
            'id': video_id,
            'title': title,
-            'duration': duration,
-            'formats': formats,
-        }
+            'duration': parse_duration(self._search_regex(
+                r'id=["\']video_duree["\'][^>]*>([^<]+)',
+                webpage, 'duration', fatal=False)),
+        })
+        return info
@@ -2,6 +2,7 @@ from __future__ import unicode_literals

 from .theplatform import ThePlatformFeedIE
 from ..utils import (
+    ExtractorError,
    int_or_none,
    find_xpath_attr,
    xpath_element,
@@ -61,6 +62,7 @@ class CBSIE(CBSBaseIE):
        asset_types = []
        subtitles = {}
        formats = []
+        last_e = None
        for item in items_data.findall('.//item'):
            asset_type = xpath_text(item, 'assetType')
            if not asset_type or asset_type in asset_types:
@@ -74,11 +76,17 @@ class CBSIE(CBSBaseIE):
                query['formats'] = 'MPEG4,M3U'
            elif asset_type in ('RTMP', 'WIFI', '3G'):
                query['formats'] = 'MPEG4,FLV'
-            tp_formats, tp_subtitles = self._extract_theplatform_smil(
-                update_url_query(tp_release_url, query), content_id,
-                'Downloading %s SMIL data' % asset_type)
+            try:
+                tp_formats, tp_subtitles = self._extract_theplatform_smil(
+                    update_url_query(tp_release_url, query), content_id,
+                    'Downloading %s SMIL data' % asset_type)
+            except ExtractorError as e:
+                last_e = e
+                continue
            formats.extend(tp_formats)
            subtitles = self._merge_subtitles(subtitles, tp_subtitles)
+        if last_e and not formats:
+            raise last_e
        self._sort_formats(formats)

        info = self._extract_theplatform_metadata(tp_path, content_id)
@@ -13,6 +13,7 @@ from ..utils import (
    float_or_none,
    sanitized_Request,
    unescapeHTML,
+    update_url_query,
    urlencode_postdata,
    USER_AGENTS,
 )
@@ -265,6 +266,10 @@ class CeskaTelevizePoradyIE(InfoExtractor):
            # m3u8 download
            'skip_download': True,
        },
+    }, {
+        # iframe embed
+        'url': 'http://www.ceskatelevize.cz/porady/10614999031-neviditelni/21251212048/',
+        'only_matching': True,
    }]

    def _real_extract(self, url):
@@ -272,8 +277,11 @@ class CeskaTelevizePoradyIE(InfoExtractor):

        webpage = self._download_webpage(url, video_id)

-        data_url = unescapeHTML(self._search_regex(
-            r'<span[^>]*\bdata-url=(["\'])(?P<url>(?:(?!\1).)+)\1',
-            webpage, 'iframe player url', group='url'))
+        data_url = update_url_query(unescapeHTML(self._search_regex(
+            (r'<span[^>]*\bdata-url=(["\'])(?P<url>(?:(?!\1).)+)\1',
+             r'<iframe[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//(?:www\.)?ceskatelevize\.cz/ivysilani/embed/iFramePlayer\.php.*?)\1'),
+            webpage, 'iframe player url', group='url')), query={
+                'autoStart': 'true',
+        })

        return self.url_result(data_url, ie=CeskaTelevizeIE.ie_key())
@@ -644,19 +644,31 @@ class InfoExtractor(object):
            content, _ = res
            return content

+    def _download_xml_handle(
+            self, url_or_request, video_id, note='Downloading XML',
+            errnote='Unable to download XML', transform_source=None,
+            fatal=True, encoding=None, data=None, headers={}, query={}):
+        """Return a tuple (xml as an xml.etree.ElementTree.Element, URL handle)"""
+        res = self._download_webpage_handle(
+            url_or_request, video_id, note, errnote, fatal=fatal,
+            encoding=encoding, data=data, headers=headers, query=query)
+        if res is False:
+            return res
+        xml_string, urlh = res
+        return self._parse_xml(
+            xml_string, video_id, transform_source=transform_source,
+            fatal=fatal), urlh
+
    def _download_xml(self, url_or_request, video_id,
                      note='Downloading XML', errnote='Unable to download XML',
                      transform_source=None, fatal=True, encoding=None,
                      data=None, headers={}, query={}):
        """Return the xml as an xml.etree.ElementTree.Element"""
-        xml_string = self._download_webpage(
-            url_or_request, video_id, note, errnote, fatal=fatal,
-            encoding=encoding, data=data, headers=headers, query=query)
-        if xml_string is False:
-            return xml_string
-        return self._parse_xml(
-            xml_string, video_id, transform_source=transform_source,
-            fatal=fatal)
+        res = self._download_xml_handle(
+            url_or_request, video_id, note=note, errnote=errnote,
+            transform_source=transform_source, fatal=fatal, encoding=encoding,
+            data=data, headers=headers, query=query)
+        return res if res is False else res[0]

    def _parse_xml(self, xml_string, video_id, transform_source=None, fatal=True):
        if transform_source:
@@ -1694,22 +1706,24 @@ class InfoExtractor(object):
            })
        return subtitles

-    def _extract_xspf_playlist(self, playlist_url, playlist_id, fatal=True):
+    def _extract_xspf_playlist(self, xspf_url, playlist_id, fatal=True):
        xspf = self._download_xml(
-            playlist_url, playlist_id, 'Downloading xpsf playlist',
+            xspf_url, playlist_id, 'Downloading xpsf playlist',
            'Unable to download xspf manifest', fatal=fatal)
        if xspf is False:
            return []
-        return self._parse_xspf(xspf, playlist_id)
+        return self._parse_xspf(
+            xspf, playlist_id, xspf_url=xspf_url,
+            xspf_base_url=base_url(xspf_url))

-    def _parse_xspf(self, playlist, playlist_id):
+    def _parse_xspf(self, xspf_doc, playlist_id, xspf_url=None, xspf_base_url=None):
        NS_MAP = {
            'xspf': 'http://xspf.org/ns/0/',
            's1': 'http://static.streamone.nl/player/ns/0',
        }

        entries = []
-        for track in playlist.findall(xpath_with_ns('./xspf:trackList/xspf:track', NS_MAP)):
+        for track in xspf_doc.findall(xpath_with_ns('./xspf:trackList/xspf:track', NS_MAP)):
            title = xpath_text(
                track, xpath_with_ns('./xspf:title', NS_MAP), 'title', default=playlist_id)
            description = xpath_text(
@@ -1719,12 +1733,18 @@ class InfoExtractor(object):
            duration = float_or_none(
                xpath_text(track, xpath_with_ns('./xspf:duration', NS_MAP), 'duration'), 1000)

-            formats = [{
-                'url': location.text,
-                'format_id': location.get(xpath_with_ns('s1:label', NS_MAP)),
-                'width': int_or_none(location.get(xpath_with_ns('s1:width', NS_MAP))),
-                'height': int_or_none(location.get(xpath_with_ns('s1:height', NS_MAP))),
-            } for location in track.findall(xpath_with_ns('./xspf:location', NS_MAP))]
+            formats = []
+            for location in track.findall(xpath_with_ns('./xspf:location', NS_MAP)):
+                format_url = urljoin(xspf_base_url, location.text)
+                if not format_url:
+                    continue
+                formats.append({
+                    'url': format_url,
+                    'manifest_url': xspf_url,
+                    'format_id': location.get(xpath_with_ns('s1:label', NS_MAP)),
+                    'width': int_or_none(location.get(xpath_with_ns('s1:width', NS_MAP))),
+                    'height': int_or_none(location.get(xpath_with_ns('s1:height', NS_MAP))),
+                })
            self._sort_formats(formats)

            entries.append({
@@ -1738,18 +1758,18 @@ class InfoExtractor(object):
        return entries

    def _extract_mpd_formats(self, mpd_url, video_id, mpd_id=None, note=None, errnote=None, fatal=True, formats_dict={}):
-        res = self._download_webpage_handle(
+        res = self._download_xml_handle(
            mpd_url, video_id,
            note=note or 'Downloading MPD manifest',
            errnote=errnote or 'Failed to download MPD manifest',
            fatal=fatal)
        if res is False:
            return []
-        mpd, urlh = res
+        mpd_doc, urlh = res
        mpd_base_url = base_url(urlh.geturl())

        return self._parse_mpd_formats(
-            compat_etree_fromstring(mpd.encode('utf-8')), mpd_id, mpd_base_url,
+            mpd_doc, mpd_id=mpd_id, mpd_base_url=mpd_base_url,
            formats_dict=formats_dict, mpd_url=mpd_url)

    def _parse_mpd_formats(self, mpd_doc, mpd_id=None, mpd_base_url='', formats_dict={}, mpd_url=None):
@@ -2023,17 +2043,16 @@ class InfoExtractor(object):
        return formats

    def _extract_ism_formats(self, ism_url, video_id, ism_id=None, note=None, errnote=None, fatal=True):
-        res = self._download_webpage_handle(
+        res = self._download_xml_handle(
            ism_url, video_id,
            note=note or 'Downloading ISM manifest',
            errnote=errnote or 'Failed to download ISM manifest',
            fatal=fatal)
        if res is False:
            return []
-        ism, urlh = res
+        ism_doc, urlh = res

-        return self._parse_ism_formats(
-            compat_etree_fromstring(ism.encode('utf-8')), urlh.geturl(), ism_id)
+        return self._parse_ism_formats(ism_doc, urlh.geturl(), ism_id)

    def _parse_ism_formats(self, ism_doc, ism_url, ism_id=None):
        """
@@ -2131,8 +2150,8 @@ class InfoExtractor(object):
        return formats

    def _parse_html5_media_entries(self, base_url, webpage, video_id, m3u8_id=None, m3u8_entry_protocol='m3u8', mpd_id=None, preference=None):
-        def absolute_url(video_url):
-            return compat_urlparse.urljoin(base_url, video_url)
+        def absolute_url(item_url):
+            return urljoin(base_url, item_url)

        def parse_content_type(content_type):
            if not content_type:
@@ -2189,7 +2208,7 @@ class InfoExtractor(object):
            if src:
                _, formats = _media_formats(src, media_type)
                media_info['formats'].extend(formats)
-            media_info['thumbnail'] = media_attributes.get('poster')
+            media_info['thumbnail'] = absolute_url(media_attributes.get('poster'))
            if media_content:
                for source_tag in re.findall(r'<source[^>]+>', media_content):
                    source_attributes = extract_attributes(source_tag)
@@ -104,6 +104,7 @@ from .mediasite import MediasiteIE
 from .springboardplatform import SpringboardPlatformIE
 from .yapfiles import YapFilesIE
 from .vice import ViceIE
+from .xfileshare import XFileShareIE


 class GenericIE(InfoExtractor):
@@ -2231,7 +2232,11 @@ class GenericIE(InfoExtractor):
                self._sort_formats(smil['formats'])
                return smil
            elif doc.tag == '{http://xspf.org/ns/0/}playlist':
-                return self.playlist_result(self._parse_xspf(doc, video_id), video_id)
+                return self.playlist_result(
+                    self._parse_xspf(
+                        doc, video_id, xspf_url=url,
+                        xspf_base_url=compat_str(full_response.geturl())),
+                    video_id)
            elif re.match(r'(?i)^(?:{[^}]+})?MPD$', doc.tag):
                info_dict['formats'] = self._parse_mpd_formats(
                    doc,
@@ -2971,6 +2976,11 @@ class GenericIE(InfoExtractor):
            return self.playlist_from_matches(
                vice_urls, video_id, video_title, ie=ViceIE.ie_key())

+        xfileshare_urls = XFileShareIE._extract_urls(webpage)
+        if xfileshare_urls:
+            return self.playlist_from_matches(
+                xfileshare_urls, video_id, video_title, ie=XFileShareIE.ie_key())
+
        def merge_dicts(dict1, dict2):
            merged = {}
            for k, v in dict1.items():
@@ -7,6 +7,7 @@ from .youtube import YoutubeIE
 from ..utils import (
    determine_ext,
    int_or_none,
+    NO_DEFAULT,
    parse_iso8601,
    smuggle_url,
    xpath_text,
@@ -16,18 +17,19 @@ from ..utils import (
 class HeiseIE(InfoExtractor):
    _VALID_URL = r'https?://(?:www\.)?heise\.de/(?:[^/]+/)+[^/]+-(?P<id>[0-9]+)\.html'
    _TESTS = [{
+        # kaltura embed
        'url': 'http://www.heise.de/video/artikel/Podcast-c-t-uplink-3-3-Owncloud-Tastaturen-Peilsender-Smartphone-2404147.html',
-        'md5': 'ffed432483e922e88545ad9f2f15d30e',
        'info_dict': {
-            'id': '2404147',
+            'id': '1_kkrq94sm',
            'ext': 'mp4',
            'title': "Podcast: c't uplink 3.3 – Owncloud / Tastaturen / Peilsender Smartphone",
-            'format_id': 'mp4_720p',
-            'timestamp': 1411812600,
-            'upload_date': '20140927',
+            'timestamp': 1512734959,
+            'upload_date': '20171208',
            'description': 'md5:c934cbfb326c669c2bcabcbe3d3fcd20',
-            'thumbnail': r're:^https?://.*/gallery/$',
-        }
+        },
+        'params': {
+            'skip_download': True,
+        },
    }, {
        # YouTube embed
        'url': 'http://www.heise.de/newsticker/meldung/Netflix-In-20-Jahren-vom-Videoverleih-zum-TV-Revolutionaer-3814130.html',
@@ -46,13 +48,26 @@ class HeiseIE(InfoExtractor):
        },
    }, {
        'url': 'https://www.heise.de/video/artikel/nachgehakt-Wie-sichert-das-c-t-Tool-Restric-tor-Windows-10-ab-3700244.html',
-        'md5': '4b58058b46625bdbd841fc2804df95fc',
        'info_dict': {
            'id': '1_ntrmio2s',
+            'ext': 'mp4',
+            'title': "nachgehakt: Wie sichert das c't-Tool Restric'tor Windows 10 ab?",
+            'description': 'md5:47e8ffb6c46d85c92c310a512d6db271',
            'timestamp': 1512470717,
            'upload_date': '20171205',
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        'url': 'https://www.heise.de/ct/artikel/c-t-uplink-20-8-Staubsaugerroboter-Xiaomi-Vacuum-2-AR-Brille-Meta-2-und-Android-rooten-3959893.html',
+        'info_dict': {
+            'id': '1_59mk80sf',
            'ext': 'mp4',
-            'title': 'ct10 nachgehakt hos restrictor',
+            'title': "c't uplink 20.8: Staubsaugerroboter Xiaomi Vacuum 2, AR-Brille Meta 2 und Android rooten",
+            'description': 'md5:f50fe044d3371ec73a8f79fcebd74afc',
+            'timestamp': 1517567237,
+            'upload_date': '20180202',
        },
        'params': {
            'skip_download': True,
@@ -72,19 +87,40 @@ class HeiseIE(InfoExtractor):
        video_id = self._match_id(url)
        webpage = self._download_webpage(url, video_id)

-        title = self._html_search_meta('fulltitle', webpage, default=None)
-        if not title or title == "c't":
-            title = self._search_regex(
-                r'<div[^>]+class="videoplayerjw"[^>]+data-title="([^"]+)"',
-                webpage, 'title')
+        def extract_title(default=NO_DEFAULT):
+            title = self._html_search_meta(
+                ('fulltitle', 'title'), webpage, default=None)
+            if not title or title == "c't":
+                title = self._search_regex(
+                    r'<div[^>]+class="videoplayerjw"[^>]+data-title="([^"]+)"',
+                    webpage, 'title', default=None)
+            if not title:
+                title = self._html_search_regex(
+                    r'<h1[^>]+\bclass=["\']article_page_title[^>]+>(.+?)<',
+                    webpage, 'title', default=default)
+            return title

-        yt_urls = YoutubeIE._extract_urls(webpage)
-        if yt_urls:
-            return self.playlist_from_matches(yt_urls, video_id, title, ie=YoutubeIE.ie_key())
+        title = extract_title(default=None)
+        description = self._og_search_description(
+            webpage, default=None) or self._html_search_meta(
+            'description', webpage)

        kaltura_url = KalturaIE._extract_url(webpage)
        if kaltura_url:
-            return self.url_result(smuggle_url(kaltura_url, {'source_url': url}), KalturaIE.ie_key())
+            return {
+                '_type': 'url_transparent',
+                'url': smuggle_url(kaltura_url, {'source_url': url}),
+                'ie_key': KalturaIE.ie_key(),
+                'title': title,
+                'description': description,
+            }
+
+        yt_urls = YoutubeIE._extract_urls(webpage)
+        if yt_urls:
+            return self.playlist_from_matches(
+                yt_urls, video_id, title, ie=YoutubeIE.ie_key())
+
+        title = extract_title()

        container_id = self._search_regex(
            r'<div class="videoplayerjw"[^>]+data-container="([0-9]+)"',
@@ -115,10 +151,6 @@ class HeiseIE(InfoExtractor):
            })
        self._sort_formats(formats)

-        description = self._og_search_description(
-            webpage, default=None) or self._html_search_meta(
-            'description', webpage)
-
        return {
            'id': video_id,
            'title': title,
@@ -1,6 +1,6 @@
 from __future__ import unicode_literals

-import itertools
+import json
 import re

 from .common import InfoExtractor
@@ -238,70 +238,58 @@ class InstagramUserIE(InfoExtractor):
    }

    def _entries(self, uploader_id):
-        query = {
-            '__a': 1,
-        }
-
-        def get_count(kind):
+        def get_count(suffix):
            return int_or_none(try_get(
-                node, lambda x: x['%ss' % kind]['count']))
+                node, lambda x: x['edge_media_' + suffix]['count']))

-        for page_num in itertools.count(1):
-            page = self._download_json(
-                'https://instagram.com/%s/' % uploader_id, uploader_id,
-                note='Downloading page %d' % page_num,
-                fatal=False, query=query)
-            if not page:
-                break
-
-            nodes = try_get(page, lambda x: x['user']['media']['nodes'], list)
-            if not nodes:
-                break
-
-            max_id = None
-
-            for node in nodes:
-                node_id = node.get('id')
-                if node_id:
-                    max_id = node_id
-
-                if node.get('__typename') != 'GraphVideo' and node.get('is_video') is not True:
-                    continue
-                video_id = node.get('code')
-                if not video_id:
-                    continue
-
-                info = self.url_result(
-                    'https://instagram.com/p/%s/' % video_id,
-                    ie=InstagramIE.ie_key(), video_id=video_id)
-
-                description = try_get(
-                    node, [lambda x: x['caption'], lambda x: x['text']['id']],
-                    compat_str)
-                thumbnail = node.get('thumbnail_src') or node.get('display_src')
-                timestamp = int_or_none(node.get('date'))
-
-                comment_count = get_count('comment')
-                like_count = get_count('like')
-                view_count = int_or_none(node.get('video_views'))
-
-                info.update({
-                    'description': description,
-                    'thumbnail': thumbnail,
-                    'timestamp': timestamp,
-                    'comment_count': comment_count,
-                    'like_count': like_count,
-                    'view_count': view_count,
+        edges = self._download_json(
+            'https://www.instagram.com/graphql/query/', uploader_id, query={
+                'query_hash': '472f257a40c653c64c666ce877d59d2b',
+                'variables': json.dumps({
+                    'id': uploader_id,
+                    'first': 999999999,
                })
+            })['data']['user']['edge_owner_to_timeline_media']['edges']

-                yield info
+        for edge in edges:
+            node = edge['node']

-            if not max_id:
-                break
+            if node.get('__typename') != 'GraphVideo' and node.get('is_video') is not True:
+                continue
+            video_id = node.get('shortcode')
+            if not video_id:
+                continue

-            query['max_id'] = max_id
+            info = self.url_result(
+                'https://instagram.com/p/%s/' % video_id,
+                ie=InstagramIE.ie_key(), video_id=video_id)
+
+            description = try_get(
+                node, lambda x: x['edge_media_to_caption']['edges'][0]['node']['text'],
+                compat_str)
+            thumbnail = node.get('thumbnail_src') or node.get('display_src')
+            timestamp = int_or_none(node.get('taken_at_timestamp'))
+
+            comment_count = get_count('to_comment')
+            like_count = get_count('preview_like')
+            view_count = int_or_none(node.get('video_view_count'))
+
+            info.update({
+                'description': description,
+                'thumbnail': thumbnail,
+                'timestamp': timestamp,
+                'comment_count': comment_count,
+                'like_count': like_count,
+                'view_count': view_count,
+            })
+
+            yield info

    def _real_extract(self, url):
-        uploader_id = self._match_id(url)
+        username = self._match_id(url)
+        uploader_id = self._download_json(
+            'https://instagram.com/%s/' % username, username, query={
+                '__a': 1,
+            })['graphql']['user']['id']
        return self.playlist_result(
-            self._entries(uploader_id), uploader_id, uploader_id)
+            self._entries(uploader_id), username, username)
@@ -4,15 +4,17 @@ from __future__ import unicode_literals
 from .common import InfoExtractor
 from ..compat import compat_str
 from ..utils import (
+    ExtractorError,
    int_or_none,
    float_or_none,
-    ExtractorError,
+    smuggle_url,
 )


 class NineNowIE(InfoExtractor):
    IE_NAME = '9now.com.au'
    _VALID_URL = r'https?://(?:www\.)?9now\.com\.au/(?:[^/]+/){2}(?P<id>[^/?#]+)'
+    _GEO_COUNTRIES = ['AU']
    _TESTS = [{
        # clip
        'url': 'https://www.9now.com.au/afl-footy-show/2016/clip-ciql02091000g0hp5oktrnytc',
@@ -75,7 +77,9 @@ class NineNowIE(InfoExtractor):

        return {
            '_type': 'url_transparent',
-            'url': self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id,
+            'url': smuggle_url(
+                self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id,
+                {'geo_countries': self._GEO_COUNTRIES}),
            'id': video_id,
            'title': title,
            'description': common_data.get('description'),
@@ -133,7 +133,7 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
                            (?:
                                prosieben(?:maxx)?|sixx|sat1(?:gold)?|kabeleins(?:doku)?|the-voice-of-germany|7tv|advopedia
                            )\.(?:de|at|ch)|
-                            ran\.de|fem\.com|advopedia\.de
+                            ran\.de|fem\.com|advopedia\.de|galileo\.tv/video
                        )
                        /(?P<id>.+)
                    '''
@@ -326,6 +326,11 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
            'url': 'http://www.sat1gold.de/tv/edel-starck/video/11-staffel-1-episode-1-partner-wider-willen-ganze-folge',
            'only_matching': True,
        },
+        {
+            # geo restricted to Germany
+            'url': 'https://www.galileo.tv/video/diese-emojis-werden-oft-missverstanden',
+            'only_matching': True,
+        },
        {
            'url': 'http://www.sat1gold.de/tv/edel-starck/playlist/die-gesamte-1-staffel',
            'only_matching': True,
@@ -343,7 +348,7 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
        r'"clip_id"\s*:\s+"(\d+)"',
        r'clipid: "(\d+)"',
        r'clip[iI]d=(\d+)',
-        r'clip[iI]d\s*=\s*["\'](\d+)',
+        r'clip[iI][dD]\s*=\s*["\'](\d+)',
        r"'itemImageUrl'\s*:\s*'/dynamic/thumbnails/full/\d+/(\d+)",
        r'proMamsId&quot;\s*:\s*&quot;(\d+)',
        r'proMamsId"\s*:\s*"(\d+)',
@@ -4,22 +4,30 @@ from __future__ import unicode_literals
 import re

 from .brightcove import BrightcoveNewIE
-from ..utils import update_url_query
+from ..compat import compat_str
+from ..utils import (
+    try_get,
+    update_url_query,
+)


 class SevenPlusIE(BrightcoveNewIE):
    IE_NAME = '7plus'
    _VALID_URL = r'https?://(?:www\.)?7plus\.com\.au/(?P<path>[^?]+\?.*?\bepisode-id=(?P<id>[^&#]+))'
    _TESTS = [{
-        'url': 'https://7plus.com.au/BEAT?episode-id=BEAT-001',
+        'url': 'https://7plus.com.au/MTYS?episode-id=MTYS7-003',
        'info_dict': {
-            'id': 'BEAT-001',
+            'id': 'MTYS7-003',
            'ext': 'mp4',
-            'title': 'S1 E1 - Help / Lucy In The Sky With Diamonds',
-            'description': 'md5:37718bea20a8eedaca7f7361af566131',
+            'title': 'S7 E3 - Wind Surf',
+            'description': 'md5:29c6a69f21accda7601278f81b46483d',
            'uploader_id': '5303576322001',
-            'upload_date': '20171031',
-            'timestamp': 1509440068,
+            'upload_date': '20171201',
+            'timestamp': 1512106377,
+            'series': 'Mighty Ships',
+            'season_number': 7,
+            'episode_number': 3,
+            'episode': 'Wind Surf',
        },
        'params': {
            'format': 'bestvideo',
@@ -63,5 +71,14 @@ class SevenPlusIE(BrightcoveNewIE):
                    value = item.get(src_key)
                    if value:
                        info[dst_key] = value
+                info['series'] = try_get(
+                    item, lambda x: x['seriesLogo']['name'], compat_str)
+                mobj = re.search(r'^S(\d+)\s+E(\d+)\s+-\s+(.+)$', info['title'])
+                if mobj:
+                    info.update({
+                        'season_number': int(mobj.group(1)),
+                        'episode_number': int(mobj.group(2)),
+                        'episode': mobj.group(3),
+                    })

        return info
@@ -118,6 +118,15 @@ class XFileShareIE(InfoExtractor):
        'only_matching': True
    }]

+    @staticmethod
+    def _extract_urls(webpage):
+        return [
+            mobj.group('url')
+            for mobj in re.finditer(
+                r'<iframe\b[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//(?:%s)/embed-[0-9a-zA-Z]+.*?)\1'
+                % '|'.join(site for site in list(zip(*XFileShareIE._SITES))[0]),
+                webpage)]
+
    def _real_extract(self, url):
        mobj = re.match(self._VALID_URL, url)
        video_id = mobj.group('id')
@@ -1,3 +1,3 @@
 from __future__ import unicode_literals

-__version__ = '2018.03.14'
+__version__ = '2018.03.20'
Author	SHA1	Message	Date
Sergey M․	a66d1d079a	release 2018.03.20	2018-03-20 01:55:48 +07:00
Sergey M․	c651de39d5	[ChangeLog] Actualize [ci skip]	2018-03-20 01:54:35 +07:00
Sergey M․	d9e2240f7c	[7plus] Extract series metadata (closes #15862 , closes #15906 )	2018-03-20 01:40:53 +07:00
Sergey M․	832f9d5258	[9now] Bypass geo restriction (closes #15920 )	2018-03-20 01:08:03 +07:00
Sergey M․	21dedcb580	[cbs] Skip unavailable assets (closes #13490 , closes #13506 , closes #15776 )	2018-03-20 00:27:39 +07:00
Sergey M․	6780154e6b	[extractor/common] Improve thumbnail extraction for HTML5 entries	2018-03-19 23:43:53 +07:00
Sergey M․	38f59e2793	[canalc2] Add support for HTML5 videos (closes #15916 , closes #15919 )	2018-03-19 23:40:19 +07:00
Sergey M․	9a054fcbba	[ceskatelevize] Add support for iframe embeds (closes #15918 )	2018-03-19 23:29:53 +07:00
kayb94	6e3f23d912	[prosiebensat1] Add support for galileo.tv (closes #15894 )	2018-03-19 04:14:33 +07:00
Sergey M․	47a5cb7734	Generalize XML manifest processing code and improve XSPF parsing (closes #15794 )	2018-03-18 02:52:17 +07:00
Sergey M․	e0d198c18d	[extractor/common] Add _download_xml_handle	2018-03-18 02:52:01 +07:00
Ricardo Constantino	96b8b9abae	[extractor/generic] Support relative URIs in _parse_xspf <location> can have relative URIs, not just absolute.	2018-03-18 02:48:44 +07:00
Sergey M․	178ee88319	[generic] Add support for xfileshare embeds (closes #15879 )	2018-03-17 23:57:07 +07:00
Sergey M․	d123960857	[bilibili] Switch to v2 playurl API	2018-03-16 03:18:53 +07:00
Sergey M․	3526c3043b	[bilibili] Fix and improve extraction (closes #15048 , closes #15430 , closes #15622 , closes #15863 )	2018-03-16 00:19:17 +07:00
Sergey M․	8e70c1bfac	[heise] Improve extraction (closes #15496 , closes #15784 , closes #15026 )	2018-03-15 23:09:24 +07:00
Remita Amine	27b1c73f14	[instagram] fix user videos extraction(fixes #15858 )	2018-03-15 14:33:36 +01:00