1
0
mirror of https://github.com/ytdl-org/youtube-dl.git synced 2026-06-10 22:50:19 +00:00

Compare commits

..

33 Commits

Author SHA1 Message Date
Sergey M․ 9e18bb4c67 release 2018.05.09 2018-05-09 00:36:47 +07:00
Sergey M․ 44277998ad [ChangeLog] Actualize
[ci skip]
2018-05-09 00:34:39 +07:00
Sergey M․ 05108a496a [YoutubeDL] Ensure ext exists for automatic captions 2018-05-08 22:57:52 +07:00
Sergey M․ 2fbd86352e [udemy] Extract asset captions 2018-05-08 22:57:01 +07:00
Sergey M․ 0ce76801e8 [udemy] Extract stream URLs (closes #16372) 2018-05-08 22:33:35 +07:00
Sergey M․ 789b7774a7 [businessinsider] Add extractor (closes #16387, closes #16388, closes #16389) 2018-05-06 21:58:55 +07:00
Sergey M․ 660a230b2d [cloudflarestream] Add support for cloudflare streams (closes #16375) 2018-05-05 01:21:52 +07:00
Sergey M․ a90a6b54ee [watchbox] Fix extraction (closes #16356) 2018-05-02 20:43:34 +07:00
Remita Amine 3cc0d0b829 [discovery] extract Affiliate/Anonymous Auth Token from cookies(closes #14954) 2018-05-02 09:32:53 +01:00
Sergey M․ ea1f5e5dbd [itv:btcc] Add extractor (closes #16139) 2018-05-02 07:21:24 +07:00
Sergey M․ 5f95927a62 Improve geo bypass mechanism
* Introduce geo bypass context
* Add ability to bypass based on IP blocks in CIDR notation
* Introduce --geo-bypass-ip-block
2018-05-02 07:20:59 +07:00
Sergey M․ a93ce61bd5 [tunein] Use live title for live streams (closes #16347) 2018-05-02 01:29:44 +07:00
Sergey M․ c18142da6e [itv] Improve extraction (closes #16253) 2018-05-01 22:48:08 +07:00
Sergey M․ cc42941390 release 2018.05.01 2018-05-01 03:38:57 +07:00
Sergey M․ cc5772c4f0 [ChangeLog] Actualize
[ci skip]
2018-05-01 03:30:23 +07:00
Sergey M․ c21692fa94 [kaltura] Improve iframe embeds detection (closes #16337) 2018-05-01 03:09:04 +07:00
Sergey M․ 8513963468 [udemy] Extract outputs renditions (closes #16289, closes #16291, closes #16320, closes #16321, closes #16334, closes #16335) 2018-05-01 02:15:43 +07:00
Sergey M․ 67ca1a8ef7 [zattoo] Improve and simplify (closes #14676) 2018-05-01 01:50:30 +07:00
Alex Seiler 4a73354586 [zattoo] Add extractor (closes #14668) 2018-05-01 01:50:07 +07:00
Sergey M․ 796bf9de45 [yandexmusic] Convert release_year to int 2018-04-29 22:56:07 +07:00
Sergey M․ e5eadfa82f [udemy,xiami,yandexmusic] Override _download_webpage_handle instead of _download_webpage 2018-04-29 22:54:52 +07:00
Niklas Haas 30226342ab [youtube] Correctly disable polymer on all requests
Rather than just the one that use the _download_webpage helper. The need
for this was made apparent by 0fe7783e, which refactored
_download_json in a way that completely avoids the use of
_download_webpage, thus breaking youtube.

Fixes #16323
2018-04-29 22:35:16 +07:00
Bastian de Groot 01aec84880 [generic] Prefer enclosures over links in RSS feeds 2018-04-29 22:14:37 +07:00
Meneth32 12b0d4e0e1 [redditr] Add support for old.reddit.com URLs 2018-04-29 21:59:40 +07:00
Sergey M․ 106c8c3edb [nrktv] Update API host (closes #16324) 2018-04-29 19:04:40 +07:00
Sergey M․ 500a86a52e [downloader/fragment] Restart download if .ytdl file is corrupt (closes #16312) 2018-04-29 00:33:31 +07:00
Sergey M․ 7dd6ab4a47 [imdb] Extract all formats (closes #16249) 2018-04-28 04:51:39 +07:00
Sergey M․ ae1c585cee [vimeo] Extract JSON LD (closes #16295) 2018-04-28 02:51:18 +07:00
Sergey M․ e7e4a6e0f9 [extractor/common] Extract interaction statistic 2018-04-28 02:48:03 +07:00
Sergey M․ 6cc622327f [utils] Introduce merge_dicts 2018-04-28 02:47:17 +07:00
Sergey M․ 0fe7783ece [extractor/common] Add _download_json_handle 2018-04-28 01:59:15 +07:00
Sergey M․ c84eae4f66 [funk:channel] Improve extraction (closes #16285) 2018-04-27 03:45:52 +07:00
Sergey M․ d3711b0050 [devscripts/gh-pages/generate-download.py] Use program checksum from versions.json 2018-04-25 02:14:27 +07:00
38 changed files with 926 additions and 166 deletions
+3 -3
View File
@@ -6,8 +6,8 @@
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2018.04.25*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2018.04.25**
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2018.05.09*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2018.05.09**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@@ -36,7 +36,7 @@ Add the `-v` flag to **your command line** you run youtube-dl with (`youtube-dl
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2018.04.25
[debug] youtube-dl version 2018.05.09
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}
+44
View File
@@ -1,3 +1,47 @@
version 2018.05.09
Core
* [YoutubeDL] Ensure ext exists for automatic captions
* Introduce --geo-bypass-ip-block
Extractors
+ [udemy] Extract asset captions
+ [udemy] Extract stream URLs (#16372)
+ [businessinsider] Add support for businessinsider.com (#16387, #16388, #16389)
+ [cloudflarestream] Add support for cloudflarestream.com (#16375)
* [watchbox] Fix extraction (#16356)
* [discovery] Extract Affiliate/Anonymous Auth Token from cookies (#14954)
+ [itv:btcc] Add support for itv.com/btcc (#16139)
* [tunein] Use live title for live streams (#16347)
* [itv] Improve extraction (#16253)
version 2018.05.01
Core
* [downloader/fragment] Restart download if .ytdl file is corrupt (#16312)
+ [extractor/common] Extract interaction statistic
+ [utils] Add merge_dicts
+ [extractor/common] Add _download_json_handle
Extractors
* [kaltura] Improve iframe embeds detection (#16337)
+ [udemy] Extract outputs renditions (#16289, #16291, #16320, #16321, #16334,
#16335)
+ [zattoo] Add support for zattoo.com and mobiltv.quickline.com (#14668, #14676)
* [yandexmusic] Convert release_year to int
* [udemy] Override _download_webpage_handle instead of _download_webpage
* [xiami] Override _download_webpage_handle instead of _download_webpage
* [yandexmusic] Override _download_webpage_handle instead of _download_webpage
* [youtube] Correctly disable polymer on all requests (#16323, #16326)
* [generic] Prefer enclosures over links in RSS feeds (#16189)
+ [redditr] Add support for old.reddit.com URLs (#16274)
* [nrktv] Update API host (#16324)
+ [imdb] Extract all formats (#16249)
+ [vimeo] Extract JSON-LD (#16295)
* [funk:channel] Improve extraction (#16285)
version 2018.04.25
Core
+3
View File
@@ -116,6 +116,9 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
--geo-bypass-country CODE Force bypass geographic restriction with
explicitly provided two-letter ISO 3166-2
country code (experimental)
--geo-bypass-ip-block IP_BLOCK Force bypass geographic restriction with
explicitly provided IP block in CIDR
notation (experimental)
## Video Selection:
--playlist-start NUMBER Playlist video to start at (default is 1)
+7 -12
View File
@@ -1,27 +1,22 @@
#!/usr/bin/env python3
from __future__ import unicode_literals
import hashlib
import urllib.request
import json
versions_info = json.load(open('update/versions.json'))
version = versions_info['latest']
URL = versions_info['versions'][version]['bin'][0]
data = urllib.request.urlopen(URL).read()
version_dict = versions_info['versions'][version]
# Read template page
with open('download.html.in', 'r', encoding='utf-8') as tmplf:
template = tmplf.read()
sha256sum = hashlib.sha256(data).hexdigest()
template = template.replace('@PROGRAM_VERSION@', version)
template = template.replace('@PROGRAM_URL@', URL)
template = template.replace('@PROGRAM_SHA256SUM@', sha256sum)
template = template.replace('@EXE_URL@', versions_info['versions'][version]['exe'][0])
template = template.replace('@EXE_SHA256SUM@', versions_info['versions'][version]['exe'][1])
template = template.replace('@TAR_URL@', versions_info['versions'][version]['tar'][0])
template = template.replace('@TAR_SHA256SUM@', versions_info['versions'][version]['tar'][1])
template = template.replace('@PROGRAM_URL@', version_dict['bin'][0])
template = template.replace('@PROGRAM_SHA256SUM@', version_dict['bin'][1])
template = template.replace('@EXE_URL@', version_dict['exe'][0])
template = template.replace('@EXE_SHA256SUM@', version_dict['exe'][1])
template = template.replace('@TAR_URL@', version_dict['tar'][0])
template = template.replace('@TAR_SHA256SUM@', version_dict['tar'][1])
with open('download.html', 'w', encoding='utf-8') as dlf:
dlf.write(template)
+7
View File
@@ -122,6 +122,7 @@
- **BRMediathek**: Bayerischer Rundfunk Mediathek
- **bt:article**: Bergens Tidende Articles
- **bt:vestlendingen**: Bergens Tidende - Vestlendingen
- **BusinessInsider**
- **BuzzFeed**
- **BYUtv**
- **Camdemy**
@@ -163,6 +164,7 @@
- **ClipRs**
- **Clipsyndicate**
- **CloserToTruth**
- **CloudflareStream**
- **cloudtime**: CloudTime
- **Cloudy**
- **Clubic**
@@ -373,6 +375,7 @@
- **Ir90Tv**
- **ITTF**
- **ITV**
- **ITVBTCC**
- **ivi**: ivi.ru
- **ivi:compilation**: ivi.ru compilations
- **ivideon**: Ivideon TV
@@ -667,6 +670,8 @@
- **qqmusic:playlist**: QQ音乐 - 歌单
- **qqmusic:singer**: QQ音乐 - 歌手
- **qqmusic:toplist**: QQ音乐 - 排行榜
- **Quickline**
- **QuicklineLive**
- **R7**
- **R7Article**
- **radio.de**
@@ -1092,6 +1097,8 @@
- **youtube:watchlater**: Youtube watch later list, ":ytwatchlater" for short (requires authentication)
- **Zapiks**
- **Zaq1**
- **Zattoo**
- **ZattooLive**
- **ZDF**
- **ZDFChannel**
- **zingmp3**: mp3.zing.vn
+12
View File
@@ -42,6 +42,7 @@ from youtube_dl.utils import (
is_html,
js_to_json,
limit_length,
merge_dicts,
mimetype2ext,
month_by_name,
multipart_encode,
@@ -669,6 +670,17 @@ class TestUtil(unittest.TestCase):
self.assertEqual(dict_get(d, ('b', 'c', key, )), None)
self.assertEqual(dict_get(d, ('b', 'c', key, ), skip_false_values=False), false_value)
def test_merge_dicts(self):
self.assertEqual(merge_dicts({'a': 1}, {'b': 2}), {'a': 1, 'b': 2})
self.assertEqual(merge_dicts({'a': 1}, {'a': 2}), {'a': 1})
self.assertEqual(merge_dicts({'a': 1}, {'a': None}), {'a': 1})
self.assertEqual(merge_dicts({'a': 1}, {'a': ''}), {'a': 1})
self.assertEqual(merge_dicts({'a': 1}, {}), {'a': 1})
self.assertEqual(merge_dicts({'a': None}, {'a': 1}), {'a': 1})
self.assertEqual(merge_dicts({'a': ''}, {'a': 1}), {'a': ''})
self.assertEqual(merge_dicts({'a': ''}, {'a': 'abc'}), {'a': 'abc'})
self.assertEqual(merge_dicts({'a': None}, {'a': ''}, {'a': 'abc'}), {'a': 'abc'})
def test_encode_compat_str(self):
self.assertEqual(encode_compat_str(b'\xd1\x82\xd0\xb5\xd1\x81\xd1\x82', 'utf-8'), 'тест')
self.assertEqual(encode_compat_str('тест', 'utf-8'), 'тест')
+18 -10
View File
@@ -286,6 +286,9 @@ class YoutubeDL(object):
Two-letter ISO 3166-2 country code that will be used for
explicit geographic restriction bypassing via faking
X-Forwarded-For HTTP header (experimental)
geo_bypass_ip_block:
IP range in CIDR notation that will be used similarly to
geo_bypass_country (experimental)
The following options determine which downloader is picked:
external_downloader: Executable of the external downloader to call.
@@ -1479,23 +1482,28 @@ class YoutubeDL(object):
if info_dict.get('%s_number' % field) is not None and not info_dict.get(field):
info_dict[field] = '%s %d' % (field.capitalize(), info_dict['%s_number' % field])
for cc_kind in ('subtitles', 'automatic_captions'):
cc = info_dict.get(cc_kind)
if cc:
for _, subtitle in cc.items():
for subtitle_format in subtitle:
if subtitle_format.get('url'):
subtitle_format['url'] = sanitize_url(subtitle_format['url'])
if subtitle_format.get('ext') is None:
subtitle_format['ext'] = determine_ext(subtitle_format['url']).lower()
automatic_captions = info_dict.get('automatic_captions')
subtitles = info_dict.get('subtitles')
if subtitles:
for _, subtitle in subtitles.items():
for subtitle_format in subtitle:
if subtitle_format.get('url'):
subtitle_format['url'] = sanitize_url(subtitle_format['url'])
if subtitle_format.get('ext') is None:
subtitle_format['ext'] = determine_ext(subtitle_format['url']).lower()
if self.params.get('listsubtitles', False):
if 'automatic_captions' in info_dict:
self.list_subtitles(info_dict['id'], info_dict.get('automatic_captions'), 'automatic captions')
self.list_subtitles(
info_dict['id'], automatic_captions, 'automatic captions')
self.list_subtitles(info_dict['id'], subtitles, 'subtitles')
return
info_dict['requested_subtitles'] = self.process_subtitles(
info_dict['id'], subtitles,
info_dict.get('automatic_captions'))
info_dict['id'], subtitles, automatic_captions)
# We now pick which formats have to be downloaded
if info_dict.get('formats') is None:
+1
View File
@@ -430,6 +430,7 @@ def _real_main(argv=None):
'config_location': opts.config_location,
'geo_bypass': opts.geo_bypass,
'geo_bypass_country': opts.geo_bypass_country,
'geo_bypass_ip_block': opts.geo_bypass_ip_block,
# just for deprecation check
'autonumber': opts.autonumber if opts.autonumber is True else None,
'usetitle': opts.usetitle if opts.usetitle is True else None,
+16 -5
View File
@@ -74,9 +74,14 @@ class FragmentFD(FileDownloader):
return not ctx['live'] and not ctx['tmpfilename'] == '-'
def _read_ytdl_file(self, ctx):
assert 'ytdl_corrupt' not in ctx
stream, _ = sanitize_open(self.ytdl_filename(ctx['filename']), 'r')
ctx['fragment_index'] = json.loads(stream.read())['downloader']['current_fragment']['index']
stream.close()
try:
ctx['fragment_index'] = json.loads(stream.read())['downloader']['current_fragment']['index']
except Exception:
ctx['ytdl_corrupt'] = True
finally:
stream.close()
def _write_ytdl_file(self, ctx):
frag_index_stream, _ = sanitize_open(self.ytdl_filename(ctx['filename']), 'w')
@@ -158,11 +163,17 @@ class FragmentFD(FileDownloader):
if self.__do_ytdl_file(ctx):
if os.path.isfile(encodeFilename(self.ytdl_filename(ctx['filename']))):
self._read_ytdl_file(ctx)
if ctx['fragment_index'] > 0 and resume_len == 0:
is_corrupt = ctx.get('ytdl_corrupt') is True
is_inconsistent = ctx['fragment_index'] > 0 and resume_len == 0
if is_corrupt or is_inconsistent:
message = (
'.ytdl file is corrupt' if is_corrupt else
'Inconsistent state of incomplete fragment download')
self.report_warning(
'Inconsistent state of incomplete fragment download. '
'Restarting from the beginning...')
'%s. Restarting from the beginning...' % message)
ctx['fragment_index'] = resume_len = 0
if 'ytdl_corrupt' in ctx:
del ctx['ytdl_corrupt']
self._write_ytdl_file(ctx)
else:
self._write_ytdl_file(ctx)
+3 -1
View File
@@ -277,7 +277,9 @@ class AnvatoIE(InfoExtractor):
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
self._initialize_geo_bypass(smuggled_data.get('geo_countries'))
self._initialize_geo_bypass({
'countries': smuggled_data.get('geo_countries'),
})
mobj = re.match(self._VALID_URL, url)
access_key, video_id = mobj.group('access_key_or_mcp', 'id')
+4 -1
View File
@@ -669,7 +669,10 @@ class BrightcoveNewIE(AdobePassIE):
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
self._initialize_geo_bypass(smuggled_data.get('geo_countries'))
self._initialize_geo_bypass({
'countries': smuggled_data.get('geo_countries'),
'ip_blocks': smuggled_data.get('geo_ip_blocks'),
})
account_id, player_id, embed, video_id = re.match(self._VALID_URL, url).groups()
+42
View File
@@ -0,0 +1,42 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from .jwplatform import JWPlatformIE
class BusinessInsiderIE(InfoExtractor):
_VALID_URL = r'https?://(?:[^/]+\.)?businessinsider\.(?:com|nl)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://uk.businessinsider.com/how-much-radiation-youre-exposed-to-in-everyday-life-2016-6',
'md5': 'ca237a53a8eb20b6dc5bd60564d4ab3e',
'info_dict': {
'id': 'hZRllCfw',
'ext': 'mp4',
'title': "Here's how much radiation you're exposed to in everyday life",
'description': 'md5:9a0d6e2c279948aadaa5e84d6d9b99bd',
'upload_date': '20170709',
'timestamp': 1499606400,
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.businessinsider.nl/5-scientifically-proven-things-make-you-less-attractive-2017-7/',
'only_matching': True,
}, {
'url': 'http://www.businessinsider.com/excel-index-match-vlookup-video-how-to-2015-2?IR=T',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
jwplatform_id = self._search_regex(
(r'data-media-id=["\']([a-zA-Z0-9]{8})',
r'id=["\']jwplayer_([a-zA-Z0-9]{8})',
r'id["\']?\s*:\s*["\']?([a-zA-Z0-9]{8})'),
webpage, 'jwplatform id')
return self.url_result(
'jwplatform:%s' % jwplatform_id, ie=JWPlatformIE.ie_key(),
video_id=video_id)
+60
View File
@@ -0,0 +1,60 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class CloudflareStreamIE(InfoExtractor):
_VALID_URL = r'''(?x)
https?://
(?:
(?:watch\.)?cloudflarestream\.com/|
embed\.cloudflarestream\.com/embed/[^/]+\.js\?.*?\bvideo=
)
(?P<id>[\da-f]+)
'''
_TESTS = [{
'url': 'https://embed.cloudflarestream.com/embed/we4g.fla9.latest.js?video=31c9291ab41fac05471db4e73aa11717',
'info_dict': {
'id': '31c9291ab41fac05471db4e73aa11717',
'ext': 'mp4',
'title': '31c9291ab41fac05471db4e73aa11717',
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://watch.cloudflarestream.com/9df17203414fd1db3e3ed74abbe936c1',
'only_matching': True,
}, {
'url': 'https://cloudflarestream.com/31c9291ab41fac05471db4e73aa11717/manifest/video.mpd',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):
return [
mobj.group('url')
for mobj in re.finditer(
r'<script[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//embed\.cloudflarestream\.com/embed/[^/]+\.js\?.*?\bvideo=[\da-f]+?.*?)\1',
webpage)]
def _real_extract(self, url):
video_id = self._match_id(url)
formats = self._extract_m3u8_formats(
'https://cloudflarestream.com/%s/manifest/video.m3u8' % video_id,
video_id, 'mp4', entry_protocol='m3u8_native', m3u8_id='hls',
fatal=False)
formats.extend(self._extract_mpd_formats(
'https://cloudflarestream.com/%s/manifest/video.mpd' % video_id,
video_id, mpd_id='dash', fatal=False))
self._sort_formats(formats)
return {
'id': video_id,
'title': video_id,
'formats': formats,
}
+136 -26
View File
@@ -346,6 +346,11 @@ class InfoExtractor(object):
geo restriction bypass mechanism right away in order to bypass
geo restriction, of course, if the mechanism is not disabled. (experimental)
_GEO_IP_BLOCKS attribute may contain a list of presumably geo unrestricted
IP blocks in CIDR notation for this extractor. One of these IP blocks
will be used by geo restriction bypass mechanism similarly
to _GEO_COUNTRIES. (experimental)
NB: both these geo attributes are experimental and may change in future
or be completely removed.
@@ -358,6 +363,7 @@ class InfoExtractor(object):
_x_forwarded_for_ip = None
_GEO_BYPASS = True
_GEO_COUNTRIES = None
_GEO_IP_BLOCKS = None
_WORKING = True
def __init__(self, downloader=None):
@@ -392,12 +398,15 @@ class InfoExtractor(object):
def initialize(self):
"""Initializes an instance (authentication, etc)."""
self._initialize_geo_bypass(self._GEO_COUNTRIES)
self._initialize_geo_bypass({
'countries': self._GEO_COUNTRIES,
'ip_blocks': self._GEO_IP_BLOCKS,
})
if not self._ready:
self._real_initialize()
self._ready = True
def _initialize_geo_bypass(self, countries):
def _initialize_geo_bypass(self, geo_bypass_context):
"""
Initialize geo restriction bypass mechanism.
@@ -408,28 +417,82 @@ class InfoExtractor(object):
HTTP requests.
This method will be used for initial geo bypass mechanism initialization
during the instance initialization with _GEO_COUNTRIES.
during the instance initialization with _GEO_COUNTRIES and
_GEO_IP_BLOCKS.
You may also manually call it from extractor's code if geo countries
You may also manually call it from extractor's code if geo bypass
information is not available beforehand (e.g. obtained during
extraction) or due to some another reason.
extraction) or due to some other reason. In this case you should pass
this information in geo bypass context passed as first argument. It may
contain following fields:
countries: List of geo unrestricted countries (similar
to _GEO_COUNTRIES)
ip_blocks: List of geo unrestricted IP blocks in CIDR notation
(similar to _GEO_IP_BLOCKS)
"""
if not self._x_forwarded_for_ip:
country_code = self._downloader.params.get('geo_bypass_country', None)
# If there is no explicit country for geo bypass specified and
# the extractor is known to be geo restricted let's fake IP
# as X-Forwarded-For right away.
if (not country_code and
self._GEO_BYPASS and
self._downloader.params.get('geo_bypass', True) and
countries):
country_code = random.choice(countries)
if country_code:
self._x_forwarded_for_ip = GeoUtils.random_ipv4(country_code)
# Geo bypass mechanism is explicitly disabled by user
if not self._downloader.params.get('geo_bypass', True):
return
if not geo_bypass_context:
geo_bypass_context = {}
# Backward compatibility: previously _initialize_geo_bypass
# expected a list of countries, some 3rd party code may still use
# it this way
if isinstance(geo_bypass_context, (list, tuple)):
geo_bypass_context = {
'countries': geo_bypass_context,
}
# The whole point of geo bypass mechanism is to fake IP
# as X-Forwarded-For HTTP header based on some IP block or
# country code.
# Path 1: bypassing based on IP block in CIDR notation
# Explicit IP block specified by user, use it right away
# regardless of whether extractor is geo bypassable or not
ip_block = self._downloader.params.get('geo_bypass_ip_block', None)
# Otherwise use random IP block from geo bypass context but only
# if extractor is known as geo bypassable
if not ip_block:
ip_blocks = geo_bypass_context.get('ip_blocks')
if self._GEO_BYPASS and ip_blocks:
ip_block = random.choice(ip_blocks)
if ip_block:
self._x_forwarded_for_ip = GeoUtils.random_ipv4(ip_block)
if self._downloader.params.get('verbose', False):
self._downloader.to_screen(
'[debug] Using fake IP %s as X-Forwarded-For.'
% self._x_forwarded_for_ip)
return
# Path 2: bypassing based on country code
# Explicit country code specified by user, use it right away
# regardless of whether extractor is geo bypassable or not
country = self._downloader.params.get('geo_bypass_country', None)
# Otherwise use random country code from geo bypass context but
# only if extractor is known as geo bypassable
if not country:
countries = geo_bypass_context.get('countries')
if self._GEO_BYPASS and countries:
country = random.choice(countries)
if country:
self._x_forwarded_for_ip = GeoUtils.random_ipv4(country)
if self._downloader.params.get('verbose', False):
self._downloader.to_screen(
'[debug] Using fake IP %s (%s) as X-Forwarded-For.'
% (self._x_forwarded_for_ip, country_code.upper()))
% (self._x_forwarded_for_ip, country.upper()))
def extract(self, url):
"""Extracts URL information and returns it in list of dicts."""
@@ -682,18 +745,30 @@ class InfoExtractor(object):
else:
self.report_warning(errmsg + str(ve))
def _download_json(self, url_or_request, video_id,
note='Downloading JSON metadata',
errnote='Unable to download JSON metadata',
transform_source=None,
fatal=True, encoding=None, data=None, headers={}, query={}):
json_string = self._download_webpage(
def _download_json_handle(
self, url_or_request, video_id, note='Downloading JSON metadata',
errnote='Unable to download JSON metadata', transform_source=None,
fatal=True, encoding=None, data=None, headers={}, query={}):
"""Return a tuple (JSON object, URL handle)"""
res = self._download_webpage_handle(
url_or_request, video_id, note, errnote, fatal=fatal,
encoding=encoding, data=data, headers=headers, query=query)
if (not fatal) and json_string is False:
return None
if res is False:
return res
json_string, urlh = res
return self._parse_json(
json_string, video_id, transform_source=transform_source, fatal=fatal)
json_string, video_id, transform_source=transform_source,
fatal=fatal), urlh
def _download_json(
self, url_or_request, video_id, note='Downloading JSON metadata',
errnote='Unable to download JSON metadata', transform_source=None,
fatal=True, encoding=None, data=None, headers={}, query={}):
res = self._download_json_handle(
url_or_request, video_id, note=note, errnote=errnote,
transform_source=transform_source, fatal=fatal, encoding=encoding,
data=data, headers=headers, query=query)
return res if res is False else res[0]
def _parse_json(self, json_string, video_id, transform_source=None, fatal=True):
if transform_source:
@@ -1008,6 +1083,40 @@ class InfoExtractor(object):
if isinstance(json_ld, dict):
json_ld = [json_ld]
INTERACTION_TYPE_MAP = {
'CommentAction': 'comment',
'AgreeAction': 'like',
'DisagreeAction': 'dislike',
'LikeAction': 'like',
'DislikeAction': 'dislike',
'ListenAction': 'view',
'WatchAction': 'view',
'ViewAction': 'view',
}
def extract_interaction_statistic(e):
interaction_statistic = e.get('interactionStatistic')
if not isinstance(interaction_statistic, list):
return
for is_e in interaction_statistic:
if not isinstance(is_e, dict):
continue
if is_e.get('@type') != 'InteractionCounter':
continue
interaction_type = is_e.get('interactionType')
if not isinstance(interaction_type, compat_str):
continue
interaction_count = int_or_none(is_e.get('userInteractionCount'))
if interaction_count is None:
continue
count_kind = INTERACTION_TYPE_MAP.get(interaction_type.split('/')[-1])
if not count_kind:
continue
count_key = '%s_count' % count_kind
if info.get(count_key) is not None:
continue
info[count_key] = interaction_count
def extract_video_object(e):
assert e['@type'] == 'VideoObject'
info.update({
@@ -1023,6 +1132,7 @@ class InfoExtractor(object):
'height': int_or_none(e.get('height')),
'view_count': int_or_none(e.get('interactionCount')),
})
extract_interaction_statistic(e)
for e in json_ld:
if isinstance(e.get('@context'), compat_str) and re.match(r'^https?://schema.org/?$', e.get('@context')):
+26 -11
View File
@@ -5,7 +5,10 @@ import re
import string
from .discoverygo import DiscoveryGoBaseIE
from ..compat import compat_str
from ..compat import (
compat_str,
compat_urllib_parse_unquote,
)
from ..utils import (
ExtractorError,
try_get,
@@ -55,15 +58,27 @@ class DiscoveryIE(DiscoveryGoBaseIE):
video = next(cb for cb in content_blocks if cb.get('type') == 'video')['content']['items'][0]
video_id = video['id']
access_token = self._download_json(
'https://www.%s.com/anonymous' % site, display_id, query={
'authRel': 'authorization',
'client_id': try_get(
react_data, lambda x: x['application']['apiClientId'],
compat_str) or '3020a40c2356a645b4b4',
'nonce': ''.join([random.choice(string.ascii_letters) for _ in range(32)]),
'redirectUri': 'https://fusion.ddmcdn.com/app/mercury-sdk/180/redirectHandler.html?https://www.%s.com' % site,
})['access_token']
access_token = None
cookies = self._get_cookies(url)
# prefer Affiliate Auth Token over Anonymous Auth Token
auth_storage_cookie = cookies.get('eosAf') or cookies.get('eosAn')
if auth_storage_cookie and auth_storage_cookie.value:
auth_storage = self._parse_json(compat_urllib_parse_unquote(
compat_urllib_parse_unquote(auth_storage_cookie.value)),
video_id, fatal=False) or {}
access_token = auth_storage.get('a') or auth_storage.get('access_token')
if not access_token:
access_token = self._download_json(
'https://www.%s.com/anonymous' % site, display_id, query={
'authRel': 'authorization',
'client_id': try_get(
react_data, lambda x: x['application']['apiClientId'],
compat_str) or '3020a40c2356a645b4b4',
'nonce': ''.join([random.choice(string.ascii_letters) for _ in range(32)]),
'redirectUri': 'https://fusion.ddmcdn.com/app/mercury-sdk/180/redirectHandler.html?https://www.%s.com' % site,
})['access_token']
try:
stream = self._download_json(
@@ -72,7 +87,7 @@ class DiscoveryIE(DiscoveryGoBaseIE):
'Authorization': 'Bearer ' + access_token,
})
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
if isinstance(e.cause, compat_HTTPError) and e.cause.code in (401, 403):
e_description = self._parse_json(
e.cause.read().decode(), display_id)['description']
if 'resource not available for country' in e_description:
+3 -1
View File
@@ -102,7 +102,9 @@ class DPlayIE(InfoExtractor):
display_id = mobj.group('id')
domain = mobj.group('domain')
self._initialize_geo_bypass([mobj.group('country').upper()])
self._initialize_geo_bypass({
'countries': [mobj.group('country').upper()],
})
webpage = self._download_webpage(url, display_id)
+12 -1
View File
@@ -137,6 +137,7 @@ from .brightcove import (
BrightcoveLegacyIE,
BrightcoveNewIE,
)
from .businessinsider import BusinessInsiderIE
from .buzzfeed import BuzzFeedIE
from .byutv import BYUtvIE
from .c56 import C56IE
@@ -195,6 +196,7 @@ from .clippit import ClippitIE
from .cliprs import ClipRsIE
from .clipsyndicate import ClipsyndicateIE
from .closertotruth import CloserToTruthIE
from .cloudflarestream import CloudflareStreamIE
from .cloudy import CloudyIE
from .clubic import ClubicIE
from .clyp import ClypIE
@@ -477,7 +479,10 @@ from .internetvideoarchive import InternetVideoArchiveIE
from .iprima import IPrimaIE
from .iqiyi import IqiyiIE
from .ir90tv import Ir90TvIE
from .itv import ITVIE
from .itv import (
ITVIE,
ITVBTCCIE,
)
from .ivi import (
IviIE,
IviCompilationIE
@@ -1418,5 +1423,11 @@ from .youtube import (
)
from .zapiks import ZapiksIE
from .zaq1 import Zaq1IE
from .zattoo import (
QuicklineIE,
QuicklineLiveIE,
ZattooIE,
ZattooLiveIE,
)
from .zdf import ZDFIE, ZDFChannelIE
from .zingmp3 import ZingMp3IE
+40 -11
View File
@@ -5,7 +5,10 @@ import re
from .common import InfoExtractor
from .nexx import NexxIE
from ..utils import int_or_none
from ..utils import (
int_or_none,
try_get,
)
class FunkBaseIE(InfoExtractor):
@@ -77,6 +80,20 @@ class FunkChannelIE(FunkBaseIE):
'params': {
'skip_download': True,
},
}, {
# only available via byIdList API
'url': 'https://www.funk.net/channel/informr/martin-sonneborn-erklaert-die-eu',
'info_dict': {
'id': '205067',
'ext': 'mp4',
'title': 'Martin Sonneborn erklärt die EU',
'description': 'md5:050f74626e4ed87edf4626d2024210c0',
'timestamp': 1494424042,
'upload_date': '20170510',
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.funk.net/channel/59d5149841dca100012511e3/mein-erster-job-lovemilla-folge-1/lovemilla/',
'only_matching': True,
@@ -87,16 +104,28 @@ class FunkChannelIE(FunkBaseIE):
channel_id = mobj.group('id')
alias = mobj.group('alias')
results = self._download_json(
'https://www.funk.net/api/v3.0/content/videos/filter', channel_id,
headers={
'authorization': 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbGllbnROYW1lIjoiY3VyYXRpb24tdG9vbCIsInNjb3BlIjoic3RhdGljLWNvbnRlbnQtYXBpLGN1cmF0aW9uLWFwaSxzZWFyY2gtYXBpIn0.q4Y2xZG8PFHai24-4Pjx2gym9RmJejtmK6lMXP5wAgc',
'Referer': url,
}, query={
'channelId': channel_id,
'size': 100,
})['result']
headers = {
'authorization': 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbGllbnROYW1lIjoiY3VyYXRpb24tdG9vbCIsInNjb3BlIjoic3RhdGljLWNvbnRlbnQtYXBpLGN1cmF0aW9uLWFwaSxzZWFyY2gtYXBpIn0.q4Y2xZG8PFHai24-4Pjx2gym9RmJejtmK6lMXP5wAgc',
'Referer': url,
}
video = next(r for r in results if r.get('alias') == alias)
video = None
by_id_list = self._download_json(
'https://www.funk.net/api/v3.0/content/videos/byIdList', channel_id,
headers=headers, query={
'ids': alias,
}, fatal=False)
if by_id_list:
video = try_get(by_id_list, lambda x: x['result'][0], dict)
if not video:
results = self._download_json(
'https://www.funk.net/api/v3.0/content/videos/filter', channel_id,
headers=headers, query={
'channelId': channel_id,
'size': 100,
})['result']
video = next(r for r in results if r.get('alias') == alias)
return self._make_url_result(video)
+55 -36
View File
@@ -23,6 +23,7 @@ from ..utils import (
is_html,
js_to_json,
KNOWN_EXTENSIONS,
merge_dicts,
mimetype2ext,
orderedSet,
sanitized_Request,
@@ -106,6 +107,7 @@ from .springboardplatform import SpringboardPlatformIE
from .yapfiles import YapFilesIE
from .vice import ViceIE
from .xfileshare import XFileShareIE
from .cloudflarestream import CloudflareStreamIE
class GenericIE(InfoExtractor):
@@ -190,6 +192,16 @@ class GenericIE(InfoExtractor):
'title': 'pdv_maddow_netcast_m4v-02-27-2015-201624',
}
},
# RSS feed with enclosures and unsupported link URLs
{
'url': 'http://www.hellointernet.fm/podcast?format=rss',
'info_dict': {
'id': 'http://www.hellointernet.fm/podcast?format=rss',
'description': 'CGP Grey and Brady Haran talk about YouTube, life, work, whatever.',
'title': 'Hello Internet',
},
'playlist_mincount': 100,
},
# SMIL from http://videolectures.net/promogram_igor_mekjavic_eng
{
'url': 'http://videolectures.net/promogram_igor_mekjavic_eng/video/1/smil.xml',
@@ -1271,6 +1283,23 @@ class GenericIE(InfoExtractor):
},
'add_ie': ['Kaltura'],
},
{
# Kaltura iframe embed, more sophisticated
'url': 'http://www.cns.nyu.edu/~eero/math-tools/Videos/lecture-05sep2017.html',
'info_dict': {
'id': '1_9gzouybz',
'ext': 'mp4',
'title': 'lecture-05sep2017',
'description': 'md5:40f347d91fd4ba047e511c5321064b49',
'upload_date': '20170913',
'uploader_id': 'eps2',
'timestamp': 1505340777,
},
'params': {
'skip_download': True,
},
'add_ie': ['Kaltura'],
},
{
# meta twitter:player
'url': 'http://thechive.com/2017/12/08/all-i-want-for-christmas-is-more-twerk/',
@@ -1443,21 +1472,6 @@ class GenericIE(InfoExtractor):
},
'expected_warnings': ['Failed to parse JSON Expecting value'],
},
# Ooyala embed
{
'url': 'http://www.businessinsider.com/excel-index-match-vlookup-video-how-to-2015-2?IR=T',
'info_dict': {
'id': '50YnY4czr4ms1vJ7yz3xzq0excz_pUMs',
'ext': 'mp4',
'description': 'Index/Match versus VLOOKUP.',
'title': 'This is what separates the Excel masters from the wannabes',
'duration': 191.933,
},
'params': {
# m3u8 downloads
'skip_download': True,
}
},
# Brightcove URL in single quotes
{
'url': 'http://www.sportsnet.ca/baseball/mlb/sn-presents-russell-martin-world-citizen/',
@@ -1985,6 +1999,19 @@ class GenericIE(InfoExtractor):
'skip_download': True,
},
},
{
# CloudflareStream embed
'url': 'https://www.cloudflare.com/products/cloudflare-stream/',
'info_dict': {
'id': '31c9291ab41fac05471db4e73aa11717',
'ext': 'mp4',
'title': '31c9291ab41fac05471db4e73aa11717',
},
'add_ie': [CloudflareStreamIE.ie_key()],
'params': {
'skip_download': True,
},
},
{
'url': 'http://share-videos.se/auto/video/83645793?uid=13',
'md5': 'b68d276de422ab07ee1d49388103f457',
@@ -2025,13 +2052,15 @@ class GenericIE(InfoExtractor):
entries = []
for it in doc.findall('./channel/item'):
next_url = xpath_text(it, 'link', fatal=False)
next_url = None
enclosure_nodes = it.findall('./enclosure')
for e in enclosure_nodes:
next_url = e.attrib.get('url')
if next_url:
break
if not next_url:
enclosure_nodes = it.findall('./enclosure')
for e in enclosure_nodes:
next_url = e.attrib.get('url')
if next_url:
break
next_url = xpath_text(it, 'link', fatal=False)
if not next_url:
continue
@@ -2995,6 +3024,11 @@ class GenericIE(InfoExtractor):
return self.playlist_from_matches(
xfileshare_urls, video_id, video_title, ie=XFileShareIE.ie_key())
cloudflarestream_urls = CloudflareStreamIE._extract_urls(webpage)
if cloudflarestream_urls:
return self.playlist_from_matches(
cloudflarestream_urls, video_id, video_title, ie=CloudflareStreamIE.ie_key())
sharevideos_urls = [mobj.group('url') for mobj in re.finditer(
r'<iframe[^>]+?\bsrc\s*=\s*(["\'])(?P<url>(?:https?:)?//embed\.share-videos\.se/auto/embed/\d+\?.*?\buid=\d+.*?)\1',
webpage)]
@@ -3002,21 +3036,6 @@ class GenericIE(InfoExtractor):
return self.playlist_from_matches(
sharevideos_urls, video_id, video_title)
def merge_dicts(dict1, dict2):
merged = {}
for k, v in dict1.items():
if v is not None:
merged[k] = v
for k, v in dict2.items():
if v is None:
continue
if (k not in merged or
(isinstance(v, compat_str) and v and
isinstance(merged[k], compat_str) and
not merged[k])):
merged[k] = v
return merged
# Look for HTML5 media
entries = self._parse_html5_media_entries(url, webpage, video_id, m3u8_id='hls')
if entries:
+1 -1
View File
@@ -123,7 +123,7 @@ class GoIE(AdobePassIE):
'adobe_requestor_id': requestor_id,
})
else:
self._initialize_geo_bypass(['US'])
self._initialize_geo_bypass({'countries': ['US']})
entitlement = self._download_json(
'https://api.entitlement.watchabc.go.com/vp2/ws-secure/entitlement/2020/authorize.json',
video_id, data=urlencode_postdata(data))
+21 -13
View File
@@ -3,7 +3,9 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
determine_ext,
mimetype2ext,
qualities,
remove_end,
@@ -73,19 +75,25 @@ class ImdbIE(InfoExtractor):
video_info_list = format_info.get('videoInfoList')
if not video_info_list or not isinstance(video_info_list, list):
continue
video_info = video_info_list[0]
if not video_info or not isinstance(video_info, dict):
continue
video_url = video_info.get('videoUrl')
if not video_url:
continue
format_id = format_info.get('ffname')
formats.append({
'format_id': format_id,
'url': video_url,
'ext': mimetype2ext(video_info.get('videoMimeType')),
'quality': quality(format_id),
})
for video_info in video_info_list:
if not video_info or not isinstance(video_info, dict):
continue
video_url = video_info.get('videoUrl')
if not video_url or not isinstance(video_url, compat_str):
continue
if (video_info.get('videoMimeType') == 'application/x-mpegURL' or
determine_ext(video_url) == 'm3u8'):
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
continue
format_id = format_info.get('ffname')
formats.append({
'format_id': format_id,
'url': video_url,
'ext': mimetype2ext(video_info.get('videoMimeType')),
'quality': quality(format_id),
})
self._sort_formats(formats)
return {
+47 -1
View File
@@ -7,6 +7,7 @@ import json
import re
from .common import InfoExtractor
from .brightcove import BrightcoveNewIE
from ..compat import (
compat_str,
compat_etree_register_namespace,
@@ -18,6 +19,7 @@ from ..utils import (
xpath_text,
int_or_none,
parse_duration,
smuggle_url,
ExtractorError,
determine_ext,
)
@@ -41,6 +43,14 @@ class ITVIE(InfoExtractor):
# unavailable via data-playlist-url
'url': 'https://www.itv.com/hub/through-the-keyhole/2a2271a0033',
'only_matching': True,
}, {
# InvalidVodcrid
'url': 'https://www.itv.com/hub/james-martins-saturday-morning/2a5159a0034',
'only_matching': True,
}, {
# ContentUnavailable
'url': 'https://www.itv.com/hub/whos-doing-the-dishes/2a2898a0024',
'only_matching': True,
}]
def _real_extract(self, url):
@@ -127,7 +137,8 @@ class ITVIE(InfoExtractor):
if fault_code == 'InvalidGeoRegion':
self.raise_geo_restricted(
msg=fault_string, countries=self._GEO_COUNTRIES)
elif fault_code != 'InvalidEntity':
elif fault_code not in (
'InvalidEntity', 'InvalidVodcrid', 'ContentUnavailable'):
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, fault_string), expected=True)
info.update({
@@ -251,3 +262,38 @@ class ITVIE(InfoExtractor):
'subtitles': subtitles,
})
return info
class ITVBTCCIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?itv\.com/btcc/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TEST = {
'url': 'http://www.itv.com/btcc/races/btcc-2018-all-the-action-from-brands-hatch',
'info_dict': {
'id': 'btcc-2018-all-the-action-from-brands-hatch',
'title': 'BTCC 2018: All the action from Brands Hatch',
},
'playlist_mincount': 9,
}
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/1582188683001/HkiHLnNRx_default/index.html?videoId=%s'
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
entries = [
self.url_result(
smuggle_url(self.BRIGHTCOVE_URL_TEMPLATE % video_id, {
# ITV does not like some GB IP ranges, so here are some
# IP blocks it accepts
'geo_ip_blocks': [
'193.113.0.0/16', '54.36.162.0/23', '159.65.16.0/21'
],
'referrer': url,
}),
ie=BrightcoveNewIE.ie_key(), video_id=video_id)
for video_id in re.findall(r'data-video-id=["\'](\d+)', webpage)]
title = self._og_search_title(webpage, fatal=False)
return self.playlist_result(entries, playlist_id, title)
+2 -1
View File
@@ -136,9 +136,10 @@ class KalturaIE(InfoExtractor):
re.search(
r'''(?xs)
<(?:iframe[^>]+src|meta[^>]+\bcontent)=(?P<q1>["'])
(?:https?:)?//(?:(?:www|cdnapi)\.)?kaltura\.com/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+)
(?:https?:)?//(?:(?:www|cdnapi(?:sec)?)\.)?kaltura\.com/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+)
(?:(?!(?P=q1)).)*
[?&;]entry_id=(?P<id>(?:(?!(?P=q1))[^&])+)
(?:(?!(?P=q1)).)*
(?P=q1)
''', webpage)
)
+3 -1
View File
@@ -282,7 +282,9 @@ class LimelightMediaIE(LimelightBaseIE):
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
video_id = self._match_id(url)
self._initialize_geo_bypass(smuggled_data.get('geo_countries'))
self._initialize_geo_bypass({
'countries': smuggled_data.get('geo_countries'),
})
pc, mobile, metadata = self._extract(
video_id, 'getPlaylistByMediaId',
+1 -1
View File
@@ -237,7 +237,7 @@ class NRKTVIE(NRKBaseIE):
(?:/\d{2}-\d{2}-\d{4})?
(?:\#del=(?P<part_id>\d+))?
''' % _EPISODE_RE
_API_HOST = 'psapi-ne.nrk.no'
_API_HOST = 'psapi-we.nrk.no'
_TESTS = [{
'url': 'https://tv.nrk.no/serie/20-spoersmaal-tv/MUHH48000314/23-05-2014',
+5 -1
View File
@@ -47,7 +47,7 @@ class RedditIE(InfoExtractor):
class RedditRIE(InfoExtractor):
_VALID_URL = r'(?P<url>https?://(?:www\.)?reddit\.com/r/[^/]+/comments/(?P<id>[^/?#&]+))'
_VALID_URL = r'(?P<url>https?://(?:(?:www|old)\.)?reddit\.com/r/[^/]+/comments/(?P<id>[^/?#&]+))'
_TESTS = [{
'url': 'https://www.reddit.com/r/videos/comments/6rrwyj/that_small_heart_attack/',
'info_dict': {
@@ -74,6 +74,10 @@ class RedditRIE(InfoExtractor):
# imgur
'url': 'https://www.reddit.com/r/MadeMeSmile/comments/6t7wi5/wait_for_it/',
'only_matching': True,
}, {
# imgur @ old reddit
'url': 'https://old.reddit.com/r/MadeMeSmile/comments/6t7wi5/wait_for_it/',
'only_matching': True,
}, {
# streamable
'url': 'https://www.reddit.com/r/videos/comments/6t7sg9/comedians_hilarious_joke_about_the_guam_flag/',
+1 -1
View File
@@ -62,7 +62,7 @@ class TuneInBaseIE(InfoExtractor):
return {
'id': content_id,
'title': title,
'title': self._live_title(title) if is_live else title,
'formats': formats,
'thumbnail': thumbnail,
'location': location,
+4 -2
View File
@@ -227,14 +227,16 @@ class TVPlayIE(InfoExtractor):
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
self._initialize_geo_bypass(smuggled_data.get('geo_countries'))
self._initialize_geo_bypass({
'countries': smuggled_data.get('geo_countries'),
})
video_id = self._match_id(url)
geo_country = self._search_regex(
r'https?://[^/]+\.([a-z]{2})', url,
'geo country', default=None)
if geo_country:
self._initialize_geo_bypass([geo_country.upper()])
self._initialize_geo_bypass({'countries': [geo_country.upper()]})
video = self._download_json(
'http://playapi.mtgx.tv/v3/videos/%s' % video_id, video_id, 'Downloading video JSON')
+33 -6
View File
@@ -18,6 +18,7 @@ from ..utils import (
int_or_none,
js_to_json,
sanitized_Request,
try_get,
unescapeHTML,
urlencode_postdata,
)
@@ -58,6 +59,10 @@ class UdemyIE(InfoExtractor):
# no url in outputs format entry
'url': 'https://www.udemy.com/learn-web-development-complete-step-by-step-guide-to-success/learn/v4/t/lecture/4125812',
'only_matching': True,
}, {
# only outputs rendition
'url': 'https://www.udemy.com/how-you-can-help-your-local-community-5-amazing-examples/learn/v4/t/lecture/3225750?start=0',
'only_matching': True,
}]
def _extract_course_info(self, webpage, video_id):
@@ -101,7 +106,7 @@ class UdemyIE(InfoExtractor):
% (course_id, lecture_id),
lecture_id, 'Downloading lecture JSON', query={
'fields[lecture]': 'title,description,view_html,asset',
'fields[asset]': 'asset_type,stream_url,thumbnail_url,download_urls,data',
'fields[asset]': 'asset_type,stream_url,thumbnail_url,download_urls,stream_urls,captions,data',
})
def _handle_error(self, response):
@@ -115,9 +120,9 @@ class UdemyIE(InfoExtractor):
error_str += ' - %s' % error_data.get('formErrors')
raise ExtractorError(error_str, expected=True)
def _download_webpage(self, *args, **kwargs):
def _download_webpage_handle(self, *args, **kwargs):
kwargs.setdefault('headers', {})['User-Agent'] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/603.2.4 (KHTML, like Gecko) Version/10.1.1 Safari/603.2.4'
return super(UdemyIE, self)._download_webpage(
return super(UdemyIE, self)._download_webpage_handle(
*args, **compat_kwargs(kwargs))
def _download_json(self, url_or_request, *args, **kwargs):
@@ -299,9 +304,25 @@ class UdemyIE(InfoExtractor):
'url': src,
})
download_urls = asset.get('download_urls')
if isinstance(download_urls, dict):
extract_formats(download_urls.get('Video'))
for url_kind in ('download', 'stream'):
urls = asset.get('%s_urls' % url_kind)
if isinstance(urls, dict):
extract_formats(urls.get('Video'))
captions = asset.get('captions')
if isinstance(captions, list):
for cc in captions:
if not isinstance(cc, dict):
continue
cc_url = cc.get('url')
if not cc_url or not isinstance(cc_url, compat_str):
continue
lang = try_get(cc, lambda x: x['locale']['locale'], compat_str)
sub_dict = (automatic_captions if cc.get('source') == 'auto'
else subtitles)
sub_dict.setdefault(lang or 'en', []).append({
'url': cc_url,
})
view_html = lecture.get('view_html')
if view_html:
@@ -357,6 +378,12 @@ class UdemyIE(InfoExtractor):
fatal=False)
extract_subtitles(text_tracks)
if not formats and outputs:
for format_id, output in outputs.items():
f = extract_output_format(output, format_id)
if f.get('url'):
formats.append(f)
self._sort_formats(formats, field_preference=('height', 'width', 'tbr', 'format_id'))
return {
+9 -4
View File
@@ -16,6 +16,7 @@ from ..utils import (
ExtractorError,
InAdvancePagedList,
int_or_none,
merge_dicts,
NO_DEFAULT,
RegexNotFoundError,
sanitized_Request,
@@ -639,16 +640,18 @@ class VimeoIE(VimeoBaseInfoExtractor):
'preference': 1,
})
info_dict = self._parse_config(config, video_id)
formats.extend(info_dict['formats'])
info_dict_config = self._parse_config(config, video_id)
formats.extend(info_dict_config['formats'])
self._vimeo_sort_formats(formats)
json_ld = self._search_json_ld(webpage, video_id, default={})
if not cc_license:
cc_license = self._search_regex(
r'<link[^>]+rel=["\']license["\'][^>]+href=(["\'])(?P<license>(?:(?!\1).)+)\1',
webpage, 'license', default=None, group='license')
info_dict.update({
info_dict = {
'id': video_id,
'formats': formats,
'timestamp': unified_timestamp(timestamp),
@@ -658,7 +661,9 @@ class VimeoIE(VimeoBaseInfoExtractor):
'like_count': like_count,
'comment_count': comment_count,
'license': cc_license,
})
}
info_dict = merge_dicts(info_dict, info_dict_config, json_ld)
return info_dict
+1 -1
View File
@@ -69,7 +69,7 @@ class WatchBoxIE(InfoExtractor):
source = self._parse_json(
self._search_regex(
r'(?s)source\s*:\s*({.+?})\s*,\s*\n', webpage, 'source',
r'(?s)source["\']?\s*:\s*({.+?})\s*[,}]', webpage, 'source',
default='{}'),
video_id, transform_source=js_to_json, fatal=False) or {}
+2 -2
View File
@@ -9,8 +9,8 @@ from ..utils import int_or_none
class XiamiBaseIE(InfoExtractor):
_API_BASE_URL = 'http://www.xiami.com/song/playlist/cat/json/id'
def _download_webpage(self, *args, **kwargs):
webpage = super(XiamiBaseIE, self)._download_webpage(*args, **kwargs)
def _download_webpage_handle(self, *args, **kwargs):
webpage = super(XiamiBaseIE, self)._download_webpage_handle(*args, **kwargs)
if '>Xiami is currently not available in your country.<' in webpage:
self.raise_geo_restricted('Xiami is currently not available in your country')
return webpage
+6 -6
View File
@@ -34,8 +34,8 @@ class YandexMusicBaseIE(InfoExtractor):
'youtube-dl with --cookies',
expected=True)
def _download_webpage(self, *args, **kwargs):
webpage = super(YandexMusicBaseIE, self)._download_webpage(*args, **kwargs)
def _download_webpage_handle(self, *args, **kwargs):
webpage = super(YandexMusicBaseIE, self)._download_webpage_handle(*args, **kwargs)
if 'Нам очень жаль, но&nbsp;запросы, поступившие с&nbsp;вашего IP-адреса, похожи на&nbsp;автоматические.' in webpage:
self._raise_captcha()
return webpage
@@ -57,14 +57,14 @@ class YandexMusicTrackIE(YandexMusicBaseIE):
'info_dict': {
'id': '4878838',
'ext': 'mp3',
'title': 'Carlo Ambrosio & Fabio Di Bari, Carlo Ambrosio - Gypsy Eyes 1',
'title': 'Carlo Ambrosio, Carlo Ambrosio & Fabio Di Bari - Gypsy Eyes 1',
'filesize': 4628061,
'duration': 193.04,
'track': 'Gypsy Eyes 1',
'album': 'Gypsy Soul',
'album_artist': 'Carlo Ambrosio',
'artist': 'Carlo Ambrosio & Fabio Di Bari, Carlo Ambrosio',
'release_year': '2009',
'artist': 'Carlo Ambrosio, Carlo Ambrosio & Fabio Di Bari',
'release_year': 2009,
},
'skip': 'Travis CI servers blocked by YandexMusic',
}
@@ -120,7 +120,7 @@ class YandexMusicTrackIE(YandexMusicBaseIE):
track_info.update({
'album': album.get('title'),
'album_artist': extract_artist(album.get('artists')),
'release_year': compat_str(year) if year else None,
'release_year': int_or_none(year),
})
track_artist = extract_artist(track.get('artists'))
+2 -2
View File
@@ -246,9 +246,9 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
return True
def _download_webpage(self, *args, **kwargs):
def _download_webpage_handle(self, *args, **kwargs):
kwargs.setdefault('query', {})['disable_polymer'] = 'true'
return super(YoutubeBaseInfoExtractor, self)._download_webpage(
return super(YoutubeBaseInfoExtractor, self)._download_webpage_handle(
*args, **compat_kwargs(kwargs))
def _real_initialize(self):
+270
View File
@@ -0,0 +1,270 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from uuid import uuid4
from .common import InfoExtractor
from ..compat import (
compat_HTTPError,
compat_str,
)
from ..utils import (
ExtractorError,
int_or_none,
try_get,
urlencode_postdata,
)
class ZattooBaseIE(InfoExtractor):
_NETRC_MACHINE = 'zattoo'
_HOST_URL = 'https://zattoo.com'
_power_guide_hash = None
def _login(self):
(username, password) = self._get_login_info()
if not username or not password:
self.raise_login_required(
'A valid %s account is needed to access this media.'
% self._NETRC_MACHINE)
try:
data = self._download_json(
'%s/zapi/v2/account/login' % self._HOST_URL, None, 'Logging in',
data=urlencode_postdata({
'login': username,
'password': password,
'remember': 'true',
}), headers={
'Referer': '%s/login' % self._HOST_URL,
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
})
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 400:
raise ExtractorError(
'Unable to login: incorrect username and/or password',
expected=True)
raise
self._power_guide_hash = data['session']['power_guide_hash']
def _real_initialize(self):
webpage = self._download_webpage(
self._HOST_URL, None, 'Downloading app token')
app_token = self._html_search_regex(
r'appToken\s*=\s*(["\'])(?P<token>(?:(?!\1).)+?)\1',
webpage, 'app token', group='token')
app_version = self._html_search_regex(
r'<!--\w+-(.+?)-', webpage, 'app version', default='2.8.2')
# Will setup appropriate cookies
self._request_webpage(
'%s/zapi/v2/session/hello' % self._HOST_URL, None,
'Opening session', data=urlencode_postdata({
'client_app_token': app_token,
'uuid': compat_str(uuid4()),
'lang': 'en',
'app_version': app_version,
'format': 'json',
}))
self._login()
def _extract_cid(self, video_id, channel_name):
channel_groups = self._download_json(
'%s/zapi/v2/cached/channels/%s' % (self._HOST_URL,
self._power_guide_hash),
video_id, 'Downloading channel list',
query={'details': False})['channel_groups']
channel_list = []
for chgrp in channel_groups:
channel_list.extend(chgrp['channels'])
try:
return next(
chan['cid'] for chan in channel_list
if chan.get('cid') and (
chan.get('display_alias') == channel_name or
chan.get('cid') == channel_name))
except StopIteration:
raise ExtractorError('Could not extract channel id')
def _extract_cid_and_video_info(self, video_id):
data = self._download_json(
'%s/zapi/program/details' % self._HOST_URL,
video_id,
'Downloading video information',
query={
'program_id': video_id,
'complete': True
})
p = data['program']
cid = p['cid']
info_dict = {
'id': video_id,
'title': p.get('title') or p['episode_title'],
'description': p.get('description'),
'thumbnail': p.get('image_url'),
'creator': p.get('channel_name'),
'episode': p.get('episode_title'),
'episode_number': int_or_none(p.get('episode_number')),
'season_number': int_or_none(p.get('season_number')),
'release_year': int_or_none(p.get('year')),
'categories': try_get(p, lambda x: x['categories'], list),
}
return cid, info_dict
def _extract_formats(self, cid, video_id, record_id=None, is_live=False):
postdata_common = {
'https_watch_urls': True,
}
if is_live:
postdata_common.update({'timeshift': 10800})
url = '%s/zapi/watch/live/%s' % (self._HOST_URL, cid)
elif record_id:
url = '%s/zapi/watch/recording/%s' % (self._HOST_URL, record_id)
else:
url = '%s/zapi/watch/recall/%s/%s' % (self._HOST_URL, cid, video_id)
formats = []
for stream_type in ('dash', 'hls', 'hls5', 'hds'):
postdata = postdata_common.copy()
postdata['stream_type'] = stream_type
data = self._download_json(
url, video_id, 'Downloading %s formats' % stream_type.upper(),
data=urlencode_postdata(postdata), fatal=False)
if not data:
continue
watch_urls = try_get(
data, lambda x: x['stream']['watch_urls'], list)
if not watch_urls:
continue
for watch in watch_urls:
if not isinstance(watch, dict):
continue
watch_url = watch.get('url')
if not watch_url or not isinstance(watch_url, compat_str):
continue
format_id_list = [stream_type]
maxrate = watch.get('maxrate')
if maxrate:
format_id_list.append(compat_str(maxrate))
audio_channel = watch.get('audio_channel')
if audio_channel:
format_id_list.append(compat_str(audio_channel))
preference = 1 if audio_channel == 'A' else None
format_id = '-'.join(format_id_list)
if stream_type in ('dash', 'dash_widevine', 'dash_playready'):
this_formats = self._extract_mpd_formats(
watch_url, video_id, mpd_id=format_id, fatal=False)
elif stream_type in ('hls', 'hls5', 'hls5_fairplay'):
this_formats = self._extract_m3u8_formats(
watch_url, video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id=format_id,
fatal=False)
elif stream_type == 'hds':
this_formats = self._extract_f4m_formats(
watch_url, video_id, f4m_id=format_id, fatal=False)
elif stream_type == 'smooth_playready':
this_formats = self._extract_ism_formats(
watch_url, video_id, ism_id=format_id, fatal=False)
else:
assert False
for this_format in this_formats:
this_format['preference'] = preference
formats.extend(this_formats)
self._sort_formats(formats)
return formats
def _extract_video(self, channel_name, video_id, record_id=None, is_live=False):
if is_live:
cid = self._extract_cid(video_id, channel_name)
info_dict = {
'id': channel_name,
'title': self._live_title(channel_name),
'is_live': True,
}
else:
cid, info_dict = self._extract_cid_and_video_info(video_id)
formats = self._extract_formats(
cid, video_id, record_id=record_id, is_live=is_live)
info_dict['formats'] = formats
return info_dict
class QuicklineBaseIE(ZattooBaseIE):
_NETRC_MACHINE = 'quickline'
_HOST_URL = 'https://mobiltv.quickline.com'
class QuicklineIE(QuicklineBaseIE):
_VALID_URL = r'https?://(?:www\.)?mobiltv\.quickline\.com/watch/(?P<channel>[^/]+)/(?P<id>[0-9]+)'
_TEST = {
'url': 'https://mobiltv.quickline.com/watch/prosieben/130671867-maze-runner-die-auserwaehlten-in-der-brandwueste',
'only_matching': True,
}
def _real_extract(self, url):
channel_name, video_id = re.match(self._VALID_URL, url).groups()
return self._extract_video(channel_name, video_id)
class QuicklineLiveIE(QuicklineBaseIE):
_VALID_URL = r'https?://(?:www\.)?mobiltv\.quickline\.com/watch/(?P<id>[^/]+)'
_TEST = {
'url': 'https://mobiltv.quickline.com/watch/srf1',
'only_matching': True,
}
@classmethod
def suitable(cls, url):
return False if QuicklineIE.suitable(url) else super(QuicklineLiveIE, cls).suitable(url)
def _real_extract(self, url):
channel_name = video_id = self._match_id(url)
return self._extract_video(channel_name, video_id, is_live=True)
class ZattooIE(ZattooBaseIE):
_VALID_URL = r'https?://(?:www\.)?zattoo\.com/watch/(?P<channel>[^/]+?)/(?P<id>[0-9]+)[^/]+(?:/(?P<recid>[0-9]+))?'
# Since regular videos are only available for 7 days and recorded videos
# are only available for a specific user, we cannot have detailed tests.
_TESTS = [{
'url': 'https://zattoo.com/watch/prosieben/130671867-maze-runner-die-auserwaehlten-in-der-brandwueste',
'only_matching': True,
}, {
'url': 'https://zattoo.com/watch/srf_zwei/132905652-eishockey-spengler-cup/102791477/1512211800000/1514433500000/92000',
'only_matching': True,
}]
def _real_extract(self, url):
channel_name, video_id, record_id = re.match(self._VALID_URL, url).groups()
return self._extract_video(channel_name, video_id, record_id)
class ZattooLiveIE(ZattooBaseIE):
_VALID_URL = r'https?://(?:www\.)?zattoo\.com/watch/(?P<id>[^/]+)'
_TEST = {
'url': 'https://zattoo.com/watch/srf1',
'only_matching': True,
}
@classmethod
def suitable(cls, url):
return False if ZattooIE.suitable(url) else super(ZattooLiveIE, cls).suitable(url)
def _real_extract(self, url):
channel_name = video_id = self._match_id(url)
return self._extract_video(channel_name, video_id, is_live=True)
+4
View File
@@ -249,6 +249,10 @@ def parseOpts(overrideArguments=None):
'--geo-bypass-country', metavar='CODE',
dest='geo_bypass_country', default=None,
help='Force bypass geographic restriction with explicitly provided two-letter ISO 3166-2 country code (experimental)')
geo.add_option(
'--geo-bypass-ip-block', metavar='IP_BLOCK',
dest='geo_bypass_ip_block', default=None,
help='Force bypass geographic restriction with explicitly provided IP block in CIDR notation (experimental)')
selection = optparse.OptionGroup(parser, 'Video Selection')
selection.add_option(
+21 -4
View File
@@ -2225,6 +2225,20 @@ def try_get(src, getter, expected_type=None):
return v
def merge_dicts(*dicts):
merged = {}
for a_dict in dicts:
for k, v in a_dict.items():
if v is None:
continue
if (k not in merged or
(isinstance(v, compat_str) and v and
isinstance(merged[k], compat_str) and
not merged[k])):
merged[k] = v
return merged
def encode_compat_str(string, encoding=preferredencoding(), errors='strict'):
return string if isinstance(string, compat_str) else compat_str(string, encoding, errors)
@@ -3520,10 +3534,13 @@ class GeoUtils(object):
}
@classmethod
def random_ipv4(cls, code):
block = cls._country_ip_map.get(code.upper())
if not block:
return None
def random_ipv4(cls, code_or_block):
if len(code_or_block) == 2:
block = cls._country_ip_map.get(code_or_block.upper())
if not block:
return None
else:
block = code_or_block
addr, preflen = block.split('/')
addr_min = compat_struct_unpack('!L', socket.inet_aton(addr))[0]
addr_max = addr_min | (0xffffffff >> int(preflen))
+1 -1
View File
@@ -1,3 +1,3 @@
from __future__ import unicode_literals
__version__ = '2018.04.25'
__version__ = '2018.05.09'