1
0
mirror of https://github.com/ytdl-org/youtube-dl.git synced 2026-06-14 16:40:17 +00:00

Compare commits

..

47 Commits

Author SHA1 Message Date
Sergey M․ 6ab35f5e16 release 2018.02.26 2018-02-26 04:23:38 +07:00
Sergey M․ 32ae31847f [ChangeLog] Actualize 2018-02-26 04:19:04 +07:00
Sergey M․ abe8766c35 [udemy] Use custom User-Agent (closes #15571) 2018-02-26 04:12:53 +07:00
Sergey M․ eaa3172672 release 2018.02.25 2018-02-25 20:38:10 +07:00
Sergey M․ 797c9284d6 [ChangeLog] Actualize 2018-02-25 20:35:52 +07:00
Sergey M․ 8c73ef37b6 [vidlii] Add extractor (closes #14472, closes #14512, closes #14779) 2018-02-25 20:28:40 +07:00
Andrew Udvare b5cbe3d652 [postprocessor/embedthumbnail] Skip embedding when there aren't any thumbnails 2018-02-25 19:33:13 +07:00
Sergey M․ ece12e6348 [streamango] Skip dead test 2018-02-25 18:36:25 +07:00
Sergey M․ ff274e3c16 [streamango] Capture and output error messages 2018-02-25 18:34:52 +07:00
Sergey M․ c106237d56 [streamango] Fix formats extraction, improve and simplify (closes #14256) 2018-02-25 18:27:23 +07:00
gfabiano 6e72ea4775 [streamango] Fix extraction (closes #14160) 2018-02-25 18:26:48 +07:00
Sergey M․ d6a0350253 [ard] Remove dead tests 2018-02-25 17:41:12 +07:00
Wandang ad29ef043e [ard] Add alive tests 2018-02-25 17:38:07 +07:00
Sergey M․ f01df14c4f [telequebec:emission] Extend _VALID_URL 2018-02-25 17:05:39 +07:00
Sergey M․ 9306b0c8d9 [telequebec] Add support for emissions and refactor (closes #14649, closes #14655) 2018-02-25 16:54:12 +07:00
Sergey M․ f4b7427279 [extractor/common] Improve jwplayer subtitles extraction (closes #15695) 2018-02-25 00:59:29 +07:00
Sergey M․ 300148b48a [telequebec:live] Add extractor (closes #15688) 2018-02-24 06:17:29 +07:00
Wandang 2d17c63140 [abcnews] Update tests 2018-02-24 05:17:21 +07:00
Sergey M․ f2908d072e [mailru:music] Add extractor (closes #15618) 2018-02-24 04:52:55 +07:00
Remita Amine 5e7841932c [aenetworks] switch to akamai hls formats(closes #15612) 2018-02-23 08:23:55 +01:00
Sergey M․ 870f3bfc63 [ytsearch] Fix flat title extraction (closes #11260, closes #15681) 2018-02-23 03:43:42 +07:00
Sergey M․ 3d977fe4d2 release 2018.02.22 2018-02-22 23:50:35 +07:00
Sergey M․ f075838728 [ChangeLog] Actualize 2018-02-22 23:48:58 +07:00
Sergey M․ 2acc11d771 [vidio] Fix HLS URL extraction (closes #15675) 2018-02-22 22:50:39 +07:00
Sergey M․ 0704306e1d [nexx] Add support for arc.nexx.cloud URLs 2018-02-22 22:31:28 +07:00
Sergey M․ 9dc7ea320d [nexx] Don't capture domain id and add support for domainless shortcuts 2018-02-22 22:27:19 +07:00
Remita Amine e231afb14f [nexx] switch to ark api(closes #15652) 2018-02-22 10:41:47 +01:00
Wandang 12acb9a6fb [zdf] Update tests 2018-02-21 21:57:34 +07:00
Wandang 18ebd1a843 [redtube] Fix duration extraction and update test 2018-02-21 21:55:28 +07:00
Wandang 8315ee6c4c [reddit] Update test 2018-02-21 04:12:56 +07:00
Wandang b9d1a79426 [9gag] Update test 2018-02-20 22:28:54 +07:00
Wandang 09f934b009 [vk] Update test 2018-02-20 22:21:10 +07:00
Wandang 73af6e22fd [vimeo] Update test 2018-02-20 22:20:15 +07:00
Wandang 77e499f95e [xhamster] Update test 2018-02-20 22:18:50 +07:00
Sergey M․ befa4708fd [utils] Fixup some common URL's typos in sanitize_url (closes #15649) 2018-02-19 22:50:23 +07:00
Sergey M․ 90830004c8 [sonyliv] Respect referrer (closes #15648) 2018-02-19 22:29:08 +07:00
Sergey M․ 18d7aa6efa [brightcove:new] Use referrer for formats' HTTP headers 2018-02-19 22:28:27 +07:00
Remita Amine b12cf31bb1 [cbc] add new extractor for olympics.cbc.ca(closes #15535) 2018-02-19 09:02:23 +01:00
Sergey M․ 7d2b4aa047 Respect --prefer-insecure while updating (closes #15497) 2018-02-18 16:43:54 +07:00
VietTPham 38662dfec7 [fusion] Add support for fusion.tv 2018-02-17 20:54:52 +07:00
Sergey M․ ee706f1009 [npo] Improve quality metadata extraction 2018-02-17 20:32:34 +07:00
Sergey M․ c4e7496421 [npo] Relax _VALID_URL (closes #14987, closes #14994) 2018-02-17 20:32:26 +07:00
Sergey M․ b8adcec4ea [npo] Capture and output error message 2018-02-17 20:32:20 +07:00
Sergey M․ 073cca3df8 [downloader/common] Add whitespace 2018-02-17 19:11:46 +07:00
Parmjit Virk f66df20ccd [pornhub] Add support for channels (closes #15613) 2018-02-17 01:17:06 +07:00
Sergey M․ ea69624992 [youtube] Handle shared URLs with generic extractor (closes #14303) 2018-02-15 22:33:11 +07:00
Sergey M․ 49702e3669 [francetv] Fix typo 2018-02-12 00:25:42 +07:00
40 changed files with 785 additions and 219 deletions
+3 -3
View File
@@ -6,8 +6,8 @@
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2018.02.11*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2018.02.11**
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2018.02.26*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2018.02.26**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@@ -36,7 +36,7 @@ Add the `-v` flag to **your command line** you run youtube-dl with (`youtube-dl
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2018.02.11
[debug] youtube-dl version 2018.02.26
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}
+46
View File
@@ -1,3 +1,49 @@
version 2018.02.26
Extractors
* [udemy] Use custom User-Agent (#15571)
version 2018.02.25
Core
* [postprocessor/embedthumbnail] Skip embedding when there aren't any
thumbnails (#12573)
* [extractor/common] Improve jwplayer subtitles extraction (#15695)
Extractors
+ [vidlii] Add support for vidlii.com (#14472, #14512, #14779)
+ [streamango] Capture and output error messages
* [streamango] Fix extraction (#14160, #14256)
+ [telequebec] Add support for emissions (#14649, #14655)
+ [telequebec:live] Add support for live streams (#15688)
+ [mailru:music] Add support for mail.ru/music (#15618)
* [aenetworks] Switch to akamai HLS formats (#15612)
* [ytsearch] Fix flat title extraction (#11260, #15681)
version 2018.02.22
Core
+ [utils] Fixup some common URL typos in sanitize_url (#15649)
* Respect --prefer-insecure while updating (#15497)
Extractors
* [vidio] Fix HLS URL extraction (#15675)
+ [nexx] Add support for arc.nexx.cloud URLs
* [nexx] Switch to arc API (#15652)
* [redtube] Fix duration extraction (#15659)
+ [sonyliv] Respect referrer (#15648)
+ [brightcove:new] Use referrer for formats' HTTP headers
+ [cbc] Add support for olympics.cbc.ca (#15535)
+ [fusion] Add support for fusion.tv (#15628)
* [npo] Improve quality metadata extraction
* [npo] Relax URL regular expression (#14987, #14994)
+ [npo] Capture and output error message
+ [pornhub] Add support for channels (#15613)
* [youtube] Handle shared URLs with generic extractor (#14303)
version 2018.02.11
Core
+1 -2
View File
@@ -310,8 +310,7 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
--encoding ENCODING Force the specified encoding (experimental)
--no-check-certificate Suppress HTTPS certificate validation
--prefer-insecure Use an unencrypted connection to retrieve
information about the video. (Currently
supported only for YouTube)
information whenever possible
--user-agent UA Specify a custom user agent
--referer URL Specify a custom referer, use if the video
access is restricted to one domain
+6
View File
@@ -135,6 +135,7 @@
- **CarambaTVPage**
- **CartoonNetwork**
- **cbc.ca**
- **cbc.ca:olympics**
- **cbc.ca:player**
- **cbc.ca:watch**
- **cbc.ca:watch:video**
@@ -439,6 +440,8 @@
- **m6**
- **macgamestore**: MacGameStore trailers
- **mailru**: Видео@Mail.Ru
- **mailru:music**: Музыка@Mail.Ru
- **mailru:music:search**: Музыка@Mail.Ru
- **MakersChannel**
- **MakerTV**
- **mangomolo:live**
@@ -819,6 +822,8 @@
- **Telegraaf**
- **TeleMB**
- **TeleQuebec**
- **TeleQuebecEmission**
- **TeleQuebecLive**
- **TeleTask**
- **Telewebion**
- **TF1**
@@ -945,6 +950,7 @@
- **VideoPress**
- **videoweed**: VideoWeed
- **Vidio**
- **VidLii**
- **vidme**
- **vidme:user**
- **vidme:user:likes**
+7
View File
@@ -57,6 +57,7 @@ from youtube_dl.utils import (
read_batch_urls,
sanitize_filename,
sanitize_path,
sanitize_url,
expand_path,
prepend_extension,
replace_extension,
@@ -219,6 +220,12 @@ class TestUtil(unittest.TestCase):
self.assertEqual(sanitize_path('./abc'), 'abc')
self.assertEqual(sanitize_path('./../abc'), '..\\abc')
def test_sanitize_url(self):
self.assertEqual(sanitize_url('//foo.bar'), 'http://foo.bar')
self.assertEqual(sanitize_url('httpss://foo.bar'), 'https://foo.bar')
self.assertEqual(sanitize_url('rmtps://foo.bar'), 'rtmps://foo.bar')
self.assertEqual(sanitize_url('https://foo.bar'), 'https://foo.bar')
def test_expand_path(self):
def env(var):
return '%{0}%'.format(var) if sys.platform == 'win32' else '${0}'.format(var)
+1 -1
View File
@@ -438,7 +438,7 @@ def _real_main(argv=None):
with YoutubeDL(ydl_opts) as ydl:
# Update version
if opts.update_self:
update_self(ydl.to_screen, opts.verbose, ydl._opener)
update_self(ydl.to_screen, opts.verbose, ydl._opener, opts.prefer_insecure)
# Remove cache dir
if opts.rm_cachedir:
+1 -1
View File
@@ -49,7 +49,7 @@ class FileDownloader(object):
external_downloader_args: A list of additional command-line arguments for the
external downloader.
hls_use_mpegts: Use the mpegts container for HLS videos.
http_chunk_size: Size of a chunk for chunk-based HTTP downloading.May be
http_chunk_size: Size of a chunk for chunk-based HTTP downloading. May be
useful for bypassing bandwidth throttling imposed by
a webserver (experimental)
+2 -2
View File
@@ -66,7 +66,7 @@ class AbcNewsIE(InfoExtractor):
_TESTS = [{
'url': 'http://abcnews.go.com/Blotter/News/dramatic-video-rare-death-job-america/story?id=10498713#.UIhwosWHLjY',
'info_dict': {
'id': '10498713',
'id': '10505354',
'ext': 'flv',
'display_id': 'dramatic-video-rare-death-job-america',
'title': 'Occupational Hazards',
@@ -79,7 +79,7 @@ class AbcNewsIE(InfoExtractor):
}, {
'url': 'http://abcnews.go.com/Entertainment/justin-timberlake-performs-stop-feeling-eurovision-2016/story?id=39125818',
'info_dict': {
'id': '39125818',
'id': '38897857',
'ext': 'mp4',
'display_id': 'justin-timberlake-performs-stop-feeling-eurovision-2016',
'title': 'Justin Timberlake Drops Hints For Secret Single',
+2 -1
View File
@@ -122,7 +122,8 @@ class AENetworksIE(AENetworksBaseIE):
query = {
'mbr': 'true',
'assetTypes': 'high_video_s3'
'assetTypes': 'high_video_ak',
'switch': 'hls_high_ak',
}
video_id = self._html_search_meta('aetn:VideoID', webpage)
media_url = self._search_regex(
+22 -46
View File
@@ -24,57 +24,30 @@ class ARDMediathekIE(InfoExtractor):
_VALID_URL = r'^https?://(?:(?:www\.)?ardmediathek\.de|mediathek\.(?:daserste|rbb-online)\.de)/(?:.*/)(?P<video_id>[0-9]+|[^0-9][^/\?]+)[^/\?]*(?:\?.*)?'
_TESTS = [{
'url': 'http://www.ardmediathek.de/tv/Dokumentation-und-Reportage/Ich-liebe-das-Leben-trotzdem/rbb-Fernsehen/Video?documentId=29582122&bcastId=3822114',
# available till 26.07.2022
'url': 'http://www.ardmediathek.de/tv/S%C3%9CDLICHT/Was-ist-die-Kunst-der-Zukunft-liebe-Ann/BR-Fernsehen/Video?bcastId=34633636&documentId=44726822',
'info_dict': {
'id': '29582122',
'id': '44726822',
'ext': 'mp4',
'title': 'Ich liebe das Leben trotzdem',
'description': 'md5:45e4c225c72b27993314b31a84a5261c',
'duration': 4557,
'title': 'Was ist die Kunst der Zukunft, liebe Anna McCarthy?',
'description': 'md5:4ada28b3e3b5df01647310e41f3a62f5',
'duration': 1740,
},
'params': {
# m3u8 download
'skip_download': True,
},
'skip': 'HTTP Error 404: Not Found',
}, {
'url': 'http://www.ardmediathek.de/tv/Tatort/Tatort-Scheinwelten-H%C3%B6rfassung-Video/Das-Erste/Video?documentId=29522730&bcastId=602916',
'md5': 'f4d98b10759ac06c0072bbcd1f0b9e3e',
'info_dict': {
'id': '29522730',
'ext': 'mp4',
'title': 'Tatort: Scheinwelten - Hörfassung (Video tgl. ab 20 Uhr)',
'description': 'md5:196392e79876d0ac94c94e8cdb2875f1',
'duration': 5252,
},
'skip': 'HTTP Error 404: Not Found',
}
}, {
# audio
'url': 'http://www.ardmediathek.de/tv/WDR-H%C3%B6rspiel-Speicher/Tod-eines-Fu%C3%9Fballers/WDR-3/Audio-Podcast?documentId=28488308&bcastId=23074086',
'md5': '219d94d8980b4f538c7fcb0865eb7f2c',
'info_dict': {
'id': '28488308',
'ext': 'mp3',
'title': 'Tod eines Fußballers',
'description': 'md5:f6e39f3461f0e1f54bfa48c8875c86ef',
'duration': 3240,
},
'skip': 'HTTP Error 404: Not Found',
'only_matching': True,
}, {
'url': 'http://mediathek.daserste.de/sendungen_a-z/328454_anne-will/22429276_vertrauen-ist-gut-spionieren-ist-besser-geht',
'only_matching': True,
}, {
# audio
'url': 'http://mediathek.rbb-online.de/radio/Hörspiel/Vor-dem-Fest/kulturradio/Audio?documentId=30796318&topRessort=radio&bcastId=9839158',
'md5': '4e8f00631aac0395fee17368ac0e9867',
'info_dict': {
'id': '30796318',
'ext': 'mp3',
'title': 'Vor dem Fest',
'description': 'md5:c0c1c8048514deaed2a73b3a60eecacb',
'duration': 3287,
},
'skip': 'Video is no longer available',
'only_matching': True,
}]
def _extract_media_info(self, media_info_url, webpage, video_id):
@@ -252,20 +225,23 @@ class ARDMediathekIE(InfoExtractor):
class ARDIE(InfoExtractor):
_VALID_URL = r'(?P<mainurl>https?://(www\.)?daserste\.de/[^?#]+/videos/(?P<display_id>[^/?#]+)-(?P<id>[0-9]+))\.html'
_TEST = {
'url': 'http://www.daserste.de/information/reportage-dokumentation/dokus/videos/die-story-im-ersten-mission-unter-falscher-flagge-100.html',
'md5': 'd216c3a86493f9322545e045ddc3eb35',
_TESTS = [{
# available till 14.02.2019
'url': 'http://www.daserste.de/information/talk/maischberger/videos/das-groko-drama-zerlegen-sich-die-volksparteien-video-102.html',
'md5': '8e4ec85f31be7c7fc08a26cdbc5a1f49',
'info_dict': {
'display_id': 'die-story-im-ersten-mission-unter-falscher-flagge',
'id': '100',
'display_id': 'das-groko-drama-zerlegen-sich-die-volksparteien-video',
'id': '102',
'ext': 'mp4',
'duration': 2600,
'title': 'Die Story im Ersten: Mission unter falscher Flagge',
'upload_date': '20140804',
'duration': 4435.0,
'title': 'Das GroKo-Drama: Zerlegen sich die Volksparteien?',
'upload_date': '20180214',
'thumbnail': r're:^https?://.*\.jpg$',
},
'skip': 'HTTP Error 404: Not Found',
}
}, {
'url': 'http://www.daserste.de/information/reportage-dokumentation/dokus/videos/die-story-im-ersten-mission-unter-falscher-flagge-100.html',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
+6 -2
View File
@@ -564,7 +564,7 @@ class BrightcoveNewIE(AdobePassIE):
return entries
def _parse_brightcove_metadata(self, json_data, video_id):
def _parse_brightcove_metadata(self, json_data, video_id, headers={}):
title = json_data['name'].strip()
formats = []
@@ -638,6 +638,9 @@ class BrightcoveNewIE(AdobePassIE):
self._sort_formats(formats)
for f in formats:
f.setdefault('http_headers', {}).update(headers)
subtitles = {}
for text_track in json_data.get('text_tracks', []):
if text_track.get('src'):
@@ -724,4 +727,5 @@ class BrightcoveNewIE(AdobePassIE):
'tveToken': tve_token,
})
return self._parse_brightcove_metadata(json_data, video_id)
return self._parse_brightcove_metadata(
json_data, video_id, headers=headers)
+62
View File
@@ -1,6 +1,7 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
@@ -13,6 +14,7 @@ from ..utils import (
xpath_element,
xpath_with_ns,
find_xpath_attr,
parse_duration,
parse_iso8601,
parse_age_limit,
int_or_none,
@@ -359,3 +361,63 @@ class CBCWatchIE(CBCWatchBaseIE):
video_id = self._match_id(url)
rss = self._call_api('web/browse/' + video_id, video_id)
return self._parse_rss_feed(rss)
class CBCOlympicsIE(InfoExtractor):
IE_NAME = 'cbc.ca:olympics'
_VALID_URL = r'https?://olympics\.cbc\.ca/video/[^/]+/(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://olympics.cbc.ca/video/whats-on-tv/olympic-morning-featuring-the-opening-ceremony/',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._hidden_inputs(webpage)['videoId']
video_doc = self._download_xml(
'https://olympics.cbc.ca/videodata/%s.xml' % video_id, video_id)
title = xpath_text(video_doc, 'title', fatal=True)
is_live = xpath_text(video_doc, 'kind') == 'Live'
if is_live:
title = self._live_title(title)
formats = []
for video_source in video_doc.findall('videoSources/videoSource'):
uri = xpath_text(video_source, 'uri')
if not uri:
continue
tokenize = self._download_json(
'https://olympics.cbc.ca/api/api-akamai/tokenize',
video_id, data=json.dumps({
'VideoSource': uri,
}).encode(), headers={
'Content-Type': 'application/json',
'Referer': url,
# d3.VideoPlayer._init in https://olympics.cbc.ca/components/script/base.js
'Cookie': '_dvp=TK:C0ObxjerU', # AKAMAI CDN cookie
}, fatal=False)
if not tokenize:
continue
content_url = tokenize['ContentUrl']
video_source_format = video_source.get('format')
if video_source_format == 'IIS':
formats.extend(self._extract_ism_formats(
content_url, video_id, ism_id=video_source_format, fatal=False))
else:
formats.extend(self._extract_m3u8_formats(
content_url, video_id, 'mp4',
'm3u8' if is_live else 'm3u8_native',
m3u8_id=video_source_format, fatal=False))
self._sort_formats(formats)
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': xpath_text(video_doc, 'description'),
'thumbnail': xpath_text(video_doc, 'thumbnailUrl'),
'duration': parse_duration(xpath_text(video_doc, 'duration')),
'formats': formats,
'is_live': is_live,
}
+4 -1
View File
@@ -2353,7 +2353,10 @@ class InfoExtractor(object):
for track in tracks:
if not isinstance(track, dict):
continue
if track.get('kind') != 'captions':
track_kind = track.get('kind')
if not track_kind or not isinstance(track_kind, compat_str):
continue
if track_kind.lower() not in ('captions', 'subtitles'):
continue
track_url = urljoin(base_url, track.get('file'))
if not track_url:
+12 -2
View File
@@ -162,6 +162,7 @@ from .cbc import (
CBCPlayerIE,
CBCWatchVideoIE,
CBCWatchIE,
CBCOlympicsIE,
)
from .cbs import CBSIE
from .cbslocal import CBSLocalIE
@@ -565,7 +566,11 @@ from .lynda import (
)
from .m6 import M6IE
from .macgamestore import MacGameStoreIE
from .mailru import MailRuIE
from .mailru import (
MailRuIE,
MailRuMusicIE,
MailRuMusicSearchIE,
)
from .makerschannel import MakersChannelIE
from .makertv import MakerTVIE
from .mangomolo import (
@@ -1044,7 +1049,11 @@ from .telebruxelles import TeleBruxellesIE
from .telecinco import TelecincoIE
from .telegraaf import TelegraafIE
from .telemb import TeleMBIE
from .telequebec import TeleQuebecIE
from .telequebec import (
TeleQuebecIE,
TeleQuebecEmissionIE,
TeleQuebecLiveIE,
)
from .teletask import TeleTaskIE
from .telewebion import TelewebionIE
from .testurl import TestURLIE
@@ -1216,6 +1225,7 @@ from .videomore import (
from .videopremium import VideoPremiumIE
from .videopress import VideoPressIE
from .vidio import VidioIE
from .vidlii import VidLiiIE
from .vidme import (
VidmeIE,
VidmeUserIE,
+4 -4
View File
@@ -78,10 +78,6 @@ class FranceTVIE(InfoExtractor):
}, {
'url': 'francetv:NI_657393@Regions',
'only_matching': True,
}, {
# france-3 live
'url': 'https://www.france.tv/france-3/direct.html',
'only_matching': True,
}, {
# france-3 live
'url': 'francetv:SIM_France3',
@@ -262,6 +258,10 @@ class FranceTVSiteIE(FranceTVBaseInfoExtractor):
}, {
'url': 'https://www.france.tv/142749-rouge-sang.html',
'only_matching': True,
}, {
# france-3 live
'url': 'https://www.france.tv/france-3/direct.html',
'only_matching': True,
}]
def _real_extract(self, url):
+3 -3
View File
@@ -5,9 +5,9 @@ from .ooyala import OoyalaIE
class FusionIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?fusion\.net/video/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?fusion\.(?:net|tv)/video/(?P<id>\d+)'
_TESTS = [{
'url': 'http://fusion.net/video/201781/u-s-and-panamanian-forces-work-together-to-stop-a-vessel-smuggling-drugs/',
'url': 'http://fusion.tv/video/201781/u-s-and-panamanian-forces-work-together-to-stop-a-vessel-smuggling-drugs/',
'info_dict': {
'id': 'ZpcWNoMTE6x6uVIIWYpHh0qQDjxBuq5P',
'ext': 'mp4',
@@ -20,7 +20,7 @@ class FusionIE(InfoExtractor):
},
'add_ie': ['Ooyala'],
}, {
'url': 'http://fusion.net/video/201781',
'url': 'http://fusion.tv/video/201781',
'only_matching': True,
}]
+16
View File
@@ -1954,6 +1954,22 @@ class GenericIE(InfoExtractor):
'skip_download': True,
},
'add_ie': [SpringboardPlatformIE.ie_key()],
},
{
'url': 'https://www.youtube.com/shared?ci=1nEzmT-M4fU',
'info_dict': {
'id': 'uPDB5I9wfp8',
'ext': 'webm',
'title': 'Pocoyo: 90 minutos de episódios completos Português para crianças - PARTE 3',
'description': 'md5:d9e4d9346a2dfff4c7dc4c8cec0f546d',
'upload_date': '20160219',
'uploader': 'Pocoyo - Português (BR)',
'uploader_id': 'PocoyoBrazil',
},
'add_ie': [YoutubeIE.ie_key()],
'params': {
'skip_download': True,
},
}
# {
# # TODO: find another test
+155
View File
@@ -1,12 +1,17 @@
# coding: utf-8
from __future__ import unicode_literals
import itertools
import json
import re
from .common import InfoExtractor
from ..compat import compat_urllib_parse_unquote
from ..utils import (
int_or_none,
parse_duration,
remove_end,
try_get,
)
@@ -157,3 +162,153 @@ class MailRuIE(InfoExtractor):
'view_count': view_count,
'formats': formats,
}
class MailRuMusicSearchBaseIE(InfoExtractor):
def _search(self, query, url, audio_id, limit=100, offset=0):
search = self._download_json(
'https://my.mail.ru/cgi-bin/my/ajax', audio_id,
'Downloading songs JSON page %d' % (offset // limit + 1),
headers={
'Referer': url,
'X-Requested-With': 'XMLHttpRequest',
}, query={
'xemail': '',
'ajax_call': '1',
'func_name': 'music.search',
'mna': '',
'mnb': '',
'arg_query': query,
'arg_extended': '1',
'arg_search_params': json.dumps({
'music': {
'limit': limit,
'offset': offset,
},
}),
'arg_limit': limit,
'arg_offset': offset,
})
return next(e for e in search if isinstance(e, dict))
@staticmethod
def _extract_track(t, fatal=True):
audio_url = t['URL'] if fatal else t.get('URL')
if not audio_url:
return
audio_id = t['File'] if fatal else t.get('File')
if not audio_id:
return
thumbnail = t.get('AlbumCoverURL') or t.get('FiledAlbumCover')
uploader = t.get('OwnerName') or t.get('OwnerName_Text_HTML')
uploader_id = t.get('UploaderID')
duration = int_or_none(t.get('DurationInSeconds')) or parse_duration(
t.get('Duration') or t.get('DurationStr'))
view_count = int_or_none(t.get('PlayCount') or t.get('PlayCount_hr'))
track = t.get('Name') or t.get('Name_Text_HTML')
artist = t.get('Author') or t.get('Author_Text_HTML')
if track:
title = '%s - %s' % (artist, track) if artist else track
else:
title = audio_id
return {
'extractor_key': MailRuMusicIE.ie_key(),
'id': audio_id,
'title': title,
'thumbnail': thumbnail,
'uploader': uploader,
'uploader_id': uploader_id,
'duration': duration,
'view_count': view_count,
'vcodec': 'none',
'abr': int_or_none(t.get('BitRate')),
'track': track,
'artist': artist,
'album': t.get('Album'),
'url': audio_url,
}
class MailRuMusicIE(MailRuMusicSearchBaseIE):
IE_NAME = 'mailru:music'
IE_DESC = 'Музыка@Mail.Ru'
_VALID_URL = r'https?://my\.mail\.ru/music/songs/[^/?#&]+-(?P<id>[\da-f]+)'
_TESTS = [{
'url': 'https://my.mail.ru/music/songs/%D0%BC8%D0%BB8%D1%82%D1%85-l-a-h-luciferian-aesthetics-of-herrschaft-single-2017-4e31f7125d0dfaef505d947642366893',
'md5': '0f8c22ef8c5d665b13ac709e63025610',
'info_dict': {
'id': '4e31f7125d0dfaef505d947642366893',
'ext': 'mp3',
'title': 'L.A.H. (Luciferian Aesthetics of Herrschaft) single, 2017 - М8Л8ТХ',
'uploader': 'Игорь Мудрый',
'uploader_id': '1459196328',
'duration': 280,
'view_count': int,
'vcodec': 'none',
'abr': 320,
'track': 'L.A.H. (Luciferian Aesthetics of Herrschaft) single, 2017',
'artist': 'М8Л8ТХ',
},
}]
def _real_extract(self, url):
audio_id = self._match_id(url)
webpage = self._download_webpage(url, audio_id)
title = self._og_search_title(webpage)
music_data = self._search(title, url, audio_id)['MusicData']
t = next(t for t in music_data if t.get('File') == audio_id)
info = self._extract_track(t)
info['title'] = title
return info
class MailRuMusicSearchIE(MailRuMusicSearchBaseIE):
IE_NAME = 'mailru:music:search'
IE_DESC = 'Музыка@Mail.Ru'
_VALID_URL = r'https?://my\.mail\.ru/music/search/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://my.mail.ru/music/search/black%20shadow',
'info_dict': {
'id': 'black shadow',
},
'playlist_mincount': 532,
}]
def _real_extract(self, url):
query = compat_urllib_parse_unquote(self._match_id(url))
entries = []
LIMIT = 100
offset = 0
for _ in itertools.count(1):
search = self._search(query, url, query, LIMIT, offset)
music_data = search.get('MusicData')
if not music_data or not isinstance(music_data, list):
break
for t in music_data:
track = self._extract_track(t, fatal=False)
if track:
entries.append(track)
total = try_get(
search, lambda x: x['Results']['music']['Total'], int)
if total is not None:
if offset > total:
break
offset += LIMIT
return self.playlist_result(entries, query)
+13 -86
View File
@@ -1,27 +1,23 @@
# coding: utf-8
from __future__ import unicode_literals
import hashlib
import random
import re
import time
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
ExtractorError,
int_or_none,
parse_duration,
try_get,
urlencode_postdata,
)
class NexxIE(InfoExtractor):
_VALID_URL = r'''(?x)
(?:
https?://api\.nexx(?:\.cloud|cdn\.com)/v3/(?P<domain_id>\d+)/videos/byid/|
nexx:(?P<domain_id_s>\d+):
https?://api\.nexx(?:\.cloud|cdn\.com)/v3/\d+/videos/byid/|
nexx:(?:\d+:)?|
https?://arc\.nexx\.cloud/api/video/
)
(?P<id>\d+)
'''
@@ -67,6 +63,12 @@ class NexxIE(InfoExtractor):
}, {
'url': 'nexx:748:128907',
'only_matching': True,
}, {
'url': 'nexx:128907',
'only_matching': True,
}, {
'url': 'https://arc.nexx.cloud/api/video/128907.json',
'only_matching': True,
}]
@staticmethod
@@ -101,87 +103,12 @@ class NexxIE(InfoExtractor):
def _extract_url(webpage):
return NexxIE._extract_urls(webpage)[0]
def _handle_error(self, response):
status = int_or_none(try_get(
response, lambda x: x['metadata']['status']) or 200)
if 200 <= status < 300:
return
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, response['metadata']['errorhint']),
expected=True)
def _call_api(self, domain_id, path, video_id, data=None, headers={}):
headers['Content-Type'] = 'application/x-www-form-urlencoded; charset=UTF-8'
result = self._download_json(
'https://api.nexx.cloud/v3/%s/%s' % (domain_id, path), video_id,
'Downloading %s JSON' % path, data=urlencode_postdata(data),
headers=headers)
self._handle_error(result)
return result['result']
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
domain_id = mobj.group('domain_id') or mobj.group('domain_id_s')
video_id = mobj.group('id')
video_id = self._match_id(url)
# Reverse engineered from JS code (see getDeviceID function)
device_id = '%d:%d:%d%d' % (
random.randint(1, 4), int(time.time()),
random.randint(1e4, 99999), random.randint(1, 9))
result = self._call_api(domain_id, 'session/init', video_id, data={
'nxp_devh': device_id,
'nxp_userh': '',
'precid': '0',
'playlicense': '0',
'screenx': '1920',
'screeny': '1080',
'playerversion': '6.0.00',
'gateway': 'html5',
'adGateway': '',
'explicitlanguage': 'en-US',
'addTextTemplates': '1',
'addDomainData': '1',
'addAdModel': '1',
}, headers={
'X-Request-Enable-Auth-Fallback': '1',
})
cid = result['general']['cid']
# As described in [1] X-Request-Token generation algorithm is
# as follows:
# md5( operation + domain_id + domain_secret )
# where domain_secret is a static value that will be given by nexx.tv
# as per [1]. Here is how this "secret" is generated (reversed
# from _play.api.init function, search for clienttoken). So it's
# actually not static and not that much of a secret.
# 1. https://nexxtvstorage.blob.core.windows.net/files/201610/27.pdf
secret = result['device']['clienttoken'][int(device_id[0]):]
secret = secret[0:len(secret) - int(device_id[-1])]
op = 'byid'
# Reversed from JS code for _play.api.call function (search for
# X-Request-Token)
request_token = hashlib.md5(
''.join((op, domain_id, secret)).encode('utf-8')).hexdigest()
video = self._call_api(
domain_id, 'videos/%s/%s' % (op, video_id), video_id, data={
'additionalfields': 'language,channel,actors,studio,licenseby,slug,subtitle,teaser,description',
'addInteractionOptions': '1',
'addStatusDetails': '1',
'addStreamDetails': '1',
'addCaptions': '1',
'addScenes': '1',
'addHotSpots': '1',
'addBumpers': '1',
'captionFormat': 'data',
}, headers={
'X-Request-CID': cid,
'X-Request-Token': request_token,
})
video = self._download_json(
'https://arc.nexx.cloud/api/video/%s.json' % video_id,
video_id)['result']
general = video['general']
title = general['title']
+1 -1
View File
@@ -13,7 +13,7 @@ class NineGagIE(InfoExtractor):
_TESTS = [{
'url': 'http://9gag.com/tv/p/Kk2X5/people-are-awesome-2013-is-absolutely-awesome',
'info_dict': {
'id': 'Kk2X5',
'id': 'kXzwOKyGlSA',
'ext': 'mp4',
'description': 'This 3-minute video will make you smile and then make you feel untalented and insignificant. Anyway, you should share this awesomeness. (Thanks, Dino!)',
'title': '\"People Are Awesome 2013\" Is Absolutely Awesome',
+33 -5
View File
@@ -11,6 +11,7 @@ from ..utils import (
determine_ext,
ExtractorError,
fix_xml_ampersands,
int_or_none,
orderedSet,
parse_duration,
qualities,
@@ -38,7 +39,7 @@ class NPOIE(NPOBaseIE):
npo\.nl/(?!(?:live|radio)/)(?:[^/]+/){2}|
ntr\.nl/(?:[^/]+/){2,}|
omroepwnl\.nl/video/fragment/[^/]+__|
(?:zapp|npo3)\.nl/(?:[^/]+/){2}
(?:zapp|npo3)\.nl/(?:[^/]+/){2,}
)
)
(?P<id>[^/?#]+)
@@ -156,6 +157,9 @@ class NPOIE(NPOBaseIE):
}, {
'url': 'http://www.npo.nl/radio-gaga/13-06-2017/BNN_101383373',
'only_matching': True,
}, {
'url': 'https://www.zapp.nl/1803-skelterlab/instructie-video-s/740-instructievideo-s/POMS_AT_11736927',
'only_matching': True,
}]
def _real_extract(self, url):
@@ -170,6 +174,10 @@ class NPOIE(NPOBaseIE):
transform_source=strip_jsonp,
)
error = metadata.get('error')
if error:
raise ExtractorError(error, expected=True)
# For some videos actual video id (prid) is different (e.g. for
# http://www.omroepwnl.nl/video/fragment/vandaag-de-dag-verkiezingen__POMS_WNL_853698
# video id is POMS_WNL_853698 but prid is POW_00996502)
@@ -187,7 +195,11 @@ class NPOIE(NPOBaseIE):
formats = []
urls = set()
quality = qualities(['adaptive', 'wmv_sb', 'h264_sb', 'wmv_bb', 'h264_bb', 'wvc1_std', 'h264_std'])
QUALITY_LABELS = ('Laag', 'Normaal', 'Hoog')
QUALITY_FORMATS = ('adaptive', 'wmv_sb', 'h264_sb', 'wmv_bb', 'h264_bb', 'wvc1_std', 'h264_std')
quality_from_label = qualities(QUALITY_LABELS)
quality_from_format_id = qualities(QUALITY_FORMATS)
items = self._download_json(
'http://ida.omroep.nl/app.php/%s' % video_id, video_id,
'Downloading formats JSON', query={
@@ -203,11 +215,27 @@ class NPOIE(NPOBaseIE):
r'video/ida/([^/]+)', item_url, 'format id',
default=None)
item_label = item.get('label')
def add_format_url(format_url):
width = int_or_none(self._search_regex(
r'(\d+)[xX]\d+', format_url, 'width', default=None))
height = int_or_none(self._search_regex(
r'\d+[xX](\d+)', format_url, 'height', default=None))
if item_label in QUALITY_LABELS:
quality = quality_from_label(item_label)
f_id = item_label
elif item_label in QUALITY_FORMATS:
quality = quality_from_format_id(format_id)
f_id = format_id
else:
quality, f_id = None
formats.append({
'url': format_url,
'format_id': format_id,
'quality': quality(format_id),
'format_id': f_id,
'width': width,
'height': height,
'quality': quality,
})
# Example: http://www.npo.nl/de-nieuwe-mens-deel-1/21-07-2010/WO_VPRO_043706
@@ -219,7 +247,7 @@ class NPOIE(NPOBaseIE):
stream_info = self._download_json(
item_url + '&type=json', video_id,
'Downloading %s stream JSON'
% item.get('label') or item.get('format') or format_id or num)
% item_label or item.get('format') or format_id or num)
except ExtractorError as ee:
if isinstance(ee.cause, compat_HTTPError) and ee.cause.code == 404:
error = (self._parse_json(
+20 -1
View File
@@ -275,7 +275,7 @@ class PornHubPlaylistIE(PornHubPlaylistBaseIE):
class PornHubUserVideosIE(PornHubPlaylistBaseIE):
_VALID_URL = r'https?://(?:www\.)?pornhub\.com/users/(?P<id>[^/]+)/videos'
_VALID_URL = r'https?://(?:www\.)?pornhub\.com/(?:user|channel)s/(?P<id>[^/]+)/videos'
_TESTS = [{
'url': 'http://www.pornhub.com/users/zoe_ph/videos/public',
'info_dict': {
@@ -285,6 +285,25 @@ class PornHubUserVideosIE(PornHubPlaylistBaseIE):
}, {
'url': 'http://www.pornhub.com/users/rushandlia/videos',
'only_matching': True,
}, {
# default sorting as Top Rated Videos
'url': 'https://www.pornhub.com/channels/povd/videos',
'info_dict': {
'id': 'povd',
},
'playlist_mincount': 293,
}, {
# Top Rated Videos
'url': 'https://www.pornhub.com/channels/povd/videos?o=ra',
'only_matching': True,
}, {
# Most Recent Videos
'url': 'https://www.pornhub.com/channels/povd/videos?o=da',
'only_matching': True,
}, {
# Most Viewed Videos
'url': 'https://www.pornhub.com/channels/povd/videos?o=vi',
'only_matching': True,
}]
def _real_extract(self, url):
+1 -1
View File
@@ -15,7 +15,7 @@ class RedditIE(InfoExtractor):
_TEST = {
# from https://www.reddit.com/r/videos/comments/6rrwyj/that_small_heart_attack/
'url': 'https://v.redd.it/zv89llsvexdz',
'md5': '655d06ace653ea3b87bccfb1b27ec99d',
'md5': '0a070c53eba7ec4534d95a5a1259e253',
'info_dict': {
'id': 'zv89llsvexdz',
'ext': 'mp4',
+5 -4
View File
@@ -16,12 +16,12 @@ class RedTubeIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:www\.)?redtube\.com/|embed\.redtube\.com/\?.*?\bid=)(?P<id>[0-9]+)'
_TESTS = [{
'url': 'http://www.redtube.com/66418',
'md5': '7b8c22b5e7098a3e1c09709df1126d2d',
'md5': 'fc08071233725f26b8f014dba9590005',
'info_dict': {
'id': '66418',
'ext': 'mp4',
'title': 'Sucked on a toilet',
'upload_date': '20120831',
'upload_date': '20110811',
'duration': 596,
'view_count': int,
'age_limit': 18,
@@ -90,8 +90,9 @@ class RedTubeIE(InfoExtractor):
upload_date = unified_strdate(self._search_regex(
r'<span[^>]+>ADDED ([^<]+)<',
webpage, 'upload date', fatal=False))
duration = int_or_none(self._search_regex(
r'videoDuration\s*:\s*(\d+)', webpage, 'duration', default=None))
duration = int_or_none(self._og_search_property(
'video:duration', webpage, default=None) or self._search_regex(
r'videoDuration\s*:\s*(\d+)', webpage, 'duration', default=None))
view_count = str_to_int(self._search_regex(
(r'<div[^>]*>Views</div>\s*<div[^>]*>\s*([\d,.]+)',
r'<span[^>]*>VIEWS</span>\s*</td>\s*<td>\s*([\d,.]+)'),
+4 -1
View File
@@ -33,5 +33,8 @@ class SonyLIVIE(InfoExtractor):
def _real_extract(self, url):
brightcove_id = self._match_id(url)
return self.url_result(
smuggle_url(self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, {'geo_countries': ['IN']}),
smuggle_url(self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, {
'geo_countries': ['IN'],
'referrer': url,
}),
'BrightcoveNew', brightcove_id)
+51 -4
View File
@@ -4,8 +4,10 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_chr
from ..utils import (
determine_ext,
ExtractorError,
int_or_none,
js_to_json,
)
@@ -32,12 +34,34 @@ class StreamangoIE(InfoExtractor):
'params': {
'skip_download': True,
},
'skip': 'gone',
}, {
'url': 'https://streamango.com/embed/clapasobsptpkdfe/20170315_150006_mp4',
'only_matching': True,
}]
def _real_extract(self, url):
def decrypt_src(encoded, val):
ALPHABET = '=/+9876543210zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA'
encoded = re.sub(r'[^A-Za-z0-9+/=]', '', encoded)
decoded = ''
sm = [None] * 4
i = 0
str_len = len(encoded)
while i < str_len:
for j in range(4):
sm[j % 4] = ALPHABET.index(encoded[i])
i += 1
char_code = ((sm[0] << 0x2) | (sm[1] >> 0x4)) ^ val
decoded += compat_chr(char_code)
if sm[2] != 0x40:
char_code = ((sm[1] & 0xf) << 0x4) | (sm[2] >> 0x2)
decoded += compat_chr(char_code)
if sm[3] != 0x40:
char_code = ((sm[2] & 0x3) << 0x6) | sm[3]
decoded += compat_chr(char_code)
return decoded
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
@@ -46,13 +70,26 @@ class StreamangoIE(InfoExtractor):
formats = []
for format_ in re.findall(r'({[^}]*\bsrc\s*:\s*[^}]*})', webpage):
video = self._parse_json(
format_, video_id, transform_source=js_to_json, fatal=False)
if not video:
mobj = re.search(r'(src\s*:\s*[^(]+\(([^)]*)\)[\s,]*)', format_)
if mobj is None:
continue
src = video.get('src')
format_ = format_.replace(mobj.group(0), '')
video = self._parse_json(
format_, video_id, transform_source=js_to_json,
fatal=False) or {}
mobj = re.search(
r'([\'"])(?P<src>(?:(?!\1).)+)\1\s*,\s*(?P<val>\d+)',
mobj.group(1))
if mobj is None:
continue
src = decrypt_src(mobj.group('src'), int_or_none(mobj.group('val')))
if not src:
continue
ext = determine_ext(src, default_ext=None)
if video.get('type') == 'application/dash+xml' or ext == 'mpd':
formats.extend(self._extract_mpd_formats(
@@ -65,6 +102,16 @@ class StreamangoIE(InfoExtractor):
'height': int_or_none(video.get('height')),
'tbr': int_or_none(video.get('bitrate')),
})
if not formats:
error = self._search_regex(
r'<p[^>]+\bclass=["\']lead[^>]+>(.+?)</p>', webpage,
'error', default=None)
if not error and '>Sorry' in webpage:
error = 'Video %s is not available' % video_id
if error:
raise ExtractorError(error, expected=True)
self._sort_formats(formats)
return {
+119 -17
View File
@@ -10,19 +10,33 @@ from ..utils import (
)
class TeleQuebecIE(InfoExtractor):
class TeleQuebecBaseIE(InfoExtractor):
@staticmethod
def _limelight_result(media_id):
return {
'_type': 'url_transparent',
'url': smuggle_url(
'limelight:media:' + media_id, {'geo_countries': ['CA']}),
'ie_key': 'LimelightMedia',
}
class TeleQuebecIE(TeleQuebecBaseIE):
_VALID_URL = r'https?://zonevideo\.telequebec\.tv/media/(?P<id>\d+)'
_TESTS = [{
'url': 'http://zonevideo.telequebec.tv/media/20984/le-couronnement-de-new-york/couronnement-de-new-york',
'md5': 'fe95a0957e5707b1b01f5013e725c90f',
# available till 01.01.2023
'url': 'http://zonevideo.telequebec.tv/media/37578/un-petit-choc-et-puis-repart/un-chef-a-la-cabane',
'info_dict': {
'id': '20984',
'id': '577116881b4b439084e6b1cf4ef8b1b3',
'ext': 'mp4',
'title': 'Le couronnement de New York',
'description': 'md5:f5b3d27a689ec6c1486132b2d687d432',
'upload_date': '20170201',
'timestamp': 1485972222,
}
'title': 'Un petit choc et puis repart!',
'description': 'md5:b04a7e6b3f74e32d7b294cffe8658374',
'upload_date': '20180222',
'timestamp': 1519326631,
},
'params': {
'skip_download': True,
},
}, {
# no description
'url': 'http://zonevideo.telequebec.tv/media/30261',
@@ -31,19 +45,107 @@ class TeleQuebecIE(InfoExtractor):
def _real_extract(self, url):
media_id = self._match_id(url)
media_data = self._download_json(
'https://mnmedias.api.telequebec.tv/api/v2/media/' + media_id,
media_id)['media']
return {
'_type': 'url_transparent',
'id': media_id,
'url': smuggle_url(
'limelight:media:' + media_data['streamInfo']['sourceId'],
{'geo_countries': ['CA']}),
'title': media_data['title'],
info = self._limelight_result(media_data['streamInfo']['sourceId'])
info.update({
'title': media_data.get('title'),
'description': try_get(
media_data, lambda x: x['descriptions'][0]['text'], compat_str),
'duration': int_or_none(
media_data.get('durationInMilliseconds'), 1000),
'ie_key': 'LimelightMedia',
})
return info
class TeleQuebecEmissionIE(TeleQuebecBaseIE):
_VALID_URL = r'''(?x)
https?://
(?:
[^/]+\.telequebec\.tv/emissions/|
(?:www\.)?telequebec\.tv/
)
(?P<id>[^?#&]+)
'''
_TESTS = [{
'url': 'http://lindicemcsween.telequebec.tv/emissions/100430013/des-soins-esthetiques-a-377-d-interets-annuels-ca-vous-tente',
'info_dict': {
'id': '66648a6aef914fe3badda25e81a4d50a',
'ext': 'mp4',
'title': "Des soins esthétiques à 377 % d'intérêts annuels, ça vous tente?",
'description': 'md5:369e0d55d0083f1fc9b71ffb640ea014',
'upload_date': '20171024',
'timestamp': 1508862118,
},
'params': {
'skip_download': True,
},
}, {
'url': 'http://bancpublic.telequebec.tv/emissions/emission-49/31986/jeunes-meres-sous-pression',
'only_matching': True,
}, {
'url': 'http://www.telequebec.tv/masha-et-michka/epi059masha-et-michka-3-053-078',
'only_matching': True,
}, {
'url': 'http://www.telequebec.tv/documentaire/bebes-sur-mesure/',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
media_id = self._search_regex(
r'mediaUID\s*:\s*["\'][Ll]imelight_(?P<id>[a-z0-9]{32})', webpage,
'limelight id')
info = self._limelight_result(media_id)
info.update({
'title': self._og_search_title(webpage, default=None),
'description': self._og_search_description(webpage, default=None),
})
return info
class TeleQuebecLiveIE(InfoExtractor):
_VALID_URL = r'https?://zonevideo\.telequebec\.tv/(?P<id>endirect)'
_TEST = {
'url': 'http://zonevideo.telequebec.tv/endirect/',
'info_dict': {
'id': 'endirect',
'ext': 'mp4',
'title': 're:^Télé-Québec - En direct [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'is_live': True,
},
'params': {
'skip_download': True,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
m3u8_url = None
webpage = self._download_webpage(
'https://player.telequebec.tv/Tq_VideoPlayer.js', video_id,
fatal=False)
if webpage:
m3u8_url = self._search_regex(
r'm3U8Url\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
'm3u8 url', default=None, group='url')
if not m3u8_url:
m3u8_url = 'https://teleqmmd.mmdlive.lldns.net/teleqmmd/f386e3b206814e1f8c8c1c71c0f8e748/manifest.m3u8'
formats = self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', m3u8_id='hls')
self._sort_formats(formats)
return {
'id': video_id,
'title': self._live_title('Télé-Québec - En direct'),
'is_live': True,
'formats': formats,
}
+6
View File
@@ -5,6 +5,7 @@ import re
from .common import InfoExtractor
from ..compat import (
compat_HTTPError,
compat_kwargs,
compat_str,
compat_urllib_request,
compat_urlparse,
@@ -114,6 +115,11 @@ class UdemyIE(InfoExtractor):
error_str += ' - %s' % error_data.get('formErrors')
raise ExtractorError(error_str, expected=True)
def _download_webpage(self, *args, **kwargs):
kwargs.setdefault('headers', {})['User-Agent'] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/603.2.4 (KHTML, like Gecko) Version/10.1.1 Safari/603.2.4'
return super(UdemyIE, self)._download_webpage(
*args, **compat_kwargs(kwargs))
def _download_json(self, url_or_request, *args, **kwargs):
headers = {
'X-Udemy-Snail-Case': 'true',
+2 -2
View File
@@ -49,8 +49,8 @@ class VidioIE(InfoExtractor):
thumbnail = clip.get('image')
m3u8_url = m3u8_url or self._search_regex(
r'data(?:-vjs)?-clip-hls-url=(["\'])(?P<url>(?!\1).+)\1',
webpage, 'hls url')
r'data(?:-vjs)?-clip-hls-url=(["\'])(?P<url>(?:(?!\1).)+)\1',
webpage, 'hls url', group='url')
formats = self._extract_m3u8_formats(
m3u8_url, display_id, 'mp4', entry_protocol='m3u8_native')
self._sort_formats(formats)
+125
View File
@@ -0,0 +1,125 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
float_or_none,
get_element_by_id,
int_or_none,
strip_or_none,
unified_strdate,
urljoin,
)
class VidLiiIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?vidlii\.com/(?:watch|embed)\?.*?\bv=(?P<id>[0-9A-Za-z_-]{11})'
_TESTS = [{
'url': 'https://www.vidlii.com/watch?v=tJluaH4BJ3v',
'md5': '9bf7d1e005dfa909b6efb0a1ff5175e2',
'info_dict': {
'id': 'tJluaH4BJ3v',
'ext': 'mp4',
'title': 'Vidlii is against me',
'description': 'md5:fa3f119287a2bfb922623b52b1856145',
'thumbnail': 're:https://.*.jpg',
'uploader': 'APPle5auc31995',
'uploader_url': 'https://www.vidlii.com/user/APPle5auc31995',
'upload_date': '20171107',
'duration': 212,
'view_count': int,
'comment_count': int,
'average_rating': float,
'categories': ['News & Politics'],
'tags': ['Vidlii', 'Jan', 'Videogames'],
}
}, {
'url': 'https://www.vidlii.com/embed?v=tJluaH4BJ3v&a=0',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(
'https://www.vidlii.com/watch?v=%s' % video_id, video_id)
video_url = self._search_regex(
r'src\s*:\s*(["\'])(?P<url>(?:https?://)?(?:(?!\1).)+)\1', webpage,
'video url', group='url')
title = self._search_regex(
(r'<h1>([^<]+)</h1>', r'<title>([^<]+) - VidLii<'), webpage,
'title')
description = self._html_search_meta(
('description', 'twitter:description'), webpage,
default=None) or strip_or_none(
get_element_by_id('des_text', webpage))
thumbnail = self._html_search_meta(
'twitter:image', webpage, default=None)
if not thumbnail:
thumbnail_path = self._search_regex(
r'img\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
'thumbnail', fatal=False, group='url')
if thumbnail_path:
thumbnail = urljoin(url, thumbnail_path)
uploader = self._search_regex(
r'<div[^>]+class=["\']wt_person[^>]+>\s*<a[^>]+\bhref=["\']/user/[^>]+>([^<]+)',
webpage, 'uploader', fatal=False)
uploader_url = 'https://www.vidlii.com/user/%s' % uploader if uploader else None
upload_date = unified_strdate(self._html_search_meta(
'datePublished', webpage, default=None) or self._search_regex(
r'<date>([^<]+)', webpage, 'upload date', fatal=False))
duration = int_or_none(self._html_search_meta(
'video:duration', webpage, 'duration',
default=None) or self._search_regex(
r'duration\s*:\s*(\d+)', webpage, 'duration', fatal=False))
view_count = int_or_none(self._search_regex(
(r'<strong>(\d+)</strong> views',
r'Views\s*:\s*<strong>(\d+)</strong>'),
webpage, 'view count', fatal=False))
comment_count = int_or_none(self._search_regex(
(r'<span[^>]+id=["\']cmt_num[^>]+>(\d+)',
r'Comments\s*:\s*<strong>(\d+)'),
webpage, 'comment count', fatal=False))
average_rating = float_or_none(self._search_regex(
r'rating\s*:\s*([\d.]+)', webpage, 'average rating', fatal=False))
category = self._html_search_regex(
r'<div>Category\s*:\s*</div>\s*<div>\s*<a[^>]+>([^<]+)', webpage,
'category', fatal=False)
categories = [category] if category else None
tags = [
strip_or_none(tag)
for tag in re.findall(
r'<a[^>]+\bhref=["\']/results\?.*?q=[^>]*>([^<]+)',
webpage) if strip_or_none(tag)
] or None
return {
'id': video_id,
'url': video_url,
'title': title,
'description': description,
'thumbnail': thumbnail,
'uploader': uploader,
'uploader_url': uploader_url,
'upload_date': upload_date,
'duration': duration,
'view_count': view_count,
'comment_count': comment_count,
'average_rating': average_rating,
'categories': categories,
'tags': tags,
}
+1 -1
View File
@@ -218,7 +218,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'id': '56015672',
'ext': 'mp4',
'title': "youtube-dl test video - \u2605 \" ' \u5e78 / \\ \u00e4 \u21ad \U0001d550",
'description': 'md5:2d3305bad981a06ff79f027f19865021',
'description': 'md5:509a9ad5c9bf97c60faee9203aca4479',
'timestamp': 1355990239,
'upload_date': '20121220',
'uploader_url': r're:https?://(?:www\.)?vimeo\.com/user7108434',
+2 -2
View File
@@ -99,10 +99,10 @@ class VKIE(VKBaseIE):
_TESTS = [
{
'url': 'http://vk.com/videos-77521?z=video-77521_162222515%2Fclub77521',
'md5': '0deae91935c54e00003c2a00646315f0',
'md5': '7babad3b85ea2e91948005b1b8b0cb84',
'info_dict': {
'id': '162222515',
'ext': 'flv',
'ext': 'mp4',
'title': 'ProtivoGunz - Хуёвая песня',
'uploader': 're:(?:Noize MC|Alexander Ilyashenko).*',
'duration': 195,
+1 -1
View File
@@ -39,7 +39,7 @@ class XHamsterIE(InfoExtractor):
'uploader': 'Ruseful2011',
'duration': 893,
'age_limit': 18,
'categories': ['Fake Hub', 'Amateur', 'MILFs', 'POV', 'Boss', 'Office', 'Oral', 'Reality', 'Sexy'],
'categories': ['Fake Hub', 'Amateur', 'MILFs', 'POV', 'Beauti', 'Beauties', 'Beautiful', 'Boss', 'Office', 'Oral', 'Reality', 'Sexy', 'Taking'],
},
}, {
'url': 'http://xhamster.com/movies/2221348/britney_spears_sexy_booty.html?hd',
+8 -6
View File
@@ -2451,7 +2451,7 @@ class YoutubeChannelIE(YoutubePlaylistBaseInfoExtractor):
class YoutubeUserIE(YoutubeChannelIE):
IE_DESC = 'YouTube.com user videos (URL or "ytuser" keyword)'
_VALID_URL = r'(?:(?:https?://(?:\w+\.)?youtube\.com/(?:(?P<user>user|c)/)?(?!(?:attribution_link|watch|results)(?:$|[^a-z_A-Z0-9-])))|ytuser:)(?!feed/)(?P<id>[A-Za-z0-9_-]+)'
_VALID_URL = r'(?:(?:https?://(?:\w+\.)?youtube\.com/(?:(?P<user>user|c)/)?(?!(?:attribution_link|watch|results|shared)(?:$|[^a-z_A-Z0-9-])))|ytuser:)(?!feed/)(?P<id>[A-Za-z0-9_-]+)'
_TEMPLATE_URL = 'https://www.youtube.com/%s/%s/videos'
IE_NAME = 'youtube:user'
@@ -2583,7 +2583,11 @@ class YoutubePlaylistsIE(YoutubePlaylistsBaseInfoExtractor):
}]
class YoutubeSearchIE(SearchInfoExtractor, YoutubePlaylistIE):
class YoutubeSearchBaseInfoExtractor(YoutubePlaylistBaseInfoExtractor):
_VIDEO_RE = r'href="\s*/watch\?v=(?P<id>[0-9A-Za-z_-]{11})(?:[^"]*"[^>]+\btitle="(?P<title>[^"]+))?'
class YoutubeSearchIE(SearchInfoExtractor, YoutubeSearchBaseInfoExtractor):
IE_DESC = 'YouTube.com searches'
# there doesn't appear to be a real limit, for example if you search for
# 'python' you get more than 8.000.000 results
@@ -2617,8 +2621,7 @@ class YoutubeSearchIE(SearchInfoExtractor, YoutubePlaylistIE):
raise ExtractorError(
'[youtube] No video results', expected=True)
new_videos = self._ids_to_results(orderedSet(re.findall(
r'href="/watch\?v=(.{11})', html_content)))
new_videos = list(self._process_page(html_content))
videos += new_videos
if not new_videos or len(videos) > limit:
break
@@ -2641,11 +2644,10 @@ class YoutubeSearchDateIE(YoutubeSearchIE):
_EXTRA_QUERY_ARGS = {'search_sort': 'video_date_uploaded'}
class YoutubeSearchURLIE(YoutubePlaylistBaseInfoExtractor):
class YoutubeSearchURLIE(YoutubeSearchBaseInfoExtractor):
IE_DESC = 'YouTube.com search URLs'
IE_NAME = 'youtube:search_url'
_VALID_URL = r'https?://(?:www\.)?youtube\.com/results\?(.*?&)?(?:search_query|q)=(?P<query>[^&]+)(?:[&]|$)'
_VIDEO_RE = r'href="\s*/watch\?v=(?P<id>[0-9A-Za-z_-]{11})(?:[^"]*"[^>]+\btitle="(?P<title>[^"]+))?'
_TESTS = [{
'url': 'https://www.youtube.com/results?baz=bar&search_query=youtube-dl+test+video&filters=video&lclk=video',
'playlist_mincount': 5,
+11 -8
View File
@@ -42,16 +42,19 @@ class ZDFIE(ZDFBaseIE):
_QUALITIES = ('auto', 'low', 'med', 'high', 'veryhigh')
_TESTS = [{
'url': 'https://www.zdf.de/service-und-hilfe/die-neue-zdf-mediathek/zdfmediathek-trailer-100.html',
'url': 'https://www.zdf.de/dokumentation/terra-x/die-magie-der-farben-von-koenigspurpur-und-jeansblau-100.html',
'info_dict': {
'id': 'zdfmediathek-trailer-100',
'id': 'die-magie-der-farben-von-koenigspurpur-und-jeansblau-100',
'ext': 'mp4',
'title': 'Die neue ZDFmediathek',
'description': 'md5:3003d36487fb9a5ea2d1ff60beb55e8d',
'duration': 30,
'timestamp': 1477627200,
'upload_date': '20161028',
}
'title': 'Die Magie der Farben (2/2)',
'description': 'md5:a89da10c928c6235401066b60a6d5c1a',
'duration': 2615,
'timestamp': 1465021200,
'upload_date': '20160604',
},
}, {
'url': 'https://www.zdf.de/service-und-hilfe/die-neue-zdf-mediathek/zdfmediathek-trailer-100.html',
'only_matching': True,
}, {
'url': 'https://www.zdf.de/filme/taunuskrimi/die-lebenden-und-die-toten-1---ein-taunuskrimi-100.html',
'only_matching': True,
+1 -1
View File
@@ -534,7 +534,7 @@ def parseOpts(overrideArguments=None):
workarounds.add_option(
'--prefer-insecure',
'--prefer-unsecure', action='store_true', dest='prefer_insecure',
help='Use an unencrypted connection to retrieve information about the video. (Currently supported only for YouTube)')
help='Use an unencrypted connection to retrieve information whenever possible')
workarounds.add_option(
'--user-agent',
metavar='UA', dest='user_agent',
+2 -1
View File
@@ -31,7 +31,8 @@ class EmbedThumbnailPP(FFmpegPostProcessor):
temp_filename = prepend_extension(filename, 'temp')
if not info.get('thumbnails'):
raise EmbedThumbnailPPError('Thumbnail was not found. Nothing to do.')
self._downloader.to_screen('[embedthumbnail] There aren\'t any thumbnails to embed')
return [], info
thumbnail_filename = info['thumbnails'][-1]['filename']
+9 -4
View File
@@ -28,10 +28,10 @@ def rsa_verify(message, signature, key):
return expected == signature
def update_self(to_screen, verbose, opener):
def update_self(to_screen, verbose, opener, prefer_insecure=False):
"""Update the program file with the latest version from the repository"""
UPDATE_URL = 'https://rg3.github.io/youtube-dl/update/'
UPDATE_URL = '//rg3.github.io/youtube-dl/update/'
VERSION_URL = UPDATE_URL + 'LATEST_VERSION'
JSON_URL = UPDATE_URL + 'versions.json'
UPDATES_RSA_KEY = (0x9d60ee4d8f805312fdb15a62f87b95bd66177b91df176765d13514a0f1754bcd2057295c5b6f1d35daa6742c3ffc9a82d3e118861c207995a8031e151d863c9927e304576bc80692bc8e094896fcf11b66f3e29e04e3a71e9a11558558acea1840aec37fc396fb6b65dc81a1c4144e03bd1c011de62e3f1357b327d08426fe93, 65537)
@@ -40,9 +40,13 @@ def update_self(to_screen, verbose, opener):
to_screen('It looks like you installed youtube-dl with a package manager, pip, setup.py or a tarball. Please use that to update.')
return
def guess_scheme(url, insecure=False):
return 'http%s:%s' % ('' if insecure is True else 's', url)
# Check if there is a new version
try:
newversion = opener.open(VERSION_URL).read().decode('utf-8').strip()
newversion = opener.open(guess_scheme(
VERSION_URL, prefer_insecure)).read().decode('utf-8').strip()
except Exception:
if verbose:
to_screen(encode_compat_str(traceback.format_exc()))
@@ -54,7 +58,8 @@ def update_self(to_screen, verbose, opener):
# Download and check versions info
try:
versions_info = opener.open(JSON_URL).read().decode('utf-8')
versions_info = opener.open(guess_scheme(
JSON_URL, prefer_insecure)).read().decode('utf-8')
versions_info = json.loads(versions_info)
except Exception:
if verbose:
+16 -4
View File
@@ -82,7 +82,7 @@ def register_socks_protocols():
compiled_regex_type = type(re.compile(''))
std_headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20150101 Firefox/47.0 (Chrome)',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0 (Chrome)',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.7',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate',
@@ -538,10 +538,22 @@ def sanitize_path(s):
return os.path.join(*sanitized_path)
# Prepend protocol-less URLs with `http:` scheme in order to mitigate the number of
# unwanted failures due to missing protocol
def sanitize_url(url):
return 'http:%s' % url if url.startswith('//') else url
# Prepend protocol-less URLs with `http:` scheme in order to mitigate
# the number of unwanted failures due to missing protocol
if url.startswith('//'):
return 'http:%s' % url
# Fix some common typos seen so far
COMMON_TYPOS = (
# https://github.com/rg3/youtube-dl/issues/15649
(r'^httpss://', r'https://'),
# https://bx1.be/lives/direct-tv/
(r'^rmtp([es]?)://', r'rtmp\1://'),
)
for mistake, fixup in COMMON_TYPOS:
if re.match(mistake, url):
return re.sub(mistake, fixup, url)
return url
def sanitized_Request(url, *args, **kwargs):
+1 -1
View File
@@ -1,3 +1,3 @@
from __future__ import unicode_literals
__version__ = '2018.02.11'
__version__ = '2018.02.26'