mirror of
https://github.com/ytdl-org/youtube-dl.git
synced 2025-01-20 16:38:48 +00:00
Merge branch 'master' into mediathekviewweb
This commit is contained in:
commit
47f1d70149
6
.github/ISSUE_TEMPLATE/1_broken_site.md
vendored
6
.github/ISSUE_TEMPLATE/1_broken_site.md
vendored
@ -18,7 +18,7 @@ title: ''
|
|||||||
|
|
||||||
<!--
|
<!--
|
||||||
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
|
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
|
||||||
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.02.10. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
|
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.04.26. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
|
||||||
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
|
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
|
||||||
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
|
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
|
||||||
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
|
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
|
||||||
@ -26,7 +26,7 @@ Carefully read and work through this check list in order to prevent the most com
|
|||||||
-->
|
-->
|
||||||
|
|
||||||
- [ ] I'm reporting a broken site support
|
- [ ] I'm reporting a broken site support
|
||||||
- [ ] I've verified that I'm running youtube-dl version **2021.02.10**
|
- [ ] I've verified that I'm running youtube-dl version **2021.04.26**
|
||||||
- [ ] I've checked that all provided URLs are alive and playable in a browser
|
- [ ] I've checked that all provided URLs are alive and playable in a browser
|
||||||
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
|
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
|
||||||
- [ ] I've searched the bugtracker for similar issues including closed ones
|
- [ ] I've searched the bugtracker for similar issues including closed ones
|
||||||
@ -41,7 +41,7 @@ Add the `-v` flag to your command line you run youtube-dl with (`youtube-dl -v <
|
|||||||
[debug] User config: []
|
[debug] User config: []
|
||||||
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
|
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
|
||||||
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
|
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
|
||||||
[debug] youtube-dl version 2021.02.10
|
[debug] youtube-dl version 2021.04.26
|
||||||
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
|
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
|
||||||
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
|
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
|
||||||
[debug] Proxy map: {}
|
[debug] Proxy map: {}
|
||||||
|
@ -19,7 +19,7 @@ labels: 'site-support-request'
|
|||||||
|
|
||||||
<!--
|
<!--
|
||||||
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
|
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
|
||||||
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.02.10. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
|
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.04.26. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
|
||||||
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
|
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
|
||||||
- Make sure that site you are requesting is not dedicated to copyright infringement, see https://yt-dl.org/copyright-infringement. youtube-dl does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights.
|
- Make sure that site you are requesting is not dedicated to copyright infringement, see https://yt-dl.org/copyright-infringement. youtube-dl does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights.
|
||||||
- Search the bugtracker for similar site support requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
|
- Search the bugtracker for similar site support requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
|
||||||
@ -27,7 +27,7 @@ Carefully read and work through this check list in order to prevent the most com
|
|||||||
-->
|
-->
|
||||||
|
|
||||||
- [ ] I'm reporting a new site support request
|
- [ ] I'm reporting a new site support request
|
||||||
- [ ] I've verified that I'm running youtube-dl version **2021.02.10**
|
- [ ] I've verified that I'm running youtube-dl version **2021.04.26**
|
||||||
- [ ] I've checked that all provided URLs are alive and playable in a browser
|
- [ ] I've checked that all provided URLs are alive and playable in a browser
|
||||||
- [ ] I've checked that none of provided URLs violate any copyrights
|
- [ ] I've checked that none of provided URLs violate any copyrights
|
||||||
- [ ] I've searched the bugtracker for similar site support requests including closed ones
|
- [ ] I've searched the bugtracker for similar site support requests including closed ones
|
||||||
|
@ -18,13 +18,13 @@ title: ''
|
|||||||
|
|
||||||
<!--
|
<!--
|
||||||
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
|
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
|
||||||
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.02.10. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
|
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.04.26. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
|
||||||
- Search the bugtracker for similar site feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
|
- Search the bugtracker for similar site feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
|
||||||
- Finally, put x into all relevant boxes (like this [x])
|
- Finally, put x into all relevant boxes (like this [x])
|
||||||
-->
|
-->
|
||||||
|
|
||||||
- [ ] I'm reporting a site feature request
|
- [ ] I'm reporting a site feature request
|
||||||
- [ ] I've verified that I'm running youtube-dl version **2021.02.10**
|
- [ ] I've verified that I'm running youtube-dl version **2021.04.26**
|
||||||
- [ ] I've searched the bugtracker for similar site feature requests including closed ones
|
- [ ] I've searched the bugtracker for similar site feature requests including closed ones
|
||||||
|
|
||||||
|
|
||||||
|
6
.github/ISSUE_TEMPLATE/4_bug_report.md
vendored
6
.github/ISSUE_TEMPLATE/4_bug_report.md
vendored
@ -18,7 +18,7 @@ title: ''
|
|||||||
|
|
||||||
<!--
|
<!--
|
||||||
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
|
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
|
||||||
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.02.10. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
|
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.04.26. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
|
||||||
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
|
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
|
||||||
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
|
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
|
||||||
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
|
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
|
||||||
@ -27,7 +27,7 @@ Carefully read and work through this check list in order to prevent the most com
|
|||||||
-->
|
-->
|
||||||
|
|
||||||
- [ ] I'm reporting a broken site support issue
|
- [ ] I'm reporting a broken site support issue
|
||||||
- [ ] I've verified that I'm running youtube-dl version **2021.02.10**
|
- [ ] I've verified that I'm running youtube-dl version **2021.04.26**
|
||||||
- [ ] I've checked that all provided URLs are alive and playable in a browser
|
- [ ] I've checked that all provided URLs are alive and playable in a browser
|
||||||
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
|
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
|
||||||
- [ ] I've searched the bugtracker for similar bug reports including closed ones
|
- [ ] I've searched the bugtracker for similar bug reports including closed ones
|
||||||
@ -43,7 +43,7 @@ Add the `-v` flag to your command line you run youtube-dl with (`youtube-dl -v <
|
|||||||
[debug] User config: []
|
[debug] User config: []
|
||||||
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
|
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
|
||||||
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
|
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
|
||||||
[debug] youtube-dl version 2021.02.10
|
[debug] youtube-dl version 2021.04.26
|
||||||
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
|
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
|
||||||
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
|
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
|
||||||
[debug] Proxy map: {}
|
[debug] Proxy map: {}
|
||||||
|
4
.github/ISSUE_TEMPLATE/5_feature_request.md
vendored
4
.github/ISSUE_TEMPLATE/5_feature_request.md
vendored
@ -19,13 +19,13 @@ labels: 'request'
|
|||||||
|
|
||||||
<!--
|
<!--
|
||||||
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
|
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
|
||||||
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.02.10. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
|
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2021.04.26. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
|
||||||
- Search the bugtracker for similar feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
|
- Search the bugtracker for similar feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
|
||||||
- Finally, put x into all relevant boxes (like this [x])
|
- Finally, put x into all relevant boxes (like this [x])
|
||||||
-->
|
-->
|
||||||
|
|
||||||
- [ ] I'm reporting a feature request
|
- [ ] I'm reporting a feature request
|
||||||
- [ ] I've verified that I'm running youtube-dl version **2021.02.10**
|
- [ ] I've verified that I'm running youtube-dl version **2021.04.26**
|
||||||
- [ ] I've searched the bugtracker for similar feature requests including closed ones
|
- [ ] I've searched the bugtracker for similar feature requests including closed ones
|
||||||
|
|
||||||
|
|
||||||
|
9
.github/workflows/ci.yml
vendored
9
.github/workflows/ci.yml
vendored
@ -49,11 +49,18 @@ jobs:
|
|||||||
- name: Install Jython
|
- name: Install Jython
|
||||||
if: ${{ matrix.python-impl == 'jython' }}
|
if: ${{ matrix.python-impl == 'jython' }}
|
||||||
run: |
|
run: |
|
||||||
wget http://search.maven.org/remotecontent?filepath=org/python/jython-installer/2.7.1/jython-installer-2.7.1.jar -O jython-installer.jar
|
wget https://repo1.maven.org/maven2/org/python/jython-installer/2.7.1/jython-installer-2.7.1.jar -O jython-installer.jar
|
||||||
java -jar jython-installer.jar -s -d "$HOME/jython"
|
java -jar jython-installer.jar -s -d "$HOME/jython"
|
||||||
echo "$HOME/jython/bin" >> $GITHUB_PATH
|
echo "$HOME/jython/bin" >> $GITHUB_PATH
|
||||||
- name: Install nose
|
- name: Install nose
|
||||||
|
if: ${{ matrix.python-impl != 'jython' }}
|
||||||
run: pip install nose
|
run: pip install nose
|
||||||
|
- name: Install nose (Jython)
|
||||||
|
if: ${{ matrix.python-impl == 'jython' }}
|
||||||
|
# Working around deprecation of support for non-SNI clients at PyPI CDN (see https://status.python.org/incidents/hzmjhqsdjqgb)
|
||||||
|
run: |
|
||||||
|
wget https://files.pythonhosted.org/packages/99/4f/13fb671119e65c4dce97c60e67d3fd9e6f7f809f2b307e2611f4701205cb/nose-1.3.7-py2-none-any.whl
|
||||||
|
pip install nose-1.3.7-py2-none-any.whl
|
||||||
- name: Run tests
|
- name: Run tests
|
||||||
continue-on-error: ${{ matrix.ytdl-test-set == 'download' || matrix.python-impl == 'jython' }}
|
continue-on-error: ${{ matrix.ytdl-test-set == 'download' || matrix.python-impl == 'jython' }}
|
||||||
env:
|
env:
|
||||||
|
199
ChangeLog
199
ChangeLog
@ -1,3 +1,202 @@
|
|||||||
|
version 2021.04.26
|
||||||
|
|
||||||
|
Extractors
|
||||||
|
+ [xfileshare] Add support for wolfstream.tv (#28858)
|
||||||
|
* [francetvinfo] Improve video id extraction (#28792)
|
||||||
|
* [medaltv] Fix extraction (#28807)
|
||||||
|
* [tver] Redirect all downloads to Brightcove (#28849)
|
||||||
|
* [go] Improve video id extraction (#25207, #25216, #26058)
|
||||||
|
* [youtube] Fix lazy extractors (#28780)
|
||||||
|
+ [bbc] Extract description and timestamp from __INITIAL_DATA__ (#28774)
|
||||||
|
* [cbsnews] Fix extraction for python <3.6 (#23359)
|
||||||
|
|
||||||
|
|
||||||
|
version 2021.04.17
|
||||||
|
|
||||||
|
Core
|
||||||
|
+ [utils] Add support for experimental HTTP response status code
|
||||||
|
308 Permanent Redirect (#27877, #28768)
|
||||||
|
|
||||||
|
Extractors
|
||||||
|
+ [lbry] Add support for HLS videos (#27877, #28768)
|
||||||
|
* [youtube] Fix stretched ratio calculation
|
||||||
|
* [youtube] Improve stretch extraction (#28769)
|
||||||
|
* [youtube:tab] Improve grid extraction (#28725)
|
||||||
|
+ [youtube:tab] Detect series playlist on playlists page (#28723)
|
||||||
|
+ [youtube] Add more invidious instances (#28706)
|
||||||
|
* [pluralsight] Extend anti-throttling timeout (#28712)
|
||||||
|
* [youtube] Improve URL to extractor routing (#27572, #28335, #28742)
|
||||||
|
+ [maoritv] Add support for maoritelevision.com (#24552)
|
||||||
|
+ [youtube:tab] Pass innertube context and x-goog-visitor-id header along with
|
||||||
|
continuation requests (#28702)
|
||||||
|
* [mtv] Fix Viacom A/B Testing Video Player extraction (#28703)
|
||||||
|
+ [pornhub] Extract DASH and HLS formats from get_media end point (#28698)
|
||||||
|
* [cbssports] Fix extraction (#28682)
|
||||||
|
* [jamendo] Fix track extraction (#28686)
|
||||||
|
* [curiositystream] Fix format extraction (#26845, #28668)
|
||||||
|
|
||||||
|
|
||||||
|
version 2021.04.07
|
||||||
|
|
||||||
|
Core
|
||||||
|
* [extractor/common] Use compat_cookies_SimpleCookie for _get_cookies
|
||||||
|
+ [compat] Introduce compat_cookies_SimpleCookie
|
||||||
|
* [extractor/common] Improve JSON-LD author extraction
|
||||||
|
* [extractor/common] Fix _get_cookies on python 2 (#20673, #23256, #20326,
|
||||||
|
#28640)
|
||||||
|
|
||||||
|
Extractors
|
||||||
|
* [youtube] Fix extraction of videos with restricted location (#28685)
|
||||||
|
+ [line] Add support for live.line.me (#17205, #28658)
|
||||||
|
* [vimeo] Improve extraction (#28591)
|
||||||
|
* [youku] Update ccode (#17852, #28447, #28460, #28648)
|
||||||
|
* [youtube] Prefer direct entry metadata over entry metadata from playlist
|
||||||
|
(#28619, #28636)
|
||||||
|
* [screencastomatic] Fix extraction (#11976, #24489)
|
||||||
|
+ [palcomp3] Add support for palcomp3.com (#13120)
|
||||||
|
+ [arnes] Add support for video.arnes.si (#28483)
|
||||||
|
+ [youtube:tab] Add support for hashtags (#28308)
|
||||||
|
|
||||||
|
|
||||||
|
version 2021.04.01
|
||||||
|
|
||||||
|
Extractors
|
||||||
|
* [youtube] Setup CONSENT cookie when needed (#28604)
|
||||||
|
* [vimeo] Fix password protected review extraction (#27591)
|
||||||
|
* [youtube] Improve age-restricted video extraction (#28578)
|
||||||
|
|
||||||
|
|
||||||
|
version 2021.03.31
|
||||||
|
|
||||||
|
Extractors
|
||||||
|
* [vlive] Fix inkey request (#28589)
|
||||||
|
* [francetvinfo] Improve video id extraction (#28584)
|
||||||
|
+ [instagram] Extract duration (#28469)
|
||||||
|
* [instagram] Improve title extraction (#28469)
|
||||||
|
+ [sbs] Add support for ondemand watch URLs (#28566)
|
||||||
|
* [youtube] Fix video's channel extraction (#28562)
|
||||||
|
* [picarto] Fix live stream extraction (#28532)
|
||||||
|
* [vimeo] Fix unlisted video extraction (#28414)
|
||||||
|
* [youtube:tab] Fix playlist/community continuation items extraction (#28266)
|
||||||
|
* [ard] Improve clip id extraction (#22724, #28528)
|
||||||
|
|
||||||
|
|
||||||
|
version 2021.03.25
|
||||||
|
|
||||||
|
Extractors
|
||||||
|
+ [zoom] Add support for zoom.us (#16597, #27002, #28531)
|
||||||
|
* [bbc] Fix BBC IPlayer Episodes/Group extraction (#28360)
|
||||||
|
* [youtube] Fix default value for youtube_include_dash_manifest (#28523)
|
||||||
|
* [zingmp3] Fix extraction (#11589, #16409, #16968, #27205)
|
||||||
|
+ [vgtv] Add support for new tv.aftonbladet.se URL schema (#28514)
|
||||||
|
+ [tiktok] Detect private videos (#28453)
|
||||||
|
* [vimeo:album] Fix extraction for albums with number of videos multiple
|
||||||
|
to page size (#28486)
|
||||||
|
* [vvvvid] Fix kenc format extraction (#28473)
|
||||||
|
* [mlb] Fix video extraction (#21241)
|
||||||
|
* [svtplay] Improve extraction (#28448)
|
||||||
|
* [applepodcasts] Fix extraction (#28445)
|
||||||
|
* [rtve] Improve extraction
|
||||||
|
+ Extract all formats
|
||||||
|
* Fix RTVE Infantil extraction (#24851)
|
||||||
|
+ Extract is_live and series
|
||||||
|
|
||||||
|
|
||||||
|
version 2021.03.14
|
||||||
|
|
||||||
|
Core
|
||||||
|
+ Introduce release_timestamp meta field (#28386)
|
||||||
|
|
||||||
|
Extractors
|
||||||
|
+ [southpark] Add support for southparkstudios.com (#28413)
|
||||||
|
* [southpark] Fix extraction (#26763, #28413)
|
||||||
|
* [sportdeutschland] Fix extraction (#21856, #28425)
|
||||||
|
* [pinterest] Reduce the number of HLS format requests
|
||||||
|
* [peertube] Improve thumbnail extraction (#28419)
|
||||||
|
* [tver] Improve title extraction (#28418)
|
||||||
|
* [fujitv] Fix HLS formats extension (#28416)
|
||||||
|
* [shahid] Fix format extraction (#28383)
|
||||||
|
+ [lbry] Add support for channel filters (#28385)
|
||||||
|
+ [bandcamp] Extract release timestamp
|
||||||
|
+ [lbry] Extract release timestamp (#28386)
|
||||||
|
* [pornhub] Detect flagged videos
|
||||||
|
+ [pornhub] Extract formats from get_media end point (#28395)
|
||||||
|
* [bilibili] Fix video info extraction (#28341)
|
||||||
|
+ [cbs] Add support for Paramount+ (#28342)
|
||||||
|
+ [trovo] Add Origin header to VOD formats (#28346)
|
||||||
|
* [voxmedia] Fix volume embed extraction (#28338)
|
||||||
|
|
||||||
|
|
||||||
|
version 2021.03.03
|
||||||
|
|
||||||
|
Extractors
|
||||||
|
* [youtube:tab] Switch continuation to browse API (#28289, #28327)
|
||||||
|
* [9c9media] Fix extraction for videos with multiple ContentPackages (#28309)
|
||||||
|
+ [bbc] Add support for BBC Reel videos (#21870, #23660, #28268)
|
||||||
|
|
||||||
|
|
||||||
|
version 2021.03.02
|
||||||
|
|
||||||
|
Extractors
|
||||||
|
* [zdf] Rework extractors (#11606, #13473, #17354, #21185, #26711, #27068,
|
||||||
|
#27930, #28198, #28199, #28274)
|
||||||
|
* Generalize cross-extractor video ids for zdf based extractors
|
||||||
|
* Improve extraction
|
||||||
|
* Fix 3sat and phoenix
|
||||||
|
* [stretchinternet] Fix extraction (#28297)
|
||||||
|
* [urplay] Fix episode data extraction (#28292)
|
||||||
|
+ [bandaichannel] Add support for b-ch.com (#21404)
|
||||||
|
* [srgssr] Improve extraction (#14717, #14725, #27231, #28238)
|
||||||
|
+ Extract subtitle
|
||||||
|
* Fix extraction for new videos
|
||||||
|
* Update srf download domains
|
||||||
|
* [vvvvid] Reduce season request payload size
|
||||||
|
+ [vvvvid] Extract series sublists playlist title (#27601, #27618)
|
||||||
|
+ [dplay] Extract Ad-Free uplynk URLs (#28160)
|
||||||
|
+ [wat] Detect DRM protected videos (#27958)
|
||||||
|
* [tf1] Improve extraction (#27980, #28040)
|
||||||
|
* [tmz] Fix and improve extraction (#24603, #24687, 28211)
|
||||||
|
+ [gedidigital] Add support for Gedi group sites (#7347, #26946)
|
||||||
|
* [youtube] Fix get_video_info request
|
||||||
|
|
||||||
|
|
||||||
|
version 2021.02.22
|
||||||
|
|
||||||
|
Core
|
||||||
|
+ [postprocessor/embedthumbnail] Recognize atomicparsley binary in lowercase
|
||||||
|
(#28112)
|
||||||
|
|
||||||
|
Extractors
|
||||||
|
* [apa] Fix and improve extraction (#27750)
|
||||||
|
+ [youporn] Extract duration (#28019)
|
||||||
|
+ [peertube] Add support for canard.tube (#28190)
|
||||||
|
* [youtube] Fixup m4a_dash formats (#28165)
|
||||||
|
+ [samplefocus] Add support for samplefocus.com (#27763)
|
||||||
|
+ [vimeo] Add support for unlisted video source format extraction
|
||||||
|
* [viki] Improve extraction (#26522, #28203)
|
||||||
|
* Extract uploader URL and episode number
|
||||||
|
* Report login required error
|
||||||
|
+ Extract 480p formats
|
||||||
|
* Fix API v4 calls
|
||||||
|
* [ninegag] Unescape title (#28201)
|
||||||
|
* [youtube] Improve URL regular expression (#28193)
|
||||||
|
+ [youtube] Add support for redirect.invidious.io (#28193)
|
||||||
|
+ [dplay] Add support for de.hgtv.com (#28182)
|
||||||
|
+ [dplay] Add support for discoveryplus.com (#24698)
|
||||||
|
+ [simplecast] Add support for simplecast.com (#24107)
|
||||||
|
* [youtube] Fix uploader extraction in flat playlist mode (#28045)
|
||||||
|
* [yandexmusic:playlist] Request missing tracks in chunks (#27355, #28184)
|
||||||
|
+ [storyfire] Add support for storyfire.com (#25628, #26349)
|
||||||
|
+ [zhihu] Add support for zhihu.com (#28177)
|
||||||
|
* [youtube] Fix controversial videos when authenticated with cookies (#28174)
|
||||||
|
* [ccma] Fix timestamp parsing in python 2
|
||||||
|
+ [videopress] Add support for video.wordpress.com
|
||||||
|
* [kakao] Improve info extraction and detect geo restriction (#26577)
|
||||||
|
* [xboxclips] Fix extraction (#27151)
|
||||||
|
* [ard] Improve formats extraction (#28155)
|
||||||
|
+ [canvas] Add support for dagelijksekost.een.be (#28119)
|
||||||
|
|
||||||
|
|
||||||
version 2021.02.10
|
version 2021.02.10
|
||||||
|
|
||||||
Extractors
|
Extractors
|
||||||
|
@ -3,6 +3,7 @@
|
|||||||
- **20min**
|
- **20min**
|
||||||
- **220.ro**
|
- **220.ro**
|
||||||
- **23video**
|
- **23video**
|
||||||
|
- **247sports**
|
||||||
- **24video**
|
- **24video**
|
||||||
- **3qsdn**: 3Q SDN
|
- **3qsdn**: 3Q SDN
|
||||||
- **3sat**
|
- **3sat**
|
||||||
@ -82,6 +83,7 @@
|
|||||||
- **awaan:video**
|
- **awaan:video**
|
||||||
- **AZMedien**: AZ Medien videos
|
- **AZMedien**: AZ Medien videos
|
||||||
- **BaiduVideo**: 百度视频
|
- **BaiduVideo**: 百度视频
|
||||||
|
- **bandaichannel**
|
||||||
- **Bandcamp**
|
- **Bandcamp**
|
||||||
- **Bandcamp:album**
|
- **Bandcamp:album**
|
||||||
- **Bandcamp:weekly**
|
- **Bandcamp:weekly**
|
||||||
@ -89,7 +91,8 @@
|
|||||||
- **bbc**: BBC
|
- **bbc**: BBC
|
||||||
- **bbc.co.uk**: BBC iPlayer
|
- **bbc.co.uk**: BBC iPlayer
|
||||||
- **bbc.co.uk:article**: BBC articles
|
- **bbc.co.uk:article**: BBC articles
|
||||||
- **bbc.co.uk:iplayer:playlist**
|
- **bbc.co.uk:iplayer:episodes**
|
||||||
|
- **bbc.co.uk:iplayer:group**
|
||||||
- **bbc.co.uk:playlist**
|
- **bbc.co.uk:playlist**
|
||||||
- **BBVTV**
|
- **BBVTV**
|
||||||
- **Beatport**
|
- **Beatport**
|
||||||
@ -158,7 +161,8 @@
|
|||||||
- **cbsnews**: CBS News
|
- **cbsnews**: CBS News
|
||||||
- **cbsnews:embed**
|
- **cbsnews:embed**
|
||||||
- **cbsnews:livevideo**: CBS News Live Videos
|
- **cbsnews:livevideo**: CBS News Live Videos
|
||||||
- **CBSSports**
|
- **cbssports**
|
||||||
|
- **cbssports:embed**
|
||||||
- **CCMA**
|
- **CCMA**
|
||||||
- **CCTV**: 央视网
|
- **CCTV**: 央视网
|
||||||
- **CDA**
|
- **CDA**
|
||||||
@ -212,6 +216,7 @@
|
|||||||
- **curiositystream**
|
- **curiositystream**
|
||||||
- **curiositystream:collection**
|
- **curiositystream:collection**
|
||||||
- **CWTV**
|
- **CWTV**
|
||||||
|
- **DagelijkseKost**: dagelijksekost.een.be
|
||||||
- **DailyMail**
|
- **DailyMail**
|
||||||
- **dailymotion**
|
- **dailymotion**
|
||||||
- **dailymotion:playlist**
|
- **dailymotion:playlist**
|
||||||
@ -233,6 +238,7 @@
|
|||||||
- **DiscoveryGo**
|
- **DiscoveryGo**
|
||||||
- **DiscoveryGoPlaylist**
|
- **DiscoveryGoPlaylist**
|
||||||
- **DiscoveryNetworksDe**
|
- **DiscoveryNetworksDe**
|
||||||
|
- **DiscoveryPlus**
|
||||||
- **DiscoveryVR**
|
- **DiscoveryVR**
|
||||||
- **Disney**
|
- **Disney**
|
||||||
- **dlive:stream**
|
- **dlive:stream**
|
||||||
@ -328,6 +334,7 @@
|
|||||||
- **Gaskrank**
|
- **Gaskrank**
|
||||||
- **Gazeta**
|
- **Gazeta**
|
||||||
- **GDCVault**
|
- **GDCVault**
|
||||||
|
- **GediDigital**
|
||||||
- **generic**: Generic downloader that works on some sites
|
- **generic**: Generic downloader that works on some sites
|
||||||
- **Gfycat**
|
- **Gfycat**
|
||||||
- **GiantBomb**
|
- **GiantBomb**
|
||||||
@ -353,6 +360,7 @@
|
|||||||
- **HentaiStigma**
|
- **HentaiStigma**
|
||||||
- **hetklokhuis**
|
- **hetklokhuis**
|
||||||
- **hgtv.com:show**
|
- **hgtv.com:show**
|
||||||
|
- **HGTVDe**
|
||||||
- **HiDive**
|
- **HiDive**
|
||||||
- **HistoricFilms**
|
- **HistoricFilms**
|
||||||
- **history:player**
|
- **history:player**
|
||||||
@ -457,6 +465,8 @@
|
|||||||
- **limelight**
|
- **limelight**
|
||||||
- **limelight:channel**
|
- **limelight:channel**
|
||||||
- **limelight:channel_list**
|
- **limelight:channel_list**
|
||||||
|
- **LineLive**
|
||||||
|
- **LineLiveChannel**
|
||||||
- **LineTV**
|
- **LineTV**
|
||||||
- **linkedin:learning**
|
- **linkedin:learning**
|
||||||
- **linkedin:learning:course**
|
- **linkedin:learning:course**
|
||||||
@ -482,6 +492,7 @@
|
|||||||
- **mangomolo:live**
|
- **mangomolo:live**
|
||||||
- **mangomolo:video**
|
- **mangomolo:video**
|
||||||
- **ManyVids**
|
- **ManyVids**
|
||||||
|
- **MaoriTV**
|
||||||
- **Markiza**
|
- **Markiza**
|
||||||
- **MarkizaPage**
|
- **MarkizaPage**
|
||||||
- **massengeschmack.tv**
|
- **massengeschmack.tv**
|
||||||
@ -517,6 +528,7 @@
|
|||||||
- **mixcloud:playlist**
|
- **mixcloud:playlist**
|
||||||
- **mixcloud:user**
|
- **mixcloud:user**
|
||||||
- **MLB**
|
- **MLB**
|
||||||
|
- **MLBVideo**
|
||||||
- **Mnet**
|
- **Mnet**
|
||||||
- **MNetTV**
|
- **MNetTV**
|
||||||
- **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net
|
- **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net
|
||||||
@ -672,6 +684,9 @@
|
|||||||
- **OutsideTV**
|
- **OutsideTV**
|
||||||
- **PacktPub**
|
- **PacktPub**
|
||||||
- **PacktPubCourse**
|
- **PacktPubCourse**
|
||||||
|
- **PalcoMP3:artist**
|
||||||
|
- **PalcoMP3:song**
|
||||||
|
- **PalcoMP3:video**
|
||||||
- **pandora.tv**: 판도라TV
|
- **pandora.tv**: 판도라TV
|
||||||
- **ParamountNetwork**
|
- **ParamountNetwork**
|
||||||
- **parliamentlive.tv**: UK parliament videos
|
- **parliamentlive.tv**: UK parliament videos
|
||||||
@ -803,6 +818,7 @@
|
|||||||
- **safari:course**: safaribooksonline.com online courses
|
- **safari:course**: safaribooksonline.com online courses
|
||||||
- **SAKTV**
|
- **SAKTV**
|
||||||
- **SaltTV**
|
- **SaltTV**
|
||||||
|
- **SampleFocus**
|
||||||
- **Sapo**: SAPO Vídeos
|
- **Sapo**: SAPO Vídeos
|
||||||
- **savefrom.net**
|
- **savefrom.net**
|
||||||
- **SBS**: sbs.com.au
|
- **SBS**: sbs.com.au
|
||||||
@ -825,6 +841,9 @@
|
|||||||
- **ShahidShow**
|
- **ShahidShow**
|
||||||
- **Shared**: shared.sx
|
- **Shared**: shared.sx
|
||||||
- **ShowRoomLive**
|
- **ShowRoomLive**
|
||||||
|
- **simplecast**
|
||||||
|
- **simplecast:episode**
|
||||||
|
- **simplecast:podcast**
|
||||||
- **Sina**
|
- **Sina**
|
||||||
- **sky.it**
|
- **sky.it**
|
||||||
- **sky:news**
|
- **sky:news**
|
||||||
@ -877,6 +896,9 @@
|
|||||||
- **Steam**
|
- **Steam**
|
||||||
- **Stitcher**
|
- **Stitcher**
|
||||||
- **StitcherShow**
|
- **StitcherShow**
|
||||||
|
- **StoryFire**
|
||||||
|
- **StoryFireSeries**
|
||||||
|
- **StoryFireUser**
|
||||||
- **Streamable**
|
- **Streamable**
|
||||||
- **streamcloud.eu**
|
- **streamcloud.eu**
|
||||||
- **StreamCZ**
|
- **StreamCZ**
|
||||||
@ -1045,6 +1067,7 @@
|
|||||||
- **Vidbit**
|
- **Vidbit**
|
||||||
- **Viddler**
|
- **Viddler**
|
||||||
- **Videa**
|
- **Videa**
|
||||||
|
- **video.arnes.si**: Arnes Video
|
||||||
- **video.google:search**: Google Video search
|
- **video.google:search**: Google Video search
|
||||||
- **video.sky.it**
|
- **video.sky.it**
|
||||||
- **video.sky.it:live**
|
- **video.sky.it:live**
|
||||||
@ -1139,7 +1162,7 @@
|
|||||||
- **WWE**
|
- **WWE**
|
||||||
- **XBef**
|
- **XBef**
|
||||||
- **XboxClips**
|
- **XboxClips**
|
||||||
- **XFileShare**: XFileShare based sites: Aparat, ClipWatching, GoUnlimited, GoVid, HolaVid, Streamty, TheVideoBee, Uqload, VidBom, vidlo, VidLocker, VidShare, VUp, XVideoSharing
|
- **XFileShare**: XFileShare based sites: Aparat, ClipWatching, GoUnlimited, GoVid, HolaVid, Streamty, TheVideoBee, Uqload, VidBom, vidlo, VidLocker, VidShare, VUp, WolfStream, XVideoSharing
|
||||||
- **XHamster**
|
- **XHamster**
|
||||||
- **XHamsterEmbed**
|
- **XHamsterEmbed**
|
||||||
- **XHamsterUser**
|
- **XHamsterUser**
|
||||||
@ -1198,5 +1221,8 @@
|
|||||||
- **ZattooLive**
|
- **ZattooLive**
|
||||||
- **ZDF**
|
- **ZDF**
|
||||||
- **ZDFChannel**
|
- **ZDFChannel**
|
||||||
|
- **Zhihu**
|
||||||
- **zingmp3**: mp3.zing.vn
|
- **zingmp3**: mp3.zing.vn
|
||||||
|
- **zingmp3:album**
|
||||||
|
- **zoom**
|
||||||
- **Zype**
|
- **Zype**
|
||||||
|
@ -70,15 +70,6 @@ class TestAllURLsMatching(unittest.TestCase):
|
|||||||
# self.assertMatch('http://www.youtube.com/results?search_query=making+mustard', ['youtube:search_url'])
|
# self.assertMatch('http://www.youtube.com/results?search_query=making+mustard', ['youtube:search_url'])
|
||||||
# self.assertMatch('https://www.youtube.com/results?baz=bar&search_query=youtube-dl+test+video&filters=video&lclk=video', ['youtube:search_url'])
|
# self.assertMatch('https://www.youtube.com/results?baz=bar&search_query=youtube-dl+test+video&filters=video&lclk=video', ['youtube:search_url'])
|
||||||
|
|
||||||
def test_youtube_extract(self):
|
|
||||||
assertExtractId = lambda url, id: self.assertEqual(YoutubeIE.extract_id(url), id)
|
|
||||||
assertExtractId('http://www.youtube.com/watch?&v=BaW_jenozKc', 'BaW_jenozKc')
|
|
||||||
assertExtractId('https://www.youtube.com/watch?&v=BaW_jenozKc', 'BaW_jenozKc')
|
|
||||||
assertExtractId('https://www.youtube.com/watch?feature=player_embedded&v=BaW_jenozKc', 'BaW_jenozKc')
|
|
||||||
assertExtractId('https://www.youtube.com/watch_popup?v=BaW_jenozKc', 'BaW_jenozKc')
|
|
||||||
assertExtractId('http://www.youtube.com/watch?v=BaW_jenozKcsharePLED17F32AD9753930', 'BaW_jenozKc')
|
|
||||||
assertExtractId('BaW_jenozKc', 'BaW_jenozKc')
|
|
||||||
|
|
||||||
def test_facebook_matching(self):
|
def test_facebook_matching(self):
|
||||||
self.assertTrue(FacebookIE.suitable('https://www.facebook.com/Shiniknoh#!/photo.php?v=10153317450565268'))
|
self.assertTrue(FacebookIE.suitable('https://www.facebook.com/Shiniknoh#!/photo.php?v=10153317450565268'))
|
||||||
self.assertTrue(FacebookIE.suitable('https://www.facebook.com/cindyweather?fref=ts#!/photo.php?v=10152183998945793'))
|
self.assertTrue(FacebookIE.suitable('https://www.facebook.com/cindyweather?fref=ts#!/photo.php?v=10152183998945793'))
|
||||||
|
@ -39,6 +39,16 @@ class TestExecution(unittest.TestCase):
|
|||||||
_, stderr = p.communicate()
|
_, stderr = p.communicate()
|
||||||
self.assertFalse(stderr)
|
self.assertFalse(stderr)
|
||||||
|
|
||||||
|
def test_lazy_extractors(self):
|
||||||
|
try:
|
||||||
|
subprocess.check_call([sys.executable, 'devscripts/make_lazy_extractors.py', 'youtube_dl/extractor/lazy_extractors.py'], cwd=rootDir, stdout=_DEV_NULL)
|
||||||
|
subprocess.check_call([sys.executable, 'test/test_all_urls.py'], cwd=rootDir, stdout=_DEV_NULL)
|
||||||
|
finally:
|
||||||
|
try:
|
||||||
|
os.remove('youtube_dl/extractor/lazy_extractors.py')
|
||||||
|
except (IOError, OSError):
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
if __name__ == '__main__':
|
if __name__ == '__main__':
|
||||||
unittest.main()
|
unittest.main()
|
||||||
|
26
test/test_youtube_misc.py
Normal file
26
test/test_youtube_misc.py
Normal file
@ -0,0 +1,26 @@
|
|||||||
|
#!/usr/bin/env python
|
||||||
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
|
# Allow direct execution
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import unittest
|
||||||
|
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||||
|
|
||||||
|
|
||||||
|
from youtube_dl.extractor import YoutubeIE
|
||||||
|
|
||||||
|
|
||||||
|
class TestYoutubeMisc(unittest.TestCase):
|
||||||
|
def test_youtube_extract(self):
|
||||||
|
assertExtractId = lambda url, id: self.assertEqual(YoutubeIE.extract_id(url), id)
|
||||||
|
assertExtractId('http://www.youtube.com/watch?&v=BaW_jenozKc', 'BaW_jenozKc')
|
||||||
|
assertExtractId('https://www.youtube.com/watch?&v=BaW_jenozKc', 'BaW_jenozKc')
|
||||||
|
assertExtractId('https://www.youtube.com/watch?feature=player_embedded&v=BaW_jenozKc', 'BaW_jenozKc')
|
||||||
|
assertExtractId('https://www.youtube.com/watch_popup?v=BaW_jenozKc', 'BaW_jenozKc')
|
||||||
|
assertExtractId('http://www.youtube.com/watch?v=BaW_jenozKcsharePLED17F32AD9753930', 'BaW_jenozKc')
|
||||||
|
assertExtractId('BaW_jenozKc', 'BaW_jenozKc')
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
unittest.main()
|
@ -773,11 +773,20 @@ class YoutubeDL(object):
|
|||||||
|
|
||||||
def extract_info(self, url, download=True, ie_key=None, extra_info={},
|
def extract_info(self, url, download=True, ie_key=None, extra_info={},
|
||||||
process=True, force_generic_extractor=False):
|
process=True, force_generic_extractor=False):
|
||||||
'''
|
"""
|
||||||
Returns a list with a dictionary for each video we find.
|
Return a list with a dictionary for each video extracted.
|
||||||
If 'download', also downloads the videos.
|
|
||||||
extra_info is a dict containing the extra values to add to each result
|
Arguments:
|
||||||
'''
|
url -- URL to extract
|
||||||
|
|
||||||
|
Keyword arguments:
|
||||||
|
download -- whether to download videos during extraction
|
||||||
|
ie_key -- extractor key hint
|
||||||
|
extra_info -- dictionary containing the extra values to add to each result
|
||||||
|
process -- whether to resolve all unresolved references (URLs, playlist items),
|
||||||
|
must be True for download to work.
|
||||||
|
force_generic_extractor -- force using the generic extractor
|
||||||
|
"""
|
||||||
|
|
||||||
if not ie_key and force_generic_extractor:
|
if not ie_key and force_generic_extractor:
|
||||||
ie_key = 'Generic'
|
ie_key = 'Generic'
|
||||||
@ -1511,12 +1520,16 @@ class YoutubeDL(object):
|
|||||||
if 'display_id' not in info_dict and 'id' in info_dict:
|
if 'display_id' not in info_dict and 'id' in info_dict:
|
||||||
info_dict['display_id'] = info_dict['id']
|
info_dict['display_id'] = info_dict['id']
|
||||||
|
|
||||||
if info_dict.get('upload_date') is None and info_dict.get('timestamp') is not None:
|
for ts_key, date_key in (
|
||||||
|
('timestamp', 'upload_date'),
|
||||||
|
('release_timestamp', 'release_date'),
|
||||||
|
):
|
||||||
|
if info_dict.get(date_key) is None and info_dict.get(ts_key) is not None:
|
||||||
# Working around out-of-range timestamp values (e.g. negative ones on Windows,
|
# Working around out-of-range timestamp values (e.g. negative ones on Windows,
|
||||||
# see http://bugs.python.org/issue1646728)
|
# see http://bugs.python.org/issue1646728)
|
||||||
try:
|
try:
|
||||||
upload_date = datetime.datetime.utcfromtimestamp(info_dict['timestamp'])
|
upload_date = datetime.datetime.utcfromtimestamp(info_dict[ts_key])
|
||||||
info_dict['upload_date'] = upload_date.strftime('%Y%m%d')
|
info_dict[date_key] = upload_date.strftime('%Y%m%d')
|
||||||
except (ValueError, OverflowError, OSError):
|
except (ValueError, OverflowError, OSError):
|
||||||
pass
|
pass
|
||||||
|
|
||||||
|
@ -73,6 +73,15 @@ try:
|
|||||||
except ImportError: # Python 2
|
except ImportError: # Python 2
|
||||||
import Cookie as compat_cookies
|
import Cookie as compat_cookies
|
||||||
|
|
||||||
|
if sys.version_info[0] == 2:
|
||||||
|
class compat_cookies_SimpleCookie(compat_cookies.SimpleCookie):
|
||||||
|
def load(self, rawdata):
|
||||||
|
if isinstance(rawdata, compat_str):
|
||||||
|
rawdata = str(rawdata)
|
||||||
|
return super(compat_cookies_SimpleCookie, self).load(rawdata)
|
||||||
|
else:
|
||||||
|
compat_cookies_SimpleCookie = compat_cookies.SimpleCookie
|
||||||
|
|
||||||
try:
|
try:
|
||||||
import html.entities as compat_html_entities
|
import html.entities as compat_html_entities
|
||||||
except ImportError: # Python 2
|
except ImportError: # Python 2
|
||||||
@ -3000,6 +3009,7 @@ __all__ = [
|
|||||||
'compat_cookiejar',
|
'compat_cookiejar',
|
||||||
'compat_cookiejar_Cookie',
|
'compat_cookiejar_Cookie',
|
||||||
'compat_cookies',
|
'compat_cookies',
|
||||||
|
'compat_cookies_SimpleCookie',
|
||||||
'compat_ctypes_WINFUNCTYPE',
|
'compat_ctypes_WINFUNCTYPE',
|
||||||
'compat_etree_Element',
|
'compat_etree_Element',
|
||||||
'compat_etree_fromstring',
|
'compat_etree_fromstring',
|
||||||
|
@ -6,25 +6,21 @@ import re
|
|||||||
from .common import InfoExtractor
|
from .common import InfoExtractor
|
||||||
from ..utils import (
|
from ..utils import (
|
||||||
determine_ext,
|
determine_ext,
|
||||||
js_to_json,
|
int_or_none,
|
||||||
url_or_none,
|
url_or_none,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
class APAIE(InfoExtractor):
|
class APAIE(InfoExtractor):
|
||||||
_VALID_URL = r'https?://[^/]+\.apa\.at/embed/(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
|
_VALID_URL = r'(?P<base_url>https?://[^/]+\.apa\.at)/embed/(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
|
||||||
_TESTS = [{
|
_TESTS = [{
|
||||||
'url': 'http://uvp.apa.at/embed/293f6d17-692a-44e3-9fd5-7b178f3a1029',
|
'url': 'http://uvp.apa.at/embed/293f6d17-692a-44e3-9fd5-7b178f3a1029',
|
||||||
'md5': '2b12292faeb0a7d930c778c7a5b4759b',
|
'md5': '2b12292faeb0a7d930c778c7a5b4759b',
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
'id': 'jjv85FdZ',
|
'id': '293f6d17-692a-44e3-9fd5-7b178f3a1029',
|
||||||
'ext': 'mp4',
|
'ext': 'mp4',
|
||||||
'title': '"Blau ist mysteriös": Die Blue Man Group im Interview',
|
'title': '293f6d17-692a-44e3-9fd5-7b178f3a1029',
|
||||||
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
|
|
||||||
'thumbnail': r're:^https?://.*\.jpg$',
|
'thumbnail': r're:^https?://.*\.jpg$',
|
||||||
'duration': 254,
|
|
||||||
'timestamp': 1519211149,
|
|
||||||
'upload_date': '20180221',
|
|
||||||
},
|
},
|
||||||
}, {
|
}, {
|
||||||
'url': 'https://uvp-apapublisher.sf.apa.at/embed/2f94e9e6-d945-4db2-9548-f9a41ebf7b78',
|
'url': 'https://uvp-apapublisher.sf.apa.at/embed/2f94e9e6-d945-4db2-9548-f9a41ebf7b78',
|
||||||
@ -46,9 +42,11 @@ class APAIE(InfoExtractor):
|
|||||||
webpage)]
|
webpage)]
|
||||||
|
|
||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
video_id = self._match_id(url)
|
mobj = re.match(self._VALID_URL, url)
|
||||||
|
video_id, base_url = mobj.group('id', 'base_url')
|
||||||
|
|
||||||
webpage = self._download_webpage(url, video_id)
|
webpage = self._download_webpage(
|
||||||
|
'%s/player/%s' % (base_url, video_id), video_id)
|
||||||
|
|
||||||
jwplatform_id = self._search_regex(
|
jwplatform_id = self._search_regex(
|
||||||
r'media[iI]d\s*:\s*["\'](?P<id>[a-zA-Z0-9]{8})', webpage,
|
r'media[iI]d\s*:\s*["\'](?P<id>[a-zA-Z0-9]{8})', webpage,
|
||||||
@ -59,16 +57,18 @@ class APAIE(InfoExtractor):
|
|||||||
'jwplatform:' + jwplatform_id, ie='JWPlatform',
|
'jwplatform:' + jwplatform_id, ie='JWPlatform',
|
||||||
video_id=video_id)
|
video_id=video_id)
|
||||||
|
|
||||||
sources = self._parse_json(
|
def extract(field, name=None):
|
||||||
self._search_regex(
|
return self._search_regex(
|
||||||
r'sources\s*=\s*(\[.+?\])\s*;', webpage, 'sources'),
|
r'\b%s["\']\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1' % field,
|
||||||
video_id, transform_source=js_to_json)
|
webpage, name or field, default=None, group='value')
|
||||||
|
|
||||||
|
title = extract('title') or video_id
|
||||||
|
description = extract('description')
|
||||||
|
thumbnail = extract('poster', 'thumbnail')
|
||||||
|
|
||||||
formats = []
|
formats = []
|
||||||
for source in sources:
|
for format_id in ('hls', 'progressive'):
|
||||||
if not isinstance(source, dict):
|
source_url = url_or_none(extract(format_id))
|
||||||
continue
|
|
||||||
source_url = url_or_none(source.get('file'))
|
|
||||||
if not source_url:
|
if not source_url:
|
||||||
continue
|
continue
|
||||||
ext = determine_ext(source_url)
|
ext = determine_ext(source_url)
|
||||||
@ -77,18 +77,19 @@ class APAIE(InfoExtractor):
|
|||||||
source_url, video_id, 'mp4', entry_protocol='m3u8_native',
|
source_url, video_id, 'mp4', entry_protocol='m3u8_native',
|
||||||
m3u8_id='hls', fatal=False))
|
m3u8_id='hls', fatal=False))
|
||||||
else:
|
else:
|
||||||
|
height = int_or_none(self._search_regex(
|
||||||
|
r'(\d+)\.mp4', source_url, 'height', default=None))
|
||||||
formats.append({
|
formats.append({
|
||||||
'url': source_url,
|
'url': source_url,
|
||||||
|
'format_id': format_id,
|
||||||
|
'height': height,
|
||||||
})
|
})
|
||||||
self._sort_formats(formats)
|
self._sort_formats(formats)
|
||||||
|
|
||||||
thumbnail = self._search_regex(
|
|
||||||
r'image\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
|
|
||||||
'thumbnail', fatal=False, group='url')
|
|
||||||
|
|
||||||
return {
|
return {
|
||||||
'id': video_id,
|
'id': video_id,
|
||||||
'title': video_id,
|
'title': title,
|
||||||
|
'description': description,
|
||||||
'thumbnail': thumbnail,
|
'thumbnail': thumbnail,
|
||||||
'formats': formats,
|
'formats': formats,
|
||||||
}
|
}
|
||||||
|
@ -42,6 +42,7 @@ class ApplePodcastsIE(InfoExtractor):
|
|||||||
ember_data = self._parse_json(self._search_regex(
|
ember_data = self._parse_json(self._search_regex(
|
||||||
r'id="shoebox-ember-data-store"[^>]*>\s*({.+?})\s*<',
|
r'id="shoebox-ember-data-store"[^>]*>\s*({.+?})\s*<',
|
||||||
webpage, 'ember data'), episode_id)
|
webpage, 'ember data'), episode_id)
|
||||||
|
ember_data = ember_data.get(episode_id) or ember_data
|
||||||
episode = ember_data['data']['attributes']
|
episode = ember_data['data']['attributes']
|
||||||
description = episode.get('description') or {}
|
description = episode.get('description') or {}
|
||||||
|
|
||||||
|
@ -335,7 +335,7 @@ class ARDIE(InfoExtractor):
|
|||||||
|
|
||||||
|
|
||||||
class ARDBetaMediathekIE(ARDMediathekBaseIE):
|
class ARDBetaMediathekIE(ARDMediathekBaseIE):
|
||||||
_VALID_URL = r'https://(?:(?:beta|www)\.)?ardmediathek\.de/(?P<client>[^/]+)/(?:player|live|video)/(?P<display_id>(?:[^/]+/)*)(?P<video_id>[a-zA-Z0-9]+)'
|
_VALID_URL = r'https://(?:(?:beta|www)\.)?ardmediathek\.de/(?:[^/]+/)?(?:player|live|video)/(?:[^/]+/)*(?P<id>Y3JpZDovL[a-zA-Z0-9]+)'
|
||||||
_TESTS = [{
|
_TESTS = [{
|
||||||
'url': 'https://www.ardmediathek.de/mdr/video/die-robuste-roswita/Y3JpZDovL21kci5kZS9iZWl0cmFnL2Ntcy84MWMxN2MzZC0wMjkxLTRmMzUtODk4ZS0wYzhlOWQxODE2NGI/',
|
'url': 'https://www.ardmediathek.de/mdr/video/die-robuste-roswita/Y3JpZDovL21kci5kZS9iZWl0cmFnL2Ntcy84MWMxN2MzZC0wMjkxLTRmMzUtODk4ZS0wYzhlOWQxODE2NGI/',
|
||||||
'md5': 'a1dc75a39c61601b980648f7c9f9f71d',
|
'md5': 'a1dc75a39c61601b980648f7c9f9f71d',
|
||||||
@ -365,22 +365,22 @@ class ARDBetaMediathekIE(ARDMediathekBaseIE):
|
|||||||
}, {
|
}, {
|
||||||
'url': 'https://www.ardmediathek.de/swr/live/Y3JpZDovL3N3ci5kZS8xMzQ4MTA0Mg',
|
'url': 'https://www.ardmediathek.de/swr/live/Y3JpZDovL3N3ci5kZS8xMzQ4MTA0Mg',
|
||||||
'only_matching': True,
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://www.ardmediathek.de/video/coronavirus-update-ndr-info/astrazeneca-kurz-lockdown-und-pims-syndrom-81/ndr/Y3JpZDovL25kci5kZS84NzE0M2FjNi0wMWEwLTQ5ODEtOTE5NS1mOGZhNzdhOTFmOTI/',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://www.ardmediathek.de/ard/player/Y3JpZDovL3dkci5kZS9CZWl0cmFnLWQ2NDJjYWEzLTMwZWYtNGI4NS1iMTI2LTU1N2UxYTcxOGIzOQ/tatort-duo-koeln-leipzig-ihr-kinderlein-kommet',
|
||||||
|
'only_matching': True,
|
||||||
}]
|
}]
|
||||||
|
|
||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
mobj = re.match(self._VALID_URL, url)
|
video_id = self._match_id(url)
|
||||||
video_id = mobj.group('video_id')
|
|
||||||
display_id = mobj.group('display_id')
|
|
||||||
if display_id:
|
|
||||||
display_id = display_id.rstrip('/')
|
|
||||||
if not display_id:
|
|
||||||
display_id = video_id
|
|
||||||
|
|
||||||
player_page = self._download_json(
|
player_page = self._download_json(
|
||||||
'https://api.ardmediathek.de/public-gateway',
|
'https://api.ardmediathek.de/public-gateway',
|
||||||
display_id, data=json.dumps({
|
video_id, data=json.dumps({
|
||||||
'query': '''{
|
'query': '''{
|
||||||
playerPage(client:"%s", clipId: "%s") {
|
playerPage(client: "ard", clipId: "%s") {
|
||||||
blockedByFsk
|
blockedByFsk
|
||||||
broadcastedOn
|
broadcastedOn
|
||||||
maturityContentRating
|
maturityContentRating
|
||||||
@ -410,7 +410,7 @@ class ARDBetaMediathekIE(ARDMediathekBaseIE):
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}''' % (mobj.group('client'), video_id),
|
}''' % video_id,
|
||||||
}).encode(), headers={
|
}).encode(), headers={
|
||||||
'Content-Type': 'application/json'
|
'Content-Type': 'application/json'
|
||||||
})['data']['playerPage']
|
})['data']['playerPage']
|
||||||
@ -435,7 +435,6 @@ class ARDBetaMediathekIE(ARDMediathekBaseIE):
|
|||||||
r'\(FSK\s*(\d+)\)\s*$', description, 'age limit', default=None))
|
r'\(FSK\s*(\d+)\)\s*$', description, 'age limit', default=None))
|
||||||
info.update({
|
info.update({
|
||||||
'age_limit': age_limit,
|
'age_limit': age_limit,
|
||||||
'display_id': display_id,
|
|
||||||
'title': title,
|
'title': title,
|
||||||
'description': description,
|
'description': description,
|
||||||
'timestamp': unified_timestamp(player_page.get('broadcastedOn')),
|
'timestamp': unified_timestamp(player_page.get('broadcastedOn')),
|
||||||
|
101
youtube_dl/extractor/arnes.py
Normal file
101
youtube_dl/extractor/arnes.py
Normal file
@ -0,0 +1,101 @@
|
|||||||
|
# coding: utf-8
|
||||||
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
|
from .common import InfoExtractor
|
||||||
|
from ..compat import (
|
||||||
|
compat_parse_qs,
|
||||||
|
compat_urllib_parse_urlparse,
|
||||||
|
)
|
||||||
|
from ..utils import (
|
||||||
|
float_or_none,
|
||||||
|
int_or_none,
|
||||||
|
parse_iso8601,
|
||||||
|
remove_start,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
class ArnesIE(InfoExtractor):
|
||||||
|
IE_NAME = 'video.arnes.si'
|
||||||
|
IE_DESC = 'Arnes Video'
|
||||||
|
_VALID_URL = r'https?://video\.arnes\.si/(?:[a-z]{2}/)?(?:watch|embed|api/(?:asset|public/video))/(?P<id>[0-9a-zA-Z]{12})'
|
||||||
|
_TESTS = [{
|
||||||
|
'url': 'https://video.arnes.si/watch/a1qrWTOQfVoU?t=10',
|
||||||
|
'md5': '4d0f4d0a03571b33e1efac25fd4a065d',
|
||||||
|
'info_dict': {
|
||||||
|
'id': 'a1qrWTOQfVoU',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'title': 'Linearna neodvisnost, definicija',
|
||||||
|
'description': 'Linearna neodvisnost, definicija',
|
||||||
|
'license': 'PRIVATE',
|
||||||
|
'creator': 'Polona Oblak',
|
||||||
|
'timestamp': 1585063725,
|
||||||
|
'upload_date': '20200324',
|
||||||
|
'channel': 'Polona Oblak',
|
||||||
|
'channel_id': 'q6pc04hw24cj',
|
||||||
|
'channel_url': 'https://video.arnes.si/?channel=q6pc04hw24cj',
|
||||||
|
'duration': 596.75,
|
||||||
|
'view_count': int,
|
||||||
|
'tags': ['linearna_algebra'],
|
||||||
|
'start_time': 10,
|
||||||
|
}
|
||||||
|
}, {
|
||||||
|
'url': 'https://video.arnes.si/api/asset/s1YjnV7hadlC/play.mp4',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://video.arnes.si/embed/s1YjnV7hadlC',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://video.arnes.si/en/watch/s1YjnV7hadlC',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://video.arnes.si/embed/s1YjnV7hadlC?t=123&hideRelated=1',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://video.arnes.si/api/public/video/s1YjnV7hadlC',
|
||||||
|
'only_matching': True,
|
||||||
|
}]
|
||||||
|
_BASE_URL = 'https://video.arnes.si'
|
||||||
|
|
||||||
|
def _real_extract(self, url):
|
||||||
|
video_id = self._match_id(url)
|
||||||
|
|
||||||
|
video = self._download_json(
|
||||||
|
self._BASE_URL + '/api/public/video/' + video_id, video_id)['data']
|
||||||
|
title = video['title']
|
||||||
|
|
||||||
|
formats = []
|
||||||
|
for media in (video.get('media') or []):
|
||||||
|
media_url = media.get('url')
|
||||||
|
if not media_url:
|
||||||
|
continue
|
||||||
|
formats.append({
|
||||||
|
'url': self._BASE_URL + media_url,
|
||||||
|
'format_id': remove_start(media.get('format'), 'FORMAT_'),
|
||||||
|
'format_note': media.get('formatTranslation'),
|
||||||
|
'width': int_or_none(media.get('width')),
|
||||||
|
'height': int_or_none(media.get('height')),
|
||||||
|
})
|
||||||
|
self._sort_formats(formats)
|
||||||
|
|
||||||
|
channel = video.get('channel') or {}
|
||||||
|
channel_id = channel.get('url')
|
||||||
|
thumbnail = video.get('thumbnailUrl')
|
||||||
|
|
||||||
|
return {
|
||||||
|
'id': video_id,
|
||||||
|
'title': title,
|
||||||
|
'formats': formats,
|
||||||
|
'thumbnail': self._BASE_URL + thumbnail,
|
||||||
|
'description': video.get('description'),
|
||||||
|
'license': video.get('license'),
|
||||||
|
'creator': video.get('author'),
|
||||||
|
'timestamp': parse_iso8601(video.get('creationTime')),
|
||||||
|
'channel': channel.get('name'),
|
||||||
|
'channel_id': channel_id,
|
||||||
|
'channel_url': self._BASE_URL + '/?channel=' + channel_id if channel_id else None,
|
||||||
|
'duration': float_or_none(video.get('duration'), 1000),
|
||||||
|
'view_count': int_or_none(video.get('views')),
|
||||||
|
'tags': video.get('hashtags'),
|
||||||
|
'start_time': int_or_none(compat_parse_qs(
|
||||||
|
compat_urllib_parse_urlparse(url).query).get('t', [None])[0]),
|
||||||
|
}
|
37
youtube_dl/extractor/bandaichannel.py
Normal file
37
youtube_dl/extractor/bandaichannel.py
Normal file
@ -0,0 +1,37 @@
|
|||||||
|
# coding: utf-8
|
||||||
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
|
from .brightcove import BrightcoveNewIE
|
||||||
|
from ..utils import extract_attributes
|
||||||
|
|
||||||
|
|
||||||
|
class BandaiChannelIE(BrightcoveNewIE):
|
||||||
|
IE_NAME = 'bandaichannel'
|
||||||
|
_VALID_URL = r'https?://(?:www\.)?b-ch\.com/titles/(?P<id>\d+/\d+)'
|
||||||
|
_TESTS = [{
|
||||||
|
'url': 'https://www.b-ch.com/titles/514/001',
|
||||||
|
'md5': 'a0f2d787baa5729bed71108257f613a4',
|
||||||
|
'info_dict': {
|
||||||
|
'id': '6128044564001',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'title': 'メタルファイターMIKU 第1話',
|
||||||
|
'timestamp': 1580354056,
|
||||||
|
'uploader_id': '5797077852001',
|
||||||
|
'upload_date': '20200130',
|
||||||
|
'duration': 1387.733,
|
||||||
|
},
|
||||||
|
'params': {
|
||||||
|
'format': 'bestvideo',
|
||||||
|
'skip_download': True,
|
||||||
|
},
|
||||||
|
}]
|
||||||
|
|
||||||
|
def _real_extract(self, url):
|
||||||
|
video_id = self._match_id(url)
|
||||||
|
webpage = self._download_webpage(url, video_id)
|
||||||
|
attrs = extract_attributes(self._search_regex(
|
||||||
|
r'(<video-js[^>]+\bid="bcplayer"[^>]*>)', webpage, 'player'))
|
||||||
|
bc = self._download_json(
|
||||||
|
'https://pbifcd.b-ch.com/v1/playbackinfo/ST/70/' + attrs['data-info'],
|
||||||
|
video_id, headers={'X-API-KEY': attrs['data-auth'].strip()})['bc']
|
||||||
|
return self._parse_brightcove_metadata(bc, bc['id'])
|
@ -49,6 +49,7 @@ class BandcampIE(InfoExtractor):
|
|||||||
'uploader': 'Ben Prunty',
|
'uploader': 'Ben Prunty',
|
||||||
'timestamp': 1396508491,
|
'timestamp': 1396508491,
|
||||||
'upload_date': '20140403',
|
'upload_date': '20140403',
|
||||||
|
'release_timestamp': 1396483200,
|
||||||
'release_date': '20140403',
|
'release_date': '20140403',
|
||||||
'duration': 260.877,
|
'duration': 260.877,
|
||||||
'track': 'Lanius (Battle)',
|
'track': 'Lanius (Battle)',
|
||||||
@ -69,6 +70,7 @@ class BandcampIE(InfoExtractor):
|
|||||||
'uploader': 'Mastodon',
|
'uploader': 'Mastodon',
|
||||||
'timestamp': 1322005399,
|
'timestamp': 1322005399,
|
||||||
'upload_date': '20111122',
|
'upload_date': '20111122',
|
||||||
|
'release_timestamp': 1076112000,
|
||||||
'release_date': '20040207',
|
'release_date': '20040207',
|
||||||
'duration': 120.79,
|
'duration': 120.79,
|
||||||
'track': 'Hail to Fire',
|
'track': 'Hail to Fire',
|
||||||
@ -197,7 +199,7 @@ class BandcampIE(InfoExtractor):
|
|||||||
'thumbnail': thumbnail,
|
'thumbnail': thumbnail,
|
||||||
'uploader': artist,
|
'uploader': artist,
|
||||||
'timestamp': timestamp,
|
'timestamp': timestamp,
|
||||||
'release_date': unified_strdate(tralbum.get('album_release_date')),
|
'release_timestamp': unified_timestamp(tralbum.get('album_release_date')),
|
||||||
'duration': duration,
|
'duration': duration,
|
||||||
'track': track,
|
'track': track,
|
||||||
'track_number': track_number,
|
'track_number': track_number,
|
||||||
|
@ -1,31 +1,39 @@
|
|||||||
# coding: utf-8
|
# coding: utf-8
|
||||||
from __future__ import unicode_literals
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
|
import functools
|
||||||
import itertools
|
import itertools
|
||||||
|
import json
|
||||||
import re
|
import re
|
||||||
|
|
||||||
from .common import InfoExtractor
|
from .common import InfoExtractor
|
||||||
|
from ..compat import (
|
||||||
|
compat_etree_Element,
|
||||||
|
compat_HTTPError,
|
||||||
|
compat_parse_qs,
|
||||||
|
compat_str,
|
||||||
|
compat_urllib_parse_urlparse,
|
||||||
|
compat_urlparse,
|
||||||
|
)
|
||||||
from ..utils import (
|
from ..utils import (
|
||||||
|
ExtractorError,
|
||||||
|
OnDemandPagedList,
|
||||||
clean_html,
|
clean_html,
|
||||||
dict_get,
|
dict_get,
|
||||||
ExtractorError,
|
|
||||||
float_or_none,
|
float_or_none,
|
||||||
get_element_by_class,
|
get_element_by_class,
|
||||||
int_or_none,
|
int_or_none,
|
||||||
js_to_json,
|
js_to_json,
|
||||||
parse_duration,
|
parse_duration,
|
||||||
parse_iso8601,
|
parse_iso8601,
|
||||||
|
strip_or_none,
|
||||||
try_get,
|
try_get,
|
||||||
unescapeHTML,
|
unescapeHTML,
|
||||||
|
unified_timestamp,
|
||||||
url_or_none,
|
url_or_none,
|
||||||
urlencode_postdata,
|
urlencode_postdata,
|
||||||
urljoin,
|
urljoin,
|
||||||
)
|
)
|
||||||
from ..compat import (
|
|
||||||
compat_etree_Element,
|
|
||||||
compat_HTTPError,
|
|
||||||
compat_urlparse,
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
class BBCCoUkIE(InfoExtractor):
|
class BBCCoUkIE(InfoExtractor):
|
||||||
@ -756,8 +764,17 @@ class BBCIE(BBCCoUkIE):
|
|||||||
'only_matching': True,
|
'only_matching': True,
|
||||||
}, {
|
}, {
|
||||||
# custom redirection to www.bbc.com
|
# custom redirection to www.bbc.com
|
||||||
|
# also, video with window.__INITIAL_DATA__
|
||||||
'url': 'http://www.bbc.co.uk/news/science-environment-33661876',
|
'url': 'http://www.bbc.co.uk/news/science-environment-33661876',
|
||||||
'only_matching': True,
|
'info_dict': {
|
||||||
|
'id': 'p02xzws1',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'title': "Pluto may have 'nitrogen glaciers'",
|
||||||
|
'description': 'md5:6a95b593f528d7a5f2605221bc56912f',
|
||||||
|
'thumbnail': r're:https?://.+/.+\.jpg',
|
||||||
|
'timestamp': 1437785037,
|
||||||
|
'upload_date': '20150725',
|
||||||
|
},
|
||||||
}, {
|
}, {
|
||||||
# single video article embedded with data-media-vpid
|
# single video article embedded with data-media-vpid
|
||||||
'url': 'http://www.bbc.co.uk/sport/rowing/35908187',
|
'url': 'http://www.bbc.co.uk/sport/rowing/35908187',
|
||||||
@ -793,11 +810,25 @@ class BBCIE(BBCCoUkIE):
|
|||||||
'description': 'Learn English words and phrases from this story',
|
'description': 'Learn English words and phrases from this story',
|
||||||
},
|
},
|
||||||
'add_ie': [BBCCoUkIE.ie_key()],
|
'add_ie': [BBCCoUkIE.ie_key()],
|
||||||
|
}, {
|
||||||
|
# BBC Reel
|
||||||
|
'url': 'https://www.bbc.com/reel/video/p07c6sb6/how-positive-thinking-is-harming-your-happiness',
|
||||||
|
'info_dict': {
|
||||||
|
'id': 'p07c6sb9',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'title': 'How positive thinking is harming your happiness',
|
||||||
|
'alt_title': 'The downsides of positive thinking',
|
||||||
|
'description': 'md5:fad74b31da60d83b8265954ee42d85b4',
|
||||||
|
'duration': 235,
|
||||||
|
'thumbnail': r're:https?://.+/p07c9dsr.jpg',
|
||||||
|
'upload_date': '20190604',
|
||||||
|
'categories': ['Psychology'],
|
||||||
|
},
|
||||||
}]
|
}]
|
||||||
|
|
||||||
@classmethod
|
@classmethod
|
||||||
def suitable(cls, url):
|
def suitable(cls, url):
|
||||||
EXCLUDE_IE = (BBCCoUkIE, BBCCoUkArticleIE, BBCCoUkIPlayerPlaylistIE, BBCCoUkPlaylistIE)
|
EXCLUDE_IE = (BBCCoUkIE, BBCCoUkArticleIE, BBCCoUkIPlayerEpisodesIE, BBCCoUkIPlayerGroupIE, BBCCoUkPlaylistIE)
|
||||||
return (False if any(ie.suitable(url) for ie in EXCLUDE_IE)
|
return (False if any(ie.suitable(url) for ie in EXCLUDE_IE)
|
||||||
else super(BBCIE, cls).suitable(url))
|
else super(BBCIE, cls).suitable(url))
|
||||||
|
|
||||||
@ -929,7 +960,7 @@ class BBCIE(BBCCoUkIE):
|
|||||||
else:
|
else:
|
||||||
entry['title'] = info['title']
|
entry['title'] = info['title']
|
||||||
entry['formats'].extend(info['formats'])
|
entry['formats'].extend(info['formats'])
|
||||||
except Exception as e:
|
except ExtractorError as e:
|
||||||
# Some playlist URL may fail with 500, at the same time
|
# Some playlist URL may fail with 500, at the same time
|
||||||
# the other one may work fine (e.g.
|
# the other one may work fine (e.g.
|
||||||
# http://www.bbc.com/turkce/haberler/2015/06/150615_telabyad_kentin_cogu)
|
# http://www.bbc.com/turkce/haberler/2015/06/150615_telabyad_kentin_cogu)
|
||||||
@ -980,6 +1011,37 @@ class BBCIE(BBCCoUkIE):
|
|||||||
'subtitles': subtitles,
|
'subtitles': subtitles,
|
||||||
}
|
}
|
||||||
|
|
||||||
|
# bbc reel (e.g. https://www.bbc.com/reel/video/p07c6sb6/how-positive-thinking-is-harming-your-happiness)
|
||||||
|
initial_data = self._parse_json(self._html_search_regex(
|
||||||
|
r'<script[^>]+id=(["\'])initial-data\1[^>]+data-json=(["\'])(?P<json>(?:(?!\2).)+)',
|
||||||
|
webpage, 'initial data', default='{}', group='json'), playlist_id, fatal=False)
|
||||||
|
if initial_data:
|
||||||
|
init_data = try_get(
|
||||||
|
initial_data, lambda x: x['initData']['items'][0], dict) or {}
|
||||||
|
smp_data = init_data.get('smpData') or {}
|
||||||
|
clip_data = try_get(smp_data, lambda x: x['items'][0], dict) or {}
|
||||||
|
version_id = clip_data.get('versionID')
|
||||||
|
if version_id:
|
||||||
|
title = smp_data['title']
|
||||||
|
formats, subtitles = self._download_media_selector(version_id)
|
||||||
|
self._sort_formats(formats)
|
||||||
|
image_url = smp_data.get('holdingImageURL')
|
||||||
|
display_date = init_data.get('displayDate')
|
||||||
|
topic_title = init_data.get('topicTitle')
|
||||||
|
|
||||||
|
return {
|
||||||
|
'id': version_id,
|
||||||
|
'title': title,
|
||||||
|
'formats': formats,
|
||||||
|
'alt_title': init_data.get('shortTitle'),
|
||||||
|
'thumbnail': image_url.replace('$recipe', 'raw') if image_url else None,
|
||||||
|
'description': smp_data.get('summary') or init_data.get('shortSummary'),
|
||||||
|
'upload_date': display_date.replace('-', '') if display_date else None,
|
||||||
|
'subtitles': subtitles,
|
||||||
|
'duration': int_or_none(clip_data.get('duration')),
|
||||||
|
'categories': [topic_title] if topic_title else None,
|
||||||
|
}
|
||||||
|
|
||||||
# Morph based embed (e.g. http://www.bbc.co.uk/sport/live/olympics/36895975)
|
# Morph based embed (e.g. http://www.bbc.co.uk/sport/live/olympics/36895975)
|
||||||
# There are several setPayload calls may be present but the video
|
# There are several setPayload calls may be present but the video
|
||||||
# seems to be always related to the first one
|
# seems to be always related to the first one
|
||||||
@ -1041,7 +1103,7 @@ class BBCIE(BBCCoUkIE):
|
|||||||
thumbnail = None
|
thumbnail = None
|
||||||
image_url = current_programme.get('image_url')
|
image_url = current_programme.get('image_url')
|
||||||
if image_url:
|
if image_url:
|
||||||
thumbnail = image_url.replace('{recipe}', '1920x1920')
|
thumbnail = image_url.replace('{recipe}', 'raw')
|
||||||
return {
|
return {
|
||||||
'id': programme_id,
|
'id': programme_id,
|
||||||
'title': title,
|
'title': title,
|
||||||
@ -1114,12 +1176,29 @@ class BBCIE(BBCCoUkIE):
|
|||||||
continue
|
continue
|
||||||
formats, subtitles = self._download_media_selector(item_id)
|
formats, subtitles = self._download_media_selector(item_id)
|
||||||
self._sort_formats(formats)
|
self._sort_formats(formats)
|
||||||
|
item_desc = None
|
||||||
|
blocks = try_get(media, lambda x: x['summary']['blocks'], list)
|
||||||
|
if blocks:
|
||||||
|
summary = []
|
||||||
|
for block in blocks:
|
||||||
|
text = try_get(block, lambda x: x['model']['text'], compat_str)
|
||||||
|
if text:
|
||||||
|
summary.append(text)
|
||||||
|
if summary:
|
||||||
|
item_desc = '\n\n'.join(summary)
|
||||||
|
item_time = None
|
||||||
|
for meta in try_get(media, lambda x: x['metadata']['items'], list) or []:
|
||||||
|
if try_get(meta, lambda x: x['label']) == 'Published':
|
||||||
|
item_time = unified_timestamp(meta.get('timestamp'))
|
||||||
|
break
|
||||||
entries.append({
|
entries.append({
|
||||||
'id': item_id,
|
'id': item_id,
|
||||||
'title': item_title,
|
'title': item_title,
|
||||||
'thumbnail': item.get('holdingImageUrl'),
|
'thumbnail': item.get('holdingImageUrl'),
|
||||||
'formats': formats,
|
'formats': formats,
|
||||||
'subtitles': subtitles,
|
'subtitles': subtitles,
|
||||||
|
'timestamp': item_time,
|
||||||
|
'description': strip_or_none(item_desc),
|
||||||
})
|
})
|
||||||
for resp in (initial_data.get('data') or {}).values():
|
for resp in (initial_data.get('data') or {}).values():
|
||||||
name = resp.get('name')
|
name = resp.get('name')
|
||||||
@ -1293,21 +1372,149 @@ class BBCCoUkPlaylistBaseIE(InfoExtractor):
|
|||||||
playlist_id, title, description)
|
playlist_id, title, description)
|
||||||
|
|
||||||
|
|
||||||
class BBCCoUkIPlayerPlaylistIE(BBCCoUkPlaylistBaseIE):
|
class BBCCoUkIPlayerPlaylistBaseIE(InfoExtractor):
|
||||||
IE_NAME = 'bbc.co.uk:iplayer:playlist'
|
_VALID_URL_TMPL = r'https?://(?:www\.)?bbc\.co\.uk/iplayer/%%s/(?P<id>%s)' % BBCCoUkIE._ID_REGEX
|
||||||
_VALID_URL = r'https?://(?:www\.)?bbc\.co\.uk/iplayer/(?:episodes|group)/(?P<id>%s)' % BBCCoUkIE._ID_REGEX
|
|
||||||
_URL_TEMPLATE = 'http://www.bbc.co.uk/iplayer/episode/%s'
|
@staticmethod
|
||||||
_VIDEO_ID_TEMPLATE = r'data-ip-id=["\'](%s)'
|
def _get_default(episode, key, default_key='default'):
|
||||||
|
return try_get(episode, lambda x: x[key][default_key])
|
||||||
|
|
||||||
|
def _get_description(self, data):
|
||||||
|
synopsis = data.get(self._DESCRIPTION_KEY) or {}
|
||||||
|
return dict_get(synopsis, ('large', 'medium', 'small'))
|
||||||
|
|
||||||
|
def _fetch_page(self, programme_id, per_page, series_id, page):
|
||||||
|
elements = self._get_elements(self._call_api(
|
||||||
|
programme_id, per_page, page + 1, series_id))
|
||||||
|
for element in elements:
|
||||||
|
episode = self._get_episode(element)
|
||||||
|
episode_id = episode.get('id')
|
||||||
|
if not episode_id:
|
||||||
|
continue
|
||||||
|
thumbnail = None
|
||||||
|
image = self._get_episode_image(episode)
|
||||||
|
if image:
|
||||||
|
thumbnail = image.replace('{recipe}', 'raw')
|
||||||
|
category = self._get_default(episode, 'labels', 'category')
|
||||||
|
yield {
|
||||||
|
'_type': 'url',
|
||||||
|
'id': episode_id,
|
||||||
|
'title': self._get_episode_field(episode, 'subtitle'),
|
||||||
|
'url': 'https://www.bbc.co.uk/iplayer/episode/' + episode_id,
|
||||||
|
'thumbnail': thumbnail,
|
||||||
|
'description': self._get_description(episode),
|
||||||
|
'categories': [category] if category else None,
|
||||||
|
'series': self._get_episode_field(episode, 'title'),
|
||||||
|
'ie_key': BBCCoUkIE.ie_key(),
|
||||||
|
}
|
||||||
|
|
||||||
|
def _real_extract(self, url):
|
||||||
|
pid = self._match_id(url)
|
||||||
|
qs = compat_parse_qs(compat_urllib_parse_urlparse(url).query)
|
||||||
|
series_id = qs.get('seriesId', [None])[0]
|
||||||
|
page = qs.get('page', [None])[0]
|
||||||
|
per_page = 36 if page else self._PAGE_SIZE
|
||||||
|
fetch_page = functools.partial(self._fetch_page, pid, per_page, series_id)
|
||||||
|
entries = fetch_page(int(page) - 1) if page else OnDemandPagedList(fetch_page, self._PAGE_SIZE)
|
||||||
|
playlist_data = self._get_playlist_data(self._call_api(pid, 1))
|
||||||
|
return self.playlist_result(
|
||||||
|
entries, pid, self._get_playlist_title(playlist_data),
|
||||||
|
self._get_description(playlist_data))
|
||||||
|
|
||||||
|
|
||||||
|
class BBCCoUkIPlayerEpisodesIE(BBCCoUkIPlayerPlaylistBaseIE):
|
||||||
|
IE_NAME = 'bbc.co.uk:iplayer:episodes'
|
||||||
|
_VALID_URL = BBCCoUkIPlayerPlaylistBaseIE._VALID_URL_TMPL % 'episodes'
|
||||||
_TESTS = [{
|
_TESTS = [{
|
||||||
'url': 'http://www.bbc.co.uk/iplayer/episodes/b05rcz9v',
|
'url': 'http://www.bbc.co.uk/iplayer/episodes/b05rcz9v',
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
'id': 'b05rcz9v',
|
'id': 'b05rcz9v',
|
||||||
'title': 'The Disappearance',
|
'title': 'The Disappearance',
|
||||||
'description': 'French thriller serial about a missing teenager.',
|
'description': 'md5:58eb101aee3116bad4da05f91179c0cb',
|
||||||
},
|
},
|
||||||
'playlist_mincount': 6,
|
'playlist_mincount': 8,
|
||||||
'skip': 'This programme is not currently available on BBC iPlayer',
|
|
||||||
}, {
|
}, {
|
||||||
|
# all seasons
|
||||||
|
'url': 'https://www.bbc.co.uk/iplayer/episodes/b094m5t9/doctor-foster',
|
||||||
|
'info_dict': {
|
||||||
|
'id': 'b094m5t9',
|
||||||
|
'title': 'Doctor Foster',
|
||||||
|
'description': 'md5:5aa9195fad900e8e14b52acd765a9fd6',
|
||||||
|
},
|
||||||
|
'playlist_mincount': 10,
|
||||||
|
}, {
|
||||||
|
# explicit season
|
||||||
|
'url': 'https://www.bbc.co.uk/iplayer/episodes/b094m5t9/doctor-foster?seriesId=b094m6nv',
|
||||||
|
'info_dict': {
|
||||||
|
'id': 'b094m5t9',
|
||||||
|
'title': 'Doctor Foster',
|
||||||
|
'description': 'md5:5aa9195fad900e8e14b52acd765a9fd6',
|
||||||
|
},
|
||||||
|
'playlist_mincount': 5,
|
||||||
|
}, {
|
||||||
|
# all pages
|
||||||
|
'url': 'https://www.bbc.co.uk/iplayer/episodes/m0004c4v/beechgrove',
|
||||||
|
'info_dict': {
|
||||||
|
'id': 'm0004c4v',
|
||||||
|
'title': 'Beechgrove',
|
||||||
|
'description': 'Gardening show that celebrates Scottish horticulture and growing conditions.',
|
||||||
|
},
|
||||||
|
'playlist_mincount': 37,
|
||||||
|
}, {
|
||||||
|
# explicit page
|
||||||
|
'url': 'https://www.bbc.co.uk/iplayer/episodes/m0004c4v/beechgrove?page=2',
|
||||||
|
'info_dict': {
|
||||||
|
'id': 'm0004c4v',
|
||||||
|
'title': 'Beechgrove',
|
||||||
|
'description': 'Gardening show that celebrates Scottish horticulture and growing conditions.',
|
||||||
|
},
|
||||||
|
'playlist_mincount': 1,
|
||||||
|
}]
|
||||||
|
_PAGE_SIZE = 100
|
||||||
|
_DESCRIPTION_KEY = 'synopsis'
|
||||||
|
|
||||||
|
def _get_episode_image(self, episode):
|
||||||
|
return self._get_default(episode, 'image')
|
||||||
|
|
||||||
|
def _get_episode_field(self, episode, field):
|
||||||
|
return self._get_default(episode, field)
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _get_elements(data):
|
||||||
|
return data['entities']['results']
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _get_episode(element):
|
||||||
|
return element.get('episode') or {}
|
||||||
|
|
||||||
|
def _call_api(self, pid, per_page, page=1, series_id=None):
|
||||||
|
variables = {
|
||||||
|
'id': pid,
|
||||||
|
'page': page,
|
||||||
|
'perPage': per_page,
|
||||||
|
}
|
||||||
|
if series_id:
|
||||||
|
variables['sliceId'] = series_id
|
||||||
|
return self._download_json(
|
||||||
|
'https://graph.ibl.api.bbc.co.uk/', pid, headers={
|
||||||
|
'Content-Type': 'application/json'
|
||||||
|
}, data=json.dumps({
|
||||||
|
'id': '5692d93d5aac8d796a0305e895e61551',
|
||||||
|
'variables': variables,
|
||||||
|
}).encode('utf-8'))['data']['programme']
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _get_playlist_data(data):
|
||||||
|
return data
|
||||||
|
|
||||||
|
def _get_playlist_title(self, data):
|
||||||
|
return self._get_default(data, 'title')
|
||||||
|
|
||||||
|
|
||||||
|
class BBCCoUkIPlayerGroupIE(BBCCoUkIPlayerPlaylistBaseIE):
|
||||||
|
IE_NAME = 'bbc.co.uk:iplayer:group'
|
||||||
|
_VALID_URL = BBCCoUkIPlayerPlaylistBaseIE._VALID_URL_TMPL % 'group'
|
||||||
|
_TESTS = [{
|
||||||
# Available for over a year unlike 30 days for most other programmes
|
# Available for over a year unlike 30 days for most other programmes
|
||||||
'url': 'http://www.bbc.co.uk/iplayer/group/p02tcc32',
|
'url': 'http://www.bbc.co.uk/iplayer/group/p02tcc32',
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
@ -1316,14 +1523,56 @@ class BBCCoUkIPlayerPlaylistIE(BBCCoUkPlaylistBaseIE):
|
|||||||
'description': 'md5:683e901041b2fe9ba596f2ab04c4dbe7',
|
'description': 'md5:683e901041b2fe9ba596f2ab04c4dbe7',
|
||||||
},
|
},
|
||||||
'playlist_mincount': 10,
|
'playlist_mincount': 10,
|
||||||
|
}, {
|
||||||
|
# all pages
|
||||||
|
'url': 'https://www.bbc.co.uk/iplayer/group/p081d7j7',
|
||||||
|
'info_dict': {
|
||||||
|
'id': 'p081d7j7',
|
||||||
|
'title': 'Music in Scotland',
|
||||||
|
'description': 'Perfomances in Scotland and programmes featuring Scottish acts.',
|
||||||
|
},
|
||||||
|
'playlist_mincount': 47,
|
||||||
|
}, {
|
||||||
|
# explicit page
|
||||||
|
'url': 'https://www.bbc.co.uk/iplayer/group/p081d7j7?page=2',
|
||||||
|
'info_dict': {
|
||||||
|
'id': 'p081d7j7',
|
||||||
|
'title': 'Music in Scotland',
|
||||||
|
'description': 'Perfomances in Scotland and programmes featuring Scottish acts.',
|
||||||
|
},
|
||||||
|
'playlist_mincount': 11,
|
||||||
}]
|
}]
|
||||||
|
_PAGE_SIZE = 200
|
||||||
|
_DESCRIPTION_KEY = 'synopses'
|
||||||
|
|
||||||
def _extract_title_and_description(self, webpage):
|
def _get_episode_image(self, episode):
|
||||||
title = self._search_regex(r'<h1>([^<]+)</h1>', webpage, 'title', fatal=False)
|
return self._get_default(episode, 'images', 'standard')
|
||||||
description = self._search_regex(
|
|
||||||
r'<p[^>]+class=(["\'])subtitle\1[^>]*>(?P<value>[^<]+)</p>',
|
def _get_episode_field(self, episode, field):
|
||||||
webpage, 'description', fatal=False, group='value')
|
return episode.get(field)
|
||||||
return title, description
|
|
||||||
|
@staticmethod
|
||||||
|
def _get_elements(data):
|
||||||
|
return data['elements']
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _get_episode(element):
|
||||||
|
return element
|
||||||
|
|
||||||
|
def _call_api(self, pid, per_page, page=1, series_id=None):
|
||||||
|
return self._download_json(
|
||||||
|
'http://ibl.api.bbc.co.uk/ibl/v1/groups/%s/episodes' % pid,
|
||||||
|
pid, query={
|
||||||
|
'page': page,
|
||||||
|
'per_page': per_page,
|
||||||
|
})['group_episodes']
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _get_playlist_data(data):
|
||||||
|
return data['group']
|
||||||
|
|
||||||
|
def _get_playlist_title(self, data):
|
||||||
|
return data.get('title')
|
||||||
|
|
||||||
|
|
||||||
class BBCCoUkPlaylistIE(BBCCoUkPlaylistBaseIE):
|
class BBCCoUkPlaylistIE(BBCCoUkPlaylistBaseIE):
|
||||||
|
@ -156,6 +156,7 @@ class BiliBiliIE(InfoExtractor):
|
|||||||
cid = js['result']['cid']
|
cid = js['result']['cid']
|
||||||
|
|
||||||
headers = {
|
headers = {
|
||||||
|
'Accept': 'application/json',
|
||||||
'Referer': url
|
'Referer': url
|
||||||
}
|
}
|
||||||
headers.update(self.geo_verification_headers())
|
headers.update(self.geo_verification_headers())
|
||||||
|
@ -1,86 +0,0 @@
|
|||||||
from __future__ import unicode_literals
|
|
||||||
|
|
||||||
import json
|
|
||||||
|
|
||||||
from .common import InfoExtractor
|
|
||||||
from ..utils import (
|
|
||||||
remove_start,
|
|
||||||
int_or_none,
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
class BlinkxIE(InfoExtractor):
|
|
||||||
_VALID_URL = r'(?:https?://(?:www\.)blinkx\.com/#?ce/|blinkx:)(?P<id>[^?]+)'
|
|
||||||
IE_NAME = 'blinkx'
|
|
||||||
|
|
||||||
_TEST = {
|
|
||||||
'url': 'http://www.blinkx.com/ce/Da0Gw3xc5ucpNduzLuDDlv4WC9PuI4fDi1-t6Y3LyfdY2SZS5Urbvn-UPJvrvbo8LTKTc67Wu2rPKSQDJyZeeORCR8bYkhs8lI7eqddznH2ofh5WEEdjYXnoRtj7ByQwt7atMErmXIeYKPsSDuMAAqJDlQZ-3Ff4HJVeH_s3Gh8oQ',
|
|
||||||
'md5': '337cf7a344663ec79bf93a526a2e06c7',
|
|
||||||
'info_dict': {
|
|
||||||
'id': 'Da0Gw3xc',
|
|
||||||
'ext': 'mp4',
|
|
||||||
'title': 'No Daily Show for John Oliver; HBO Show Renewed - IGN News',
|
|
||||||
'uploader': 'IGN News',
|
|
||||||
'upload_date': '20150217',
|
|
||||||
'timestamp': 1424215740,
|
|
||||||
'description': 'HBO has renewed Last Week Tonight With John Oliver for two more seasons.',
|
|
||||||
'duration': 47.743333,
|
|
||||||
},
|
|
||||||
}
|
|
||||||
|
|
||||||
def _real_extract(self, url):
|
|
||||||
video_id = self._match_id(url)
|
|
||||||
display_id = video_id[:8]
|
|
||||||
|
|
||||||
api_url = ('https://apib4.blinkx.com/api.php?action=play_video&'
|
|
||||||
+ 'video=%s' % video_id)
|
|
||||||
data_json = self._download_webpage(api_url, display_id)
|
|
||||||
data = json.loads(data_json)['api']['results'][0]
|
|
||||||
duration = None
|
|
||||||
thumbnails = []
|
|
||||||
formats = []
|
|
||||||
for m in data['media']:
|
|
||||||
if m['type'] == 'jpg':
|
|
||||||
thumbnails.append({
|
|
||||||
'url': m['link'],
|
|
||||||
'width': int(m['w']),
|
|
||||||
'height': int(m['h']),
|
|
||||||
})
|
|
||||||
elif m['type'] == 'original':
|
|
||||||
duration = float(m['d'])
|
|
||||||
elif m['type'] == 'youtube':
|
|
||||||
yt_id = m['link']
|
|
||||||
self.to_screen('Youtube video detected: %s' % yt_id)
|
|
||||||
return self.url_result(yt_id, 'Youtube', video_id=yt_id)
|
|
||||||
elif m['type'] in ('flv', 'mp4'):
|
|
||||||
vcodec = remove_start(m['vcodec'], 'ff')
|
|
||||||
acodec = remove_start(m['acodec'], 'ff')
|
|
||||||
vbr = int_or_none(m.get('vbr') or m.get('vbitrate'), 1000)
|
|
||||||
abr = int_or_none(m.get('abr') or m.get('abitrate'), 1000)
|
|
||||||
tbr = vbr + abr if vbr and abr else None
|
|
||||||
format_id = '%s-%sk-%s' % (vcodec, tbr, m['w'])
|
|
||||||
formats.append({
|
|
||||||
'format_id': format_id,
|
|
||||||
'url': m['link'],
|
|
||||||
'vcodec': vcodec,
|
|
||||||
'acodec': acodec,
|
|
||||||
'abr': abr,
|
|
||||||
'vbr': vbr,
|
|
||||||
'tbr': tbr,
|
|
||||||
'width': int_or_none(m.get('w')),
|
|
||||||
'height': int_or_none(m.get('h')),
|
|
||||||
})
|
|
||||||
|
|
||||||
self._sort_formats(formats)
|
|
||||||
|
|
||||||
return {
|
|
||||||
'id': display_id,
|
|
||||||
'fullid': video_id,
|
|
||||||
'title': data['title'],
|
|
||||||
'formats': formats,
|
|
||||||
'uploader': data['channel_name'],
|
|
||||||
'timestamp': data['pubdate_epoch'],
|
|
||||||
'description': data.get('description'),
|
|
||||||
'thumbnails': thumbnails,
|
|
||||||
'duration': duration,
|
|
||||||
}
|
|
@ -27,7 +27,7 @@ class CBSBaseIE(ThePlatformFeedIE):
|
|||||||
|
|
||||||
|
|
||||||
class CBSIE(CBSBaseIE):
|
class CBSIE(CBSBaseIE):
|
||||||
_VALID_URL = r'(?:cbs:|https?://(?:www\.)?(?:cbs\.com/shows/[^/]+/video|colbertlateshow\.com/(?:video|podcasts))/)(?P<id>[\w-]+)'
|
_VALID_URL = r'(?:cbs:|https?://(?:www\.)?(?:(?:cbs|paramountplus)\.com/shows/[^/]+/video|colbertlateshow\.com/(?:video|podcasts))/)(?P<id>[\w-]+)'
|
||||||
|
|
||||||
_TESTS = [{
|
_TESTS = [{
|
||||||
'url': 'http://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/',
|
'url': 'http://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/',
|
||||||
@ -52,6 +52,9 @@ class CBSIE(CBSBaseIE):
|
|||||||
}, {
|
}, {
|
||||||
'url': 'http://www.colbertlateshow.com/podcasts/dYSwjqPs_X1tvbV_P2FcPWRa_qT6akTC/in-the-bad-room-with-stephen/',
|
'url': 'http://www.colbertlateshow.com/podcasts/dYSwjqPs_X1tvbV_P2FcPWRa_qT6akTC/in-the-bad-room-with-stephen/',
|
||||||
'only_matching': True,
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://www.paramountplus.com/shows/all-rise/video/QmR1WhNkh1a_IrdHZrbcRklm176X_rVc/all-rise-space/',
|
||||||
|
'only_matching': True,
|
||||||
}]
|
}]
|
||||||
|
|
||||||
def _extract_video_info(self, content_id, site='cbs', mpx_acc=2198311517):
|
def _extract_video_info(self, content_id, site='cbs', mpx_acc=2198311517):
|
||||||
|
@ -26,7 +26,7 @@ class CBSNewsEmbedIE(CBSIE):
|
|||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
item = self._parse_json(zlib.decompress(compat_b64decode(
|
item = self._parse_json(zlib.decompress(compat_b64decode(
|
||||||
compat_urllib_parse_unquote(self._match_id(url))),
|
compat_urllib_parse_unquote(self._match_id(url))),
|
||||||
-zlib.MAX_WBITS), None)['video']['items'][0]
|
-zlib.MAX_WBITS).decode('utf-8'), None)['video']['items'][0]
|
||||||
return self._extract_video_info(item['mpxRefId'], 'cbsnews')
|
return self._extract_video_info(item['mpxRefId'], 'cbsnews')
|
||||||
|
|
||||||
|
|
||||||
|
@ -1,38 +1,113 @@
|
|||||||
from __future__ import unicode_literals
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
from .cbs import CBSBaseIE
|
import re
|
||||||
|
|
||||||
|
# from .cbs import CBSBaseIE
|
||||||
|
from .common import InfoExtractor
|
||||||
|
from ..utils import (
|
||||||
|
int_or_none,
|
||||||
|
try_get,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
class CBSSportsIE(CBSBaseIE):
|
# class CBSSportsEmbedIE(CBSBaseIE):
|
||||||
_VALID_URL = r'https?://(?:www\.)?cbssports\.com/[^/]+/(?:video|news)/(?P<id>[^/?#&]+)'
|
class CBSSportsEmbedIE(InfoExtractor):
|
||||||
|
IE_NAME = 'cbssports:embed'
|
||||||
|
_VALID_URL = r'''(?ix)https?://(?:(?:www\.)?cbs|embed\.247)sports\.com/player/embed.+?
|
||||||
|
(?:
|
||||||
|
ids%3D(?P<id>[\da-f]{8}-(?:[\da-f]{4}-){3}[\da-f]{12})|
|
||||||
|
pcid%3D(?P<pcid>\d+)
|
||||||
|
)'''
|
||||||
_TESTS = [{
|
_TESTS = [{
|
||||||
'url': 'https://www.cbssports.com/nba/video/donovan-mitchell-flashes-star-potential-in-game-2-victory-over-thunder/',
|
'url': 'https://www.cbssports.com/player/embed/?args=player_id%3Db56c03a6-231a-4bbe-9c55-af3c8a8e9636%26ids%3Db56c03a6-231a-4bbe-9c55-af3c8a8e9636%26resizable%3D1%26autoplay%3Dtrue%26domain%3Dcbssports.com%26comp_ads_enabled%3Dfalse%26watchAndRead%3D0%26startTime%3D0%26env%3Dprod',
|
||||||
'info_dict': {
|
'only_matching': True,
|
||||||
'id': '1214315075735',
|
|
||||||
'ext': 'mp4',
|
|
||||||
'title': 'Donovan Mitchell flashes star potential in Game 2 victory over Thunder',
|
|
||||||
'description': 'md5:df6f48622612c2d6bd2e295ddef58def',
|
|
||||||
'timestamp': 1524111457,
|
|
||||||
'upload_date': '20180419',
|
|
||||||
'uploader': 'CBSI-NEW',
|
|
||||||
},
|
|
||||||
'params': {
|
|
||||||
# m3u8 download
|
|
||||||
'skip_download': True,
|
|
||||||
}
|
|
||||||
}, {
|
}, {
|
||||||
'url': 'https://www.cbssports.com/nba/news/nba-playoffs-2018-watch-76ers-vs-heat-game-3-series-schedule-tv-channel-online-stream/',
|
'url': 'https://embed.247sports.com/player/embed/?args=%3fplayer_id%3d1827823171591%26channel%3dcollege-football-recruiting%26pcid%3d1827823171591%26width%3d640%26height%3d360%26autoplay%3dTrue%26comp_ads_enabled%3dFalse%26uvpc%3dhttps%253a%252f%252fwww.cbssports.com%252fapi%252fcontent%252fvideo%252fconfig%252f%253fcfg%253duvp_247sports_v4%2526partner%253d247%26uvpc_m%3dhttps%253a%252f%252fwww.cbssports.com%252fapi%252fcontent%252fvideo%252fconfig%252f%253fcfg%253duvp_247sports_m_v4%2526partner_m%253d247_mobile%26utag%3d247sportssite%26resizable%3dTrue',
|
||||||
'only_matching': True,
|
'only_matching': True,
|
||||||
}]
|
}]
|
||||||
|
|
||||||
def _extract_video_info(self, filter_query, video_id):
|
# def _extract_video_info(self, filter_query, video_id):
|
||||||
return self._extract_feed_info('dJ5BDC', 'VxxJg8Ymh8sE', filter_query, video_id)
|
# return self._extract_feed_info('dJ5BDC', 'VxxJg8Ymh8sE', filter_query, video_id)
|
||||||
|
|
||||||
|
def _real_extract(self, url):
|
||||||
|
uuid, pcid = re.match(self._VALID_URL, url).groups()
|
||||||
|
query = {'id': uuid} if uuid else {'pcid': pcid}
|
||||||
|
video = self._download_json(
|
||||||
|
'https://www.cbssports.com/api/content/video/',
|
||||||
|
uuid or pcid, query=query)[0]
|
||||||
|
video_id = video['id']
|
||||||
|
title = video['title']
|
||||||
|
metadata = video.get('metaData') or {}
|
||||||
|
# return self._extract_video_info('byId=%d' % metadata['mpxOutletId'], video_id)
|
||||||
|
# return self._extract_video_info('byGuid=' + metadata['mpxRefId'], video_id)
|
||||||
|
|
||||||
|
formats = self._extract_m3u8_formats(
|
||||||
|
metadata['files'][0]['url'], video_id, 'mp4',
|
||||||
|
'm3u8_native', m3u8_id='hls', fatal=False)
|
||||||
|
self._sort_formats(formats)
|
||||||
|
|
||||||
|
image = video.get('image')
|
||||||
|
thumbnails = None
|
||||||
|
if image:
|
||||||
|
image_path = image.get('path')
|
||||||
|
if image_path:
|
||||||
|
thumbnails = [{
|
||||||
|
'url': image_path,
|
||||||
|
'width': int_or_none(image.get('width')),
|
||||||
|
'height': int_or_none(image.get('height')),
|
||||||
|
'filesize': int_or_none(image.get('size')),
|
||||||
|
}]
|
||||||
|
|
||||||
|
return {
|
||||||
|
'id': video_id,
|
||||||
|
'title': title,
|
||||||
|
'formats': formats,
|
||||||
|
'thumbnails': thumbnails,
|
||||||
|
'description': video.get('description'),
|
||||||
|
'timestamp': int_or_none(try_get(video, lambda x: x['dateCreated']['epoch'])),
|
||||||
|
'duration': int_or_none(metadata.get('duration')),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
class CBSSportsBaseIE(InfoExtractor):
|
||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
display_id = self._match_id(url)
|
display_id = self._match_id(url)
|
||||||
webpage = self._download_webpage(url, display_id)
|
webpage = self._download_webpage(url, display_id)
|
||||||
video_id = self._search_regex(
|
iframe_url = self._search_regex(
|
||||||
[r'(?:=|%26)pcid%3D(\d+)', r'embedVideo(?:Container)?_(\d+)'],
|
r'<iframe[^>]+(?:data-)?src="(https?://[^/]+/player/embed[^"]+)"',
|
||||||
webpage, 'video id')
|
webpage, 'embed url')
|
||||||
return self._extract_video_info('byId=%s' % video_id, video_id)
|
return self.url_result(iframe_url, CBSSportsEmbedIE.ie_key())
|
||||||
|
|
||||||
|
|
||||||
|
class CBSSportsIE(CBSSportsBaseIE):
|
||||||
|
IE_NAME = 'cbssports'
|
||||||
|
_VALID_URL = r'https?://(?:www\.)?cbssports\.com/[^/]+/video/(?P<id>[^/?#&]+)'
|
||||||
|
_TESTS = [{
|
||||||
|
'url': 'https://www.cbssports.com/college-football/video/cover-3-stanford-spring-gleaning/',
|
||||||
|
'info_dict': {
|
||||||
|
'id': 'b56c03a6-231a-4bbe-9c55-af3c8a8e9636',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'title': 'Cover 3: Stanford Spring Gleaning',
|
||||||
|
'description': 'The Cover 3 crew break down everything you need to know about the Stanford Cardinal this spring.',
|
||||||
|
'timestamp': 1617218398,
|
||||||
|
'upload_date': '20210331',
|
||||||
|
'duration': 502,
|
||||||
|
},
|
||||||
|
}]
|
||||||
|
|
||||||
|
|
||||||
|
class TwentyFourSevenSportsIE(CBSSportsBaseIE):
|
||||||
|
IE_NAME = '247sports'
|
||||||
|
_VALID_URL = r'https?://(?:www\.)?247sports\.com/Video/(?:[^/?#&]+-)?(?P<id>\d+)'
|
||||||
|
_TESTS = [{
|
||||||
|
'url': 'https://247sports.com/Video/2021-QB-Jake-Garcia-senior-highlights-through-five-games-10084854/',
|
||||||
|
'info_dict': {
|
||||||
|
'id': '4f1265cb-c3b5-44a8-bb1d-1914119a0ccc',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'title': '2021 QB Jake Garcia senior highlights through five games',
|
||||||
|
'description': 'md5:8cb67ebed48e2e6adac1701e0ff6e45b',
|
||||||
|
'timestamp': 1607114223,
|
||||||
|
'upload_date': '20201204',
|
||||||
|
'duration': 208,
|
||||||
|
},
|
||||||
|
}]
|
||||||
|
@ -133,6 +133,8 @@ class CDAIE(InfoExtractor):
|
|||||||
'age_limit': 18 if need_confirm_age else 0,
|
'age_limit': 18 if need_confirm_age else 0,
|
||||||
}
|
}
|
||||||
|
|
||||||
|
info = self._search_json_ld(webpage, video_id, default={})
|
||||||
|
|
||||||
# Source: https://www.cda.pl/js/player.js?t=1606154898
|
# Source: https://www.cda.pl/js/player.js?t=1606154898
|
||||||
def decrypt_file(a):
|
def decrypt_file(a):
|
||||||
for p in ('_XDDD', '_CDA', '_ADC', '_CXD', '_QWE', '_Q5', '_IKSDE'):
|
for p in ('_XDDD', '_CDA', '_ADC', '_CXD', '_QWE', '_Q5', '_IKSDE'):
|
||||||
@ -197,7 +199,7 @@ class CDAIE(InfoExtractor):
|
|||||||
handler = self._download_webpage
|
handler = self._download_webpage
|
||||||
|
|
||||||
webpage = handler(
|
webpage = handler(
|
||||||
self._BASE_URL + href, video_id,
|
urljoin(self._BASE_URL, href), video_id,
|
||||||
'Downloading %s version information' % resolution, fatal=False)
|
'Downloading %s version information' % resolution, fatal=False)
|
||||||
if not webpage:
|
if not webpage:
|
||||||
# Manually report warning because empty page is returned when
|
# Manually report warning because empty page is returned when
|
||||||
@ -209,6 +211,4 @@ class CDAIE(InfoExtractor):
|
|||||||
|
|
||||||
self._sort_formats(formats)
|
self._sort_formats(formats)
|
||||||
|
|
||||||
info = self._search_json_ld(webpage, video_id, default={})
|
|
||||||
|
|
||||||
return merge_dicts(info_dict, info)
|
return merge_dicts(info_dict, info)
|
||||||
|
@ -17,7 +17,7 @@ import math
|
|||||||
|
|
||||||
from ..compat import (
|
from ..compat import (
|
||||||
compat_cookiejar_Cookie,
|
compat_cookiejar_Cookie,
|
||||||
compat_cookies,
|
compat_cookies_SimpleCookie,
|
||||||
compat_etree_Element,
|
compat_etree_Element,
|
||||||
compat_etree_fromstring,
|
compat_etree_fromstring,
|
||||||
compat_getpass,
|
compat_getpass,
|
||||||
@ -230,8 +230,10 @@ class InfoExtractor(object):
|
|||||||
uploader: Full name of the video uploader.
|
uploader: Full name of the video uploader.
|
||||||
license: License name the video is licensed under.
|
license: License name the video is licensed under.
|
||||||
creator: The creator of the video.
|
creator: The creator of the video.
|
||||||
|
release_timestamp: UNIX timestamp of the moment the video was released.
|
||||||
release_date: The date (YYYYMMDD) when the video was released.
|
release_date: The date (YYYYMMDD) when the video was released.
|
||||||
timestamp: UNIX timestamp of the moment the video became available.
|
timestamp: UNIX timestamp of the moment the video became available
|
||||||
|
(uploaded).
|
||||||
upload_date: Video upload date (YYYYMMDD).
|
upload_date: Video upload date (YYYYMMDD).
|
||||||
If not explicitly set, calculated from timestamp.
|
If not explicitly set, calculated from timestamp.
|
||||||
uploader_id: Nickname or id of the video uploader.
|
uploader_id: Nickname or id of the video uploader.
|
||||||
@ -1273,6 +1275,7 @@ class InfoExtractor(object):
|
|||||||
|
|
||||||
def extract_video_object(e):
|
def extract_video_object(e):
|
||||||
assert e['@type'] == 'VideoObject'
|
assert e['@type'] == 'VideoObject'
|
||||||
|
author = e.get('author')
|
||||||
info.update({
|
info.update({
|
||||||
'url': url_or_none(e.get('contentUrl')),
|
'url': url_or_none(e.get('contentUrl')),
|
||||||
'title': unescapeHTML(e.get('name')),
|
'title': unescapeHTML(e.get('name')),
|
||||||
@ -1280,7 +1283,11 @@ class InfoExtractor(object):
|
|||||||
'thumbnail': url_or_none(e.get('thumbnailUrl') or e.get('thumbnailURL')),
|
'thumbnail': url_or_none(e.get('thumbnailUrl') or e.get('thumbnailURL')),
|
||||||
'duration': parse_duration(e.get('duration')),
|
'duration': parse_duration(e.get('duration')),
|
||||||
'timestamp': unified_timestamp(e.get('uploadDate')),
|
'timestamp': unified_timestamp(e.get('uploadDate')),
|
||||||
'uploader': str_or_none(e.get('author')),
|
# author can be an instance of 'Organization' or 'Person' types.
|
||||||
|
# both types can have 'name' property(inherited from 'Thing' type). [1]
|
||||||
|
# however some websites are using 'Text' type instead.
|
||||||
|
# 1. https://schema.org/VideoObject
|
||||||
|
'uploader': author.get('name') if isinstance(author, dict) else author if isinstance(author, compat_str) else None,
|
||||||
'filesize': float_or_none(e.get('contentSize')),
|
'filesize': float_or_none(e.get('contentSize')),
|
||||||
'tbr': int_or_none(e.get('bitrate')),
|
'tbr': int_or_none(e.get('bitrate')),
|
||||||
'width': int_or_none(e.get('width')),
|
'width': int_or_none(e.get('width')),
|
||||||
@ -2894,10 +2901,10 @@ class InfoExtractor(object):
|
|||||||
self._downloader.cookiejar.set_cookie(cookie)
|
self._downloader.cookiejar.set_cookie(cookie)
|
||||||
|
|
||||||
def _get_cookies(self, url):
|
def _get_cookies(self, url):
|
||||||
""" Return a compat_cookies.SimpleCookie with the cookies for the url """
|
""" Return a compat_cookies_SimpleCookie with the cookies for the url """
|
||||||
req = sanitized_Request(url)
|
req = sanitized_Request(url)
|
||||||
self._downloader.cookiejar.add_cookie_header(req)
|
self._downloader.cookiejar.add_cookie_header(req)
|
||||||
return compat_cookies.SimpleCookie(req.get_header('Cookie'))
|
return compat_cookies_SimpleCookie(req.get_header('Cookie'))
|
||||||
|
|
||||||
def _apply_first_set_cookie_header(self, url_handle, cookie):
|
def _apply_first_set_cookie_header(self, url_handle, cookie):
|
||||||
"""
|
"""
|
||||||
|
@ -25,12 +25,12 @@ class CuriosityStreamBaseIE(InfoExtractor):
|
|||||||
raise ExtractorError(
|
raise ExtractorError(
|
||||||
'%s said: %s' % (self.IE_NAME, error), expected=True)
|
'%s said: %s' % (self.IE_NAME, error), expected=True)
|
||||||
|
|
||||||
def _call_api(self, path, video_id):
|
def _call_api(self, path, video_id, query=None):
|
||||||
headers = {}
|
headers = {}
|
||||||
if self._auth_token:
|
if self._auth_token:
|
||||||
headers['X-Auth-Token'] = self._auth_token
|
headers['X-Auth-Token'] = self._auth_token
|
||||||
result = self._download_json(
|
result = self._download_json(
|
||||||
self._API_BASE_URL + path, video_id, headers=headers)
|
self._API_BASE_URL + path, video_id, headers=headers, query=query)
|
||||||
self._handle_errors(result)
|
self._handle_errors(result)
|
||||||
return result['data']
|
return result['data']
|
||||||
|
|
||||||
@ -52,27 +52,38 @@ class CuriosityStreamIE(CuriosityStreamBaseIE):
|
|||||||
_VALID_URL = r'https?://(?:app\.)?curiositystream\.com/video/(?P<id>\d+)'
|
_VALID_URL = r'https?://(?:app\.)?curiositystream\.com/video/(?P<id>\d+)'
|
||||||
_TEST = {
|
_TEST = {
|
||||||
'url': 'https://app.curiositystream.com/video/2',
|
'url': 'https://app.curiositystream.com/video/2',
|
||||||
'md5': '262bb2f257ff301115f1973540de8983',
|
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
'id': '2',
|
'id': '2',
|
||||||
'ext': 'mp4',
|
'ext': 'mp4',
|
||||||
'title': 'How Did You Develop The Internet?',
|
'title': 'How Did You Develop The Internet?',
|
||||||
'description': 'Vint Cerf, Google\'s Chief Internet Evangelist, describes how he and Bob Kahn created the internet.',
|
'description': 'Vint Cerf, Google\'s Chief Internet Evangelist, describes how he and Bob Kahn created the internet.',
|
||||||
}
|
},
|
||||||
|
'params': {
|
||||||
|
'format': 'bestvideo',
|
||||||
|
# m3u8 download
|
||||||
|
'skip_download': True,
|
||||||
|
},
|
||||||
}
|
}
|
||||||
|
|
||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
video_id = self._match_id(url)
|
video_id = self._match_id(url)
|
||||||
media = self._call_api('media/' + video_id, video_id)
|
|
||||||
title = media['title']
|
|
||||||
|
|
||||||
formats = []
|
formats = []
|
||||||
|
for encoding_format in ('m3u8', 'mpd'):
|
||||||
|
media = self._call_api('media/' + video_id, video_id, query={
|
||||||
|
'encodingsNew': 'true',
|
||||||
|
'encodingsFormat': encoding_format,
|
||||||
|
})
|
||||||
for encoding in media.get('encodings', []):
|
for encoding in media.get('encodings', []):
|
||||||
m3u8_url = encoding.get('master_playlist_url')
|
playlist_url = encoding.get('master_playlist_url')
|
||||||
if m3u8_url:
|
if encoding_format == 'm3u8':
|
||||||
|
# use `m3u8` entry_protocol until EXT-X-MAP is properly supported by `m3u8_native` entry_protocol
|
||||||
formats.extend(self._extract_m3u8_formats(
|
formats.extend(self._extract_m3u8_formats(
|
||||||
m3u8_url, video_id, 'mp4', 'm3u8_native',
|
playlist_url, video_id, 'mp4',
|
||||||
m3u8_id='hls', fatal=False))
|
m3u8_id='hls', fatal=False))
|
||||||
|
elif encoding_format == 'mpd':
|
||||||
|
formats.extend(self._extract_mpd_formats(
|
||||||
|
playlist_url, video_id, mpd_id='dash', fatal=False))
|
||||||
encoding_url = encoding.get('url')
|
encoding_url = encoding.get('url')
|
||||||
file_url = encoding.get('file_url')
|
file_url = encoding.get('file_url')
|
||||||
if not encoding_url and not file_url:
|
if not encoding_url and not file_url:
|
||||||
@ -108,6 +119,8 @@ class CuriosityStreamIE(CuriosityStreamBaseIE):
|
|||||||
formats.append(fmt)
|
formats.append(fmt)
|
||||||
self._sort_formats(formats)
|
self._sort_formats(formats)
|
||||||
|
|
||||||
|
title = media['title']
|
||||||
|
|
||||||
subtitles = {}
|
subtitles = {}
|
||||||
for closed_caption in media.get('closed_captions', []):
|
for closed_caption in media.get('closed_captions', []):
|
||||||
sub_url = closed_caption.get('file')
|
sub_url = closed_caption.get('file')
|
||||||
@ -140,7 +153,7 @@ class CuriosityStreamCollectionIE(CuriosityStreamBaseIE):
|
|||||||
'title': 'Curious Minds: The Internet',
|
'title': 'Curious Minds: The Internet',
|
||||||
'description': 'How is the internet shaping our lives in the 21st Century?',
|
'description': 'How is the internet shaping our lives in the 21st Century?',
|
||||||
},
|
},
|
||||||
'playlist_mincount': 17,
|
'playlist_mincount': 16,
|
||||||
}, {
|
}, {
|
||||||
'url': 'https://curiositystream.com/series/2',
|
'url': 'https://curiositystream.com/series/2',
|
||||||
'only_matching': True,
|
'only_matching': True,
|
||||||
|
@ -32,6 +32,18 @@ class DigitallySpeakingIE(InfoExtractor):
|
|||||||
# From http://www.gdcvault.com/play/1013700/Advanced-Material
|
# From http://www.gdcvault.com/play/1013700/Advanced-Material
|
||||||
'url': 'http://sevt.dispeak.com/ubm/gdc/eur10/xml/11256_1282118587281VNIT.xml',
|
'url': 'http://sevt.dispeak.com/ubm/gdc/eur10/xml/11256_1282118587281VNIT.xml',
|
||||||
'only_matching': True,
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
# From https://gdcvault.com/play/1016624, empty speakerVideo
|
||||||
|
'url': 'https://sevt.dispeak.com/ubm/gdc/online12/xml/201210-822101_1349794556671DDDD.xml',
|
||||||
|
'info_dict': {
|
||||||
|
'id': '201210-822101_1349794556671DDDD',
|
||||||
|
'ext': 'flv',
|
||||||
|
'title': 'Pre-launch - Preparing to Take the Plunge',
|
||||||
|
},
|
||||||
|
}, {
|
||||||
|
# From http://www.gdcvault.com/play/1014846/Conference-Keynote-Shigeru, empty slideVideo
|
||||||
|
'url': 'http://events.digitallyspeaking.com/gdc/project25/xml/p25-miyamoto1999_1282467389849HSVB.xml',
|
||||||
|
'only_matching': True,
|
||||||
}]
|
}]
|
||||||
|
|
||||||
def _parse_mp4(self, metadata):
|
def _parse_mp4(self, metadata):
|
||||||
@ -84,25 +96,19 @@ class DigitallySpeakingIE(InfoExtractor):
|
|||||||
'vcodec': 'none',
|
'vcodec': 'none',
|
||||||
'format_id': audio.get('code'),
|
'format_id': audio.get('code'),
|
||||||
})
|
})
|
||||||
slide_video_path = xpath_text(metadata, './slideVideo', fatal=True)
|
for video_key, format_id, preference in (
|
||||||
|
('slide', 'slides', -2), ('speaker', 'speaker', -1)):
|
||||||
|
video_path = xpath_text(metadata, './%sVideo' % video_key)
|
||||||
|
if not video_path:
|
||||||
|
continue
|
||||||
formats.append({
|
formats.append({
|
||||||
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
|
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
|
||||||
'play_path': remove_end(slide_video_path, '.flv'),
|
'play_path': remove_end(video_path, '.flv'),
|
||||||
'ext': 'flv',
|
'ext': 'flv',
|
||||||
'format_note': 'slide deck video',
|
'format_note': '%s video' % video_key,
|
||||||
'quality': -2,
|
'quality': preference,
|
||||||
'preference': -2,
|
'preference': preference,
|
||||||
'format_id': 'slides',
|
'format_id': format_id,
|
||||||
})
|
|
||||||
speaker_video_path = xpath_text(metadata, './speakerVideo', fatal=True)
|
|
||||||
formats.append({
|
|
||||||
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
|
|
||||||
'play_path': remove_end(speaker_video_path, '.flv'),
|
|
||||||
'ext': 'flv',
|
|
||||||
'format_note': 'speaker video',
|
|
||||||
'quality': -1,
|
|
||||||
'preference': -1,
|
|
||||||
'format_id': 'speaker',
|
|
||||||
})
|
})
|
||||||
return formats
|
return formats
|
||||||
|
|
||||||
|
@ -330,6 +330,7 @@ class DiscoveryPlusIE(DPlayIE):
|
|||||||
'videoId': video_id,
|
'videoId': video_id,
|
||||||
'wisteriaProperties': {
|
'wisteriaProperties': {
|
||||||
'platform': 'desktop',
|
'platform': 'desktop',
|
||||||
|
'product': 'dplus_us',
|
||||||
},
|
},
|
||||||
}).encode('utf-8'))['data']['attributes']['streaming']
|
}).encode('utf-8'))['data']['attributes']['streaming']
|
||||||
|
|
||||||
|
@ -1,193 +1,43 @@
|
|||||||
from __future__ import unicode_literals
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
import re
|
from .zdf import ZDFIE
|
||||||
|
|
||||||
from .common import InfoExtractor
|
|
||||||
from ..utils import (
|
|
||||||
int_or_none,
|
|
||||||
unified_strdate,
|
|
||||||
xpath_text,
|
|
||||||
determine_ext,
|
|
||||||
float_or_none,
|
|
||||||
ExtractorError,
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
class DreiSatIE(InfoExtractor):
|
class DreiSatIE(ZDFIE):
|
||||||
IE_NAME = '3sat'
|
IE_NAME = '3sat'
|
||||||
_GEO_COUNTRIES = ['DE']
|
_VALID_URL = r'https?://(?:www\.)?3sat\.de/(?:[^/]+/)*(?P<id>[^/?#&]+)\.html'
|
||||||
_VALID_URL = r'https?://(?:www\.)?3sat\.de/mediathek/(?:(?:index|mediathek)\.php)?\?(?:(?:mode|display)=[^&]+&)*obj=(?P<id>[0-9]+)'
|
_TESTS = [{
|
||||||
_TESTS = [
|
# Same as https://www.zdf.de/dokumentation/ab-18/10-wochen-sommer-102.html
|
||||||
{
|
'url': 'https://www.3sat.de/film/ab-18/10-wochen-sommer-108.html',
|
||||||
'url': 'http://www.3sat.de/mediathek/index.php?mode=play&obj=45918',
|
'md5': '0aff3e7bc72c8813f5e0fae333316a1d',
|
||||||
'md5': 'be37228896d30a88f315b638900a026e',
|
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
'id': '45918',
|
'id': '141007_ab18_10wochensommer_film',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'title': 'Ab 18! - 10 Wochen Sommer',
|
||||||
|
'description': 'md5:8253f41dc99ce2c3ff892dac2d65fe26',
|
||||||
|
'duration': 2660,
|
||||||
|
'timestamp': 1608604200,
|
||||||
|
'upload_date': '20201222',
|
||||||
|
},
|
||||||
|
}, {
|
||||||
|
'url': 'https://www.3sat.de/gesellschaft/schweizweit/waidmannsheil-100.html',
|
||||||
|
'info_dict': {
|
||||||
|
'id': '140913_sendung_schweizweit',
|
||||||
'ext': 'mp4',
|
'ext': 'mp4',
|
||||||
'title': 'Waidmannsheil',
|
'title': 'Waidmannsheil',
|
||||||
'description': 'md5:cce00ca1d70e21425e72c86a98a56817',
|
'description': 'md5:cce00ca1d70e21425e72c86a98a56817',
|
||||||
'uploader': 'SCHWEIZWEIT',
|
'timestamp': 1410623100,
|
||||||
'uploader_id': '100000210',
|
|
||||||
'upload_date': '20140913'
|
'upload_date': '20140913'
|
||||||
},
|
},
|
||||||
'params': {
|
'params': {
|
||||||
'skip_download': True, # m3u8 downloads
|
'skip_download': True,
|
||||||
}
|
}
|
||||||
},
|
}, {
|
||||||
{
|
# Same as https://www.zdf.de/filme/filme-sonstige/der-hauptmann-112.html
|
||||||
'url': 'http://www.3sat.de/mediathek/mediathek.php?mode=play&obj=51066',
|
'url': 'https://www.3sat.de/film/spielfilm/der-hauptmann-100.html',
|
||||||
'only_matching': True,
|
'only_matching': True,
|
||||||
},
|
}, {
|
||||||
]
|
# Same as https://www.zdf.de/wissen/nano/nano-21-mai-2019-102.html, equal media ids
|
||||||
|
'url': 'https://www.3sat.de/wissen/nano/nano-21-mai-2019-102.html',
|
||||||
def _parse_smil_formats(self, smil, smil_url, video_id, namespace=None, f4m_params=None, transform_rtmp_url=None):
|
'only_matching': True,
|
||||||
param_groups = {}
|
}]
|
||||||
for param_group in smil.findall(self._xpath_ns('./head/paramGroup', namespace)):
|
|
||||||
group_id = param_group.get(self._xpath_ns(
|
|
||||||
'id', 'http://www.w3.org/XML/1998/namespace'))
|
|
||||||
params = {}
|
|
||||||
for param in param_group:
|
|
||||||
params[param.get('name')] = param.get('value')
|
|
||||||
param_groups[group_id] = params
|
|
||||||
|
|
||||||
formats = []
|
|
||||||
for video in smil.findall(self._xpath_ns('.//video', namespace)):
|
|
||||||
src = video.get('src')
|
|
||||||
if not src:
|
|
||||||
continue
|
|
||||||
bitrate = int_or_none(self._search_regex(r'_(\d+)k', src, 'bitrate', None)) or float_or_none(video.get('system-bitrate') or video.get('systemBitrate'), 1000)
|
|
||||||
group_id = video.get('paramGroup')
|
|
||||||
param_group = param_groups[group_id]
|
|
||||||
for proto in param_group['protocols'].split(','):
|
|
||||||
formats.append({
|
|
||||||
'url': '%s://%s' % (proto, param_group['host']),
|
|
||||||
'app': param_group['app'],
|
|
||||||
'play_path': src,
|
|
||||||
'ext': 'flv',
|
|
||||||
'format_id': '%s-%d' % (proto, bitrate),
|
|
||||||
'tbr': bitrate,
|
|
||||||
})
|
|
||||||
self._sort_formats(formats)
|
|
||||||
return formats
|
|
||||||
|
|
||||||
def extract_from_xml_url(self, video_id, xml_url):
|
|
||||||
doc = self._download_xml(
|
|
||||||
xml_url, video_id,
|
|
||||||
note='Downloading video info',
|
|
||||||
errnote='Failed to download video info')
|
|
||||||
|
|
||||||
status_code = xpath_text(doc, './status/statuscode')
|
|
||||||
if status_code and status_code != 'ok':
|
|
||||||
if status_code == 'notVisibleAnymore':
|
|
||||||
message = 'Video %s is not available' % video_id
|
|
||||||
else:
|
|
||||||
message = '%s returned error: %s' % (self.IE_NAME, status_code)
|
|
||||||
raise ExtractorError(message, expected=True)
|
|
||||||
|
|
||||||
title = xpath_text(doc, './/information/title', 'title', True)
|
|
||||||
|
|
||||||
urls = []
|
|
||||||
formats = []
|
|
||||||
for fnode in doc.findall('.//formitaeten/formitaet'):
|
|
||||||
video_url = xpath_text(fnode, 'url')
|
|
||||||
if not video_url or video_url in urls:
|
|
||||||
continue
|
|
||||||
urls.append(video_url)
|
|
||||||
|
|
||||||
is_available = 'http://www.metafilegenerator' not in video_url
|
|
||||||
geoloced = 'static_geoloced_online' in video_url
|
|
||||||
if not is_available or geoloced:
|
|
||||||
continue
|
|
||||||
|
|
||||||
format_id = fnode.attrib['basetype']
|
|
||||||
format_m = re.match(r'''(?x)
|
|
||||||
(?P<vcodec>[^_]+)_(?P<acodec>[^_]+)_(?P<container>[^_]+)_
|
|
||||||
(?P<proto>[^_]+)_(?P<index>[^_]+)_(?P<indexproto>[^_]+)
|
|
||||||
''', format_id)
|
|
||||||
|
|
||||||
ext = determine_ext(video_url, None) or format_m.group('container')
|
|
||||||
|
|
||||||
if ext == 'meta':
|
|
||||||
continue
|
|
||||||
elif ext == 'smil':
|
|
||||||
formats.extend(self._extract_smil_formats(
|
|
||||||
video_url, video_id, fatal=False))
|
|
||||||
elif ext == 'm3u8':
|
|
||||||
# the certificates are misconfigured (see
|
|
||||||
# https://github.com/ytdl-org/youtube-dl/issues/8665)
|
|
||||||
if video_url.startswith('https://'):
|
|
||||||
continue
|
|
||||||
formats.extend(self._extract_m3u8_formats(
|
|
||||||
video_url, video_id, 'mp4', 'm3u8_native',
|
|
||||||
m3u8_id=format_id, fatal=False))
|
|
||||||
elif ext == 'f4m':
|
|
||||||
formats.extend(self._extract_f4m_formats(
|
|
||||||
video_url, video_id, f4m_id=format_id, fatal=False))
|
|
||||||
else:
|
|
||||||
quality = xpath_text(fnode, './quality')
|
|
||||||
if quality:
|
|
||||||
format_id += '-' + quality
|
|
||||||
|
|
||||||
abr = int_or_none(xpath_text(fnode, './audioBitrate'), 1000)
|
|
||||||
vbr = int_or_none(xpath_text(fnode, './videoBitrate'), 1000)
|
|
||||||
|
|
||||||
tbr = int_or_none(self._search_regex(
|
|
||||||
r'_(\d+)k', video_url, 'bitrate', None))
|
|
||||||
if tbr and vbr and not abr:
|
|
||||||
abr = tbr - vbr
|
|
||||||
|
|
||||||
formats.append({
|
|
||||||
'format_id': format_id,
|
|
||||||
'url': video_url,
|
|
||||||
'ext': ext,
|
|
||||||
'acodec': format_m.group('acodec'),
|
|
||||||
'vcodec': format_m.group('vcodec'),
|
|
||||||
'abr': abr,
|
|
||||||
'vbr': vbr,
|
|
||||||
'tbr': tbr,
|
|
||||||
'width': int_or_none(xpath_text(fnode, './width')),
|
|
||||||
'height': int_or_none(xpath_text(fnode, './height')),
|
|
||||||
'filesize': int_or_none(xpath_text(fnode, './filesize')),
|
|
||||||
'protocol': format_m.group('proto').lower(),
|
|
||||||
})
|
|
||||||
|
|
||||||
geolocation = xpath_text(doc, './/details/geolocation')
|
|
||||||
if not formats and geolocation and geolocation != 'none':
|
|
||||||
self.raise_geo_restricted(countries=self._GEO_COUNTRIES)
|
|
||||||
|
|
||||||
self._sort_formats(formats)
|
|
||||||
|
|
||||||
thumbnails = []
|
|
||||||
for node in doc.findall('.//teaserimages/teaserimage'):
|
|
||||||
thumbnail_url = node.text
|
|
||||||
if not thumbnail_url:
|
|
||||||
continue
|
|
||||||
thumbnail = {
|
|
||||||
'url': thumbnail_url,
|
|
||||||
}
|
|
||||||
thumbnail_key = node.get('key')
|
|
||||||
if thumbnail_key:
|
|
||||||
m = re.match('^([0-9]+)x([0-9]+)$', thumbnail_key)
|
|
||||||
if m:
|
|
||||||
thumbnail['width'] = int(m.group(1))
|
|
||||||
thumbnail['height'] = int(m.group(2))
|
|
||||||
thumbnails.append(thumbnail)
|
|
||||||
|
|
||||||
upload_date = unified_strdate(xpath_text(doc, './/details/airtime'))
|
|
||||||
|
|
||||||
return {
|
|
||||||
'id': video_id,
|
|
||||||
'title': title,
|
|
||||||
'description': xpath_text(doc, './/information/detail'),
|
|
||||||
'duration': int_or_none(xpath_text(doc, './/details/lengthSec')),
|
|
||||||
'thumbnails': thumbnails,
|
|
||||||
'uploader': xpath_text(doc, './/details/originChannelTitle'),
|
|
||||||
'uploader_id': xpath_text(doc, './/details/originChannelId'),
|
|
||||||
'upload_date': upload_date,
|
|
||||||
'formats': formats,
|
|
||||||
}
|
|
||||||
|
|
||||||
def _real_extract(self, url):
|
|
||||||
video_id = self._match_id(url)
|
|
||||||
details_url = 'http://www.3sat.de/mediathek/xmlservice/web/beitragsDetails?id=%s' % video_id
|
|
||||||
return self.extract_from_xml_url(video_id, details_url)
|
|
||||||
|
@ -72,6 +72,7 @@ from .arte import (
|
|||||||
ArteTVEmbedIE,
|
ArteTVEmbedIE,
|
||||||
ArteTVPlaylistIE,
|
ArteTVPlaylistIE,
|
||||||
)
|
)
|
||||||
|
from .arnes import ArnesIE
|
||||||
from .asiancrush import (
|
from .asiancrush import (
|
||||||
AsianCrushIE,
|
AsianCrushIE,
|
||||||
AsianCrushPlaylistIE,
|
AsianCrushPlaylistIE,
|
||||||
@ -90,11 +91,13 @@ from .awaan import (
|
|||||||
)
|
)
|
||||||
from .azmedien import AZMedienIE
|
from .azmedien import AZMedienIE
|
||||||
from .baidu import BaiduVideoIE
|
from .baidu import BaiduVideoIE
|
||||||
|
from .bandaichannel import BandaiChannelIE
|
||||||
from .bandcamp import BandcampIE, BandcampAlbumIE, BandcampWeeklyIE
|
from .bandcamp import BandcampIE, BandcampAlbumIE, BandcampWeeklyIE
|
||||||
from .bbc import (
|
from .bbc import (
|
||||||
BBCCoUkIE,
|
BBCCoUkIE,
|
||||||
BBCCoUkArticleIE,
|
BBCCoUkArticleIE,
|
||||||
BBCCoUkIPlayerPlaylistIE,
|
BBCCoUkIPlayerEpisodesIE,
|
||||||
|
BBCCoUkIPlayerGroupIE,
|
||||||
BBCCoUkPlaylistIE,
|
BBCCoUkPlaylistIE,
|
||||||
BBCIE,
|
BBCIE,
|
||||||
)
|
)
|
||||||
@ -129,7 +132,6 @@ from .bleacherreport import (
|
|||||||
BleacherReportIE,
|
BleacherReportIE,
|
||||||
BleacherReportCMSIE,
|
BleacherReportCMSIE,
|
||||||
)
|
)
|
||||||
from .blinkx import BlinkxIE
|
|
||||||
from .bloomberg import BloombergIE
|
from .bloomberg import BloombergIE
|
||||||
from .bokecc import BokeCCIE
|
from .bokecc import BokeCCIE
|
||||||
from .bongacams import BongaCamsIE
|
from .bongacams import BongaCamsIE
|
||||||
@ -188,7 +190,11 @@ from .cbsnews import (
|
|||||||
CBSNewsIE,
|
CBSNewsIE,
|
||||||
CBSNewsLiveVideoIE,
|
CBSNewsLiveVideoIE,
|
||||||
)
|
)
|
||||||
from .cbssports import CBSSportsIE
|
from .cbssports import (
|
||||||
|
CBSSportsEmbedIE,
|
||||||
|
CBSSportsIE,
|
||||||
|
TwentyFourSevenSportsIE,
|
||||||
|
)
|
||||||
from .ccc import (
|
from .ccc import (
|
||||||
CCCIE,
|
CCCIE,
|
||||||
CCCPlaylistIE,
|
CCCPlaylistIE,
|
||||||
@ -421,6 +427,7 @@ from .gamestar import GameStarIE
|
|||||||
from .gaskrank import GaskrankIE
|
from .gaskrank import GaskrankIE
|
||||||
from .gazeta import GazetaIE
|
from .gazeta import GazetaIE
|
||||||
from .gdcvault import GDCVaultIE
|
from .gdcvault import GDCVaultIE
|
||||||
|
from .gedidigital import GediDigitalIE
|
||||||
from .generic import GenericIE
|
from .generic import GenericIE
|
||||||
from .gfycat import GfycatIE
|
from .gfycat import GfycatIE
|
||||||
from .giantbomb import GiantBombIE
|
from .giantbomb import GiantBombIE
|
||||||
@ -591,7 +598,11 @@ from .limelight import (
|
|||||||
LimelightChannelIE,
|
LimelightChannelIE,
|
||||||
LimelightChannelListIE,
|
LimelightChannelListIE,
|
||||||
)
|
)
|
||||||
from .line import LineTVIE
|
from .line import (
|
||||||
|
LineTVIE,
|
||||||
|
LineLiveIE,
|
||||||
|
LineLiveChannelIE,
|
||||||
|
)
|
||||||
from .linkedin import (
|
from .linkedin import (
|
||||||
LinkedInLearningIE,
|
LinkedInLearningIE,
|
||||||
LinkedInLearningCourseIE,
|
LinkedInLearningCourseIE,
|
||||||
@ -628,6 +639,7 @@ from .mangomolo import (
|
|||||||
MangomoloLiveIE,
|
MangomoloLiveIE,
|
||||||
)
|
)
|
||||||
from .manyvids import ManyVidsIE
|
from .manyvids import ManyVidsIE
|
||||||
|
from .maoritv import MaoriTVIE
|
||||||
from .markiza import (
|
from .markiza import (
|
||||||
MarkizaIE,
|
MarkizaIE,
|
||||||
MarkizaPageIE,
|
MarkizaPageIE,
|
||||||
@ -675,7 +687,10 @@ from .mixcloud import (
|
|||||||
MixcloudUserIE,
|
MixcloudUserIE,
|
||||||
MixcloudPlaylistIE,
|
MixcloudPlaylistIE,
|
||||||
)
|
)
|
||||||
from .mlb import MLBIE
|
from .mlb import (
|
||||||
|
MLBIE,
|
||||||
|
MLBVideoIE,
|
||||||
|
)
|
||||||
from .mnet import MnetIE
|
from .mnet import MnetIE
|
||||||
from .moevideo import MoeVideoIE
|
from .moevideo import MoeVideoIE
|
||||||
from .mofosex import (
|
from .mofosex import (
|
||||||
@ -876,6 +891,11 @@ from .packtpub import (
|
|||||||
PacktPubIE,
|
PacktPubIE,
|
||||||
PacktPubCourseIE,
|
PacktPubCourseIE,
|
||||||
)
|
)
|
||||||
|
from .palcomp3 import (
|
||||||
|
PalcoMP3IE,
|
||||||
|
PalcoMP3ArtistIE,
|
||||||
|
PalcoMP3VideoIE,
|
||||||
|
)
|
||||||
from .pandoratv import PandoraTVIE
|
from .pandoratv import PandoraTVIE
|
||||||
from .parliamentliveuk import ParliamentLiveUKIE
|
from .parliamentliveuk import ParliamentLiveUKIE
|
||||||
from .patreon import PatreonIE
|
from .patreon import PatreonIE
|
||||||
@ -1623,5 +1643,9 @@ from .zattoo import (
|
|||||||
)
|
)
|
||||||
from .zdf import ZDFIE, ZDFChannelIE
|
from .zdf import ZDFIE, ZDFChannelIE
|
||||||
from .zhihu import ZhihuIE
|
from .zhihu import ZhihuIE
|
||||||
from .zingmp3 import ZingMp3IE
|
from .zingmp3 import (
|
||||||
|
ZingMp3IE,
|
||||||
|
ZingMp3AlbumIE,
|
||||||
|
)
|
||||||
|
from .zoom import ZoomIE
|
||||||
from .zype import ZypeIE
|
from .zype import ZypeIE
|
||||||
|
@ -383,6 +383,10 @@ class FranceTVInfoIE(FranceTVBaseInfoExtractor):
|
|||||||
}, {
|
}, {
|
||||||
'url': 'http://france3-regions.francetvinfo.fr/limousin/emissions/jt-1213-limousin',
|
'url': 'http://france3-regions.francetvinfo.fr/limousin/emissions/jt-1213-limousin',
|
||||||
'only_matching': True,
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
# "<figure id=" pattern (#28792)
|
||||||
|
'url': 'https://www.francetvinfo.fr/culture/patrimoine/incendie-de-notre-dame-de-paris/notre-dame-de-paris-de-l-incendie-de-la-cathedrale-a-sa-reconstruction_4372291.html',
|
||||||
|
'only_matching': True,
|
||||||
}]
|
}]
|
||||||
|
|
||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
@ -399,7 +403,8 @@ class FranceTVInfoIE(FranceTVBaseInfoExtractor):
|
|||||||
video_id = self._search_regex(
|
video_id = self._search_regex(
|
||||||
(r'player\.load[^;]+src:\s*["\']([^"\']+)',
|
(r'player\.load[^;]+src:\s*["\']([^"\']+)',
|
||||||
r'id-video=([^@]+@[^"]+)',
|
r'id-video=([^@]+@[^"]+)',
|
||||||
r'<a[^>]+href="(?:https?:)?//videos\.francetv\.fr/video/([^@]+@[^"]+)"'),
|
r'<a[^>]+href="(?:https?:)?//videos\.francetv\.fr/video/([^@]+@[^"]+)"',
|
||||||
|
r'(?:data-id|<figure[^<]+\bid)=["\']([\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'),
|
||||||
webpage, 'video id')
|
webpage, 'video id')
|
||||||
|
|
||||||
return self._make_url_result(video_id)
|
return self._make_url_result(video_id)
|
||||||
|
@ -17,7 +17,7 @@ class FujiTVFODPlus7IE(InfoExtractor):
|
|||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
video_id = self._match_id(url)
|
video_id = self._match_id(url)
|
||||||
formats = self._extract_m3u8_formats(
|
formats = self._extract_m3u8_formats(
|
||||||
self._BASE_URL + 'abr/pc_html5/%s.m3u8' % video_id, video_id)
|
self._BASE_URL + 'abr/pc_html5/%s.m3u8' % video_id, video_id, 'mp4')
|
||||||
for f in formats:
|
for f in formats:
|
||||||
wh = self._BITRATE_MAP.get(f.get('tbr'))
|
wh = self._BITRATE_MAP.get(f.get('tbr'))
|
||||||
if wh:
|
if wh:
|
||||||
|
@ -16,7 +16,7 @@ from ..utils import (
|
|||||||
|
|
||||||
|
|
||||||
class FunimationIE(InfoExtractor):
|
class FunimationIE(InfoExtractor):
|
||||||
_VALID_URL = r'https?://(?:www\.)?funimation(?:\.com|now\.uk)/shows/[^/]+/(?P<id>[^/?#&]+)'
|
_VALID_URL = r'https?://(?:www\.)?funimation(?:\.com|now\.uk)/(?:[^/]+/)?shows/[^/]+/(?P<id>[^/?#&]+)'
|
||||||
|
|
||||||
_NETRC_MACHINE = 'funimation'
|
_NETRC_MACHINE = 'funimation'
|
||||||
_TOKEN = None
|
_TOKEN = None
|
||||||
@ -51,6 +51,10 @@ class FunimationIE(InfoExtractor):
|
|||||||
}, {
|
}, {
|
||||||
'url': 'https://www.funimationnow.uk/shows/puzzle-dragons-x/drop-impact/simulcast/',
|
'url': 'https://www.funimationnow.uk/shows/puzzle-dragons-x/drop-impact/simulcast/',
|
||||||
'only_matching': True,
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
# with lang code
|
||||||
|
'url': 'https://www.funimation.com/en/shows/hacksign/role-play/',
|
||||||
|
'only_matching': True,
|
||||||
}]
|
}]
|
||||||
|
|
||||||
def _login(self):
|
def _login(self):
|
||||||
|
@ -6,6 +6,7 @@ from .common import InfoExtractor
|
|||||||
from .kaltura import KalturaIE
|
from .kaltura import KalturaIE
|
||||||
from ..utils import (
|
from ..utils import (
|
||||||
HEADRequest,
|
HEADRequest,
|
||||||
|
remove_start,
|
||||||
sanitized_Request,
|
sanitized_Request,
|
||||||
smuggle_url,
|
smuggle_url,
|
||||||
urlencode_postdata,
|
urlencode_postdata,
|
||||||
@ -102,6 +103,26 @@ class GDCVaultIE(InfoExtractor):
|
|||||||
'format': 'mp4-408',
|
'format': 'mp4-408',
|
||||||
},
|
},
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
# Kaltura embed, whitespace between quote and embedded URL in iframe's src
|
||||||
|
'url': 'https://www.gdcvault.com/play/1025699',
|
||||||
|
'info_dict': {
|
||||||
|
'id': '0_zagynv0a',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'title': 'Tech Toolbox',
|
||||||
|
'upload_date': '20190408',
|
||||||
|
'uploader_id': 'joe@blazestreaming.com',
|
||||||
|
'timestamp': 1554764629,
|
||||||
|
},
|
||||||
|
'params': {
|
||||||
|
'skip_download': True,
|
||||||
|
},
|
||||||
|
},
|
||||||
|
{
|
||||||
|
# HTML5 video
|
||||||
|
'url': 'http://www.gdcvault.com/play/1014846/Conference-Keynote-Shigeru',
|
||||||
|
'only_matching': True,
|
||||||
|
},
|
||||||
]
|
]
|
||||||
|
|
||||||
def _login(self, webpage_url, display_id):
|
def _login(self, webpage_url, display_id):
|
||||||
@ -175,7 +196,18 @@ class GDCVaultIE(InfoExtractor):
|
|||||||
|
|
||||||
xml_name = self._html_search_regex(
|
xml_name = self._html_search_regex(
|
||||||
r'<iframe src=".*?\?xml(?:=|URL=xml/)(.+?\.xml).*?".*?</iframe>',
|
r'<iframe src=".*?\?xml(?:=|URL=xml/)(.+?\.xml).*?".*?</iframe>',
|
||||||
start_page, 'xml filename')
|
start_page, 'xml filename', default=None)
|
||||||
|
if not xml_name:
|
||||||
|
info = self._parse_html5_media_entries(url, start_page, video_id)[0]
|
||||||
|
info.update({
|
||||||
|
'title': remove_start(self._search_regex(
|
||||||
|
r'>Session Name:\s*<.*?>\s*<td>(.+?)</td>', start_page,
|
||||||
|
'title', default=None) or self._og_search_title(
|
||||||
|
start_page, default=None), 'GDC Vault - '),
|
||||||
|
'id': video_id,
|
||||||
|
'display_id': display_id,
|
||||||
|
})
|
||||||
|
return info
|
||||||
embed_url = '%s/xml/%s' % (xml_root, xml_name)
|
embed_url = '%s/xml/%s' % (xml_root, xml_name)
|
||||||
ie_key = 'DigitallySpeaking'
|
ie_key = 'DigitallySpeaking'
|
||||||
|
|
||||||
|
161
youtube_dl/extractor/gedidigital.py
Normal file
161
youtube_dl/extractor/gedidigital.py
Normal file
@ -0,0 +1,161 @@
|
|||||||
|
# coding: utf-8
|
||||||
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
|
import re
|
||||||
|
|
||||||
|
from .common import InfoExtractor
|
||||||
|
from ..utils import (
|
||||||
|
determine_ext,
|
||||||
|
int_or_none,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
class GediDigitalIE(InfoExtractor):
|
||||||
|
_VALID_URL = r'''(?x)https?://video\.
|
||||||
|
(?:
|
||||||
|
(?:
|
||||||
|
(?:espresso\.)?repubblica
|
||||||
|
|lastampa
|
||||||
|
|ilsecoloxix
|
||||||
|
)|
|
||||||
|
(?:
|
||||||
|
iltirreno
|
||||||
|
|messaggeroveneto
|
||||||
|
|ilpiccolo
|
||||||
|
|gazzettadimantova
|
||||||
|
|mattinopadova
|
||||||
|
|laprovinciapavese
|
||||||
|
|tribunatreviso
|
||||||
|
|nuovavenezia
|
||||||
|
|gazzettadimodena
|
||||||
|
|lanuovaferrara
|
||||||
|
|corrierealpi
|
||||||
|
|lasentinella
|
||||||
|
)\.gelocal
|
||||||
|
)\.it(?:/[^/]+){2,3}?/(?P<id>\d+)(?:[/?&#]|$)'''
|
||||||
|
_TESTS = [{
|
||||||
|
'url': 'https://video.lastampa.it/politica/il-paradosso-delle-regionali-la-lega-vince-ma-sembra-aver-perso/121559/121683',
|
||||||
|
'md5': '84658d7fb9e55a6e57ecc77b73137494',
|
||||||
|
'info_dict': {
|
||||||
|
'id': '121559',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'title': 'Il paradosso delle Regionali: ecco perché la Lega vince ma sembra aver perso',
|
||||||
|
'description': 'md5:de7f4d6eaaaf36c153b599b10f8ce7ca',
|
||||||
|
'thumbnail': r're:^https://www\.repstatic\.it/video/photo/.+?-thumb-full-.+?\.jpg$',
|
||||||
|
'duration': 125,
|
||||||
|
},
|
||||||
|
}, {
|
||||||
|
'url': 'https://video.espresso.repubblica.it/embed/tutti-i-video/01-ted-villa/14772/14870&width=640&height=360',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://video.repubblica.it/motori/record-della-pista-a-spa-francorchamps-la-pagani-huayra-roadster-bc-stupisce/367415/367963',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://video.ilsecoloxix.it/sport/cassani-e-i-brividi-azzurri-ai-mondiali-di-imola-qui-mi-sono-innamorato-del-ciclismo-da-ragazzino-incredibile-tornarci-da-ct/66184/66267',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://video.iltirreno.gelocal.it/sport/dentro-la-notizia-ferrari-cosa-succede-a-maranello/141059/142723',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://video.messaggeroveneto.gelocal.it/locale/maria-giovanna-elmi-covid-vaccino/138155/139268',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://video.ilpiccolo.gelocal.it/dossier/big-john/dinosauro-big-john-al-via-le-visite-guidate-a-trieste/135226/135751',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://video.gazzettadimantova.gelocal.it/locale/dal-ponte-visconteo-di-valeggio-l-and-8217sos-dei-ristoratori-aprire-anche-a-cena/137310/137818',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://video.mattinopadova.gelocal.it/dossier/coronavirus-in-veneto/covid-a-vo-un-anno-dopo-un-cuore-tricolore-per-non-dimenticare/138402/138964',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://video.laprovinciapavese.gelocal.it/locale/mede-zona-rossa-via-alle-vaccinazioni-per-gli-over-80/137545/138120',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://video.tribunatreviso.gelocal.it/dossier/coronavirus-in-veneto/ecco-le-prima-vaccinazioni-di-massa-nella-marca/134485/135024',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://video.nuovavenezia.gelocal.it/locale/camion-troppo-alto-per-il-ponte-ferroviario-perde-il-carico/135734/136266',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://video.gazzettadimodena.gelocal.it/locale/modena-scoperta-la-proteina-che-predice-il-livello-di-gravita-del-covid/139109/139796',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://video.lanuovaferrara.gelocal.it/locale/due-bombole-di-gpl-aperte-e-abbandonate-i-vigili-bruciano-il-gas/134391/134957',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://video.corrierealpi.gelocal.it/dossier/cortina-2021-i-mondiali-di-sci-alpino/mondiali-di-sci-il-timelapse-sulla-splendida-olympia/133760/134331',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://video.lasentinella.gelocal.it/locale/vestigne-centra-un-auto-e-si-ribalta/138931/139466',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://video.espresso.repubblica.it/tutti-i-video/01-ted-villa/14772',
|
||||||
|
'only_matching': True,
|
||||||
|
}]
|
||||||
|
|
||||||
|
def _real_extract(self, url):
|
||||||
|
video_id = self._match_id(url)
|
||||||
|
|
||||||
|
webpage = self._download_webpage(url, video_id)
|
||||||
|
title = self._html_search_meta(
|
||||||
|
['twitter:title', 'og:title'], webpage, fatal=True)
|
||||||
|
player_data = re.findall(
|
||||||
|
r"PlayerFactory\.setParam\('(?P<type>format|param)',\s*'(?P<name>[^']+)',\s*'(?P<val>[^']+)'\);",
|
||||||
|
webpage)
|
||||||
|
|
||||||
|
formats = []
|
||||||
|
duration = thumb = None
|
||||||
|
for t, n, v in player_data:
|
||||||
|
if t == 'format':
|
||||||
|
if n in ('video-hds-vod-ec', 'video-hls-vod-ec', 'video-viralize', 'video-youtube-pfp'):
|
||||||
|
continue
|
||||||
|
elif n.endswith('-vod-ak'):
|
||||||
|
formats.extend(self._extract_akamai_formats(
|
||||||
|
v, video_id, {'http': 'media.gedidigital.it'}))
|
||||||
|
else:
|
||||||
|
ext = determine_ext(v)
|
||||||
|
if ext == 'm3u8':
|
||||||
|
formats.extend(self._extract_m3u8_formats(
|
||||||
|
v, video_id, 'mp4', 'm3u8_native', m3u8_id=n, fatal=False))
|
||||||
|
continue
|
||||||
|
f = {
|
||||||
|
'format_id': n,
|
||||||
|
'url': v,
|
||||||
|
}
|
||||||
|
if ext == 'mp3':
|
||||||
|
abr = int_or_none(self._search_regex(
|
||||||
|
r'-mp3-audio-(\d+)', v, 'abr', default=None))
|
||||||
|
f.update({
|
||||||
|
'abr': abr,
|
||||||
|
'tbr': abr,
|
||||||
|
'vcodec': 'none'
|
||||||
|
})
|
||||||
|
else:
|
||||||
|
mobj = re.match(r'^video-rrtv-(\d+)(?:-(\d+))?$', n)
|
||||||
|
if mobj:
|
||||||
|
f.update({
|
||||||
|
'height': int(mobj.group(1)),
|
||||||
|
'vbr': int_or_none(mobj.group(2)),
|
||||||
|
})
|
||||||
|
if not f.get('vbr'):
|
||||||
|
f['vbr'] = int_or_none(self._search_regex(
|
||||||
|
r'-video-rrtv-(\d+)', v, 'abr', default=None))
|
||||||
|
formats.append(f)
|
||||||
|
elif t == 'param':
|
||||||
|
if n in ['image_full', 'image']:
|
||||||
|
thumb = v
|
||||||
|
elif n == 'videoDuration':
|
||||||
|
duration = int_or_none(v)
|
||||||
|
|
||||||
|
self._sort_formats(formats)
|
||||||
|
|
||||||
|
return {
|
||||||
|
'id': video_id,
|
||||||
|
'title': title,
|
||||||
|
'description': self._html_search_meta(
|
||||||
|
['twitter:description', 'og:description', 'description'], webpage),
|
||||||
|
'thumbnail': thumb or self._og_search_thumbnail(webpage),
|
||||||
|
'formats': formats,
|
||||||
|
'duration': duration,
|
||||||
|
}
|
@ -2953,7 +2953,7 @@ class GenericIE(InfoExtractor):
|
|||||||
webpage)
|
webpage)
|
||||||
if not mobj:
|
if not mobj:
|
||||||
mobj = re.search(
|
mobj = re.search(
|
||||||
r'data-video-link=["\'](?P<url>http://m.mlb.com/video/[^"\']+)',
|
r'data-video-link=["\'](?P<url>http://m\.mlb\.com/video/[^"\']+)',
|
||||||
webpage)
|
webpage)
|
||||||
if mobj is not None:
|
if mobj is not None:
|
||||||
return self.url_result(mobj.group('url'), 'MLB')
|
return self.url_result(mobj.group('url'), 'MLB')
|
||||||
|
@ -4,10 +4,12 @@ from __future__ import unicode_literals
|
|||||||
import re
|
import re
|
||||||
|
|
||||||
from .adobepass import AdobePassIE
|
from .adobepass import AdobePassIE
|
||||||
|
from ..compat import compat_str
|
||||||
from ..utils import (
|
from ..utils import (
|
||||||
int_or_none,
|
int_or_none,
|
||||||
determine_ext,
|
determine_ext,
|
||||||
parse_age_limit,
|
parse_age_limit,
|
||||||
|
try_get,
|
||||||
urlencode_postdata,
|
urlencode_postdata,
|
||||||
ExtractorError,
|
ExtractorError,
|
||||||
)
|
)
|
||||||
@ -116,6 +118,18 @@ class GoIE(AdobePassIE):
|
|||||||
# m3u8 download
|
# m3u8 download
|
||||||
'skip_download': True,
|
'skip_download': True,
|
||||||
},
|
},
|
||||||
|
}, {
|
||||||
|
'url': 'https://abc.com/shows/modern-family/episode-guide/season-01/101-pilot',
|
||||||
|
'info_dict': {
|
||||||
|
'id': 'VDKA22600213',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'title': 'Pilot',
|
||||||
|
'description': 'md5:74306df917cfc199d76d061d66bebdb4',
|
||||||
|
},
|
||||||
|
'params': {
|
||||||
|
# m3u8 download
|
||||||
|
'skip_download': True,
|
||||||
|
},
|
||||||
}, {
|
}, {
|
||||||
'url': 'http://abc.go.com/shows/the-catch/episode-guide/season-01/10-the-wedding',
|
'url': 'http://abc.go.com/shows/the-catch/episode-guide/season-01/10-the-wedding',
|
||||||
'only_matching': True,
|
'only_matching': True,
|
||||||
@ -149,11 +163,27 @@ class GoIE(AdobePassIE):
|
|||||||
brand = site_info.get('brand')
|
brand = site_info.get('brand')
|
||||||
if not video_id or not site_info:
|
if not video_id or not site_info:
|
||||||
webpage = self._download_webpage(url, display_id or video_id)
|
webpage = self._download_webpage(url, display_id or video_id)
|
||||||
|
data = self._parse_json(
|
||||||
|
self._search_regex(
|
||||||
|
r'["\']__abc_com__["\']\s*\]\s*=\s*({.+?})\s*;', webpage,
|
||||||
|
'data', default='{}'),
|
||||||
|
display_id or video_id, fatal=False)
|
||||||
|
# https://abc.com/shows/modern-family/episode-guide/season-01/101-pilot
|
||||||
|
layout = try_get(data, lambda x: x['page']['content']['video']['layout'], dict)
|
||||||
|
video_id = None
|
||||||
|
if layout:
|
||||||
|
video_id = try_get(
|
||||||
|
layout,
|
||||||
|
(lambda x: x['videoid'], lambda x: x['video']['id']),
|
||||||
|
compat_str)
|
||||||
|
if not video_id:
|
||||||
video_id = self._search_regex(
|
video_id = self._search_regex(
|
||||||
(
|
(
|
||||||
# There may be inner quotes, e.g. data-video-id="'VDKA3609139'"
|
# There may be inner quotes, e.g. data-video-id="'VDKA3609139'"
|
||||||
# from http://freeform.go.com/shows/shadowhunters/episodes/season-2/1-this-guilty-blood
|
# from http://freeform.go.com/shows/shadowhunters/episodes/season-2/1-this-guilty-blood
|
||||||
r'data-video-id=["\']*(VDKA\w+)',
|
r'data-video-id=["\']*(VDKA\w+)',
|
||||||
|
# page.analytics.videoIdCode
|
||||||
|
r'\bvideoIdCode["\']\s*:\s*["\']((?:vdka|VDKA)\w+)',
|
||||||
# https://abc.com/shows/the-rookie/episode-guide/season-02/03-the-bet
|
# https://abc.com/shows/the-rookie/episode-guide/season-02/03-the-bet
|
||||||
r'\b(?:video)?id["\']\s*:\s*["\'](VDKA\w+)'
|
r'\b(?:video)?id["\']\s*:\s*["\'](VDKA\w+)'
|
||||||
), webpage, 'video id', default=video_id)
|
), webpage, 'video id', default=video_id)
|
||||||
|
@ -12,6 +12,7 @@ from ..compat import (
|
|||||||
)
|
)
|
||||||
from ..utils import (
|
from ..utils import (
|
||||||
ExtractorError,
|
ExtractorError,
|
||||||
|
float_or_none,
|
||||||
get_element_by_attribute,
|
get_element_by_attribute,
|
||||||
int_or_none,
|
int_or_none,
|
||||||
lowercase_escape,
|
lowercase_escape,
|
||||||
@ -32,6 +33,7 @@ class InstagramIE(InfoExtractor):
|
|||||||
'title': 'Video by naomipq',
|
'title': 'Video by naomipq',
|
||||||
'description': 'md5:1f17f0ab29bd6fe2bfad705f58de3cb8',
|
'description': 'md5:1f17f0ab29bd6fe2bfad705f58de3cb8',
|
||||||
'thumbnail': r're:^https?://.*\.jpg',
|
'thumbnail': r're:^https?://.*\.jpg',
|
||||||
|
'duration': 0,
|
||||||
'timestamp': 1371748545,
|
'timestamp': 1371748545,
|
||||||
'upload_date': '20130620',
|
'upload_date': '20130620',
|
||||||
'uploader_id': 'naomipq',
|
'uploader_id': 'naomipq',
|
||||||
@ -48,6 +50,7 @@ class InstagramIE(InfoExtractor):
|
|||||||
'ext': 'mp4',
|
'ext': 'mp4',
|
||||||
'title': 'Video by britneyspears',
|
'title': 'Video by britneyspears',
|
||||||
'thumbnail': r're:^https?://.*\.jpg',
|
'thumbnail': r're:^https?://.*\.jpg',
|
||||||
|
'duration': 0,
|
||||||
'timestamp': 1453760977,
|
'timestamp': 1453760977,
|
||||||
'upload_date': '20160125',
|
'upload_date': '20160125',
|
||||||
'uploader_id': 'britneyspears',
|
'uploader_id': 'britneyspears',
|
||||||
@ -86,6 +89,24 @@ class InstagramIE(InfoExtractor):
|
|||||||
'title': 'Post by instagram',
|
'title': 'Post by instagram',
|
||||||
'description': 'md5:0f9203fc6a2ce4d228da5754bcf54957',
|
'description': 'md5:0f9203fc6a2ce4d228da5754bcf54957',
|
||||||
},
|
},
|
||||||
|
}, {
|
||||||
|
# IGTV
|
||||||
|
'url': 'https://www.instagram.com/tv/BkfuX9UB-eK/',
|
||||||
|
'info_dict': {
|
||||||
|
'id': 'BkfuX9UB-eK',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'title': 'Fingerboarding Tricks with @cass.fb',
|
||||||
|
'thumbnail': r're:^https?://.*\.jpg',
|
||||||
|
'duration': 53.83,
|
||||||
|
'timestamp': 1530032919,
|
||||||
|
'upload_date': '20180626',
|
||||||
|
'uploader_id': 'instagram',
|
||||||
|
'uploader': 'Instagram',
|
||||||
|
'like_count': int,
|
||||||
|
'comment_count': int,
|
||||||
|
'comments': list,
|
||||||
|
'description': 'Meet Cass Hirst (@cass.fb), a fingerboarding pro who can perform tiny ollies and kickflips while blindfolded.',
|
||||||
|
}
|
||||||
}, {
|
}, {
|
||||||
'url': 'https://instagram.com/p/-Cmh1cukG2/',
|
'url': 'https://instagram.com/p/-Cmh1cukG2/',
|
||||||
'only_matching': True,
|
'only_matching': True,
|
||||||
@ -159,7 +180,9 @@ class InstagramIE(InfoExtractor):
|
|||||||
description = try_get(
|
description = try_get(
|
||||||
media, lambda x: x['edge_media_to_caption']['edges'][0]['node']['text'],
|
media, lambda x: x['edge_media_to_caption']['edges'][0]['node']['text'],
|
||||||
compat_str) or media.get('caption')
|
compat_str) or media.get('caption')
|
||||||
|
title = media.get('title')
|
||||||
thumbnail = media.get('display_src') or media.get('display_url')
|
thumbnail = media.get('display_src') or media.get('display_url')
|
||||||
|
duration = float_or_none(media.get('video_duration'))
|
||||||
timestamp = int_or_none(media.get('taken_at_timestamp') or media.get('date'))
|
timestamp = int_or_none(media.get('taken_at_timestamp') or media.get('date'))
|
||||||
uploader = media.get('owner', {}).get('full_name')
|
uploader = media.get('owner', {}).get('full_name')
|
||||||
uploader_id = media.get('owner', {}).get('username')
|
uploader_id = media.get('owner', {}).get('username')
|
||||||
@ -200,9 +223,10 @@ class InstagramIE(InfoExtractor):
|
|||||||
continue
|
continue
|
||||||
entries.append({
|
entries.append({
|
||||||
'id': node.get('shortcode') or node['id'],
|
'id': node.get('shortcode') or node['id'],
|
||||||
'title': 'Video %d' % edge_num,
|
'title': node.get('title') or 'Video %d' % edge_num,
|
||||||
'url': node_video_url,
|
'url': node_video_url,
|
||||||
'thumbnail': node.get('display_url'),
|
'thumbnail': node.get('display_url'),
|
||||||
|
'duration': float_or_none(node.get('video_duration')),
|
||||||
'width': int_or_none(try_get(node, lambda x: x['dimensions']['width'])),
|
'width': int_or_none(try_get(node, lambda x: x['dimensions']['width'])),
|
||||||
'height': int_or_none(try_get(node, lambda x: x['dimensions']['height'])),
|
'height': int_or_none(try_get(node, lambda x: x['dimensions']['height'])),
|
||||||
'view_count': int_or_none(node.get('video_view_count')),
|
'view_count': int_or_none(node.get('video_view_count')),
|
||||||
@ -239,8 +263,9 @@ class InstagramIE(InfoExtractor):
|
|||||||
'id': video_id,
|
'id': video_id,
|
||||||
'formats': formats,
|
'formats': formats,
|
||||||
'ext': 'mp4',
|
'ext': 'mp4',
|
||||||
'title': 'Video by %s' % uploader_id,
|
'title': title or 'Video by %s' % uploader_id,
|
||||||
'description': description,
|
'description': description,
|
||||||
|
'duration': duration,
|
||||||
'thumbnail': thumbnail,
|
'thumbnail': thumbnail,
|
||||||
'timestamp': timestamp,
|
'timestamp': timestamp,
|
||||||
'uploader_id': uploader_id,
|
'uploader_id': uploader_id,
|
||||||
|
@ -29,34 +29,51 @@ class JamendoIE(InfoExtractor):
|
|||||||
'id': '196219',
|
'id': '196219',
|
||||||
'display_id': 'stories-from-emona-i',
|
'display_id': 'stories-from-emona-i',
|
||||||
'ext': 'flac',
|
'ext': 'flac',
|
||||||
'title': 'Maya Filipič - Stories from Emona I',
|
# 'title': 'Maya Filipič - Stories from Emona I',
|
||||||
'artist': 'Maya Filipič',
|
'title': 'Stories from Emona I',
|
||||||
|
# 'artist': 'Maya Filipič',
|
||||||
'track': 'Stories from Emona I',
|
'track': 'Stories from Emona I',
|
||||||
'duration': 210,
|
'duration': 210,
|
||||||
'thumbnail': r're:^https?://.*\.jpg',
|
'thumbnail': r're:^https?://.*\.jpg',
|
||||||
'timestamp': 1217438117,
|
'timestamp': 1217438117,
|
||||||
'upload_date': '20080730',
|
'upload_date': '20080730',
|
||||||
|
'license': 'by-nc-nd',
|
||||||
|
'view_count': int,
|
||||||
|
'like_count': int,
|
||||||
|
'average_rating': int,
|
||||||
|
'tags': ['piano', 'peaceful', 'newage', 'strings', 'upbeat'],
|
||||||
}
|
}
|
||||||
}, {
|
}, {
|
||||||
'url': 'https://licensing.jamendo.com/en/track/1496667/energetic-rock',
|
'url': 'https://licensing.jamendo.com/en/track/1496667/energetic-rock',
|
||||||
'only_matching': True,
|
'only_matching': True,
|
||||||
}]
|
}]
|
||||||
|
|
||||||
|
def _call_api(self, resource, resource_id):
|
||||||
|
path = '/api/%ss' % resource
|
||||||
|
rand = compat_str(random.random())
|
||||||
|
return self._download_json(
|
||||||
|
'https://www.jamendo.com' + path, resource_id, query={
|
||||||
|
'id[]': resource_id,
|
||||||
|
}, headers={
|
||||||
|
'X-Jam-Call': '$%s*%s~' % (hashlib.sha1((path + rand).encode()).hexdigest(), rand)
|
||||||
|
})[0]
|
||||||
|
|
||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
track_id, display_id = self._VALID_URL_RE.match(url).groups()
|
track_id, display_id = self._VALID_URL_RE.match(url).groups()
|
||||||
webpage = self._download_webpage(
|
# webpage = self._download_webpage(
|
||||||
'https://www.jamendo.com/track/' + track_id, track_id)
|
# 'https://www.jamendo.com/track/' + track_id, track_id)
|
||||||
models = self._parse_json(self._html_search_regex(
|
# models = self._parse_json(self._html_search_regex(
|
||||||
r"data-bundled-models='([^']+)",
|
# r"data-bundled-models='([^']+)",
|
||||||
webpage, 'bundled models'), track_id)
|
# webpage, 'bundled models'), track_id)
|
||||||
track = models['track']['models'][0]
|
# track = models['track']['models'][0]
|
||||||
|
track = self._call_api('track', track_id)
|
||||||
title = track_name = track['name']
|
title = track_name = track['name']
|
||||||
get_model = lambda x: try_get(models, lambda y: y[x]['models'][0], dict) or {}
|
# get_model = lambda x: try_get(models, lambda y: y[x]['models'][0], dict) or {}
|
||||||
artist = get_model('artist')
|
# artist = get_model('artist')
|
||||||
artist_name = artist.get('name')
|
# artist_name = artist.get('name')
|
||||||
if artist_name:
|
# if artist_name:
|
||||||
title = '%s - %s' % (artist_name, title)
|
# title = '%s - %s' % (artist_name, title)
|
||||||
album = get_model('album')
|
# album = get_model('album')
|
||||||
|
|
||||||
formats = [{
|
formats = [{
|
||||||
'url': 'https://%s.jamendo.com/?trackid=%s&format=%s&from=app-97dab294'
|
'url': 'https://%s.jamendo.com/?trackid=%s&format=%s&from=app-97dab294'
|
||||||
@ -74,7 +91,7 @@ class JamendoIE(InfoExtractor):
|
|||||||
|
|
||||||
urls = []
|
urls = []
|
||||||
thumbnails = []
|
thumbnails = []
|
||||||
for _, covers in track.get('cover', {}).items():
|
for covers in (track.get('cover') or {}).values():
|
||||||
for cover_id, cover_url in covers.items():
|
for cover_id, cover_url in covers.items():
|
||||||
if not cover_url or cover_url in urls:
|
if not cover_url or cover_url in urls:
|
||||||
continue
|
continue
|
||||||
@ -88,13 +105,14 @@ class JamendoIE(InfoExtractor):
|
|||||||
})
|
})
|
||||||
|
|
||||||
tags = []
|
tags = []
|
||||||
for tag in track.get('tags', []):
|
for tag in (track.get('tags') or []):
|
||||||
tag_name = tag.get('name')
|
tag_name = tag.get('name')
|
||||||
if not tag_name:
|
if not tag_name:
|
||||||
continue
|
continue
|
||||||
tags.append(tag_name)
|
tags.append(tag_name)
|
||||||
|
|
||||||
stats = track.get('stats') or {}
|
stats = track.get('stats') or {}
|
||||||
|
license = track.get('licenseCC') or []
|
||||||
|
|
||||||
return {
|
return {
|
||||||
'id': track_id,
|
'id': track_id,
|
||||||
@ -103,11 +121,11 @@ class JamendoIE(InfoExtractor):
|
|||||||
'title': title,
|
'title': title,
|
||||||
'description': track.get('description'),
|
'description': track.get('description'),
|
||||||
'duration': int_or_none(track.get('duration')),
|
'duration': int_or_none(track.get('duration')),
|
||||||
'artist': artist_name,
|
# 'artist': artist_name,
|
||||||
'track': track_name,
|
'track': track_name,
|
||||||
'album': album.get('name'),
|
# 'album': album.get('name'),
|
||||||
'formats': formats,
|
'formats': formats,
|
||||||
'license': '-'.join(track.get('licenseCC', [])) or None,
|
'license': '-'.join(license) if license else None,
|
||||||
'timestamp': int_or_none(track.get('dateCreated')),
|
'timestamp': int_or_none(track.get('dateCreated')),
|
||||||
'view_count': int_or_none(stats.get('listenedAll')),
|
'view_count': int_or_none(stats.get('listenedAll')),
|
||||||
'like_count': int_or_none(stats.get('favorited')),
|
'like_count': int_or_none(stats.get('favorited')),
|
||||||
@ -116,9 +134,9 @@ class JamendoIE(InfoExtractor):
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
class JamendoAlbumIE(InfoExtractor):
|
class JamendoAlbumIE(JamendoIE):
|
||||||
_VALID_URL = r'https?://(?:www\.)?jamendo\.com/album/(?P<id>[0-9]+)'
|
_VALID_URL = r'https?://(?:www\.)?jamendo\.com/album/(?P<id>[0-9]+)'
|
||||||
_TEST = {
|
_TESTS = [{
|
||||||
'url': 'https://www.jamendo.com/album/121486/duck-on-cover',
|
'url': 'https://www.jamendo.com/album/121486/duck-on-cover',
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
'id': '121486',
|
'id': '121486',
|
||||||
@ -151,17 +169,7 @@ class JamendoAlbumIE(InfoExtractor):
|
|||||||
'params': {
|
'params': {
|
||||||
'playlistend': 2
|
'playlistend': 2
|
||||||
}
|
}
|
||||||
}
|
}]
|
||||||
|
|
||||||
def _call_api(self, resource, resource_id):
|
|
||||||
path = '/api/%ss' % resource
|
|
||||||
rand = compat_str(random.random())
|
|
||||||
return self._download_json(
|
|
||||||
'https://www.jamendo.com' + path, resource_id, query={
|
|
||||||
'id[]': resource_id,
|
|
||||||
}, headers={
|
|
||||||
'X-Jam-Call': '$%s*%s~' % (hashlib.sha1((path + rand).encode()).hexdigest(), rand)
|
|
||||||
})[0]
|
|
||||||
|
|
||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
album_id = self._match_id(url)
|
album_id = self._match_id(url)
|
||||||
@ -169,7 +177,7 @@ class JamendoAlbumIE(InfoExtractor):
|
|||||||
album_name = album.get('name')
|
album_name = album.get('name')
|
||||||
|
|
||||||
entries = []
|
entries = []
|
||||||
for track in album.get('tracks', []):
|
for track in (album.get('tracks') or []):
|
||||||
track_id = track.get('id')
|
track_id = track.get('id')
|
||||||
if not track_id:
|
if not track_id:
|
||||||
continue
|
continue
|
||||||
|
@ -120,7 +120,7 @@ class KalturaIE(InfoExtractor):
|
|||||||
def _extract_urls(webpage):
|
def _extract_urls(webpage):
|
||||||
# Embed codes: https://knowledge.kaltura.com/embedding-kaltura-media-players-your-site
|
# Embed codes: https://knowledge.kaltura.com/embedding-kaltura-media-players-your-site
|
||||||
finditer = (
|
finditer = (
|
||||||
re.finditer(
|
list(re.finditer(
|
||||||
r"""(?xs)
|
r"""(?xs)
|
||||||
kWidget\.(?:thumb)?[Ee]mbed\(
|
kWidget\.(?:thumb)?[Ee]mbed\(
|
||||||
\{.*?
|
\{.*?
|
||||||
@ -128,8 +128,8 @@ class KalturaIE(InfoExtractor):
|
|||||||
(?P<q2>['"])_?(?P<partner_id>(?:(?!(?P=q2)).)+)(?P=q2),.*?
|
(?P<q2>['"])_?(?P<partner_id>(?:(?!(?P=q2)).)+)(?P=q2),.*?
|
||||||
(?P<q3>['"])entry_?[Ii]d(?P=q3)\s*:\s*
|
(?P<q3>['"])entry_?[Ii]d(?P=q3)\s*:\s*
|
||||||
(?P<q4>['"])(?P<id>(?:(?!(?P=q4)).)+)(?P=q4)(?:,|\s*\})
|
(?P<q4>['"])(?P<id>(?:(?!(?P=q4)).)+)(?P=q4)(?:,|\s*\})
|
||||||
""", webpage)
|
""", webpage))
|
||||||
or re.finditer(
|
or list(re.finditer(
|
||||||
r'''(?xs)
|
r'''(?xs)
|
||||||
(?P<q1>["'])
|
(?P<q1>["'])
|
||||||
(?:https?:)?//cdnapi(?:sec)?\.kaltura\.com(?::\d+)?/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+)(?:(?!(?P=q1)).)*
|
(?:https?:)?//cdnapi(?:sec)?\.kaltura\.com(?::\d+)?/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+)(?:(?!(?P=q1)).)*
|
||||||
@ -142,16 +142,16 @@ class KalturaIE(InfoExtractor):
|
|||||||
\[\s*(?P<q2_1>["'])entry_?[Ii]d(?P=q2_1)\s*\]\s*=\s*
|
\[\s*(?P<q2_1>["'])entry_?[Ii]d(?P=q2_1)\s*\]\s*=\s*
|
||||||
)
|
)
|
||||||
(?P<q3>["'])(?P<id>(?:(?!(?P=q3)).)+)(?P=q3)
|
(?P<q3>["'])(?P<id>(?:(?!(?P=q3)).)+)(?P=q3)
|
||||||
''', webpage)
|
''', webpage))
|
||||||
or re.finditer(
|
or list(re.finditer(
|
||||||
r'''(?xs)
|
r'''(?xs)
|
||||||
<(?:iframe[^>]+src|meta[^>]+\bcontent)=(?P<q1>["'])
|
<(?:iframe[^>]+src|meta[^>]+\bcontent)=(?P<q1>["'])\s*
|
||||||
(?:https?:)?//(?:(?:www|cdnapi(?:sec)?)\.)?kaltura\.com/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+)
|
(?:https?:)?//(?:(?:www|cdnapi(?:sec)?)\.)?kaltura\.com/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+)
|
||||||
(?:(?!(?P=q1)).)*
|
(?:(?!(?P=q1)).)*
|
||||||
[?&;]entry_id=(?P<id>(?:(?!(?P=q1))[^&])+)
|
[?&;]entry_id=(?P<id>(?:(?!(?P=q1))[^&])+)
|
||||||
(?:(?!(?P=q1)).)*
|
(?:(?!(?P=q1)).)*
|
||||||
(?P=q1)
|
(?P=q1)
|
||||||
''', webpage)
|
''', webpage))
|
||||||
)
|
)
|
||||||
urls = []
|
urls = []
|
||||||
for mobj in finditer:
|
for mobj in finditer:
|
||||||
|
@ -6,8 +6,10 @@ import json
|
|||||||
|
|
||||||
from .common import InfoExtractor
|
from .common import InfoExtractor
|
||||||
from ..compat import (
|
from ..compat import (
|
||||||
|
compat_parse_qs,
|
||||||
compat_str,
|
compat_str,
|
||||||
compat_urllib_parse_unquote,
|
compat_urllib_parse_unquote,
|
||||||
|
compat_urllib_parse_urlparse,
|
||||||
)
|
)
|
||||||
from ..utils import (
|
from ..utils import (
|
||||||
determine_ext,
|
determine_ext,
|
||||||
@ -60,6 +62,7 @@ class LBRYBaseIE(InfoExtractor):
|
|||||||
'description': stream_value.get('description'),
|
'description': stream_value.get('description'),
|
||||||
'license': stream_value.get('license'),
|
'license': stream_value.get('license'),
|
||||||
'timestamp': int_or_none(stream.get('timestamp')),
|
'timestamp': int_or_none(stream.get('timestamp')),
|
||||||
|
'release_timestamp': int_or_none(stream_value.get('release_time')),
|
||||||
'tags': stream_value.get('tags'),
|
'tags': stream_value.get('tags'),
|
||||||
'duration': int_or_none(media.get('duration')),
|
'duration': int_or_none(media.get('duration')),
|
||||||
'channel': try_get(signing_channel, lambda x: x['value']['title']),
|
'channel': try_get(signing_channel, lambda x: x['value']['title']),
|
||||||
@ -92,6 +95,8 @@ class LBRYIE(LBRYBaseIE):
|
|||||||
'description': 'md5:f6cb5c704b332d37f5119313c2c98f51',
|
'description': 'md5:f6cb5c704b332d37f5119313c2c98f51',
|
||||||
'timestamp': 1595694354,
|
'timestamp': 1595694354,
|
||||||
'upload_date': '20200725',
|
'upload_date': '20200725',
|
||||||
|
'release_timestamp': 1595340697,
|
||||||
|
'release_date': '20200721',
|
||||||
'width': 1280,
|
'width': 1280,
|
||||||
'height': 720,
|
'height': 720,
|
||||||
}
|
}
|
||||||
@ -106,6 +111,8 @@ class LBRYIE(LBRYBaseIE):
|
|||||||
'description': 'md5:661ac4f1db09f31728931d7b88807a61',
|
'description': 'md5:661ac4f1db09f31728931d7b88807a61',
|
||||||
'timestamp': 1591312601,
|
'timestamp': 1591312601,
|
||||||
'upload_date': '20200604',
|
'upload_date': '20200604',
|
||||||
|
'release_timestamp': 1591312421,
|
||||||
|
'release_date': '20200604',
|
||||||
'tags': list,
|
'tags': list,
|
||||||
'duration': 2570,
|
'duration': 2570,
|
||||||
'channel': 'The LBRY Foundation',
|
'channel': 'The LBRY Foundation',
|
||||||
@ -113,6 +120,26 @@ class LBRYIE(LBRYBaseIE):
|
|||||||
'channel_url': 'https://lbry.tv/@LBRYFoundation:0ed629d2b9c601300cacf7eabe9da0be79010212',
|
'channel_url': 'https://lbry.tv/@LBRYFoundation:0ed629d2b9c601300cacf7eabe9da0be79010212',
|
||||||
'vcodec': 'none',
|
'vcodec': 'none',
|
||||||
}
|
}
|
||||||
|
}, {
|
||||||
|
# HLS
|
||||||
|
'url': 'https://odysee.com/@gardeningincanada:b/plants-i-will-never-grow-again.-the:e',
|
||||||
|
'md5': 'fc82f45ea54915b1495dd7cb5cc1289f',
|
||||||
|
'info_dict': {
|
||||||
|
'id': 'e51671357333fe22ae88aad320bde2f6f96b1410',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'title': 'PLANTS I WILL NEVER GROW AGAIN. THE BLACK LIST PLANTS FOR A CANADIAN GARDEN | Gardening in Canada 🍁',
|
||||||
|
'description': 'md5:9c539c6a03fb843956de61a4d5288d5e',
|
||||||
|
'timestamp': 1618254123,
|
||||||
|
'upload_date': '20210412',
|
||||||
|
'release_timestamp': 1618254002,
|
||||||
|
'release_date': '20210412',
|
||||||
|
'tags': list,
|
||||||
|
'duration': 554,
|
||||||
|
'channel': 'Gardening In Canada',
|
||||||
|
'channel_id': 'b8be0e93b423dad221abe29545fbe8ec36e806bc',
|
||||||
|
'channel_url': 'https://odysee.com/@gardeningincanada:b8be0e93b423dad221abe29545fbe8ec36e806bc',
|
||||||
|
'formats': 'mincount:3',
|
||||||
|
}
|
||||||
}, {
|
}, {
|
||||||
'url': 'https://odysee.com/@BrodieRobertson:5/apple-is-tracking-everything-you-do-on:e',
|
'url': 'https://odysee.com/@BrodieRobertson:5/apple-is-tracking-everything-you-do-on:e',
|
||||||
'only_matching': True,
|
'only_matching': True,
|
||||||
@ -156,10 +183,18 @@ class LBRYIE(LBRYBaseIE):
|
|||||||
streaming_url = self._call_api_proxy(
|
streaming_url = self._call_api_proxy(
|
||||||
'get', claim_id, {'uri': uri}, 'streaming url')['streaming_url']
|
'get', claim_id, {'uri': uri}, 'streaming url')['streaming_url']
|
||||||
info = self._parse_stream(result, url)
|
info = self._parse_stream(result, url)
|
||||||
|
urlh = self._request_webpage(
|
||||||
|
streaming_url, display_id, note='Downloading streaming redirect url info')
|
||||||
|
if determine_ext(urlh.geturl()) == 'm3u8':
|
||||||
|
info['formats'] = self._extract_m3u8_formats(
|
||||||
|
urlh.geturl(), display_id, 'mp4', entry_protocol='m3u8_native',
|
||||||
|
m3u8_id='hls')
|
||||||
|
self._sort_formats(info['formats'])
|
||||||
|
else:
|
||||||
|
info['url'] = streaming_url
|
||||||
info.update({
|
info.update({
|
||||||
'id': claim_id,
|
'id': claim_id,
|
||||||
'title': title,
|
'title': title,
|
||||||
'url': streaming_url,
|
|
||||||
})
|
})
|
||||||
return info
|
return info
|
||||||
|
|
||||||
@ -181,17 +216,18 @@ class LBRYChannelIE(LBRYBaseIE):
|
|||||||
}]
|
}]
|
||||||
_PAGE_SIZE = 50
|
_PAGE_SIZE = 50
|
||||||
|
|
||||||
def _fetch_page(self, claim_id, url, page):
|
def _fetch_page(self, claim_id, url, params, page):
|
||||||
page += 1
|
page += 1
|
||||||
result = self._call_api_proxy(
|
page_params = {
|
||||||
'claim_search', claim_id, {
|
|
||||||
'channel_ids': [claim_id],
|
'channel_ids': [claim_id],
|
||||||
'claim_type': 'stream',
|
'claim_type': 'stream',
|
||||||
'no_totals': True,
|
'no_totals': True,
|
||||||
'page': page,
|
'page': page,
|
||||||
'page_size': self._PAGE_SIZE,
|
'page_size': self._PAGE_SIZE,
|
||||||
'stream_types': self._SUPPORTED_STREAM_TYPES,
|
}
|
||||||
}, 'page %d' % page)
|
page_params.update(params)
|
||||||
|
result = self._call_api_proxy(
|
||||||
|
'claim_search', claim_id, page_params, 'page %d' % page)
|
||||||
for item in (result.get('items') or []):
|
for item in (result.get('items') or []):
|
||||||
stream_claim_name = item.get('name')
|
stream_claim_name = item.get('name')
|
||||||
stream_claim_id = item.get('claim_id')
|
stream_claim_id = item.get('claim_id')
|
||||||
@ -212,8 +248,31 @@ class LBRYChannelIE(LBRYBaseIE):
|
|||||||
result = self._resolve_url(
|
result = self._resolve_url(
|
||||||
'lbry://' + display_id, display_id, 'channel')
|
'lbry://' + display_id, display_id, 'channel')
|
||||||
claim_id = result['claim_id']
|
claim_id = result['claim_id']
|
||||||
|
qs = compat_parse_qs(compat_urllib_parse_urlparse(url).query)
|
||||||
|
content = qs.get('content', [None])[0]
|
||||||
|
params = {
|
||||||
|
'fee_amount': qs.get('fee_amount', ['>=0'])[0],
|
||||||
|
'order_by': {
|
||||||
|
'new': ['release_time'],
|
||||||
|
'top': ['effective_amount'],
|
||||||
|
'trending': ['trending_group', 'trending_mixed'],
|
||||||
|
}[qs.get('order', ['new'])[0]],
|
||||||
|
'stream_types': [content] if content in ['audio', 'video'] else self._SUPPORTED_STREAM_TYPES,
|
||||||
|
}
|
||||||
|
duration = qs.get('duration', [None])[0]
|
||||||
|
if duration:
|
||||||
|
params['duration'] = {
|
||||||
|
'long': '>=1200',
|
||||||
|
'short': '<=240',
|
||||||
|
}[duration]
|
||||||
|
language = qs.get('language', ['all'])[0]
|
||||||
|
if language != 'all':
|
||||||
|
languages = [language]
|
||||||
|
if language == 'en':
|
||||||
|
languages.append('none')
|
||||||
|
params['any_languages'] = languages
|
||||||
entries = OnDemandPagedList(
|
entries = OnDemandPagedList(
|
||||||
functools.partial(self._fetch_page, claim_id, url),
|
functools.partial(self._fetch_page, claim_id, url, params),
|
||||||
self._PAGE_SIZE)
|
self._PAGE_SIZE)
|
||||||
result_value = result.get('value') or {}
|
result_value = result.get('value') or {}
|
||||||
return self.playlist_result(
|
return self.playlist_result(
|
||||||
|
@ -4,7 +4,13 @@ from __future__ import unicode_literals
|
|||||||
import re
|
import re
|
||||||
|
|
||||||
from .common import InfoExtractor
|
from .common import InfoExtractor
|
||||||
from ..utils import js_to_json
|
from ..compat import compat_str
|
||||||
|
from ..utils import (
|
||||||
|
ExtractorError,
|
||||||
|
int_or_none,
|
||||||
|
js_to_json,
|
||||||
|
str_or_none,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
class LineTVIE(InfoExtractor):
|
class LineTVIE(InfoExtractor):
|
||||||
@ -88,3 +94,137 @@ class LineTVIE(InfoExtractor):
|
|||||||
for thumbnail in video_info.get('thumbnails', {}).get('list', [])],
|
for thumbnail in video_info.get('thumbnails', {}).get('list', [])],
|
||||||
'view_count': video_info.get('meta', {}).get('count'),
|
'view_count': video_info.get('meta', {}).get('count'),
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
class LineLiveBaseIE(InfoExtractor):
|
||||||
|
_API_BASE_URL = 'https://live-api.line-apps.com/web/v4.0/channel/'
|
||||||
|
|
||||||
|
def _parse_broadcast_item(self, item):
|
||||||
|
broadcast_id = compat_str(item['id'])
|
||||||
|
title = item['title']
|
||||||
|
is_live = item.get('isBroadcastingNow')
|
||||||
|
|
||||||
|
thumbnails = []
|
||||||
|
for thumbnail_id, thumbnail_url in (item.get('thumbnailURLs') or {}).items():
|
||||||
|
if not thumbnail_url:
|
||||||
|
continue
|
||||||
|
thumbnails.append({
|
||||||
|
'id': thumbnail_id,
|
||||||
|
'url': thumbnail_url,
|
||||||
|
})
|
||||||
|
|
||||||
|
channel = item.get('channel') or {}
|
||||||
|
channel_id = str_or_none(channel.get('id'))
|
||||||
|
|
||||||
|
return {
|
||||||
|
'id': broadcast_id,
|
||||||
|
'title': self._live_title(title) if is_live else title,
|
||||||
|
'thumbnails': thumbnails,
|
||||||
|
'timestamp': int_or_none(item.get('createdAt')),
|
||||||
|
'channel': channel.get('name'),
|
||||||
|
'channel_id': channel_id,
|
||||||
|
'channel_url': 'https://live.line.me/channels/' + channel_id if channel_id else None,
|
||||||
|
'duration': int_or_none(item.get('archiveDuration')),
|
||||||
|
'view_count': int_or_none(item.get('viewerCount')),
|
||||||
|
'comment_count': int_or_none(item.get('chatCount')),
|
||||||
|
'is_live': is_live,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
class LineLiveIE(LineLiveBaseIE):
|
||||||
|
_VALID_URL = r'https?://live\.line\.me/channels/(?P<channel_id>\d+)/broadcast/(?P<id>\d+)'
|
||||||
|
_TESTS = [{
|
||||||
|
'url': 'https://live.line.me/channels/4867368/broadcast/16331360',
|
||||||
|
'md5': 'bc931f26bf1d4f971e3b0982b3fab4a3',
|
||||||
|
'info_dict': {
|
||||||
|
'id': '16331360',
|
||||||
|
'title': '振りコピ講座😙😙😙',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'timestamp': 1617095132,
|
||||||
|
'upload_date': '20210330',
|
||||||
|
'channel': '白川ゆめか',
|
||||||
|
'channel_id': '4867368',
|
||||||
|
'view_count': int,
|
||||||
|
'comment_count': int,
|
||||||
|
'is_live': False,
|
||||||
|
}
|
||||||
|
}, {
|
||||||
|
# archiveStatus == 'DELETED'
|
||||||
|
'url': 'https://live.line.me/channels/4778159/broadcast/16378488',
|
||||||
|
'only_matching': True,
|
||||||
|
}]
|
||||||
|
|
||||||
|
def _real_extract(self, url):
|
||||||
|
channel_id, broadcast_id = re.match(self._VALID_URL, url).groups()
|
||||||
|
broadcast = self._download_json(
|
||||||
|
self._API_BASE_URL + '%s/broadcast/%s' % (channel_id, broadcast_id),
|
||||||
|
broadcast_id)
|
||||||
|
item = broadcast['item']
|
||||||
|
info = self._parse_broadcast_item(item)
|
||||||
|
protocol = 'm3u8' if info['is_live'] else 'm3u8_native'
|
||||||
|
formats = []
|
||||||
|
for k, v in (broadcast.get(('live' if info['is_live'] else 'archived') + 'HLSURLs') or {}).items():
|
||||||
|
if not v:
|
||||||
|
continue
|
||||||
|
if k == 'abr':
|
||||||
|
formats.extend(self._extract_m3u8_formats(
|
||||||
|
v, broadcast_id, 'mp4', protocol,
|
||||||
|
m3u8_id='hls', fatal=False))
|
||||||
|
continue
|
||||||
|
f = {
|
||||||
|
'ext': 'mp4',
|
||||||
|
'format_id': 'hls-' + k,
|
||||||
|
'protocol': protocol,
|
||||||
|
'url': v,
|
||||||
|
}
|
||||||
|
if not k.isdigit():
|
||||||
|
f['vcodec'] = 'none'
|
||||||
|
formats.append(f)
|
||||||
|
if not formats:
|
||||||
|
archive_status = item.get('archiveStatus')
|
||||||
|
if archive_status != 'ARCHIVED':
|
||||||
|
raise ExtractorError('this video has been ' + archive_status.lower(), expected=True)
|
||||||
|
self._sort_formats(formats)
|
||||||
|
info['formats'] = formats
|
||||||
|
return info
|
||||||
|
|
||||||
|
|
||||||
|
class LineLiveChannelIE(LineLiveBaseIE):
|
||||||
|
_VALID_URL = r'https?://live\.line\.me/channels/(?P<id>\d+)(?!/broadcast/\d+)(?:[/?&#]|$)'
|
||||||
|
_TEST = {
|
||||||
|
'url': 'https://live.line.me/channels/5893542',
|
||||||
|
'info_dict': {
|
||||||
|
'id': '5893542',
|
||||||
|
'title': 'いくらちゃん',
|
||||||
|
'description': 'md5:c3a4af801f43b2fac0b02294976580be',
|
||||||
|
},
|
||||||
|
'playlist_mincount': 29
|
||||||
|
}
|
||||||
|
|
||||||
|
def _archived_broadcasts_entries(self, archived_broadcasts, channel_id):
|
||||||
|
while True:
|
||||||
|
for row in (archived_broadcasts.get('rows') or []):
|
||||||
|
share_url = str_or_none(row.get('shareURL'))
|
||||||
|
if not share_url:
|
||||||
|
continue
|
||||||
|
info = self._parse_broadcast_item(row)
|
||||||
|
info.update({
|
||||||
|
'_type': 'url',
|
||||||
|
'url': share_url,
|
||||||
|
'ie_key': LineLiveIE.ie_key(),
|
||||||
|
})
|
||||||
|
yield info
|
||||||
|
if not archived_broadcasts.get('hasNextPage'):
|
||||||
|
return
|
||||||
|
archived_broadcasts = self._download_json(
|
||||||
|
self._API_BASE_URL + channel_id + '/archived_broadcasts',
|
||||||
|
channel_id, query={
|
||||||
|
'lastId': info['id'],
|
||||||
|
})
|
||||||
|
|
||||||
|
def _real_extract(self, url):
|
||||||
|
channel_id = self._match_id(url)
|
||||||
|
channel = self._download_json(self._API_BASE_URL + channel_id, channel_id)
|
||||||
|
return self.playlist_result(
|
||||||
|
self._archived_broadcasts_entries(channel.get('archivedBroadcasts') or {}, channel_id),
|
||||||
|
channel_id, channel.get('title'), channel.get('information'))
|
||||||
|
31
youtube_dl/extractor/maoritv.py
Normal file
31
youtube_dl/extractor/maoritv.py
Normal file
@ -0,0 +1,31 @@
|
|||||||
|
# coding: utf-8
|
||||||
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
|
from .common import InfoExtractor
|
||||||
|
|
||||||
|
|
||||||
|
class MaoriTVIE(InfoExtractor):
|
||||||
|
_VALID_URL = r'https?://(?:www\.)?maoritelevision\.com/shows/(?:[^/]+/)+(?P<id>[^/?&#]+)'
|
||||||
|
_TEST = {
|
||||||
|
'url': 'https://www.maoritelevision.com/shows/korero-mai/S01E054/korero-mai-series-1-episode-54',
|
||||||
|
'md5': '5ade8ef53851b6a132c051b1cd858899',
|
||||||
|
'info_dict': {
|
||||||
|
'id': '4774724855001',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'title': 'Kōrero Mai, Series 1 Episode 54',
|
||||||
|
'upload_date': '20160226',
|
||||||
|
'timestamp': 1456455018,
|
||||||
|
'description': 'md5:59bde32fd066d637a1a55794c56d8dcb',
|
||||||
|
'uploader_id': '1614493167001',
|
||||||
|
},
|
||||||
|
}
|
||||||
|
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/1614493167001/HJlhIQhQf_default/index.html?videoId=%s'
|
||||||
|
|
||||||
|
def _real_extract(self, url):
|
||||||
|
display_id = self._match_id(url)
|
||||||
|
webpage = self._download_webpage(url, display_id)
|
||||||
|
brightcove_id = self._search_regex(
|
||||||
|
r'data-main-video-id=["\'](\d+)', webpage, 'brightcove id')
|
||||||
|
return self.url_result(
|
||||||
|
self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id,
|
||||||
|
'BrightcoveNew', brightcove_id)
|
@ -15,33 +15,39 @@ from ..utils import (
|
|||||||
|
|
||||||
|
|
||||||
class MedalTVIE(InfoExtractor):
|
class MedalTVIE(InfoExtractor):
|
||||||
_VALID_URL = r'https?://(?:www\.)?medal\.tv/clips/(?P<id>[0-9]+)'
|
_VALID_URL = r'https?://(?:www\.)?medal\.tv/clips/(?P<id>[^/?#&]+)'
|
||||||
_TESTS = [{
|
_TESTS = [{
|
||||||
'url': 'https://medal.tv/clips/34934644/3Is9zyGMoBMr',
|
'url': 'https://medal.tv/clips/2mA60jWAGQCBH',
|
||||||
'md5': '7b07b064331b1cf9e8e5c52a06ae68fa',
|
'md5': '7b07b064331b1cf9e8e5c52a06ae68fa',
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
'id': '34934644',
|
'id': '2mA60jWAGQCBH',
|
||||||
'ext': 'mp4',
|
'ext': 'mp4',
|
||||||
'title': 'Quad Cold',
|
'title': 'Quad Cold',
|
||||||
'description': 'Medal,https://medal.tv/desktop/',
|
'description': 'Medal,https://medal.tv/desktop/',
|
||||||
'uploader': 'MowgliSB',
|
'uploader': 'MowgliSB',
|
||||||
'timestamp': 1603165266,
|
'timestamp': 1603165266,
|
||||||
'upload_date': '20201020',
|
'upload_date': '20201020',
|
||||||
'uploader_id': 10619174,
|
'uploader_id': '10619174',
|
||||||
}
|
}
|
||||||
}, {
|
}, {
|
||||||
'url': 'https://medal.tv/clips/36787208',
|
'url': 'https://medal.tv/clips/2um24TWdty0NA',
|
||||||
'md5': 'b6dc76b78195fff0b4f8bf4a33ec2148',
|
'md5': 'b6dc76b78195fff0b4f8bf4a33ec2148',
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
'id': '36787208',
|
'id': '2um24TWdty0NA',
|
||||||
'ext': 'mp4',
|
'ext': 'mp4',
|
||||||
'title': 'u tk me i tk u bigger',
|
'title': 'u tk me i tk u bigger',
|
||||||
'description': 'Medal,https://medal.tv/desktop/',
|
'description': 'Medal,https://medal.tv/desktop/',
|
||||||
'uploader': 'Mimicc',
|
'uploader': 'Mimicc',
|
||||||
'timestamp': 1605580939,
|
'timestamp': 1605580939,
|
||||||
'upload_date': '20201117',
|
'upload_date': '20201117',
|
||||||
'uploader_id': 5156321,
|
'uploader_id': '5156321',
|
||||||
}
|
}
|
||||||
|
}, {
|
||||||
|
'url': 'https://medal.tv/clips/37rMeFpryCC-9',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://medal.tv/clips/2WRj40tpY_EU9',
|
||||||
|
'only_matching': True,
|
||||||
}]
|
}]
|
||||||
|
|
||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
|
@ -1,15 +1,91 @@
|
|||||||
from __future__ import unicode_literals
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
from .nhl import NHLBaseIE
|
import re
|
||||||
|
|
||||||
|
from .common import InfoExtractor
|
||||||
|
from ..utils import (
|
||||||
|
determine_ext,
|
||||||
|
int_or_none,
|
||||||
|
parse_duration,
|
||||||
|
parse_iso8601,
|
||||||
|
try_get,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
class MLBIE(NHLBaseIE):
|
class MLBBaseIE(InfoExtractor):
|
||||||
|
def _real_extract(self, url):
|
||||||
|
display_id = self._match_id(url)
|
||||||
|
video = self._download_video_data(display_id)
|
||||||
|
video_id = video['id']
|
||||||
|
title = video['title']
|
||||||
|
feed = self._get_feed(video)
|
||||||
|
|
||||||
|
formats = []
|
||||||
|
for playback in (feed.get('playbacks') or []):
|
||||||
|
playback_url = playback.get('url')
|
||||||
|
if not playback_url:
|
||||||
|
continue
|
||||||
|
name = playback.get('name')
|
||||||
|
ext = determine_ext(playback_url)
|
||||||
|
if ext == 'm3u8':
|
||||||
|
formats.extend(self._extract_m3u8_formats(
|
||||||
|
playback_url, video_id, 'mp4',
|
||||||
|
'm3u8_native', m3u8_id=name, fatal=False))
|
||||||
|
else:
|
||||||
|
f = {
|
||||||
|
'format_id': name,
|
||||||
|
'url': playback_url,
|
||||||
|
}
|
||||||
|
mobj = re.search(r'_(\d+)K_(\d+)X(\d+)', name)
|
||||||
|
if mobj:
|
||||||
|
f.update({
|
||||||
|
'height': int(mobj.group(3)),
|
||||||
|
'tbr': int(mobj.group(1)),
|
||||||
|
'width': int(mobj.group(2)),
|
||||||
|
})
|
||||||
|
mobj = re.search(r'_(\d+)x(\d+)_(\d+)_(\d+)K\.mp4', playback_url)
|
||||||
|
if mobj:
|
||||||
|
f.update({
|
||||||
|
'fps': int(mobj.group(3)),
|
||||||
|
'height': int(mobj.group(2)),
|
||||||
|
'tbr': int(mobj.group(4)),
|
||||||
|
'width': int(mobj.group(1)),
|
||||||
|
})
|
||||||
|
formats.append(f)
|
||||||
|
self._sort_formats(formats)
|
||||||
|
|
||||||
|
thumbnails = []
|
||||||
|
for cut in (try_get(feed, lambda x: x['image']['cuts'], list) or []):
|
||||||
|
src = cut.get('src')
|
||||||
|
if not src:
|
||||||
|
continue
|
||||||
|
thumbnails.append({
|
||||||
|
'height': int_or_none(cut.get('height')),
|
||||||
|
'url': src,
|
||||||
|
'width': int_or_none(cut.get('width')),
|
||||||
|
})
|
||||||
|
|
||||||
|
language = (video.get('language') or 'EN').lower()
|
||||||
|
|
||||||
|
return {
|
||||||
|
'id': video_id,
|
||||||
|
'title': title,
|
||||||
|
'formats': formats,
|
||||||
|
'description': video.get('description'),
|
||||||
|
'duration': parse_duration(feed.get('duration')),
|
||||||
|
'thumbnails': thumbnails,
|
||||||
|
'timestamp': parse_iso8601(video.get(self._TIMESTAMP_KEY)),
|
||||||
|
'subtitles': self._extract_mlb_subtitles(feed, language),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
class MLBIE(MLBBaseIE):
|
||||||
_VALID_URL = r'''(?x)
|
_VALID_URL = r'''(?x)
|
||||||
https?://
|
https?://
|
||||||
(?:[\da-z_-]+\.)*(?P<site>mlb)\.com/
|
(?:[\da-z_-]+\.)*mlb\.com/
|
||||||
(?:
|
(?:
|
||||||
(?:
|
(?:
|
||||||
(?:[^/]+/)*c-|
|
(?:[^/]+/)*video/[^/]+/c-|
|
||||||
(?:
|
(?:
|
||||||
shared/video/embed/(?:embed|m-internal-embed)\.html|
|
shared/video/embed/(?:embed|m-internal-embed)\.html|
|
||||||
(?:[^/]+/)+(?:play|index)\.jsp|
|
(?:[^/]+/)+(?:play|index)\.jsp|
|
||||||
@ -18,7 +94,6 @@ class MLBIE(NHLBaseIE):
|
|||||||
(?P<id>\d+)
|
(?P<id>\d+)
|
||||||
)
|
)
|
||||||
'''
|
'''
|
||||||
_CONTENT_DOMAIN = 'content.mlb.com'
|
|
||||||
_TESTS = [
|
_TESTS = [
|
||||||
{
|
{
|
||||||
'url': 'https://www.mlb.com/mariners/video/ackleys-spectacular-catch/c-34698933',
|
'url': 'https://www.mlb.com/mariners/video/ackleys-spectacular-catch/c-34698933',
|
||||||
@ -76,18 +151,6 @@ class MLBIE(NHLBaseIE):
|
|||||||
'thumbnail': r're:^https?://.*\.jpg$',
|
'thumbnail': r're:^https?://.*\.jpg$',
|
||||||
},
|
},
|
||||||
},
|
},
|
||||||
{
|
|
||||||
'url': 'https://www.mlb.com/news/blue-jays-kevin-pillar-goes-spidey-up-the-wall-to-rob-tim-beckham-of-a-homer/c-118550098',
|
|
||||||
'md5': 'e09e37b552351fddbf4d9e699c924d68',
|
|
||||||
'info_dict': {
|
|
||||||
'id': '75609783',
|
|
||||||
'ext': 'mp4',
|
|
||||||
'title': 'Must C: Pillar climbs for catch',
|
|
||||||
'description': '4/15/15: Blue Jays outfielder Kevin Pillar continues his defensive dominance by climbing the wall in left to rob Tim Beckham of a home run',
|
|
||||||
'timestamp': 1429139220,
|
|
||||||
'upload_date': '20150415',
|
|
||||||
}
|
|
||||||
},
|
|
||||||
{
|
{
|
||||||
'url': 'https://www.mlb.com/video/hargrove-homers-off-caldwell/c-1352023483?tid=67793694',
|
'url': 'https://www.mlb.com/video/hargrove-homers-off-caldwell/c-1352023483?tid=67793694',
|
||||||
'only_matching': True,
|
'only_matching': True,
|
||||||
@ -113,8 +176,92 @@ class MLBIE(NHLBaseIE):
|
|||||||
'url': 'http://mlb.mlb.com/shared/video/embed/m-internal-embed.html?content_id=75609783&property=mlb&autoplay=true&hashmode=false&siteSection=mlb/multimedia/article_118550098/article_embed&club=mlb',
|
'url': 'http://mlb.mlb.com/shared/video/embed/m-internal-embed.html?content_id=75609783&property=mlb&autoplay=true&hashmode=false&siteSection=mlb/multimedia/article_118550098/article_embed&club=mlb',
|
||||||
'only_matching': True,
|
'only_matching': True,
|
||||||
},
|
},
|
||||||
{
|
|
||||||
'url': 'https://www.mlb.com/cut4/carlos-gomez-borrowed-sunglasses-from-an-as-fan/c-278912842',
|
|
||||||
'only_matching': True,
|
|
||||||
}
|
|
||||||
]
|
]
|
||||||
|
_TIMESTAMP_KEY = 'date'
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _get_feed(video):
|
||||||
|
return video
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _extract_mlb_subtitles(feed, language):
|
||||||
|
subtitles = {}
|
||||||
|
for keyword in (feed.get('keywordsAll') or []):
|
||||||
|
keyword_type = keyword.get('type')
|
||||||
|
if keyword_type and keyword_type.startswith('closed_captions_location_'):
|
||||||
|
cc_location = keyword.get('value')
|
||||||
|
if cc_location:
|
||||||
|
subtitles.setdefault(language, []).append({
|
||||||
|
'url': cc_location,
|
||||||
|
})
|
||||||
|
return subtitles
|
||||||
|
|
||||||
|
def _download_video_data(self, display_id):
|
||||||
|
return self._download_json(
|
||||||
|
'http://content.mlb.com/mlb/item/id/v1/%s/details/web-v1.json' % display_id,
|
||||||
|
display_id)
|
||||||
|
|
||||||
|
|
||||||
|
class MLBVideoIE(MLBBaseIE):
|
||||||
|
_VALID_URL = r'https?://(?:www\.)?mlb\.com/(?:[^/]+/)*video/(?P<id>[^/?&#]+)'
|
||||||
|
_TEST = {
|
||||||
|
'url': 'https://www.mlb.com/mariners/video/ackley-s-spectacular-catch-c34698933',
|
||||||
|
'md5': '632358dacfceec06bad823b83d21df2d',
|
||||||
|
'info_dict': {
|
||||||
|
'id': 'c04a8863-f569-42e6-9f87-992393657614',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'title': "Ackley's spectacular catch",
|
||||||
|
'description': 'md5:7f5a981eb4f3cbc8daf2aeffa2215bf0',
|
||||||
|
'duration': 66,
|
||||||
|
'timestamp': 1405995000,
|
||||||
|
'upload_date': '20140722',
|
||||||
|
'thumbnail': r're:^https?://.+',
|
||||||
|
},
|
||||||
|
}
|
||||||
|
_TIMESTAMP_KEY = 'timestamp'
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def suitable(cls, url):
|
||||||
|
return False if MLBIE.suitable(url) else super(MLBVideoIE, cls).suitable(url)
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _get_feed(video):
|
||||||
|
return video['feeds'][0]
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _extract_mlb_subtitles(feed, language):
|
||||||
|
subtitles = {}
|
||||||
|
for cc_location in (feed.get('closedCaptions') or []):
|
||||||
|
subtitles.setdefault(language, []).append({
|
||||||
|
'url': cc_location,
|
||||||
|
})
|
||||||
|
|
||||||
|
def _download_video_data(self, display_id):
|
||||||
|
# https://www.mlb.com/data-service/en/videos/[SLUG]
|
||||||
|
return self._download_json(
|
||||||
|
'https://fastball-gateway.mlb.com/graphql',
|
||||||
|
display_id, query={
|
||||||
|
'query': '''{
|
||||||
|
mediaPlayback(ids: "%s") {
|
||||||
|
description
|
||||||
|
feeds(types: CMS) {
|
||||||
|
closedCaptions
|
||||||
|
duration
|
||||||
|
image {
|
||||||
|
cuts {
|
||||||
|
width
|
||||||
|
height
|
||||||
|
src
|
||||||
|
}
|
||||||
|
}
|
||||||
|
playbacks {
|
||||||
|
name
|
||||||
|
url
|
||||||
|
}
|
||||||
|
}
|
||||||
|
id
|
||||||
|
timestamp
|
||||||
|
title
|
||||||
|
}
|
||||||
|
}''' % display_id,
|
||||||
|
})['data']['mediaPlayback'][0]
|
||||||
|
@ -255,7 +255,9 @@ class MTVServicesInfoExtractor(InfoExtractor):
|
|||||||
|
|
||||||
@staticmethod
|
@staticmethod
|
||||||
def _extract_child_with_type(parent, t):
|
def _extract_child_with_type(parent, t):
|
||||||
return next(c for c in parent['children'] if c.get('type') == t)
|
for c in parent['children']:
|
||||||
|
if c.get('type') == t:
|
||||||
|
return c
|
||||||
|
|
||||||
def _extract_mgid(self, webpage):
|
def _extract_mgid(self, webpage):
|
||||||
try:
|
try:
|
||||||
@ -286,7 +288,8 @@ class MTVServicesInfoExtractor(InfoExtractor):
|
|||||||
data = self._parse_json(self._search_regex(
|
data = self._parse_json(self._search_regex(
|
||||||
r'__DATA__\s*=\s*({.+?});', webpage, 'data'), None)
|
r'__DATA__\s*=\s*({.+?});', webpage, 'data'), None)
|
||||||
main_container = self._extract_child_with_type(data, 'MainContainer')
|
main_container = self._extract_child_with_type(data, 'MainContainer')
|
||||||
video_player = self._extract_child_with_type(main_container, 'VideoPlayer')
|
ab_testing = self._extract_child_with_type(main_container, 'ABTesting')
|
||||||
|
video_player = self._extract_child_with_type(ab_testing or main_container, 'VideoPlayer')
|
||||||
mgid = video_player['props']['media']['video']['config']['uri']
|
mgid = video_player['props']['media']['video']['config']['uri']
|
||||||
|
|
||||||
return mgid
|
return mgid
|
||||||
@ -320,7 +323,7 @@ class MTVServicesEmbeddedIE(MTVServicesInfoExtractor):
|
|||||||
@staticmethod
|
@staticmethod
|
||||||
def _extract_url(webpage):
|
def _extract_url(webpage):
|
||||||
mobj = re.search(
|
mobj = re.search(
|
||||||
r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//media.mtvnservices.com/embed/.+?)\1', webpage)
|
r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//media\.mtvnservices\.com/embed/.+?)\1', webpage)
|
||||||
if mobj:
|
if mobj:
|
||||||
return mobj.group('url')
|
return mobj.group('url')
|
||||||
|
|
||||||
|
@ -23,11 +23,9 @@ class NineCNineMediaIE(InfoExtractor):
|
|||||||
destination_code, content_id = re.match(self._VALID_URL, url).groups()
|
destination_code, content_id = re.match(self._VALID_URL, url).groups()
|
||||||
api_base_url = self._API_BASE_TEMPLATE % (destination_code, content_id)
|
api_base_url = self._API_BASE_TEMPLATE % (destination_code, content_id)
|
||||||
content = self._download_json(api_base_url, content_id, query={
|
content = self._download_json(api_base_url, content_id, query={
|
||||||
'$include': '[Media,Season,ContentPackages]',
|
'$include': '[Media.Name,Season,ContentPackages.Duration,ContentPackages.Id]',
|
||||||
})
|
})
|
||||||
title = content['Name']
|
title = content['Name']
|
||||||
if len(content['ContentPackages']) > 1:
|
|
||||||
raise ExtractorError('multiple content packages')
|
|
||||||
content_package = content['ContentPackages'][0]
|
content_package = content['ContentPackages'][0]
|
||||||
package_id = content_package['Id']
|
package_id = content_package['Id']
|
||||||
content_package_url = api_base_url + 'contentpackages/%s/' % package_id
|
content_package_url = api_base_url + 'contentpackages/%s/' % package_id
|
||||||
|
148
youtube_dl/extractor/palcomp3.py
Normal file
148
youtube_dl/extractor/palcomp3.py
Normal file
@ -0,0 +1,148 @@
|
|||||||
|
# coding: utf-8
|
||||||
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
|
import re
|
||||||
|
|
||||||
|
from .common import InfoExtractor
|
||||||
|
from ..compat import compat_str
|
||||||
|
from ..utils import (
|
||||||
|
int_or_none,
|
||||||
|
str_or_none,
|
||||||
|
try_get,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
class PalcoMP3BaseIE(InfoExtractor):
|
||||||
|
_GQL_QUERY_TMPL = '''{
|
||||||
|
artist(slug: "%s") {
|
||||||
|
%s
|
||||||
|
}
|
||||||
|
}'''
|
||||||
|
_ARTIST_FIELDS_TMPL = '''music(slug: "%%s") {
|
||||||
|
%s
|
||||||
|
}'''
|
||||||
|
_MUSIC_FIELDS = '''duration
|
||||||
|
hls
|
||||||
|
mp3File
|
||||||
|
musicID
|
||||||
|
plays
|
||||||
|
title'''
|
||||||
|
|
||||||
|
def _call_api(self, artist_slug, artist_fields):
|
||||||
|
return self._download_json(
|
||||||
|
'https://www.palcomp3.com.br/graphql/', artist_slug, query={
|
||||||
|
'query': self._GQL_QUERY_TMPL % (artist_slug, artist_fields),
|
||||||
|
})['data']
|
||||||
|
|
||||||
|
def _parse_music(self, music):
|
||||||
|
music_id = compat_str(music['musicID'])
|
||||||
|
title = music['title']
|
||||||
|
|
||||||
|
formats = []
|
||||||
|
hls_url = music.get('hls')
|
||||||
|
if hls_url:
|
||||||
|
formats.append({
|
||||||
|
'url': hls_url,
|
||||||
|
'protocol': 'm3u8_native',
|
||||||
|
'ext': 'mp4',
|
||||||
|
})
|
||||||
|
mp3_file = music.get('mp3File')
|
||||||
|
if mp3_file:
|
||||||
|
formats.append({
|
||||||
|
'url': mp3_file,
|
||||||
|
})
|
||||||
|
|
||||||
|
return {
|
||||||
|
'id': music_id,
|
||||||
|
'title': title,
|
||||||
|
'formats': formats,
|
||||||
|
'duration': int_or_none(music.get('duration')),
|
||||||
|
'view_count': int_or_none(music.get('plays')),
|
||||||
|
}
|
||||||
|
|
||||||
|
def _real_initialize(self):
|
||||||
|
self._ARTIST_FIELDS_TMPL = self._ARTIST_FIELDS_TMPL % self._MUSIC_FIELDS
|
||||||
|
|
||||||
|
def _real_extract(self, url):
|
||||||
|
artist_slug, music_slug = re.match(self._VALID_URL, url).groups()
|
||||||
|
artist_fields = self._ARTIST_FIELDS_TMPL % music_slug
|
||||||
|
music = self._call_api(artist_slug, artist_fields)['artist']['music']
|
||||||
|
return self._parse_music(music)
|
||||||
|
|
||||||
|
|
||||||
|
class PalcoMP3IE(PalcoMP3BaseIE):
|
||||||
|
IE_NAME = 'PalcoMP3:song'
|
||||||
|
_VALID_URL = r'https?://(?:www\.)?palcomp3\.com(?:\.br)?/(?P<artist>[^/]+)/(?P<id>[^/?&#]+)'
|
||||||
|
_TESTS = [{
|
||||||
|
'url': 'https://www.palcomp3.com/maiaraemaraisaoficial/nossas-composicoes-cuida-bem-dela/',
|
||||||
|
'md5': '99fd6405b2d8fd589670f6db1ba3b358',
|
||||||
|
'info_dict': {
|
||||||
|
'id': '3162927',
|
||||||
|
'ext': 'mp3',
|
||||||
|
'title': 'Nossas Composições - CUIDA BEM DELA',
|
||||||
|
'duration': 210,
|
||||||
|
'view_count': int,
|
||||||
|
}
|
||||||
|
}]
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def suitable(cls, url):
|
||||||
|
return False if PalcoMP3VideoIE.suitable(url) else super(PalcoMP3IE, cls).suitable(url)
|
||||||
|
|
||||||
|
|
||||||
|
class PalcoMP3ArtistIE(PalcoMP3BaseIE):
|
||||||
|
IE_NAME = 'PalcoMP3:artist'
|
||||||
|
_VALID_URL = r'https?://(?:www\.)?palcomp3\.com(?:\.br)?/(?P<id>[^/?&#]+)'
|
||||||
|
_TESTS = [{
|
||||||
|
'url': 'https://www.palcomp3.com.br/condedoforro/',
|
||||||
|
'info_dict': {
|
||||||
|
'id': '358396',
|
||||||
|
'title': 'Conde do Forró',
|
||||||
|
},
|
||||||
|
'playlist_mincount': 188,
|
||||||
|
}]
|
||||||
|
_ARTIST_FIELDS_TMPL = '''artistID
|
||||||
|
musics {
|
||||||
|
nodes {
|
||||||
|
%s
|
||||||
|
}
|
||||||
|
}
|
||||||
|
name'''
|
||||||
|
|
||||||
|
@ classmethod
|
||||||
|
def suitable(cls, url):
|
||||||
|
return False if re.match(PalcoMP3IE._VALID_URL, url) else super(PalcoMP3ArtistIE, cls).suitable(url)
|
||||||
|
|
||||||
|
def _real_extract(self, url):
|
||||||
|
artist_slug = self._match_id(url)
|
||||||
|
artist = self._call_api(artist_slug, self._ARTIST_FIELDS_TMPL)['artist']
|
||||||
|
|
||||||
|
def entries():
|
||||||
|
for music in (try_get(artist, lambda x: x['musics']['nodes'], list) or []):
|
||||||
|
yield self._parse_music(music)
|
||||||
|
|
||||||
|
return self.playlist_result(
|
||||||
|
entries(), str_or_none(artist.get('artistID')), artist.get('name'))
|
||||||
|
|
||||||
|
|
||||||
|
class PalcoMP3VideoIE(PalcoMP3BaseIE):
|
||||||
|
IE_NAME = 'PalcoMP3:video'
|
||||||
|
_VALID_URL = r'https?://(?:www\.)?palcomp3\.com(?:\.br)?/(?P<artist>[^/]+)/(?P<id>[^/?&#]+)/?#clipe'
|
||||||
|
_TESTS = [{
|
||||||
|
'url': 'https://www.palcomp3.com/maiaraemaraisaoficial/maiara-e-maraisa-voce-faz-falta-aqui-ao-vivo-em-vicosa-mg/#clipe',
|
||||||
|
'add_ie': ['Youtube'],
|
||||||
|
'info_dict': {
|
||||||
|
'id': '_pD1nR2qqPg',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'title': 'Maiara e Maraisa - Você Faz Falta Aqui - DVD Ao Vivo Em Campo Grande',
|
||||||
|
'description': 'md5:7043342c09a224598e93546e98e49282',
|
||||||
|
'upload_date': '20161107',
|
||||||
|
'uploader_id': 'maiaramaraisaoficial',
|
||||||
|
'uploader': 'Maiara e Maraisa',
|
||||||
|
}
|
||||||
|
}]
|
||||||
|
_MUSIC_FIELDS = 'youtubeID'
|
||||||
|
|
||||||
|
def _parse_music(self, music):
|
||||||
|
youtube_id = music['youtubeID']
|
||||||
|
return self.url_result(youtube_id, 'Youtube', youtube_id)
|
@ -413,7 +413,8 @@ class PeerTubeIE(InfoExtractor):
|
|||||||
peertube3\.cpy\.re|
|
peertube3\.cpy\.re|
|
||||||
peertube2\.cpy\.re|
|
peertube2\.cpy\.re|
|
||||||
videos\.tcit\.fr|
|
videos\.tcit\.fr|
|
||||||
peertube\.cpy\.re
|
peertube\.cpy\.re|
|
||||||
|
canard\.tube
|
||||||
)'''
|
)'''
|
||||||
_UUID_RE = r'[\da-fA-F]{8}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{12}'
|
_UUID_RE = r'[\da-fA-F]{8}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{12}'
|
||||||
_API_BASE = 'https://%s/api/v1/videos/%s/%s'
|
_API_BASE = 'https://%s/api/v1/videos/%s/%s'
|
||||||
@ -598,11 +599,13 @@ class PeerTubeIE(InfoExtractor):
|
|||||||
else:
|
else:
|
||||||
age_limit = None
|
age_limit = None
|
||||||
|
|
||||||
|
webpage_url = 'https://%s/videos/watch/%s' % (host, video_id)
|
||||||
|
|
||||||
return {
|
return {
|
||||||
'id': video_id,
|
'id': video_id,
|
||||||
'title': title,
|
'title': title,
|
||||||
'description': description,
|
'description': description,
|
||||||
'thumbnail': urljoin(url, video.get('thumbnailPath')),
|
'thumbnail': urljoin(webpage_url, video.get('thumbnailPath')),
|
||||||
'timestamp': unified_timestamp(video.get('publishedAt')),
|
'timestamp': unified_timestamp(video.get('publishedAt')),
|
||||||
'uploader': account_data('displayName', compat_str),
|
'uploader': account_data('displayName', compat_str),
|
||||||
'uploader_id': str_or_none(account_data('id', int)),
|
'uploader_id': str_or_none(account_data('id', int)),
|
||||||
@ -620,5 +623,6 @@ class PeerTubeIE(InfoExtractor):
|
|||||||
'tags': try_get(video, lambda x: x['tags'], list),
|
'tags': try_get(video, lambda x: x['tags'], list),
|
||||||
'categories': categories,
|
'categories': categories,
|
||||||
'formats': formats,
|
'formats': formats,
|
||||||
'subtitles': subtitles
|
'subtitles': subtitles,
|
||||||
|
'webpage_url': webpage_url,
|
||||||
}
|
}
|
||||||
|
@ -1,45 +1,128 @@
|
|||||||
|
# coding: utf-8
|
||||||
from __future__ import unicode_literals
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
from .dreisat import DreiSatIE
|
import re
|
||||||
|
|
||||||
|
from .youtube import YoutubeIE
|
||||||
|
from .zdf import ZDFBaseIE
|
||||||
|
from ..compat import compat_str
|
||||||
|
from ..utils import (
|
||||||
|
int_or_none,
|
||||||
|
merge_dicts,
|
||||||
|
unified_timestamp,
|
||||||
|
xpath_text,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
class PhoenixIE(DreiSatIE):
|
class PhoenixIE(ZDFBaseIE):
|
||||||
IE_NAME = 'phoenix.de'
|
IE_NAME = 'phoenix.de'
|
||||||
_VALID_URL = r'''(?x)https?://(?:www\.)?phoenix\.de/content/
|
_VALID_URL = r'https?://(?:www\.)?phoenix\.de/(?:[^/]+/)*[^/?#&]*-a-(?P<id>\d+)\.html'
|
||||||
(?:
|
_TESTS = [{
|
||||||
phoenix/die_sendungen/(?:[^/]+/)?
|
# Same as https://www.zdf.de/politik/phoenix-sendungen/wohin-fuehrt-der-protest-in-der-pandemie-100.html
|
||||||
)?
|
'url': 'https://www.phoenix.de/sendungen/ereignisse/corona-nachgehakt/wohin-fuehrt-der-protest-in-der-pandemie-a-2050630.html',
|
||||||
(?P<id>[0-9]+)'''
|
'md5': '34ec321e7eb34231fd88616c65c92db0',
|
||||||
_TESTS = [
|
|
||||||
{
|
|
||||||
'url': 'http://www.phoenix.de/content/884301',
|
|
||||||
'md5': 'ed249f045256150c92e72dbb70eadec6',
|
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
'id': '884301',
|
'id': '210222_phx_nachgehakt_corona_protest',
|
||||||
'ext': 'mp4',
|
'ext': 'mp4',
|
||||||
'title': 'Michael Krons mit Hans-Werner Sinn',
|
'title': 'Wohin führt der Protest in der Pandemie?',
|
||||||
'description': 'Im Dialog - Sa. 25.10.14, 00.00 - 00.35 Uhr',
|
'description': 'md5:7d643fe7f565e53a24aac036b2122fbd',
|
||||||
'upload_date': '20141025',
|
'duration': 1691,
|
||||||
'uploader': 'Im Dialog',
|
'timestamp': 1613906100,
|
||||||
}
|
'upload_date': '20210221',
|
||||||
|
'uploader': 'Phoenix',
|
||||||
|
'channel': 'corona nachgehakt',
|
||||||
},
|
},
|
||||||
{
|
}, {
|
||||||
'url': 'http://www.phoenix.de/content/phoenix/die_sendungen/869815',
|
# Youtube embed
|
||||||
|
'url': 'https://www.phoenix.de/sendungen/gespraeche/phoenix-streitgut-brennglas-corona-a-1965505.html',
|
||||||
|
'info_dict': {
|
||||||
|
'id': 'hMQtqFYjomk',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'title': 'phoenix streitgut: Brennglas Corona - Wie gerecht ist unsere Gesellschaft?',
|
||||||
|
'description': 'md5:ac7a02e2eb3cb17600bc372e4ab28fdd',
|
||||||
|
'duration': 3509,
|
||||||
|
'upload_date': '20201219',
|
||||||
|
'uploader': 'phoenix',
|
||||||
|
'uploader_id': 'phoenix',
|
||||||
|
},
|
||||||
|
'params': {
|
||||||
|
'skip_download': True,
|
||||||
|
},
|
||||||
|
}, {
|
||||||
|
'url': 'https://www.phoenix.de/entwicklungen-in-russland-a-2044720.html',
|
||||||
'only_matching': True,
|
'only_matching': True,
|
||||||
},
|
}, {
|
||||||
{
|
# no media
|
||||||
'url': 'http://www.phoenix.de/content/phoenix/die_sendungen/diskussionen/928234',
|
'url': 'https://www.phoenix.de/sendungen/dokumentationen/mit-dem-jumbo-durch-die-nacht-a-89625.html',
|
||||||
'only_matching': True,
|
'only_matching': True,
|
||||||
},
|
}, {
|
||||||
]
|
# Same as https://www.zdf.de/politik/phoenix-sendungen/die-gesten-der-maechtigen-100.html
|
||||||
|
'url': 'https://www.phoenix.de/sendungen/dokumentationen/gesten-der-maechtigen-i-a-89468.html?ref=suche',
|
||||||
|
'only_matching': True,
|
||||||
|
}]
|
||||||
|
|
||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
video_id = self._match_id(url)
|
article_id = self._match_id(url)
|
||||||
webpage = self._download_webpage(url, video_id)
|
|
||||||
|
|
||||||
internal_id = self._search_regex(
|
article = self._download_json(
|
||||||
r'<div class="phx_vod" id="phx_vod_([0-9]+)"',
|
'https://www.phoenix.de/response/id/%s' % article_id, article_id,
|
||||||
webpage, 'internal video ID')
|
'Downloading article JSON')
|
||||||
|
|
||||||
api_url = 'http://www.phoenix.de/php/mediaplayer/data/beitrags_details.php?ak=web&id=%s' % internal_id
|
video = article['absaetze'][0]
|
||||||
return self.extract_from_xml_url(video_id, api_url)
|
title = video.get('titel') or article.get('subtitel')
|
||||||
|
|
||||||
|
if video.get('typ') == 'video-youtube':
|
||||||
|
video_id = video['id']
|
||||||
|
return self.url_result(
|
||||||
|
video_id, ie=YoutubeIE.ie_key(), video_id=video_id,
|
||||||
|
video_title=title)
|
||||||
|
|
||||||
|
video_id = compat_str(video.get('basename') or video.get('content'))
|
||||||
|
|
||||||
|
details = self._download_xml(
|
||||||
|
'https://www.phoenix.de/php/mediaplayer/data/beitrags_details.php',
|
||||||
|
video_id, 'Downloading details XML', query={
|
||||||
|
'ak': 'web',
|
||||||
|
'ptmd': 'true',
|
||||||
|
'id': video_id,
|
||||||
|
'profile': 'player2',
|
||||||
|
})
|
||||||
|
|
||||||
|
title = title or xpath_text(
|
||||||
|
details, './/information/title', 'title', fatal=True)
|
||||||
|
content_id = xpath_text(
|
||||||
|
details, './/video/details/basename', 'content id', fatal=True)
|
||||||
|
|
||||||
|
info = self._extract_ptmd(
|
||||||
|
'https://tmd.phoenix.de/tmd/2/ngplayer_2_3/vod/ptmd/phoenix/%s' % content_id,
|
||||||
|
content_id, None, url)
|
||||||
|
|
||||||
|
timestamp = unified_timestamp(xpath_text(details, './/details/airtime'))
|
||||||
|
|
||||||
|
thumbnails = []
|
||||||
|
for node in details.findall('.//teaserimages/teaserimage'):
|
||||||
|
thumbnail_url = node.text
|
||||||
|
if not thumbnail_url:
|
||||||
|
continue
|
||||||
|
thumbnail = {
|
||||||
|
'url': thumbnail_url,
|
||||||
|
}
|
||||||
|
thumbnail_key = node.get('key')
|
||||||
|
if thumbnail_key:
|
||||||
|
m = re.match('^([0-9]+)x([0-9]+)$', thumbnail_key)
|
||||||
|
if m:
|
||||||
|
thumbnail['width'] = int(m.group(1))
|
||||||
|
thumbnail['height'] = int(m.group(2))
|
||||||
|
thumbnails.append(thumbnail)
|
||||||
|
|
||||||
|
return merge_dicts(info, {
|
||||||
|
'id': content_id,
|
||||||
|
'title': title,
|
||||||
|
'description': xpath_text(details, './/information/detail'),
|
||||||
|
'duration': int_or_none(xpath_text(details, './/details/lengthSec')),
|
||||||
|
'thumbnails': thumbnails,
|
||||||
|
'timestamp': timestamp,
|
||||||
|
'uploader': xpath_text(details, './/details/channel'),
|
||||||
|
'uploader_id': xpath_text(details, './/details/originChannelId'),
|
||||||
|
'channel': xpath_text(details, './/details/originChannelTitle'),
|
||||||
|
})
|
||||||
|
@ -1,22 +1,15 @@
|
|||||||
# coding: utf-8
|
# coding: utf-8
|
||||||
from __future__ import unicode_literals
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
import re
|
|
||||||
import time
|
|
||||||
|
|
||||||
from .common import InfoExtractor
|
from .common import InfoExtractor
|
||||||
from ..compat import compat_str
|
|
||||||
from ..utils import (
|
from ..utils import (
|
||||||
ExtractorError,
|
ExtractorError,
|
||||||
js_to_json,
|
js_to_json,
|
||||||
try_get,
|
|
||||||
update_url_query,
|
|
||||||
urlencode_postdata,
|
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
class PicartoIE(InfoExtractor):
|
class PicartoIE(InfoExtractor):
|
||||||
_VALID_URL = r'https?://(?:www.)?picarto\.tv/(?P<id>[a-zA-Z0-9]+)(?:/(?P<token>[a-zA-Z0-9]+))?'
|
_VALID_URL = r'https?://(?:www.)?picarto\.tv/(?P<id>[a-zA-Z0-9]+)'
|
||||||
_TEST = {
|
_TEST = {
|
||||||
'url': 'https://picarto.tv/Setz',
|
'url': 'https://picarto.tv/Setz',
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
@ -34,65 +27,46 @@ class PicartoIE(InfoExtractor):
|
|||||||
return False if PicartoVodIE.suitable(url) else super(PicartoIE, cls).suitable(url)
|
return False if PicartoVodIE.suitable(url) else super(PicartoIE, cls).suitable(url)
|
||||||
|
|
||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
mobj = re.match(self._VALID_URL, url)
|
channel_id = self._match_id(url)
|
||||||
channel_id = mobj.group('id')
|
|
||||||
|
|
||||||
metadata = self._download_json(
|
data = self._download_json(
|
||||||
'https://api.picarto.tv/v1/channel/name/' + channel_id,
|
'https://ptvintern.picarto.tv/ptvapi', channel_id, query={
|
||||||
channel_id)
|
'query': '''{
|
||||||
|
channel(name: "%s") {
|
||||||
|
adult
|
||||||
|
id
|
||||||
|
online
|
||||||
|
stream_name
|
||||||
|
title
|
||||||
|
}
|
||||||
|
getLoadBalancerUrl(channel_name: "%s") {
|
||||||
|
url
|
||||||
|
}
|
||||||
|
}''' % (channel_id, channel_id),
|
||||||
|
})['data']
|
||||||
|
metadata = data['channel']
|
||||||
|
|
||||||
if metadata.get('online') is False:
|
if metadata.get('online') == 0:
|
||||||
raise ExtractorError('Stream is offline', expected=True)
|
raise ExtractorError('Stream is offline', expected=True)
|
||||||
|
title = metadata['title']
|
||||||
|
|
||||||
cdn_data = self._download_json(
|
cdn_data = self._download_json(
|
||||||
'https://picarto.tv/process/channel', channel_id,
|
data['getLoadBalancerUrl']['url'] + '/stream/json_' + metadata['stream_name'] + '.js',
|
||||||
data=urlencode_postdata({'loadbalancinginfo': channel_id}),
|
channel_id, 'Downloading load balancing info')
|
||||||
note='Downloading load balancing info')
|
|
||||||
|
|
||||||
token = mobj.group('token') or 'public'
|
|
||||||
params = {
|
|
||||||
'con': int(time.time() * 1000),
|
|
||||||
'token': token,
|
|
||||||
}
|
|
||||||
|
|
||||||
prefered_edge = cdn_data.get('preferedEdge')
|
|
||||||
formats = []
|
formats = []
|
||||||
|
for source in (cdn_data.get('source') or []):
|
||||||
for edge in cdn_data['edges']:
|
source_url = source.get('url')
|
||||||
edge_ep = edge.get('ep')
|
if not source_url:
|
||||||
if not edge_ep or not isinstance(edge_ep, compat_str):
|
|
||||||
continue
|
continue
|
||||||
edge_id = edge.get('id')
|
source_type = source.get('type')
|
||||||
for tech in cdn_data['techs']:
|
if source_type == 'html5/application/vnd.apple.mpegurl':
|
||||||
tech_label = tech.get('label')
|
|
||||||
tech_type = tech.get('type')
|
|
||||||
preference = 0
|
|
||||||
if edge_id == prefered_edge:
|
|
||||||
preference += 1
|
|
||||||
format_id = []
|
|
||||||
if edge_id:
|
|
||||||
format_id.append(edge_id)
|
|
||||||
if tech_type == 'application/x-mpegurl' or tech_label == 'HLS':
|
|
||||||
format_id.append('hls')
|
|
||||||
formats.extend(self._extract_m3u8_formats(
|
formats.extend(self._extract_m3u8_formats(
|
||||||
update_url_query(
|
source_url, channel_id, 'mp4', m3u8_id='hls', fatal=False))
|
||||||
'https://%s/hls/%s/index.m3u8'
|
elif source_type == 'html5/video/mp4':
|
||||||
% (edge_ep, channel_id), params),
|
|
||||||
channel_id, 'mp4', preference=preference,
|
|
||||||
m3u8_id='-'.join(format_id), fatal=False))
|
|
||||||
continue
|
|
||||||
elif tech_type == 'video/mp4' or tech_label == 'MP4':
|
|
||||||
format_id.append('mp4')
|
|
||||||
formats.append({
|
formats.append({
|
||||||
'url': update_url_query(
|
'url': source_url,
|
||||||
'https://%s/mp4/%s.mp4' % (edge_ep, channel_id),
|
|
||||||
params),
|
|
||||||
'format_id': '-'.join(format_id),
|
|
||||||
'preference': preference,
|
|
||||||
})
|
})
|
||||||
else:
|
|
||||||
# rtmp format does not seem to work
|
|
||||||
continue
|
|
||||||
self._sort_formats(formats)
|
self._sort_formats(formats)
|
||||||
|
|
||||||
mature = metadata.get('adult')
|
mature = metadata.get('adult')
|
||||||
@ -103,10 +77,10 @@ class PicartoIE(InfoExtractor):
|
|||||||
|
|
||||||
return {
|
return {
|
||||||
'id': channel_id,
|
'id': channel_id,
|
||||||
'title': self._live_title(metadata.get('title') or channel_id),
|
'title': self._live_title(title.strip()),
|
||||||
'is_live': True,
|
'is_live': True,
|
||||||
'thumbnail': try_get(metadata, lambda x: x['thumbnails']['web']),
|
|
||||||
'channel': channel_id,
|
'channel': channel_id,
|
||||||
|
'channel_id': metadata.get('id'),
|
||||||
'channel_url': 'https://picarto.tv/%s' % channel_id,
|
'channel_url': 'https://picarto.tv/%s' % channel_id,
|
||||||
'age_limit': age_limit,
|
'age_limit': age_limit,
|
||||||
'formats': formats,
|
'formats': formats,
|
||||||
|
@ -31,6 +31,7 @@ class PinterestBaseIE(InfoExtractor):
|
|||||||
|
|
||||||
title = (data.get('title') or data.get('grid_title') or video_id).strip()
|
title = (data.get('title') or data.get('grid_title') or video_id).strip()
|
||||||
|
|
||||||
|
urls = []
|
||||||
formats = []
|
formats = []
|
||||||
duration = None
|
duration = None
|
||||||
if extract_formats:
|
if extract_formats:
|
||||||
@ -38,8 +39,9 @@ class PinterestBaseIE(InfoExtractor):
|
|||||||
if not isinstance(format_dict, dict):
|
if not isinstance(format_dict, dict):
|
||||||
continue
|
continue
|
||||||
format_url = url_or_none(format_dict.get('url'))
|
format_url = url_or_none(format_dict.get('url'))
|
||||||
if not format_url:
|
if not format_url or format_url in urls:
|
||||||
continue
|
continue
|
||||||
|
urls.append(format_url)
|
||||||
duration = float_or_none(format_dict.get('duration'), scale=1000)
|
duration = float_or_none(format_dict.get('duration'), scale=1000)
|
||||||
ext = determine_ext(format_url)
|
ext = determine_ext(format_url)
|
||||||
if 'hls' in format_id.lower() or ext == 'm3u8':
|
if 'hls' in format_id.lower() or ext == 'm3u8':
|
||||||
|
@ -393,7 +393,7 @@ query viewClip {
|
|||||||
# To somewhat reduce the probability of these consequences
|
# To somewhat reduce the probability of these consequences
|
||||||
# we will sleep random amount of time before each call to ViewClip.
|
# we will sleep random amount of time before each call to ViewClip.
|
||||||
self._sleep(
|
self._sleep(
|
||||||
random.randint(2, 5), display_id,
|
random.randint(5, 10), display_id,
|
||||||
'%(video_id)s: Waiting for %(timeout)s seconds to avoid throttling')
|
'%(video_id)s: Waiting for %(timeout)s seconds to avoid throttling')
|
||||||
|
|
||||||
if not viewclip:
|
if not viewclip:
|
||||||
|
@ -167,6 +167,7 @@ class PornHubIE(PornHubBaseIE):
|
|||||||
'params': {
|
'params': {
|
||||||
'skip_download': True,
|
'skip_download': True,
|
||||||
},
|
},
|
||||||
|
'skip': 'Video has been flagged for verification in accordance with our trust and safety policy',
|
||||||
}, {
|
}, {
|
||||||
# subtitles
|
# subtitles
|
||||||
'url': 'https://www.pornhub.com/view_video.php?viewkey=ph5af5fef7c2aa7',
|
'url': 'https://www.pornhub.com/view_video.php?viewkey=ph5af5fef7c2aa7',
|
||||||
@ -265,7 +266,8 @@ class PornHubIE(PornHubBaseIE):
|
|||||||
webpage = dl_webpage('pc')
|
webpage = dl_webpage('pc')
|
||||||
|
|
||||||
error_msg = self._html_search_regex(
|
error_msg = self._html_search_regex(
|
||||||
r'(?s)<div[^>]+class=(["\'])(?:(?!\1).)*\b(?:removed|userMessageSection)\b(?:(?!\1).)*\1[^>]*>(?P<error>.+?)</div>',
|
(r'(?s)<div[^>]+class=(["\'])(?:(?!\1).)*\b(?:removed|userMessageSection)\b(?:(?!\1).)*\1[^>]*>(?P<error>.+?)</div>',
|
||||||
|
r'(?s)<section[^>]+class=["\']noVideo["\'][^>]*>(?P<error>.+?)</section>'),
|
||||||
webpage, 'error message', default=None, group='error')
|
webpage, 'error message', default=None, group='error')
|
||||||
if error_msg:
|
if error_msg:
|
||||||
error_msg = re.sub(r'\s+', ' ', error_msg)
|
error_msg = re.sub(r'\s+', ' ', error_msg)
|
||||||
@ -394,34 +396,50 @@ class PornHubIE(PornHubBaseIE):
|
|||||||
|
|
||||||
upload_date = None
|
upload_date = None
|
||||||
formats = []
|
formats = []
|
||||||
|
|
||||||
|
def add_format(format_url, height=None):
|
||||||
|
ext = determine_ext(format_url)
|
||||||
|
if ext == 'mpd':
|
||||||
|
formats.extend(self._extract_mpd_formats(
|
||||||
|
format_url, video_id, mpd_id='dash', fatal=False))
|
||||||
|
return
|
||||||
|
if ext == 'm3u8':
|
||||||
|
formats.extend(self._extract_m3u8_formats(
|
||||||
|
format_url, video_id, 'mp4', entry_protocol='m3u8_native',
|
||||||
|
m3u8_id='hls', fatal=False))
|
||||||
|
return
|
||||||
|
tbr = None
|
||||||
|
mobj = re.search(r'(?P<height>\d+)[pP]?_(?P<tbr>\d+)[kK]', format_url)
|
||||||
|
if mobj:
|
||||||
|
if not height:
|
||||||
|
height = int(mobj.group('height'))
|
||||||
|
tbr = int(mobj.group('tbr'))
|
||||||
|
formats.append({
|
||||||
|
'url': format_url,
|
||||||
|
'format_id': '%dp' % height if height else None,
|
||||||
|
'height': height,
|
||||||
|
'tbr': tbr,
|
||||||
|
})
|
||||||
|
|
||||||
for video_url, height in video_urls:
|
for video_url, height in video_urls:
|
||||||
if not upload_date:
|
if not upload_date:
|
||||||
upload_date = self._search_regex(
|
upload_date = self._search_regex(
|
||||||
r'/(\d{6}/\d{2})/', video_url, 'upload data', default=None)
|
r'/(\d{6}/\d{2})/', video_url, 'upload data', default=None)
|
||||||
if upload_date:
|
if upload_date:
|
||||||
upload_date = upload_date.replace('/', '')
|
upload_date = upload_date.replace('/', '')
|
||||||
ext = determine_ext(video_url)
|
if '/video/get_media' in video_url:
|
||||||
if ext == 'mpd':
|
medias = self._download_json(video_url, video_id, fatal=False)
|
||||||
formats.extend(self._extract_mpd_formats(
|
if isinstance(medias, list):
|
||||||
video_url, video_id, mpd_id='dash', fatal=False))
|
for media in medias:
|
||||||
|
if not isinstance(media, dict):
|
||||||
continue
|
continue
|
||||||
elif ext == 'm3u8':
|
video_url = url_or_none(media.get('videoUrl'))
|
||||||
formats.extend(self._extract_m3u8_formats(
|
if not video_url:
|
||||||
video_url, video_id, 'mp4', entry_protocol='m3u8_native',
|
|
||||||
m3u8_id='hls', fatal=False))
|
|
||||||
continue
|
continue
|
||||||
tbr = None
|
height = int_or_none(media.get('quality'))
|
||||||
mobj = re.search(r'(?P<height>\d+)[pP]?_(?P<tbr>\d+)[kK]', video_url)
|
add_format(video_url, height)
|
||||||
if mobj:
|
continue
|
||||||
if not height:
|
add_format(video_url)
|
||||||
height = int(mobj.group('height'))
|
|
||||||
tbr = int(mobj.group('tbr'))
|
|
||||||
formats.append({
|
|
||||||
'url': video_url,
|
|
||||||
'format_id': '%dp' % height if height else None,
|
|
||||||
'height': height,
|
|
||||||
'tbr': tbr,
|
|
||||||
})
|
|
||||||
self._sort_formats(formats)
|
self._sort_formats(formats)
|
||||||
|
|
||||||
video_uploader = self._html_search_regex(
|
video_uploader = self._html_search_regex(
|
||||||
|
@ -15,17 +15,17 @@ class RDSIE(InfoExtractor):
|
|||||||
_VALID_URL = r'https?://(?:www\.)?rds\.ca/vid(?:[eé]|%C3%A9)os/(?:[^/]+/)*(?P<id>[^/]+)-\d+\.\d+'
|
_VALID_URL = r'https?://(?:www\.)?rds\.ca/vid(?:[eé]|%C3%A9)os/(?:[^/]+/)*(?P<id>[^/]+)-\d+\.\d+'
|
||||||
|
|
||||||
_TESTS = [{
|
_TESTS = [{
|
||||||
'url': 'http://www.rds.ca/videos/football/nfl/fowler-jr-prend-la-direction-de-jacksonville-3.1132799',
|
# has two 9c9media ContentPackages, the web player selects the first ContentPackage
|
||||||
|
'url': 'https://www.rds.ca/videos/Hockey/NationalHockeyLeague/teams/9/forum-du-5-a-7-jesperi-kotkaniemi-de-retour-de-finlande-3.1377606',
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
'id': '604333',
|
'id': '2083309',
|
||||||
'display_id': 'fowler-jr-prend-la-direction-de-jacksonville',
|
'display_id': 'forum-du-5-a-7-jesperi-kotkaniemi-de-retour-de-finlande',
|
||||||
'ext': 'flv',
|
'ext': 'flv',
|
||||||
'title': 'Fowler Jr. prend la direction de Jacksonville',
|
'title': 'Forum du 5 à 7 : Kotkaniemi de retour de Finlande',
|
||||||
'description': 'Dante Fowler Jr. est le troisième choix du repêchage 2015 de la NFL. ',
|
'description': 'md5:83fa38ecc4a79b19e433433254077f25',
|
||||||
'timestamp': 1430397346,
|
'timestamp': 1606129030,
|
||||||
'upload_date': '20150430',
|
'upload_date': '20201123',
|
||||||
'duration': 154.354,
|
'duration': 773.039,
|
||||||
'age_limit': 0,
|
|
||||||
}
|
}
|
||||||
}, {
|
}, {
|
||||||
'url': 'http://www.rds.ca/vid%C3%A9os/un-voyage-positif-3.877934',
|
'url': 'http://www.rds.ca/vid%C3%A9os/un-voyage-positif-3.877934',
|
||||||
|
@ -6,11 +6,12 @@ import re
|
|||||||
from .srgssr import SRGSSRIE
|
from .srgssr import SRGSSRIE
|
||||||
from ..compat import compat_str
|
from ..compat import compat_str
|
||||||
from ..utils import (
|
from ..utils import (
|
||||||
|
determine_ext,
|
||||||
int_or_none,
|
int_or_none,
|
||||||
parse_duration,
|
parse_duration,
|
||||||
parse_iso8601,
|
parse_iso8601,
|
||||||
unescapeHTML,
|
unescapeHTML,
|
||||||
determine_ext,
|
urljoin,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
@ -21,7 +22,7 @@ class RTSIE(SRGSSRIE):
|
|||||||
_TESTS = [
|
_TESTS = [
|
||||||
{
|
{
|
||||||
'url': 'http://www.rts.ch/archives/tv/divers/3449373-les-enfants-terribles.html',
|
'url': 'http://www.rts.ch/archives/tv/divers/3449373-les-enfants-terribles.html',
|
||||||
'md5': 'ff7f8450a90cf58dacb64e29707b4a8e',
|
'md5': '753b877968ad8afaeddccc374d4256a5',
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
'id': '3449373',
|
'id': '3449373',
|
||||||
'display_id': 'les-enfants-terribles',
|
'display_id': 'les-enfants-terribles',
|
||||||
@ -35,6 +36,7 @@ class RTSIE(SRGSSRIE):
|
|||||||
'thumbnail': r're:^https?://.*\.image',
|
'thumbnail': r're:^https?://.*\.image',
|
||||||
'view_count': int,
|
'view_count': int,
|
||||||
},
|
},
|
||||||
|
'expected_warnings': ['Unable to download f4m manifest', 'Failed to download m3u8 information'],
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
'url': 'http://www.rts.ch/emissions/passe-moi-les-jumelles/5624067-entre-ciel-et-mer.html',
|
'url': 'http://www.rts.ch/emissions/passe-moi-les-jumelles/5624067-entre-ciel-et-mer.html',
|
||||||
@ -63,11 +65,12 @@ class RTSIE(SRGSSRIE):
|
|||||||
# m3u8 download
|
# m3u8 download
|
||||||
'skip_download': True,
|
'skip_download': True,
|
||||||
},
|
},
|
||||||
|
'expected_warnings': ['Unable to download f4m manifest', 'Failed to download m3u8 information'],
|
||||||
'skip': 'Blocked outside Switzerland',
|
'skip': 'Blocked outside Switzerland',
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
'url': 'http://www.rts.ch/video/info/journal-continu/5745356-londres-cachee-par-un-epais-smog.html',
|
'url': 'http://www.rts.ch/video/info/journal-continu/5745356-londres-cachee-par-un-epais-smog.html',
|
||||||
'md5': '1bae984fe7b1f78e94abc74e802ed99f',
|
'md5': '9bb06503773c07ce83d3cbd793cebb91',
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
'id': '5745356',
|
'id': '5745356',
|
||||||
'display_id': 'londres-cachee-par-un-epais-smog',
|
'display_id': 'londres-cachee-par-un-epais-smog',
|
||||||
@ -81,6 +84,7 @@ class RTSIE(SRGSSRIE):
|
|||||||
'thumbnail': r're:^https?://.*\.image',
|
'thumbnail': r're:^https?://.*\.image',
|
||||||
'view_count': int,
|
'view_count': int,
|
||||||
},
|
},
|
||||||
|
'expected_warnings': ['Unable to download f4m manifest', 'Failed to download m3u8 information'],
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
'url': 'http://www.rts.ch/audio/couleur3/programmes/la-belle-video-de-stephane-laurenceau/5706148-urban-hippie-de-damien-krisl-03-04-2014.html',
|
'url': 'http://www.rts.ch/audio/couleur3/programmes/la-belle-video-de-stephane-laurenceau/5706148-urban-hippie-de-damien-krisl-03-04-2014.html',
|
||||||
@ -160,7 +164,7 @@ class RTSIE(SRGSSRIE):
|
|||||||
media_type = 'video' if 'video' in all_info else 'audio'
|
media_type = 'video' if 'video' in all_info else 'audio'
|
||||||
|
|
||||||
# check for errors
|
# check for errors
|
||||||
self.get_media_data('rts', media_type, media_id)
|
self._get_media_data('rts', media_type, media_id)
|
||||||
|
|
||||||
info = all_info['video']['JSONinfo'] if 'video' in all_info else all_info['audio']
|
info = all_info['video']['JSONinfo'] if 'video' in all_info else all_info['audio']
|
||||||
|
|
||||||
@ -194,6 +198,7 @@ class RTSIE(SRGSSRIE):
|
|||||||
'tbr': extract_bitrate(format_url),
|
'tbr': extract_bitrate(format_url),
|
||||||
})
|
})
|
||||||
|
|
||||||
|
download_base = 'http://rtsww%s-d.rts.ch/' % ('-a' if media_type == 'audio' else '')
|
||||||
for media in info.get('media', []):
|
for media in info.get('media', []):
|
||||||
media_url = media.get('url')
|
media_url = media.get('url')
|
||||||
if not media_url or re.match(r'https?://', media_url):
|
if not media_url or re.match(r'https?://', media_url):
|
||||||
@ -205,7 +210,7 @@ class RTSIE(SRGSSRIE):
|
|||||||
format_id += '-%dk' % rate
|
format_id += '-%dk' % rate
|
||||||
formats.append({
|
formats.append({
|
||||||
'format_id': format_id,
|
'format_id': format_id,
|
||||||
'url': 'http://download-video.rts.ch/' + media_url,
|
'url': urljoin(download_base, media_url),
|
||||||
'tbr': rate or extract_bitrate(media_url),
|
'tbr': rate or extract_bitrate(media_url),
|
||||||
})
|
})
|
||||||
|
|
||||||
|
@ -2,8 +2,9 @@
|
|||||||
from __future__ import unicode_literals
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
import base64
|
import base64
|
||||||
|
import io
|
||||||
import re
|
import re
|
||||||
import time
|
import sys
|
||||||
|
|
||||||
from .common import InfoExtractor
|
from .common import InfoExtractor
|
||||||
from ..compat import (
|
from ..compat import (
|
||||||
@ -14,56 +15,13 @@ from ..utils import (
|
|||||||
determine_ext,
|
determine_ext,
|
||||||
ExtractorError,
|
ExtractorError,
|
||||||
float_or_none,
|
float_or_none,
|
||||||
|
qualities,
|
||||||
remove_end,
|
remove_end,
|
||||||
remove_start,
|
remove_start,
|
||||||
sanitized_Request,
|
|
||||||
std_headers,
|
std_headers,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
_bytes_to_chr = (lambda x: x) if sys.version_info[0] == 2 else (lambda x: map(chr, x))
|
||||||
def _decrypt_url(png):
|
|
||||||
encrypted_data = compat_b64decode(png)
|
|
||||||
text_index = encrypted_data.find(b'tEXt')
|
|
||||||
text_chunk = encrypted_data[text_index - 4:]
|
|
||||||
length = compat_struct_unpack('!I', text_chunk[:4])[0]
|
|
||||||
# Use bytearray to get integers when iterating in both python 2.x and 3.x
|
|
||||||
data = bytearray(text_chunk[8:8 + length])
|
|
||||||
data = [chr(b) for b in data if b != 0]
|
|
||||||
hash_index = data.index('#')
|
|
||||||
alphabet_data = data[:hash_index]
|
|
||||||
url_data = data[hash_index + 1:]
|
|
||||||
if url_data[0] == 'H' and url_data[3] == '%':
|
|
||||||
# remove useless HQ%% at the start
|
|
||||||
url_data = url_data[4:]
|
|
||||||
|
|
||||||
alphabet = []
|
|
||||||
e = 0
|
|
||||||
d = 0
|
|
||||||
for l in alphabet_data:
|
|
||||||
if d == 0:
|
|
||||||
alphabet.append(l)
|
|
||||||
d = e = (e + 1) % 4
|
|
||||||
else:
|
|
||||||
d -= 1
|
|
||||||
url = ''
|
|
||||||
f = 0
|
|
||||||
e = 3
|
|
||||||
b = 1
|
|
||||||
for letter in url_data:
|
|
||||||
if f == 0:
|
|
||||||
l = int(letter) * 10
|
|
||||||
f = 1
|
|
||||||
else:
|
|
||||||
if e == 0:
|
|
||||||
l += int(letter)
|
|
||||||
url += alphabet[l]
|
|
||||||
e = (b + 3) % 4
|
|
||||||
f = 0
|
|
||||||
b += 1
|
|
||||||
else:
|
|
||||||
e -= 1
|
|
||||||
|
|
||||||
return url
|
|
||||||
|
|
||||||
|
|
||||||
class RTVEALaCartaIE(InfoExtractor):
|
class RTVEALaCartaIE(InfoExtractor):
|
||||||
@ -79,28 +37,31 @@ class RTVEALaCartaIE(InfoExtractor):
|
|||||||
'ext': 'mp4',
|
'ext': 'mp4',
|
||||||
'title': 'Balonmano - Swiss Cup masculina. Final: España-Suecia',
|
'title': 'Balonmano - Swiss Cup masculina. Final: España-Suecia',
|
||||||
'duration': 5024.566,
|
'duration': 5024.566,
|
||||||
|
'series': 'Balonmano',
|
||||||
},
|
},
|
||||||
|
'expected_warnings': ['Failed to download MPD manifest', 'Failed to download m3u8 information'],
|
||||||
}, {
|
}, {
|
||||||
'note': 'Live stream',
|
'note': 'Live stream',
|
||||||
'url': 'http://www.rtve.es/alacarta/videos/television/24h-live/1694255/',
|
'url': 'http://www.rtve.es/alacarta/videos/television/24h-live/1694255/',
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
'id': '1694255',
|
'id': '1694255',
|
||||||
'ext': 'flv',
|
'ext': 'mp4',
|
||||||
'title': 'TODO',
|
'title': 're:^24H LIVE [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
|
||||||
|
'is_live': True,
|
||||||
|
},
|
||||||
|
'params': {
|
||||||
|
'skip_download': 'live stream',
|
||||||
},
|
},
|
||||||
'skip': 'The f4m manifest can\'t be used yet',
|
|
||||||
}, {
|
}, {
|
||||||
'url': 'http://www.rtve.es/alacarta/videos/servir-y-proteger/servir-proteger-capitulo-104/4236788/',
|
'url': 'http://www.rtve.es/alacarta/videos/servir-y-proteger/servir-proteger-capitulo-104/4236788/',
|
||||||
'md5': 'e55e162379ad587e9640eda4f7353c0f',
|
'md5': 'd850f3c8731ea53952ebab489cf81cbf',
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
'id': '4236788',
|
'id': '4236788',
|
||||||
'ext': 'mp4',
|
'ext': 'mp4',
|
||||||
'title': 'Servir y proteger - Capítulo 104',
|
'title': 'Servir y proteger - Capítulo 104',
|
||||||
'duration': 3222.0,
|
'duration': 3222.0,
|
||||||
},
|
},
|
||||||
'params': {
|
'expected_warnings': ['Failed to download MPD manifest', 'Failed to download m3u8 information'],
|
||||||
'skip_download': True, # requires ffmpeg
|
|
||||||
},
|
|
||||||
}, {
|
}, {
|
||||||
'url': 'http://www.rtve.es/m/alacarta/videos/cuentame-como-paso/cuentame-como-paso-t16-ultimo-minuto-nuestra-vida-capitulo-276/2969138/?media=tve',
|
'url': 'http://www.rtve.es/m/alacarta/videos/cuentame-como-paso/cuentame-como-paso-t16-ultimo-minuto-nuestra-vida-capitulo-276/2969138/?media=tve',
|
||||||
'only_matching': True,
|
'only_matching': True,
|
||||||
@ -111,58 +72,102 @@ class RTVEALaCartaIE(InfoExtractor):
|
|||||||
|
|
||||||
def _real_initialize(self):
|
def _real_initialize(self):
|
||||||
user_agent_b64 = base64.b64encode(std_headers['User-Agent'].encode('utf-8')).decode('utf-8')
|
user_agent_b64 = base64.b64encode(std_headers['User-Agent'].encode('utf-8')).decode('utf-8')
|
||||||
manager_info = self._download_json(
|
self._manager = self._download_json(
|
||||||
'http://www.rtve.es/odin/loki/' + user_agent_b64,
|
'http://www.rtve.es/odin/loki/' + user_agent_b64,
|
||||||
None, 'Fetching manager info')
|
None, 'Fetching manager info')['manager']
|
||||||
self._manager = manager_info['manager']
|
|
||||||
|
@staticmethod
|
||||||
|
def _decrypt_url(png):
|
||||||
|
encrypted_data = io.BytesIO(compat_b64decode(png)[8:])
|
||||||
|
while True:
|
||||||
|
length = compat_struct_unpack('!I', encrypted_data.read(4))[0]
|
||||||
|
chunk_type = encrypted_data.read(4)
|
||||||
|
if chunk_type == b'IEND':
|
||||||
|
break
|
||||||
|
data = encrypted_data.read(length)
|
||||||
|
if chunk_type == b'tEXt':
|
||||||
|
alphabet_data, text = data.split(b'\0')
|
||||||
|
quality, url_data = text.split(b'%%')
|
||||||
|
alphabet = []
|
||||||
|
e = 0
|
||||||
|
d = 0
|
||||||
|
for l in _bytes_to_chr(alphabet_data):
|
||||||
|
if d == 0:
|
||||||
|
alphabet.append(l)
|
||||||
|
d = e = (e + 1) % 4
|
||||||
|
else:
|
||||||
|
d -= 1
|
||||||
|
url = ''
|
||||||
|
f = 0
|
||||||
|
e = 3
|
||||||
|
b = 1
|
||||||
|
for letter in _bytes_to_chr(url_data):
|
||||||
|
if f == 0:
|
||||||
|
l = int(letter) * 10
|
||||||
|
f = 1
|
||||||
|
else:
|
||||||
|
if e == 0:
|
||||||
|
l += int(letter)
|
||||||
|
url += alphabet[l]
|
||||||
|
e = (b + 3) % 4
|
||||||
|
f = 0
|
||||||
|
b += 1
|
||||||
|
else:
|
||||||
|
e -= 1
|
||||||
|
|
||||||
|
yield quality.decode(), url
|
||||||
|
encrypted_data.read(4) # CRC
|
||||||
|
|
||||||
|
def _extract_png_formats(self, video_id):
|
||||||
|
png = self._download_webpage(
|
||||||
|
'http://www.rtve.es/ztnr/movil/thumbnail/%s/videos/%s.png' % (self._manager, video_id),
|
||||||
|
video_id, 'Downloading url information', query={'q': 'v2'})
|
||||||
|
q = qualities(['Media', 'Alta', 'HQ', 'HD_READY', 'HD_FULL'])
|
||||||
|
formats = []
|
||||||
|
for quality, video_url in self._decrypt_url(png):
|
||||||
|
ext = determine_ext(video_url)
|
||||||
|
if ext == 'm3u8':
|
||||||
|
formats.extend(self._extract_m3u8_formats(
|
||||||
|
video_url, video_id, 'mp4', 'm3u8_native',
|
||||||
|
m3u8_id='hls', fatal=False))
|
||||||
|
elif ext == 'mpd':
|
||||||
|
formats.extend(self._extract_mpd_formats(
|
||||||
|
video_url, video_id, 'dash', fatal=False))
|
||||||
|
else:
|
||||||
|
formats.append({
|
||||||
|
'format_id': quality,
|
||||||
|
'quality': q(quality),
|
||||||
|
'url': video_url,
|
||||||
|
})
|
||||||
|
self._sort_formats(formats)
|
||||||
|
return formats
|
||||||
|
|
||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
mobj = re.match(self._VALID_URL, url)
|
video_id = self._match_id(url)
|
||||||
video_id = mobj.group('id')
|
|
||||||
info = self._download_json(
|
info = self._download_json(
|
||||||
'http://www.rtve.es/api/videos/%s/config/alacarta_videos.json' % video_id,
|
'http://www.rtve.es/api/videos/%s/config/alacarta_videos.json' % video_id,
|
||||||
video_id)['page']['items'][0]
|
video_id)['page']['items'][0]
|
||||||
if info['state'] == 'DESPU':
|
if info['state'] == 'DESPU':
|
||||||
raise ExtractorError('The video is no longer available', expected=True)
|
raise ExtractorError('The video is no longer available', expected=True)
|
||||||
title = info['title']
|
title = info['title'].strip()
|
||||||
png_url = 'http://www.rtve.es/ztnr/movil/thumbnail/%s/videos/%s.png' % (self._manager, video_id)
|
formats = self._extract_png_formats(video_id)
|
||||||
png_request = sanitized_Request(png_url)
|
|
||||||
png_request.add_header('Referer', url)
|
|
||||||
png = self._download_webpage(png_request, video_id, 'Downloading url information')
|
|
||||||
video_url = _decrypt_url(png)
|
|
||||||
ext = determine_ext(video_url)
|
|
||||||
|
|
||||||
formats = []
|
|
||||||
if not video_url.endswith('.f4m') and ext != 'm3u8':
|
|
||||||
if '?' not in video_url:
|
|
||||||
video_url = video_url.replace('resources/', 'auth/resources/')
|
|
||||||
video_url = video_url.replace('.net.rtve', '.multimedia.cdn.rtve')
|
|
||||||
|
|
||||||
if ext == 'm3u8':
|
|
||||||
formats.extend(self._extract_m3u8_formats(
|
|
||||||
video_url, video_id, ext='mp4', entry_protocol='m3u8_native',
|
|
||||||
m3u8_id='hls', fatal=False))
|
|
||||||
elif ext == 'f4m':
|
|
||||||
formats.extend(self._extract_f4m_formats(
|
|
||||||
video_url, video_id, f4m_id='hds', fatal=False))
|
|
||||||
else:
|
|
||||||
formats.append({
|
|
||||||
'url': video_url,
|
|
||||||
})
|
|
||||||
self._sort_formats(formats)
|
|
||||||
|
|
||||||
subtitles = None
|
subtitles = None
|
||||||
if info.get('sbtFile') is not None:
|
sbt_file = info.get('sbtFile')
|
||||||
subtitles = self.extract_subtitles(video_id, info['sbtFile'])
|
if sbt_file:
|
||||||
|
subtitles = self.extract_subtitles(video_id, sbt_file)
|
||||||
|
|
||||||
|
is_live = info.get('live') is True
|
||||||
|
|
||||||
return {
|
return {
|
||||||
'id': video_id,
|
'id': video_id,
|
||||||
'title': title,
|
'title': self._live_title(title) if is_live else title,
|
||||||
'formats': formats,
|
'formats': formats,
|
||||||
'thumbnail': info.get('image'),
|
'thumbnail': info.get('image'),
|
||||||
'page_url': url,
|
|
||||||
'subtitles': subtitles,
|
'subtitles': subtitles,
|
||||||
'duration': float_or_none(info.get('duration'), scale=1000),
|
'duration': float_or_none(info.get('duration'), 1000),
|
||||||
|
'is_live': is_live,
|
||||||
|
'series': info.get('programTitle'),
|
||||||
}
|
}
|
||||||
|
|
||||||
def _get_subtitles(self, video_id, sub_file):
|
def _get_subtitles(self, video_id, sub_file):
|
||||||
@ -174,48 +179,26 @@ class RTVEALaCartaIE(InfoExtractor):
|
|||||||
for s in subs)
|
for s in subs)
|
||||||
|
|
||||||
|
|
||||||
class RTVEInfantilIE(InfoExtractor):
|
class RTVEInfantilIE(RTVEALaCartaIE):
|
||||||
IE_NAME = 'rtve.es:infantil'
|
IE_NAME = 'rtve.es:infantil'
|
||||||
IE_DESC = 'RTVE infantil'
|
IE_DESC = 'RTVE infantil'
|
||||||
_VALID_URL = r'https?://(?:www\.)?rtve\.es/infantil/serie/(?P<show>[^/]*)/video/(?P<short_title>[^/]*)/(?P<id>[0-9]+)/'
|
_VALID_URL = r'https?://(?:www\.)?rtve\.es/infantil/serie/[^/]+/video/[^/]+/(?P<id>[0-9]+)/'
|
||||||
|
|
||||||
_TESTS = [{
|
_TESTS = [{
|
||||||
'url': 'http://www.rtve.es/infantil/serie/cleo/video/maneras-vivir/3040283/',
|
'url': 'http://www.rtve.es/infantil/serie/cleo/video/maneras-vivir/3040283/',
|
||||||
'md5': '915319587b33720b8e0357caaa6617e6',
|
'md5': '5747454717aedf9f9fdf212d1bcfc48d',
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
'id': '3040283',
|
'id': '3040283',
|
||||||
'ext': 'mp4',
|
'ext': 'mp4',
|
||||||
'title': 'Maneras de vivir',
|
'title': 'Maneras de vivir',
|
||||||
'thumbnail': 'http://www.rtve.es/resources/jpg/6/5/1426182947956.JPG',
|
'thumbnail': r're:https?://.+/1426182947956\.JPG',
|
||||||
'duration': 357.958,
|
'duration': 357.958,
|
||||||
},
|
},
|
||||||
|
'expected_warnings': ['Failed to download MPD manifest', 'Failed to download m3u8 information'],
|
||||||
}]
|
}]
|
||||||
|
|
||||||
def _real_extract(self, url):
|
|
||||||
video_id = self._match_id(url)
|
|
||||||
info = self._download_json(
|
|
||||||
'http://www.rtve.es/api/videos/%s/config/alacarta_videos.json' % video_id,
|
|
||||||
video_id)['page']['items'][0]
|
|
||||||
|
|
||||||
webpage = self._download_webpage(url, video_id)
|
class RTVELiveIE(RTVEALaCartaIE):
|
||||||
vidplayer_id = self._search_regex(
|
|
||||||
r' id="vidplayer([0-9]+)"', webpage, 'internal video ID')
|
|
||||||
|
|
||||||
png_url = 'http://www.rtve.es/ztnr/movil/thumbnail/default/videos/%s.png' % vidplayer_id
|
|
||||||
png = self._download_webpage(png_url, video_id, 'Downloading url information')
|
|
||||||
video_url = _decrypt_url(png)
|
|
||||||
|
|
||||||
return {
|
|
||||||
'id': video_id,
|
|
||||||
'ext': 'mp4',
|
|
||||||
'title': info['title'],
|
|
||||||
'url': video_url,
|
|
||||||
'thumbnail': info.get('image'),
|
|
||||||
'duration': float_or_none(info.get('duration'), scale=1000),
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
class RTVELiveIE(InfoExtractor):
|
|
||||||
IE_NAME = 'rtve.es:live'
|
IE_NAME = 'rtve.es:live'
|
||||||
IE_DESC = 'RTVE.es live streams'
|
IE_DESC = 'RTVE.es live streams'
|
||||||
_VALID_URL = r'https?://(?:www\.)?rtve\.es/directo/(?P<id>[a-zA-Z0-9-]+)'
|
_VALID_URL = r'https?://(?:www\.)?rtve\.es/directo/(?P<id>[a-zA-Z0-9-]+)'
|
||||||
@ -225,7 +208,7 @@ class RTVELiveIE(InfoExtractor):
|
|||||||
'info_dict': {
|
'info_dict': {
|
||||||
'id': 'la-1',
|
'id': 'la-1',
|
||||||
'ext': 'mp4',
|
'ext': 'mp4',
|
||||||
'title': 're:^La 1 [0-9]{4}-[0-9]{2}-[0-9]{2}Z[0-9]{6}$',
|
'title': 're:^La 1 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
|
||||||
},
|
},
|
||||||
'params': {
|
'params': {
|
||||||
'skip_download': 'live stream',
|
'skip_download': 'live stream',
|
||||||
@ -234,29 +217,22 @@ class RTVELiveIE(InfoExtractor):
|
|||||||
|
|
||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
mobj = re.match(self._VALID_URL, url)
|
mobj = re.match(self._VALID_URL, url)
|
||||||
start_time = time.gmtime()
|
|
||||||
video_id = mobj.group('id')
|
video_id = mobj.group('id')
|
||||||
|
|
||||||
webpage = self._download_webpage(url, video_id)
|
webpage = self._download_webpage(url, video_id)
|
||||||
title = remove_end(self._og_search_title(webpage), ' en directo en RTVE.es')
|
title = remove_end(self._og_search_title(webpage), ' en directo en RTVE.es')
|
||||||
title = remove_start(title, 'Estoy viendo ')
|
title = remove_start(title, 'Estoy viendo ')
|
||||||
title += ' ' + time.strftime('%Y-%m-%dZ%H%M%S', start_time)
|
|
||||||
|
|
||||||
vidplayer_id = self._search_regex(
|
vidplayer_id = self._search_regex(
|
||||||
(r'playerId=player([0-9]+)',
|
(r'playerId=player([0-9]+)',
|
||||||
r'class=["\'].*?\blive_mod\b.*?["\'][^>]+data-assetid=["\'](\d+)',
|
r'class=["\'].*?\blive_mod\b.*?["\'][^>]+data-assetid=["\'](\d+)',
|
||||||
r'data-id=["\'](\d+)'),
|
r'data-id=["\'](\d+)'),
|
||||||
webpage, 'internal video ID')
|
webpage, 'internal video ID')
|
||||||
png_url = 'http://www.rtve.es/ztnr/movil/thumbnail/amonet/videos/%s.png' % vidplayer_id
|
|
||||||
png = self._download_webpage(png_url, video_id, 'Downloading url information')
|
|
||||||
m3u8_url = _decrypt_url(png)
|
|
||||||
formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4')
|
|
||||||
self._sort_formats(formats)
|
|
||||||
|
|
||||||
return {
|
return {
|
||||||
'id': video_id,
|
'id': video_id,
|
||||||
'title': title,
|
'title': self._live_title(title),
|
||||||
'formats': formats,
|
'formats': self._extract_png_formats(vidplayer_id),
|
||||||
'is_live': True,
|
'is_live': True,
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -10,7 +10,7 @@ from ..utils import (
|
|||||||
|
|
||||||
class SBSIE(InfoExtractor):
|
class SBSIE(InfoExtractor):
|
||||||
IE_DESC = 'sbs.com.au'
|
IE_DESC = 'sbs.com.au'
|
||||||
_VALID_URL = r'https?://(?:www\.)?sbs\.com\.au/(?:ondemand(?:/video/(?:single/)?|.*?\bplay=)|news/(?:embeds/)?video/)(?P<id>[0-9]+)'
|
_VALID_URL = r'https?://(?:www\.)?sbs\.com\.au/(?:ondemand(?:/video/(?:single/)?|.*?\bplay=|/watch/)|news/(?:embeds/)?video/)(?P<id>[0-9]+)'
|
||||||
|
|
||||||
_TESTS = [{
|
_TESTS = [{
|
||||||
# Original URL is handled by the generic IE which finds the iframe:
|
# Original URL is handled by the generic IE which finds the iframe:
|
||||||
@ -43,6 +43,9 @@ class SBSIE(InfoExtractor):
|
|||||||
}, {
|
}, {
|
||||||
'url': 'https://www.sbs.com.au/news/embeds/video/1840778819866',
|
'url': 'https://www.sbs.com.au/news/embeds/video/1840778819866',
|
||||||
'only_matching': True,
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://www.sbs.com.au/ondemand/watch/1698704451971',
|
||||||
|
'only_matching': True,
|
||||||
}]
|
}]
|
||||||
|
|
||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
|
@ -2,12 +2,18 @@
|
|||||||
from __future__ import unicode_literals
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
from .common import InfoExtractor
|
from .common import InfoExtractor
|
||||||
from ..utils import js_to_json
|
from ..utils import (
|
||||||
|
get_element_by_class,
|
||||||
|
int_or_none,
|
||||||
|
remove_start,
|
||||||
|
strip_or_none,
|
||||||
|
unified_strdate,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
class ScreencastOMaticIE(InfoExtractor):
|
class ScreencastOMaticIE(InfoExtractor):
|
||||||
_VALID_URL = r'https?://screencast-o-matic\.com/watch/(?P<id>[0-9a-zA-Z]+)'
|
_VALID_URL = r'https?://screencast-o-matic\.com/(?:(?:watch|player)/|embed\?.*?\bsc=)(?P<id>[0-9a-zA-Z]+)'
|
||||||
_TEST = {
|
_TESTS = [{
|
||||||
'url': 'http://screencast-o-matic.com/watch/c2lD3BeOPl',
|
'url': 'http://screencast-o-matic.com/watch/c2lD3BeOPl',
|
||||||
'md5': '483583cb80d92588f15ccbedd90f0c18',
|
'md5': '483583cb80d92588f15ccbedd90f0c18',
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
@ -16,22 +22,30 @@ class ScreencastOMaticIE(InfoExtractor):
|
|||||||
'title': 'Welcome to 3-4 Philosophy @ DECV!',
|
'title': 'Welcome to 3-4 Philosophy @ DECV!',
|
||||||
'thumbnail': r're:^https?://.*\.jpg$',
|
'thumbnail': r're:^https?://.*\.jpg$',
|
||||||
'description': 'as the title says! also: some general info re 1) VCE philosophy and 2) distance learning.',
|
'description': 'as the title says! also: some general info re 1) VCE philosophy and 2) distance learning.',
|
||||||
'duration': 369.163,
|
'duration': 369,
|
||||||
}
|
'upload_date': '20141216',
|
||||||
}
|
}
|
||||||
|
}, {
|
||||||
|
'url': 'http://screencast-o-matic.com/player/c2lD3BeOPl',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'http://screencast-o-matic.com/embed?ff=true&sc=cbV2r4Q5TL&fromPH=true&a=1',
|
||||||
|
'only_matching': True,
|
||||||
|
}]
|
||||||
|
|
||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
video_id = self._match_id(url)
|
video_id = self._match_id(url)
|
||||||
webpage = self._download_webpage(url, video_id)
|
webpage = self._download_webpage(
|
||||||
|
'https://screencast-o-matic.com/player/' + video_id, video_id)
|
||||||
jwplayer_data = self._parse_json(
|
info = self._parse_html5_media_entries(url, webpage, video_id)[0]
|
||||||
self._search_regex(
|
info.update({
|
||||||
r"(?s)jwplayer\('mp4Player'\).setup\((\{.*?\})\);", webpage, 'setup code'),
|
'id': video_id,
|
||||||
video_id, transform_source=js_to_json)
|
'title': get_element_by_class('overlayTitle', webpage),
|
||||||
|
'description': strip_or_none(get_element_by_class('overlayDescription', webpage)) or None,
|
||||||
info_dict = self._parse_jwplayer_data(jwplayer_data, video_id, require_title=False)
|
'duration': int_or_none(self._search_regex(
|
||||||
info_dict.update({
|
r'player\.duration\s*=\s*function\(\)\s*{\s*return\s+(\d+);\s*};',
|
||||||
'title': self._og_search_title(webpage),
|
webpage, 'duration', default=None)),
|
||||||
'description': self._og_search_description(webpage),
|
'upload_date': unified_strdate(remove_start(
|
||||||
|
get_element_by_class('overlayPublished', webpage), 'Published: ')),
|
||||||
})
|
})
|
||||||
return info_dict
|
return info
|
||||||
|
@ -51,13 +51,16 @@ class ShahidIE(ShahidBaseIE):
|
|||||||
_NETRC_MACHINE = 'shahid'
|
_NETRC_MACHINE = 'shahid'
|
||||||
_VALID_URL = r'https?://shahid\.mbc\.net/ar/(?:serie|show|movie)s/[^/]+/(?P<type>episode|clip|movie)-(?P<id>\d+)'
|
_VALID_URL = r'https?://shahid\.mbc\.net/ar/(?:serie|show|movie)s/[^/]+/(?P<type>episode|clip|movie)-(?P<id>\d+)'
|
||||||
_TESTS = [{
|
_TESTS = [{
|
||||||
'url': 'https://shahid.mbc.net/ar/shows/%D9%85%D8%AC%D9%84%D8%B3-%D8%A7%D9%84%D8%B4%D8%A8%D8%A7%D8%A8-%D8%A7%D9%84%D9%85%D9%88%D8%B3%D9%85-1-%D9%83%D9%84%D9%8A%D8%A8-1/clip-275286',
|
'url': 'https://shahid.mbc.net/ar/shows/%D9%85%D8%AA%D8%AD%D9%81-%D8%A7%D9%84%D8%AF%D8%AD%D9%8A%D8%AD-%D8%A7%D9%84%D9%85%D9%88%D8%B3%D9%85-1-%D9%83%D9%84%D9%8A%D8%A8-1/clip-816924',
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
'id': '275286',
|
'id': '816924',
|
||||||
'ext': 'mp4',
|
'ext': 'mp4',
|
||||||
'title': 'مجلس الشباب الموسم 1 كليب 1',
|
'title': 'متحف الدحيح الموسم 1 كليب 1',
|
||||||
'timestamp': 1506988800,
|
'timestamp': 1602806400,
|
||||||
'upload_date': '20171003',
|
'upload_date': '20201016',
|
||||||
|
'description': 'برومو',
|
||||||
|
'duration': 22,
|
||||||
|
'categories': ['كوميديا'],
|
||||||
},
|
},
|
||||||
'params': {
|
'params': {
|
||||||
# m3u8 download
|
# m3u8 download
|
||||||
@ -109,12 +112,15 @@ class ShahidIE(ShahidBaseIE):
|
|||||||
page_type = 'episode'
|
page_type = 'episode'
|
||||||
|
|
||||||
playout = self._call_api(
|
playout = self._call_api(
|
||||||
'playout/url/' + video_id, video_id)['playout']
|
'playout/new/url/' + video_id, video_id)['playout']
|
||||||
|
|
||||||
if playout.get('drm'):
|
if playout.get('drm'):
|
||||||
raise ExtractorError('This video is DRM protected.', expected=True)
|
raise ExtractorError('This video is DRM protected.', expected=True)
|
||||||
|
|
||||||
formats = self._extract_m3u8_formats(playout['url'], video_id, 'mp4')
|
formats = self._extract_m3u8_formats(re.sub(
|
||||||
|
# https://docs.aws.amazon.com/mediapackage/latest/ug/manifest-filtering.html
|
||||||
|
r'aws\.manifestfilter=[\w:;,-]+&?',
|
||||||
|
'', playout['url']), video_id, 'mp4')
|
||||||
self._sort_formats(formats)
|
self._sort_formats(formats)
|
||||||
|
|
||||||
# video = self._call_api(
|
# video = self._call_api(
|
||||||
|
@ -6,9 +6,9 @@ from .mtv import MTVServicesInfoExtractor
|
|||||||
|
|
||||||
class SouthParkIE(MTVServicesInfoExtractor):
|
class SouthParkIE(MTVServicesInfoExtractor):
|
||||||
IE_NAME = 'southpark.cc.com'
|
IE_NAME = 'southpark.cc.com'
|
||||||
_VALID_URL = r'https?://(?:www\.)?(?P<url>southpark\.cc\.com/(?:clips|(?:full-)?episodes|collections)/(?P<id>.+?)(\?|#|$))'
|
_VALID_URL = r'https?://(?:www\.)?(?P<url>southpark(?:\.cc|studios)\.com/(?:clips|(?:full-)?episodes|collections)/(?P<id>.+?)(\?|#|$))'
|
||||||
|
|
||||||
_FEED_URL = 'http://www.southparkstudios.com/feeds/video-player/mrss'
|
_FEED_URL = 'http://feeds.mtvnservices.com/od/feed/intl-mrss-player-feed'
|
||||||
|
|
||||||
_TESTS = [{
|
_TESTS = [{
|
||||||
'url': 'http://southpark.cc.com/clips/104437/bat-daded#tab=featured',
|
'url': 'http://southpark.cc.com/clips/104437/bat-daded#tab=featured',
|
||||||
@ -23,8 +23,20 @@ class SouthParkIE(MTVServicesInfoExtractor):
|
|||||||
}, {
|
}, {
|
||||||
'url': 'http://southpark.cc.com/collections/7758/fan-favorites/1',
|
'url': 'http://southpark.cc.com/collections/7758/fan-favorites/1',
|
||||||
'only_matching': True,
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://www.southparkstudios.com/episodes/h4o269/south-park-stunning-and-brave-season-19-ep-1',
|
||||||
|
'only_matching': True,
|
||||||
}]
|
}]
|
||||||
|
|
||||||
|
def _get_feed_query(self, uri):
|
||||||
|
return {
|
||||||
|
'accountOverride': 'intl.mtvi.com',
|
||||||
|
'arcEp': 'shared.southpark.global',
|
||||||
|
'ep': '90877963',
|
||||||
|
'imageEp': 'shared.southpark.global',
|
||||||
|
'mgid': uri,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
class SouthParkEsIE(SouthParkIE):
|
class SouthParkEsIE(SouthParkIE):
|
||||||
IE_NAME = 'southpark.cc.com:español'
|
IE_NAME = 'southpark.cc.com:español'
|
||||||
|
@ -1,82 +1,105 @@
|
|||||||
# coding: utf-8
|
# coding: utf-8
|
||||||
from __future__ import unicode_literals
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
import re
|
|
||||||
|
|
||||||
from .common import InfoExtractor
|
from .common import InfoExtractor
|
||||||
|
from ..compat import (
|
||||||
|
compat_parse_qs,
|
||||||
|
compat_urllib_parse_urlparse,
|
||||||
|
)
|
||||||
from ..utils import (
|
from ..utils import (
|
||||||
|
clean_html,
|
||||||
|
float_or_none,
|
||||||
|
int_or_none,
|
||||||
parse_iso8601,
|
parse_iso8601,
|
||||||
sanitized_Request,
|
strip_or_none,
|
||||||
|
try_get,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
class SportDeutschlandIE(InfoExtractor):
|
class SportDeutschlandIE(InfoExtractor):
|
||||||
_VALID_URL = r'https?://sportdeutschland\.tv/(?P<sport>[^/?#]+)/(?P<id>[^?#/]+)(?:$|[?#])'
|
_VALID_URL = r'https?://sportdeutschland\.tv/(?P<id>(?:[^/]+/)?[^?#/&]+)'
|
||||||
_TESTS = [{
|
_TESTS = [{
|
||||||
'url': 'https://sportdeutschland.tv/badminton/re-live-deutsche-meisterschaften-2020-halbfinals?playlistId=0',
|
'url': 'https://sportdeutschland.tv/badminton/re-live-deutsche-meisterschaften-2020-halbfinals?playlistId=0',
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
'id': 're-live-deutsche-meisterschaften-2020-halbfinals',
|
'id': '5318cac0275701382770543d7edaf0a0',
|
||||||
'ext': 'mp4',
|
'ext': 'mp4',
|
||||||
'title': 're:Re-live: Deutsche Meisterschaften 2020.*Halbfinals',
|
'title': 'Re-live: Deutsche Meisterschaften 2020 - Halbfinals - Teil 1',
|
||||||
'categories': ['Badminton-Deutschland'],
|
'duration': 16106.36,
|
||||||
'view_count': int,
|
|
||||||
'thumbnail': r're:^https?://.*\.(?:jpg|png)$',
|
|
||||||
'timestamp': int,
|
|
||||||
'upload_date': '20200201',
|
|
||||||
'description': 're:.*', # meaningless description for THIS video
|
|
||||||
},
|
},
|
||||||
|
'params': {
|
||||||
|
'noplaylist': True,
|
||||||
|
# m3u8 download
|
||||||
|
'skip_download': True,
|
||||||
|
},
|
||||||
|
}, {
|
||||||
|
'url': 'https://sportdeutschland.tv/badminton/re-live-deutsche-meisterschaften-2020-halbfinals?playlistId=0',
|
||||||
|
'info_dict': {
|
||||||
|
'id': 'c6e2fdd01f63013854c47054d2ab776f',
|
||||||
|
'title': 'Re-live: Deutsche Meisterschaften 2020 - Halbfinals',
|
||||||
|
'description': 'md5:5263ff4c31c04bb780c9f91130b48530',
|
||||||
|
'duration': 31397,
|
||||||
|
},
|
||||||
|
'playlist_count': 2,
|
||||||
|
}, {
|
||||||
|
'url': 'https://sportdeutschland.tv/freeride-world-tour-2021-fieberbrunn-oesterreich',
|
||||||
|
'only_matching': True,
|
||||||
}]
|
}]
|
||||||
|
|
||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
mobj = re.match(self._VALID_URL, url)
|
display_id = self._match_id(url)
|
||||||
video_id = mobj.group('id')
|
data = self._download_json(
|
||||||
sport_id = mobj.group('sport')
|
'https://backend.sportdeutschland.tv/api/permalinks/' + display_id,
|
||||||
|
display_id, query={'access_token': 'true'})
|
||||||
api_url = 'https://proxy.vidibusdynamic.net/ssl/backend.sportdeutschland.tv/api/permalinks/%s/%s?access_token=true' % (
|
|
||||||
sport_id, video_id)
|
|
||||||
req = sanitized_Request(api_url, headers={
|
|
||||||
'Accept': 'application/vnd.vidibus.v2.html+json',
|
|
||||||
'Referer': url,
|
|
||||||
})
|
|
||||||
data = self._download_json(req, video_id)
|
|
||||||
|
|
||||||
asset = data['asset']
|
asset = data['asset']
|
||||||
categories = [data['section']['title']]
|
title = (asset.get('title') or asset['label']).strip()
|
||||||
|
asset_id = asset.get('id') or asset.get('uuid')
|
||||||
formats = []
|
info = {
|
||||||
smil_url = asset['video']
|
'id': asset_id,
|
||||||
if '.smil' in smil_url:
|
'title': title,
|
||||||
m3u8_url = smil_url.replace('.smil', '.m3u8')
|
'description': clean_html(asset.get('body') or asset.get('description')) or asset.get('teaser'),
|
||||||
formats.extend(
|
'duration': int_or_none(asset.get('seconds')),
|
||||||
self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4'))
|
}
|
||||||
|
videos = asset.get('videos') or []
|
||||||
smil_doc = self._download_xml(
|
if len(videos) > 1:
|
||||||
smil_url, video_id, note='Downloading SMIL metadata')
|
playlist_id = compat_parse_qs(compat_urllib_parse_urlparse(url).query).get('playlistId', [None])[0]
|
||||||
base_url_el = smil_doc.find('./head/meta')
|
if playlist_id:
|
||||||
if base_url_el:
|
if self._downloader.params.get('noplaylist'):
|
||||||
base_url = base_url_el.attrib['base']
|
videos = [videos[int(playlist_id)]]
|
||||||
formats.extend([{
|
self.to_screen('Downloading just a single video because of --no-playlist')
|
||||||
'format_id': 'rmtp',
|
|
||||||
'url': base_url if base_url_el else n.attrib['src'],
|
|
||||||
'play_path': n.attrib['src'],
|
|
||||||
'ext': 'flv',
|
|
||||||
'preference': -100,
|
|
||||||
'format_note': 'Seems to fail at example stream',
|
|
||||||
} for n in smil_doc.findall('./body/video')])
|
|
||||||
else:
|
else:
|
||||||
formats.append({'url': smil_url})
|
self.to_screen('Downloading playlist %s - add --no-playlist to just download video' % asset_id)
|
||||||
|
|
||||||
self._sort_formats(formats)
|
def entries():
|
||||||
|
for i, video in enumerate(videos, 1):
|
||||||
return {
|
video_id = video.get('uuid')
|
||||||
|
video_url = video.get('url')
|
||||||
|
if not (video_id and video_url):
|
||||||
|
continue
|
||||||
|
formats = self._extract_m3u8_formats(
|
||||||
|
video_url.replace('.smil', '.m3u8'), video_id, 'mp4', fatal=False)
|
||||||
|
if not formats:
|
||||||
|
continue
|
||||||
|
yield {
|
||||||
'id': video_id,
|
'id': video_id,
|
||||||
'formats': formats,
|
'formats': formats,
|
||||||
'title': asset['title'],
|
'title': title + ' - ' + (video.get('label') or 'Teil %d' % i),
|
||||||
'thumbnail': asset.get('image'),
|
'duration': float_or_none(video.get('duration')),
|
||||||
'description': asset.get('teaser'),
|
|
||||||
'duration': asset.get('duration'),
|
|
||||||
'categories': categories,
|
|
||||||
'view_count': asset.get('views'),
|
|
||||||
'rtmp_live': asset.get('live'),
|
|
||||||
'timestamp': parse_iso8601(asset.get('date')),
|
|
||||||
}
|
}
|
||||||
|
info.update({
|
||||||
|
'_type': 'multi_video',
|
||||||
|
'entries': entries(),
|
||||||
|
})
|
||||||
|
else:
|
||||||
|
formats = self._extract_m3u8_formats(
|
||||||
|
videos[0]['url'].replace('.smil', '.m3u8'), asset_id, 'mp4')
|
||||||
|
section_title = strip_or_none(try_get(data, lambda x: x['section']['title']))
|
||||||
|
info.update({
|
||||||
|
'formats': formats,
|
||||||
|
'display_id': asset.get('permalink'),
|
||||||
|
'thumbnail': try_get(asset, lambda x: x['images'][0]),
|
||||||
|
'categories': [section_title] if section_title else None,
|
||||||
|
'view_count': int_or_none(asset.get('views')),
|
||||||
|
'is_live': asset.get('is_live') is True,
|
||||||
|
'timestamp': parse_iso8601(asset.get('date') or asset.get('published_at')),
|
||||||
|
})
|
||||||
|
return info
|
||||||
|
@ -4,16 +4,32 @@ from __future__ import unicode_literals
|
|||||||
import re
|
import re
|
||||||
|
|
||||||
from .common import InfoExtractor
|
from .common import InfoExtractor
|
||||||
from ..compat import compat_urllib_parse_urlparse
|
|
||||||
from ..utils import (
|
from ..utils import (
|
||||||
ExtractorError,
|
ExtractorError,
|
||||||
|
float_or_none,
|
||||||
|
int_or_none,
|
||||||
parse_iso8601,
|
parse_iso8601,
|
||||||
qualities,
|
qualities,
|
||||||
|
try_get,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
class SRGSSRIE(InfoExtractor):
|
class SRGSSRIE(InfoExtractor):
|
||||||
_VALID_URL = r'(?:https?://tp\.srgssr\.ch/p(?:/[^/]+)+\?urn=urn|srgssr):(?P<bu>srf|rts|rsi|rtr|swi):(?:[^:]+:)?(?P<type>video|audio):(?P<id>[0-9a-f\-]{36}|\d+)'
|
_VALID_URL = r'''(?x)
|
||||||
|
(?:
|
||||||
|
https?://tp\.srgssr\.ch/p(?:/[^/]+)+\?urn=urn|
|
||||||
|
srgssr
|
||||||
|
):
|
||||||
|
(?P<bu>
|
||||||
|
srf|rts|rsi|rtr|swi
|
||||||
|
):(?:[^:]+:)?
|
||||||
|
(?P<type>
|
||||||
|
video|audio
|
||||||
|
):
|
||||||
|
(?P<id>
|
||||||
|
[0-9a-f\-]{36}|\d+
|
||||||
|
)
|
||||||
|
'''
|
||||||
_GEO_BYPASS = False
|
_GEO_BYPASS = False
|
||||||
_GEO_COUNTRIES = ['CH']
|
_GEO_COUNTRIES = ['CH']
|
||||||
|
|
||||||
@ -25,25 +41,39 @@ class SRGSSRIE(InfoExtractor):
|
|||||||
'LEGAL': 'The video cannot be transmitted for legal reasons.',
|
'LEGAL': 'The video cannot be transmitted for legal reasons.',
|
||||||
'STARTDATE': 'This video is not yet available. Please try again later.',
|
'STARTDATE': 'This video is not yet available. Please try again later.',
|
||||||
}
|
}
|
||||||
|
_DEFAULT_LANGUAGE_CODES = {
|
||||||
|
'srf': 'de',
|
||||||
|
'rts': 'fr',
|
||||||
|
'rsi': 'it',
|
||||||
|
'rtr': 'rm',
|
||||||
|
'swi': 'en',
|
||||||
|
}
|
||||||
|
|
||||||
def _get_tokenized_src(self, url, video_id, format_id):
|
def _get_tokenized_src(self, url, video_id, format_id):
|
||||||
sp = compat_urllib_parse_urlparse(url).path.split('/')
|
|
||||||
token = self._download_json(
|
token = self._download_json(
|
||||||
'http://tp.srgssr.ch/akahd/token?acl=/%s/%s/*' % (sp[1], sp[2]),
|
'http://tp.srgssr.ch/akahd/token?acl=*',
|
||||||
video_id, 'Downloading %s token' % format_id, fatal=False) or {}
|
video_id, 'Downloading %s token' % format_id, fatal=False) or {}
|
||||||
auth_params = token.get('token', {}).get('authparams')
|
auth_params = try_get(token, lambda x: x['token']['authparams'])
|
||||||
if auth_params:
|
if auth_params:
|
||||||
url += '?' + auth_params
|
url += ('?' if '?' not in url else '&') + auth_params
|
||||||
return url
|
return url
|
||||||
|
|
||||||
def get_media_data(self, bu, media_type, media_id):
|
def _get_media_data(self, bu, media_type, media_id):
|
||||||
media_data = self._download_json(
|
query = {'onlyChapters': True} if media_type == 'video' else {}
|
||||||
'http://il.srgssr.ch/integrationlayer/1.0/ue/%s/%s/play/%s.json' % (bu, media_type, media_id),
|
full_media_data = self._download_json(
|
||||||
media_id)[media_type.capitalize()]
|
'https://il.srgssr.ch/integrationlayer/2.0/%s/mediaComposition/%s/%s.json'
|
||||||
|
% (bu, media_type, media_id),
|
||||||
|
media_id, query=query)['chapterList']
|
||||||
|
try:
|
||||||
|
media_data = next(
|
||||||
|
x for x in full_media_data if x.get('id') == media_id)
|
||||||
|
except StopIteration:
|
||||||
|
raise ExtractorError('No media information found')
|
||||||
|
|
||||||
if media_data.get('block') and media_data['block'] in self._ERRORS:
|
block_reason = media_data.get('blockReason')
|
||||||
message = self._ERRORS[media_data['block']]
|
if block_reason and block_reason in self._ERRORS:
|
||||||
if media_data['block'] == 'GEOBLOCK':
|
message = self._ERRORS[block_reason]
|
||||||
|
if block_reason == 'GEOBLOCK':
|
||||||
self.raise_geo_restricted(
|
self.raise_geo_restricted(
|
||||||
msg=message, countries=self._GEO_COUNTRIES)
|
msg=message, countries=self._GEO_COUNTRIES)
|
||||||
raise ExtractorError(
|
raise ExtractorError(
|
||||||
@ -53,53 +83,75 @@ class SRGSSRIE(InfoExtractor):
|
|||||||
|
|
||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
bu, media_type, media_id = re.match(self._VALID_URL, url).groups()
|
bu, media_type, media_id = re.match(self._VALID_URL, url).groups()
|
||||||
|
media_data = self._get_media_data(bu, media_type, media_id)
|
||||||
|
title = media_data['title']
|
||||||
|
|
||||||
media_data = self.get_media_data(bu, media_type, media_id)
|
|
||||||
|
|
||||||
metadata = media_data['AssetMetadatas']['AssetMetadata'][0]
|
|
||||||
title = metadata['title']
|
|
||||||
description = metadata.get('description')
|
|
||||||
created_date = media_data.get('createdDate') or metadata.get('createdDate')
|
|
||||||
timestamp = parse_iso8601(created_date)
|
|
||||||
|
|
||||||
thumbnails = [{
|
|
||||||
'id': image.get('id'),
|
|
||||||
'url': image['url'],
|
|
||||||
} for image in media_data.get('Image', {}).get('ImageRepresentations', {}).get('ImageRepresentation', [])]
|
|
||||||
|
|
||||||
preference = qualities(['LQ', 'MQ', 'SD', 'HQ', 'HD'])
|
|
||||||
formats = []
|
formats = []
|
||||||
for source in media_data.get('Playlists', {}).get('Playlist', []) + media_data.get('Downloads', {}).get('Download', []):
|
q = qualities(['SD', 'HD'])
|
||||||
protocol = source.get('@protocol')
|
for source in (media_data.get('resourceList') or []):
|
||||||
for asset in source['url']:
|
format_url = source.get('url')
|
||||||
asset_url = asset['text']
|
if not format_url:
|
||||||
quality = asset['@quality']
|
continue
|
||||||
format_id = '%s-%s' % (protocol, quality)
|
protocol = source.get('protocol')
|
||||||
if protocol.startswith('HTTP-HDS') or protocol.startswith('HTTP-HLS'):
|
quality = source.get('quality')
|
||||||
asset_url = self._get_tokenized_src(asset_url, media_id, format_id)
|
format_id = []
|
||||||
if protocol.startswith('HTTP-HDS'):
|
for e in (protocol, source.get('encoding'), quality):
|
||||||
formats.extend(self._extract_f4m_formats(
|
if e:
|
||||||
asset_url + ('?' if '?' not in asset_url else '&') + 'hdcore=3.4.0',
|
format_id.append(e)
|
||||||
media_id, f4m_id=format_id, fatal=False))
|
format_id = '-'.join(format_id)
|
||||||
elif protocol.startswith('HTTP-HLS'):
|
|
||||||
|
if protocol in ('HDS', 'HLS'):
|
||||||
|
if source.get('tokenType') == 'AKAMAI':
|
||||||
|
format_url = self._get_tokenized_src(
|
||||||
|
format_url, media_id, format_id)
|
||||||
|
formats.extend(self._extract_akamai_formats(
|
||||||
|
format_url, media_id))
|
||||||
|
elif protocol == 'HLS':
|
||||||
formats.extend(self._extract_m3u8_formats(
|
formats.extend(self._extract_m3u8_formats(
|
||||||
asset_url, media_id, 'mp4', 'm3u8_native',
|
format_url, media_id, 'mp4', 'm3u8_native',
|
||||||
m3u8_id=format_id, fatal=False))
|
m3u8_id=format_id, fatal=False))
|
||||||
else:
|
elif protocol in ('HTTP', 'HTTPS'):
|
||||||
formats.append({
|
formats.append({
|
||||||
'format_id': format_id,
|
'format_id': format_id,
|
||||||
'url': asset_url,
|
'url': format_url,
|
||||||
'preference': preference(quality),
|
'quality': q(quality),
|
||||||
'ext': 'flv' if protocol == 'RTMP' else None,
|
})
|
||||||
|
|
||||||
|
# This is needed because for audio medias the podcast url is usually
|
||||||
|
# always included, even if is only an audio segment and not the
|
||||||
|
# whole episode.
|
||||||
|
if int_or_none(media_data.get('position')) == 0:
|
||||||
|
for p in ('S', 'H'):
|
||||||
|
podcast_url = media_data.get('podcast%sdUrl' % p)
|
||||||
|
if not podcast_url:
|
||||||
|
continue
|
||||||
|
quality = p + 'D'
|
||||||
|
formats.append({
|
||||||
|
'format_id': 'PODCAST-' + quality,
|
||||||
|
'url': podcast_url,
|
||||||
|
'quality': q(quality),
|
||||||
})
|
})
|
||||||
self._sort_formats(formats)
|
self._sort_formats(formats)
|
||||||
|
|
||||||
|
subtitles = {}
|
||||||
|
if media_type == 'video':
|
||||||
|
for sub in (media_data.get('subtitleList') or []):
|
||||||
|
sub_url = sub.get('url')
|
||||||
|
if not sub_url:
|
||||||
|
continue
|
||||||
|
lang = sub.get('locale') or self._DEFAULT_LANGUAGE_CODES[bu]
|
||||||
|
subtitles.setdefault(lang, []).append({
|
||||||
|
'url': sub_url,
|
||||||
|
})
|
||||||
|
|
||||||
return {
|
return {
|
||||||
'id': media_id,
|
'id': media_id,
|
||||||
'title': title,
|
'title': title,
|
||||||
'description': description,
|
'description': media_data.get('description'),
|
||||||
'timestamp': timestamp,
|
'timestamp': parse_iso8601(media_data.get('date')),
|
||||||
'thumbnails': thumbnails,
|
'thumbnail': media_data.get('imageUrl'),
|
||||||
|
'duration': float_or_none(media_data.get('duration'), 1000),
|
||||||
|
'subtitles': subtitles,
|
||||||
'formats': formats,
|
'formats': formats,
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -119,26 +171,17 @@ class SRGSSRPlayIE(InfoExtractor):
|
|||||||
|
|
||||||
_TESTS = [{
|
_TESTS = [{
|
||||||
'url': 'http://www.srf.ch/play/tv/10vor10/video/snowden-beantragt-asyl-in-russland?id=28e1a57d-5b76-4399-8ab3-9097f071e6c5',
|
'url': 'http://www.srf.ch/play/tv/10vor10/video/snowden-beantragt-asyl-in-russland?id=28e1a57d-5b76-4399-8ab3-9097f071e6c5',
|
||||||
'md5': 'da6b5b3ac9fa4761a942331cef20fcb3',
|
'md5': '6db2226ba97f62ad42ce09783680046c',
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
'id': '28e1a57d-5b76-4399-8ab3-9097f071e6c5',
|
'id': '28e1a57d-5b76-4399-8ab3-9097f071e6c5',
|
||||||
'ext': 'mp4',
|
'ext': 'mp4',
|
||||||
'upload_date': '20130701',
|
'upload_date': '20130701',
|
||||||
'title': 'Snowden beantragt Asyl in Russland',
|
'title': 'Snowden beantragt Asyl in Russland',
|
||||||
'timestamp': 1372713995,
|
'timestamp': 1372708215,
|
||||||
}
|
'duration': 113.827,
|
||||||
}, {
|
'thumbnail': r're:^https?://.*1383719781\.png$',
|
||||||
# No Speichern (Save) button
|
|
||||||
'url': 'http://www.srf.ch/play/tv/top-gear/video/jaguar-xk120-shadow-und-tornado-dampflokomotive?id=677f5829-e473-4823-ac83-a1087fe97faa',
|
|
||||||
'md5': '0a274ce38fda48c53c01890651985bc6',
|
|
||||||
'info_dict': {
|
|
||||||
'id': '677f5829-e473-4823-ac83-a1087fe97faa',
|
|
||||||
'ext': 'flv',
|
|
||||||
'upload_date': '20130710',
|
|
||||||
'title': 'Jaguar XK120, Shadow und Tornado-Dampflokomotive',
|
|
||||||
'description': 'md5:88604432b60d5a38787f152dec89cd56',
|
|
||||||
'timestamp': 1373493600,
|
|
||||||
},
|
},
|
||||||
|
'expected_warnings': ['Unable to download f4m manifest'],
|
||||||
}, {
|
}, {
|
||||||
'url': 'http://www.rtr.ch/play/radio/actualitad/audio/saira-tujetsch-tuttina-cuntinuar-cun-sedrun-muster-turissem?id=63cb0778-27f8-49af-9284-8c7a8c6d15fc',
|
'url': 'http://www.rtr.ch/play/radio/actualitad/audio/saira-tujetsch-tuttina-cuntinuar-cun-sedrun-muster-turissem?id=63cb0778-27f8-49af-9284-8c7a8c6d15fc',
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
@ -146,7 +189,8 @@ class SRGSSRPlayIE(InfoExtractor):
|
|||||||
'ext': 'mp3',
|
'ext': 'mp3',
|
||||||
'upload_date': '20151013',
|
'upload_date': '20151013',
|
||||||
'title': 'Saira: Tujetsch - tuttina cuntinuar cun Sedrun Mustér Turissem',
|
'title': 'Saira: Tujetsch - tuttina cuntinuar cun Sedrun Mustér Turissem',
|
||||||
'timestamp': 1444750398,
|
'timestamp': 1444709160,
|
||||||
|
'duration': 336.816,
|
||||||
},
|
},
|
||||||
'params': {
|
'params': {
|
||||||
# rtmp download
|
# rtmp download
|
||||||
@ -159,19 +203,32 @@ class SRGSSRPlayIE(InfoExtractor):
|
|||||||
'id': '6348260',
|
'id': '6348260',
|
||||||
'display_id': '6348260',
|
'display_id': '6348260',
|
||||||
'ext': 'mp4',
|
'ext': 'mp4',
|
||||||
'duration': 1796,
|
'duration': 1796.76,
|
||||||
'title': 'Le 19h30',
|
'title': 'Le 19h30',
|
||||||
'description': '',
|
|
||||||
'uploader': '19h30',
|
|
||||||
'upload_date': '20141201',
|
'upload_date': '20141201',
|
||||||
'timestamp': 1417458600,
|
'timestamp': 1417458600,
|
||||||
'thumbnail': r're:^https?://.*\.image',
|
'thumbnail': r're:^https?://.*\.image',
|
||||||
'view_count': int,
|
|
||||||
},
|
},
|
||||||
'params': {
|
'params': {
|
||||||
# m3u8 download
|
# m3u8 download
|
||||||
'skip_download': True,
|
'skip_download': True,
|
||||||
}
|
}
|
||||||
|
}, {
|
||||||
|
'url': 'http://play.swissinfo.ch/play/tv/business/video/why-people-were-against-tax-reforms?id=42960270',
|
||||||
|
'info_dict': {
|
||||||
|
'id': '42960270',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'title': 'Why people were against tax reforms',
|
||||||
|
'description': 'md5:7ac442c558e9630e947427469c4b824d',
|
||||||
|
'duration': 94.0,
|
||||||
|
'upload_date': '20170215',
|
||||||
|
'timestamp': 1487173560,
|
||||||
|
'thumbnail': r're:https?://www\.swissinfo\.ch/srgscalableimage/42961964',
|
||||||
|
'subtitles': 'count:9',
|
||||||
|
},
|
||||||
|
'params': {
|
||||||
|
'skip_download': True,
|
||||||
|
}
|
||||||
}, {
|
}, {
|
||||||
'url': 'https://www.srf.ch/play/tv/popupvideoplayer?id=c4dba0ca-e75b-43b2-a34f-f708a4932e01',
|
'url': 'https://www.srf.ch/play/tv/popupvideoplayer?id=c4dba0ca-e75b-43b2-a34f-f708a4932e01',
|
||||||
'only_matching': True,
|
'only_matching': True,
|
||||||
@ -181,6 +238,10 @@ class SRGSSRPlayIE(InfoExtractor):
|
|||||||
}, {
|
}, {
|
||||||
'url': 'https://www.rts.ch/play/tv/19h30/video/le-19h30?urn=urn:rts:video:6348260',
|
'url': 'https://www.rts.ch/play/tv/19h30/video/le-19h30?urn=urn:rts:video:6348260',
|
||||||
'only_matching': True,
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
# audio segment, has podcastSdUrl of the full episode
|
||||||
|
'url': 'https://www.srf.ch/play/radio/popupaudioplayer?id=50b20dc8-f05b-4972-bf03-e438ff2833eb',
|
||||||
|
'only_matching': True,
|
||||||
}]
|
}]
|
||||||
|
|
||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
@ -188,5 +249,4 @@ class SRGSSRPlayIE(InfoExtractor):
|
|||||||
bu = mobj.group('bu')
|
bu = mobj.group('bu')
|
||||||
media_type = mobj.group('type') or mobj.group('type_2')
|
media_type = mobj.group('type') or mobj.group('type_2')
|
||||||
media_id = mobj.group('id')
|
media_id = mobj.group('id')
|
||||||
# other info can be extracted from url + '&layout=json'
|
|
||||||
return self.url_result('srgssr:%s:%s:%s' % (bu[:3], media_type, media_id), 'SRGSSR')
|
return self.url_result('srgssr:%s:%s:%s' % (bu[:3], media_type, media_id), 'SRGSSR')
|
||||||
|
@ -1,7 +1,6 @@
|
|||||||
from __future__ import unicode_literals
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
from .common import InfoExtractor
|
from .common import InfoExtractor
|
||||||
from ..utils import int_or_none
|
|
||||||
|
|
||||||
|
|
||||||
class StretchInternetIE(InfoExtractor):
|
class StretchInternetIE(InfoExtractor):
|
||||||
@ -11,22 +10,28 @@ class StretchInternetIE(InfoExtractor):
|
|||||||
'info_dict': {
|
'info_dict': {
|
||||||
'id': '573272',
|
'id': '573272',
|
||||||
'ext': 'mp4',
|
'ext': 'mp4',
|
||||||
'title': 'University of Mary Wrestling vs. Upper Iowa',
|
'title': 'UNIVERSITY OF MARY WRESTLING VS UPPER IOWA',
|
||||||
'timestamp': 1575668361,
|
# 'timestamp': 1575668361,
|
||||||
'upload_date': '20191206',
|
# 'upload_date': '20191206',
|
||||||
|
'uploader_id': '99997',
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
video_id = self._match_id(url)
|
video_id = self._match_id(url)
|
||||||
|
|
||||||
|
media_url = self._download_json(
|
||||||
|
'https://core.stretchlive.com/trinity/event/tcg/' + video_id,
|
||||||
|
video_id)[0]['media'][0]['url']
|
||||||
event = self._download_json(
|
event = self._download_json(
|
||||||
'https://api.stretchinternet.com/trinity/event/tcg/' + video_id,
|
'https://neo-client.stretchinternet.com/portal-ws/getEvent.json',
|
||||||
video_id)[0]
|
video_id, query={'eventID': video_id, 'token': 'asdf'})['event']
|
||||||
|
|
||||||
return {
|
return {
|
||||||
'id': video_id,
|
'id': video_id,
|
||||||
'title': event['title'],
|
'title': event['title'],
|
||||||
'timestamp': int_or_none(event.get('dateCreated'), 1000),
|
# TODO: parse US timezone abbreviations
|
||||||
'url': 'https://' + event['media'][0]['url'],
|
# 'timestamp': event.get('dateTimeString'),
|
||||||
|
'url': 'https://' + media_url,
|
||||||
|
'uploader_id': event.get('ownerID'),
|
||||||
}
|
}
|
||||||
|
@ -146,18 +146,19 @@ class SVTPlayIE(SVTPlayBaseIE):
|
|||||||
)
|
)
|
||||||
(?P<svt_id>[^/?#&]+)|
|
(?P<svt_id>[^/?#&]+)|
|
||||||
https?://(?:www\.)?(?:svtplay|oppetarkiv)\.se/(?:video|klipp|kanaler)/(?P<id>[^/?#&]+)
|
https?://(?:www\.)?(?:svtplay|oppetarkiv)\.se/(?:video|klipp|kanaler)/(?P<id>[^/?#&]+)
|
||||||
|
(?:.*?(?:modalId|id)=(?P<modal_id>[\da-zA-Z-]+))?
|
||||||
)
|
)
|
||||||
'''
|
'''
|
||||||
_TESTS = [{
|
_TESTS = [{
|
||||||
'url': 'https://www.svtplay.se/video/26194546/det-har-ar-himlen',
|
'url': 'https://www.svtplay.se/video/30479064',
|
||||||
'md5': '2382036fd6f8c994856c323fe51c426e',
|
'md5': '2382036fd6f8c994856c323fe51c426e',
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
'id': 'jNwpV9P',
|
'id': '8zVbDPA',
|
||||||
'ext': 'mp4',
|
'ext': 'mp4',
|
||||||
'title': 'Det här är himlen',
|
'title': 'Designdrömmar i Stenungsund',
|
||||||
'timestamp': 1586044800,
|
'timestamp': 1615770000,
|
||||||
'upload_date': '20200405',
|
'upload_date': '20210315',
|
||||||
'duration': 3515,
|
'duration': 3519,
|
||||||
'thumbnail': r're:^https?://(?:.*[\.-]jpg|www.svtstatic.se/image/.*)$',
|
'thumbnail': r're:^https?://(?:.*[\.-]jpg|www.svtstatic.se/image/.*)$',
|
||||||
'age_limit': 0,
|
'age_limit': 0,
|
||||||
'subtitles': {
|
'subtitles': {
|
||||||
@ -173,6 +174,12 @@ class SVTPlayIE(SVTPlayBaseIE):
|
|||||||
# AssertionError: Expected test_SVTPlay_jNwpV9P.mp4 to be at least 9.77KiB, but it's only 864.00B
|
# AssertionError: Expected test_SVTPlay_jNwpV9P.mp4 to be at least 9.77KiB, but it's only 864.00B
|
||||||
'skip_download': True,
|
'skip_download': True,
|
||||||
},
|
},
|
||||||
|
}, {
|
||||||
|
'url': 'https://www.svtplay.se/video/30479064/husdrommar/husdrommar-sasong-8-designdrommar-i-stenungsund?modalId=8zVbDPA',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://www.svtplay.se/video/30684086/rapport/rapport-24-apr-18-00-7?id=e72gVpa',
|
||||||
|
'only_matching': True,
|
||||||
}, {
|
}, {
|
||||||
# geo restricted to Sweden
|
# geo restricted to Sweden
|
||||||
'url': 'http://www.oppetarkiv.se/video/5219710/trollflojten',
|
'url': 'http://www.oppetarkiv.se/video/5219710/trollflojten',
|
||||||
@ -219,7 +226,8 @@ class SVTPlayIE(SVTPlayBaseIE):
|
|||||||
|
|
||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
mobj = re.match(self._VALID_URL, url)
|
mobj = re.match(self._VALID_URL, url)
|
||||||
video_id, svt_id = mobj.group('id', 'svt_id')
|
video_id = mobj.group('id')
|
||||||
|
svt_id = mobj.group('svt_id') or mobj.group('modal_id')
|
||||||
|
|
||||||
if svt_id:
|
if svt_id:
|
||||||
return self._extract_by_video_id(svt_id)
|
return self._extract_by_video_id(svt_id)
|
||||||
@ -254,6 +262,7 @@ class SVTPlayIE(SVTPlayBaseIE):
|
|||||||
if not svt_id:
|
if not svt_id:
|
||||||
svt_id = self._search_regex(
|
svt_id = self._search_regex(
|
||||||
(r'<video[^>]+data-video-id=["\']([\da-zA-Z-]+)',
|
(r'<video[^>]+data-video-id=["\']([\da-zA-Z-]+)',
|
||||||
|
r'<[^>]+\bdata-rt=["\']top-area-play-button["\'][^>]+\bhref=["\'][^"\']*video/%s/[^"\']*\b(?:modalId|id)=([\da-zA-Z-]+)' % re.escape(video_id),
|
||||||
r'["\']videoSvtId["\']\s*:\s*["\']([\da-zA-Z-]+)',
|
r'["\']videoSvtId["\']\s*:\s*["\']([\da-zA-Z-]+)',
|
||||||
r'["\']videoSvtId\\?["\']\s*:\s*\\?["\']([\da-zA-Z-]+)',
|
r'["\']videoSvtId\\?["\']\s*:\s*\\?["\']([\da-zA-Z-]+)',
|
||||||
r'"content"\s*:\s*{.*?"id"\s*:\s*"([\da-zA-Z-]+)"',
|
r'"content"\s*:\s*{.*?"id"\s*:\s*"([\da-zA-Z-]+)"',
|
||||||
|
@ -1,92 +1,87 @@
|
|||||||
# coding: utf-8
|
# coding: utf-8
|
||||||
from __future__ import unicode_literals
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
|
import json
|
||||||
|
import re
|
||||||
|
|
||||||
from .common import InfoExtractor
|
from .common import InfoExtractor
|
||||||
from ..compat import compat_str
|
from ..utils import (
|
||||||
|
int_or_none,
|
||||||
|
parse_iso8601,
|
||||||
|
try_get,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
class TF1IE(InfoExtractor):
|
class TF1IE(InfoExtractor):
|
||||||
"""TF1 uses the wat.tv player."""
|
_VALID_URL = r'https?://(?:www\.)?tf1\.fr/[^/]+/(?P<program_slug>[^/]+)/videos/(?P<id>[^/?&#]+)\.html'
|
||||||
_VALID_URL = r'https?://(?:(?:videos|www|lci)\.tf1|(?:www\.)?(?:tfou|ushuaiatv|histoire|tvbreizh))\.fr/(?:[^/]+/)*(?P<id>[^/?#.]+)'
|
|
||||||
_TESTS = [{
|
_TESTS = [{
|
||||||
'url': 'http://videos.tf1.fr/auto-moto/citroen-grand-c4-picasso-2013-presentation-officielle-8062060.html',
|
|
||||||
'info_dict': {
|
|
||||||
'id': '10635995',
|
|
||||||
'ext': 'mp4',
|
|
||||||
'title': 'Citroën Grand C4 Picasso 2013 : présentation officielle',
|
|
||||||
'description': 'Vidéo officielle du nouveau Citroën Grand C4 Picasso, lancé à l\'automne 2013.',
|
|
||||||
},
|
|
||||||
'params': {
|
|
||||||
# Sometimes wat serves the whole file with the --test option
|
|
||||||
'skip_download': True,
|
|
||||||
},
|
|
||||||
'expected_warnings': ['HTTP Error 404'],
|
|
||||||
}, {
|
|
||||||
'url': 'http://www.tfou.fr/chuggington/videos/le-grand-mysterioso-chuggington-7085291-739.html',
|
|
||||||
'info_dict': {
|
|
||||||
'id': 'le-grand-mysterioso-chuggington-7085291-739',
|
|
||||||
'ext': 'mp4',
|
|
||||||
'title': 'Le grand Mystérioso - Chuggington',
|
|
||||||
'description': 'Le grand Mystérioso - Emery rêve qu\'un article lui soit consacré dans le journal.',
|
|
||||||
'upload_date': '20150103',
|
|
||||||
},
|
|
||||||
'params': {
|
|
||||||
# Sometimes wat serves the whole file with the --test option
|
|
||||||
'skip_download': True,
|
|
||||||
},
|
|
||||||
'skip': 'HTTP Error 410: Gone',
|
|
||||||
}, {
|
|
||||||
'url': 'http://www.tf1.fr/tf1/koh-lanta/videos/replay-koh-lanta-22-mai-2015.html',
|
|
||||||
'only_matching': True,
|
|
||||||
}, {
|
|
||||||
'url': 'http://lci.tf1.fr/sept-a-huit/videos/sept-a-huit-du-24-mai-2015-8611550.html',
|
|
||||||
'only_matching': True,
|
|
||||||
}, {
|
|
||||||
'url': 'http://www.tf1.fr/hd1/documentaire/videos/mylene-farmer-d-une-icone.html',
|
|
||||||
'only_matching': True,
|
|
||||||
}, {
|
|
||||||
'url': 'https://www.tf1.fr/tmc/quotidien-avec-yann-barthes/videos/quotidien-premiere-partie-11-juin-2019.html',
|
'url': 'https://www.tf1.fr/tmc/quotidien-avec-yann-barthes/videos/quotidien-premiere-partie-11-juin-2019.html',
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
'id': '13641379',
|
'id': '13641379',
|
||||||
'ext': 'mp4',
|
'ext': 'mp4',
|
||||||
'title': 'md5:f392bc52245dc5ad43771650c96fb620',
|
'title': 'md5:f392bc52245dc5ad43771650c96fb620',
|
||||||
'description': 'md5:44bc54f0a21322f5b91d68e76a544eae',
|
'description': 'md5:a02cdb217141fb2d469d6216339b052f',
|
||||||
'upload_date': '20190611',
|
'upload_date': '20190611',
|
||||||
|
'timestamp': 1560273989,
|
||||||
|
'duration': 1738,
|
||||||
|
'series': 'Quotidien avec Yann Barthès',
|
||||||
|
'tags': ['intégrale', 'quotidien', 'Replay'],
|
||||||
},
|
},
|
||||||
'params': {
|
'params': {
|
||||||
# Sometimes wat serves the whole file with the --test option
|
# Sometimes wat serves the whole file with the --test option
|
||||||
'skip_download': True,
|
'skip_download': True,
|
||||||
|
'format': 'bestvideo',
|
||||||
},
|
},
|
||||||
|
}, {
|
||||||
|
'url': 'http://www.tf1.fr/tf1/koh-lanta/videos/replay-koh-lanta-22-mai-2015.html',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'http://www.tf1.fr/hd1/documentaire/videos/mylene-farmer-d-une-icone.html',
|
||||||
|
'only_matching': True,
|
||||||
}]
|
}]
|
||||||
|
|
||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
video_id = self._match_id(url)
|
program_slug, slug = re.match(self._VALID_URL, url).groups()
|
||||||
|
video = self._download_json(
|
||||||
|
'https://www.tf1.fr/graphql/web', slug, query={
|
||||||
|
'id': '9b80783950b85247541dd1d851f9cc7fa36574af015621f853ab111a679ce26f',
|
||||||
|
'variables': json.dumps({
|
||||||
|
'programSlug': program_slug,
|
||||||
|
'slug': slug,
|
||||||
|
})
|
||||||
|
})['data']['videoBySlug']
|
||||||
|
wat_id = video['streamId']
|
||||||
|
|
||||||
webpage = self._download_webpage(url, video_id)
|
tags = []
|
||||||
|
for tag in (video.get('tags') or []):
|
||||||
|
label = tag.get('label')
|
||||||
|
if not label:
|
||||||
|
continue
|
||||||
|
tags.append(label)
|
||||||
|
|
||||||
wat_id = None
|
decoration = video.get('decoration') or {}
|
||||||
|
|
||||||
data = self._parse_json(
|
thumbnails = []
|
||||||
self._search_regex(
|
for source in (try_get(decoration, lambda x: x['image']['sources'], list) or []):
|
||||||
r'__APOLLO_STATE__\s*=\s*({.+?})\s*(?:;|</script>)', webpage,
|
source_url = source.get('url')
|
||||||
'data', default='{}'), video_id, fatal=False)
|
if not source_url:
|
||||||
|
continue
|
||||||
|
thumbnails.append({
|
||||||
|
'url': source_url,
|
||||||
|
'width': int_or_none(source.get('width')),
|
||||||
|
})
|
||||||
|
|
||||||
if data:
|
return {
|
||||||
try:
|
'_type': 'url_transparent',
|
||||||
wat_id = next(
|
'id': wat_id,
|
||||||
video.get('streamId')
|
'url': 'wat:' + wat_id,
|
||||||
for key, video in data.items()
|
'title': video.get('title'),
|
||||||
if isinstance(video, dict)
|
'thumbnails': thumbnails,
|
||||||
and video.get('slug') == video_id)
|
'description': decoration.get('description'),
|
||||||
if not isinstance(wat_id, compat_str) or not wat_id.isdigit():
|
'timestamp': parse_iso8601(video.get('date')),
|
||||||
wat_id = None
|
'duration': int_or_none(try_get(video, lambda x: x['publicPlayingInfos']['duration'])),
|
||||||
except StopIteration:
|
'tags': tags,
|
||||||
pass
|
'series': decoration.get('programLabel'),
|
||||||
|
'season_number': int_or_none(video.get('season')),
|
||||||
if not wat_id:
|
'episode_number': int_or_none(video.get('episode')),
|
||||||
wat_id = self._html_search_regex(
|
}
|
||||||
(r'(["\'])(?:https?:)?//www\.wat\.tv/embedframe/.*?(?P<id>\d{8})\1',
|
|
||||||
r'(["\']?)streamId\1\s*:\s*(["\']?)(?P<id>\d+)\2'),
|
|
||||||
webpage, 'wat id', group='id')
|
|
||||||
|
|
||||||
return self.url_result('wat:%s' % wat_id, 'Wat')
|
|
||||||
|
@ -107,9 +107,12 @@ class TikTokIE(TikTokBaseIE):
|
|||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
video_id = self._match_id(url)
|
video_id = self._match_id(url)
|
||||||
webpage = self._download_webpage(url, video_id)
|
webpage = self._download_webpage(url, video_id)
|
||||||
data = self._parse_json(self._search_regex(
|
page_props = self._parse_json(self._search_regex(
|
||||||
r'<script[^>]+\bid=["\']__NEXT_DATA__[^>]+>\s*({.+?})\s*</script',
|
r'<script[^>]+\bid=["\']__NEXT_DATA__[^>]+>\s*({.+?})\s*</script',
|
||||||
webpage, 'data'), video_id)['props']['pageProps']['itemInfo']['itemStruct']
|
webpage, 'data'), video_id)['props']['pageProps']
|
||||||
|
data = try_get(page_props, lambda x: x['itemInfo']['itemStruct'], dict)
|
||||||
|
if not data and page_props.get('statusCode') == 10216:
|
||||||
|
raise ExtractorError('This video is private', expected=True)
|
||||||
return self._extract_video(data, video_id)
|
return self._extract_video(data, video_id)
|
||||||
|
|
||||||
|
|
||||||
|
@ -2,55 +2,110 @@
|
|||||||
from __future__ import unicode_literals
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
from .common import InfoExtractor
|
from .common import InfoExtractor
|
||||||
|
from .jwplatform import JWPlatformIE
|
||||||
|
from .kaltura import KalturaIE
|
||||||
|
from ..utils import (
|
||||||
|
int_or_none,
|
||||||
|
unified_timestamp,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
class TMZIE(InfoExtractor):
|
class TMZIE(InfoExtractor):
|
||||||
_VALID_URL = r'https?://(?:www\.)?tmz\.com/videos/(?P<id>[^/?#]+)'
|
_VALID_URL = r'https?://(?:www\.)?tmz\.com/videos/(?P<id>[^/?#&]+)'
|
||||||
_TESTS = [{
|
_TESTS = [{
|
||||||
'url': 'http://www.tmz.com/videos/0_okj015ty/',
|
|
||||||
'md5': '4d22a51ef205b6c06395d8394f72d560',
|
|
||||||
'info_dict': {
|
|
||||||
'id': '0_okj015ty',
|
|
||||||
'ext': 'mp4',
|
|
||||||
'title': 'Kim Kardashian\'s Boobs Unlock a Mystery!',
|
|
||||||
'description': 'Did Kim Kardasain try to one-up Khloe by one-upping Kylie??? Or is she just showing off her amazing boobs?',
|
|
||||||
'timestamp': 1394747163,
|
|
||||||
'uploader_id': 'batchUser',
|
|
||||||
'upload_date': '20140313',
|
|
||||||
}
|
|
||||||
}, {
|
|
||||||
'url': 'http://www.tmz.com/videos/0-cegprt2p/',
|
'url': 'http://www.tmz.com/videos/0-cegprt2p/',
|
||||||
|
'md5': '31f9223e20eef55954973359afa61a20',
|
||||||
|
'info_dict': {
|
||||||
|
'id': 'P6YjLBLk',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'title': "No Charges Against Hillary Clinton? Harvey Says It Ain't Over Yet",
|
||||||
|
'description': 'md5:b714359fc18607715ebccbd2da8ff488',
|
||||||
|
'timestamp': 1467831837,
|
||||||
|
'upload_date': '20160706',
|
||||||
|
},
|
||||||
|
'add_ie': [JWPlatformIE.ie_key()],
|
||||||
|
}, {
|
||||||
|
'url': 'http://www.tmz.com/videos/0_okj015ty/',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://www.tmz.com/videos/071119-chris-morgan-women-4590005-0-zcsejvcr/',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://www.tmz.com/videos/2021-02-19-021921-floyd-mayweather-1043872/',
|
||||||
'only_matching': True,
|
'only_matching': True,
|
||||||
}]
|
}]
|
||||||
|
|
||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
video_id = self._match_id(url).replace('-', '_')
|
video_id = self._match_id(url).replace('-', '_')
|
||||||
return self.url_result('kaltura:591531:%s' % video_id, 'Kaltura', video_id)
|
|
||||||
|
webpage = self._download_webpage(url, video_id, fatal=False)
|
||||||
|
if webpage:
|
||||||
|
tmz_video_id = self._search_regex(
|
||||||
|
r'nodeRef\s*:\s*["\']tmz:video:([\da-fA-F]{8}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{12})',
|
||||||
|
webpage, 'video id', default=None)
|
||||||
|
video = self._download_json(
|
||||||
|
'https://www.tmz.com/_/video/%s' % tmz_video_id, video_id,
|
||||||
|
fatal=False)
|
||||||
|
if video:
|
||||||
|
message = video['message']
|
||||||
|
info = {
|
||||||
|
'_type': 'url_transparent',
|
||||||
|
'title': message.get('title'),
|
||||||
|
'description': message.get('description'),
|
||||||
|
'timestamp': unified_timestamp(message.get('published_at')),
|
||||||
|
'duration': int_or_none(message.get('duration')),
|
||||||
|
}
|
||||||
|
jwplatform_id = message.get('jwplayer_media_id')
|
||||||
|
if jwplatform_id:
|
||||||
|
info.update({
|
||||||
|
'url': 'jwplatform:%s' % jwplatform_id,
|
||||||
|
'ie_key': JWPlatformIE.ie_key(),
|
||||||
|
})
|
||||||
|
else:
|
||||||
|
kaltura_entry_id = message.get('kaltura_entry_id') or video_id
|
||||||
|
kaltura_partner_id = message.get('kaltura_partner_id') or '591531'
|
||||||
|
info.update({
|
||||||
|
'url': 'kaltura:%s:%s' % (kaltura_partner_id, kaltura_entry_id),
|
||||||
|
'ie_key': KalturaIE.ie_key(),
|
||||||
|
})
|
||||||
|
return info
|
||||||
|
|
||||||
|
return self.url_result(
|
||||||
|
'kaltura:591531:%s' % video_id, KalturaIE.ie_key(), video_id)
|
||||||
|
|
||||||
|
|
||||||
class TMZArticleIE(InfoExtractor):
|
class TMZArticleIE(InfoExtractor):
|
||||||
_VALID_URL = r'https?://(?:www\.)?tmz\.com/\d{4}/\d{2}/\d{2}/(?P<id>[^/]+)/?'
|
_VALID_URL = r'https?://(?:www\.)?tmz\.com/\d{4}/\d{2}/\d{2}/(?P<id>[^/?#&]+)'
|
||||||
_TEST = {
|
_TEST = {
|
||||||
'url': 'http://www.tmz.com/2015/04/19/bobby-brown-bobbi-kristina-awake-video-concert',
|
'url': 'http://www.tmz.com/2015/04/19/bobby-brown-bobbi-kristina-awake-video-concert',
|
||||||
'md5': '3316ff838ae5bb7f642537825e1e90d2',
|
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
'id': '0_6snoelag',
|
'id': 'PAKZa97W',
|
||||||
'ext': 'mov',
|
'ext': 'mp4',
|
||||||
'title': 'Bobby Brown Tells Crowd ... Bobbi Kristina is Awake',
|
'title': 'Bobby Brown Tells Crowd ... Bobbi Kristina is Awake',
|
||||||
'description': 'Bobby Brown stunned his audience during a concert Saturday night, when he told the crowd, "Bobbi is awake. She\'s watching me."',
|
'description': 'Bobby Brown stunned his audience during a concert Saturday night, when he told the crowd, "Bobbi is awake. She\'s watching me."',
|
||||||
'timestamp': 1429467813,
|
'timestamp': 1429466400,
|
||||||
'upload_date': '20150419',
|
'upload_date': '20150419',
|
||||||
'uploader_id': 'batchUser',
|
},
|
||||||
}
|
'params': {
|
||||||
|
'skip_download': True,
|
||||||
|
},
|
||||||
|
'add_ie': [JWPlatformIE.ie_key()],
|
||||||
}
|
}
|
||||||
|
|
||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
video_id = self._match_id(url)
|
video_id = self._match_id(url)
|
||||||
|
|
||||||
webpage = self._download_webpage(url, video_id)
|
webpage = self._download_webpage(url, video_id)
|
||||||
|
|
||||||
|
tmz_url = self._search_regex(
|
||||||
|
r'clickLink\s*\(\s*["\'](?P<url>%s)' % TMZIE._VALID_URL, webpage,
|
||||||
|
'video id', default=None, group='url')
|
||||||
|
if tmz_url:
|
||||||
|
return self.url_result(tmz_url, ie=TMZIE.ie_key())
|
||||||
|
|
||||||
embedded_video_info = self._parse_json(self._html_search_regex(
|
embedded_video_info = self._parse_json(self._html_search_regex(
|
||||||
r'tmzVideoEmbed\(({.+?})\);', webpage, 'embedded video info'),
|
r'tmzVideoEmbed\(({.+?})\);', webpage, 'embedded video info'),
|
||||||
video_id)
|
video_id)
|
||||||
|
|
||||||
return self.url_result(
|
return self.url_result(
|
||||||
'http://www.tmz.com/videos/%s/' % embedded_video_info['id'])
|
'http://www.tmz.com/videos/%s/' % embedded_video_info['id'],
|
||||||
|
ie=TMZIE.ie_key())
|
||||||
|
@ -153,6 +153,7 @@ class TrovoVodIE(TrovoBaseIE):
|
|||||||
'protocol': 'm3u8_native',
|
'protocol': 'm3u8_native',
|
||||||
'tbr': int_or_none(play_info.get('bitrate')),
|
'tbr': int_or_none(play_info.get('bitrate')),
|
||||||
'url': play_url,
|
'url': play_url,
|
||||||
|
'http_headers': {'Origin': 'https://trovo.live'},
|
||||||
})
|
})
|
||||||
self._sort_formats(formats)
|
self._sort_formats(formats)
|
||||||
|
|
||||||
|
@ -74,6 +74,12 @@ class TV2DKIE(InfoExtractor):
|
|||||||
webpage = self._download_webpage(url, video_id)
|
webpage = self._download_webpage(url, video_id)
|
||||||
|
|
||||||
entries = []
|
entries = []
|
||||||
|
|
||||||
|
def add_entry(partner_id, kaltura_id):
|
||||||
|
entries.append(self.url_result(
|
||||||
|
'kaltura:%s:%s' % (partner_id, kaltura_id), 'Kaltura',
|
||||||
|
video_id=kaltura_id))
|
||||||
|
|
||||||
for video_el in re.findall(r'(?s)<[^>]+\bdata-entryid\s*=[^>]*>', webpage):
|
for video_el in re.findall(r'(?s)<[^>]+\bdata-entryid\s*=[^>]*>', webpage):
|
||||||
video = extract_attributes(video_el)
|
video = extract_attributes(video_el)
|
||||||
kaltura_id = video.get('data-entryid')
|
kaltura_id = video.get('data-entryid')
|
||||||
@ -82,9 +88,14 @@ class TV2DKIE(InfoExtractor):
|
|||||||
partner_id = video.get('data-partnerid')
|
partner_id = video.get('data-partnerid')
|
||||||
if not partner_id:
|
if not partner_id:
|
||||||
continue
|
continue
|
||||||
entries.append(self.url_result(
|
add_entry(partner_id, kaltura_id)
|
||||||
'kaltura:%s:%s' % (partner_id, kaltura_id), 'Kaltura',
|
if not entries:
|
||||||
video_id=kaltura_id))
|
kaltura_id = self._search_regex(
|
||||||
|
r'entry_id\s*:\s*["\']([0-9a-z_]+)', webpage, 'kaltura id')
|
||||||
|
partner_id = self._search_regex(
|
||||||
|
(r'\\u002Fp\\u002F(\d+)\\u002F', r'/p/(\d+)/'), webpage,
|
||||||
|
'partner id')
|
||||||
|
add_entry(partner_id, kaltura_id)
|
||||||
return self.playlist_result(entries)
|
return self.playlist_result(entries)
|
||||||
|
|
||||||
|
|
||||||
|
@ -25,6 +25,10 @@ class TVerIE(InfoExtractor):
|
|||||||
}, {
|
}, {
|
||||||
'url': 'https://tver.jp/episode/79622438',
|
'url': 'https://tver.jp/episode/79622438',
|
||||||
'only_matching': True,
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
# subtitle = ' '
|
||||||
|
'url': 'https://tver.jp/corner/f0068870',
|
||||||
|
'only_matching': True,
|
||||||
}]
|
}]
|
||||||
_TOKEN = None
|
_TOKEN = None
|
||||||
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/%s/default_default/index.html?videoId=%s'
|
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/%s/default_default/index.html?videoId=%s'
|
||||||
@ -40,28 +44,18 @@ class TVerIE(InfoExtractor):
|
|||||||
query={'token': self._TOKEN})['main']
|
query={'token': self._TOKEN})['main']
|
||||||
p_id = main['publisher_id']
|
p_id = main['publisher_id']
|
||||||
service = remove_start(main['service'], 'ts_')
|
service = remove_start(main['service'], 'ts_')
|
||||||
info = {
|
|
||||||
'_type': 'url_transparent',
|
|
||||||
'description': try_get(main, lambda x: x['note'][0]['text'], compat_str),
|
|
||||||
'episode_number': int_or_none(try_get(main, lambda x: x['ext']['episode_number'])),
|
|
||||||
}
|
|
||||||
|
|
||||||
if service == 'cx':
|
|
||||||
info.update({
|
|
||||||
'title': main.get('subtitle') or main['title'],
|
|
||||||
'url': 'https://i.fod.fujitv.co.jp/plus7/web/%s/%s.html' % (p_id[:4], p_id),
|
|
||||||
'ie_key': 'FujiTVFODPlus7',
|
|
||||||
})
|
|
||||||
else:
|
|
||||||
r_id = main['reference_id']
|
r_id = main['reference_id']
|
||||||
if service not in ('tx', 'russia2018', 'sebare2018live', 'gorin'):
|
if service not in ('tx', 'russia2018', 'sebare2018live', 'gorin'):
|
||||||
r_id = 'ref:' + r_id
|
r_id = 'ref:' + r_id
|
||||||
bc_url = smuggle_url(
|
bc_url = smuggle_url(
|
||||||
self.BRIGHTCOVE_URL_TEMPLATE % (p_id, r_id),
|
self.BRIGHTCOVE_URL_TEMPLATE % (p_id, r_id),
|
||||||
{'geo_countries': ['JP']})
|
{'geo_countries': ['JP']})
|
||||||
info.update({
|
|
||||||
|
return {
|
||||||
|
'_type': 'url_transparent',
|
||||||
|
'description': try_get(main, lambda x: x['note'][0]['text'], compat_str),
|
||||||
|
'episode_number': int_or_none(try_get(main, lambda x: x['ext']['episode_number'])),
|
||||||
'url': bc_url,
|
'url': bc_url,
|
||||||
'ie_key': 'BrightcoveNew',
|
'ie_key': 'BrightcoveNew',
|
||||||
})
|
}
|
||||||
|
|
||||||
return info
|
|
||||||
|
@ -19,6 +19,7 @@ from ..utils import (
|
|||||||
strip_or_none,
|
strip_or_none,
|
||||||
unified_timestamp,
|
unified_timestamp,
|
||||||
update_url_query,
|
update_url_query,
|
||||||
|
url_or_none,
|
||||||
xpath_text,
|
xpath_text,
|
||||||
)
|
)
|
||||||
|
|
||||||
@ -52,6 +53,9 @@ class TwitterBaseIE(InfoExtractor):
|
|||||||
return [f]
|
return [f]
|
||||||
|
|
||||||
def _extract_formats_from_vmap_url(self, vmap_url, video_id):
|
def _extract_formats_from_vmap_url(self, vmap_url, video_id):
|
||||||
|
vmap_url = url_or_none(vmap_url)
|
||||||
|
if not vmap_url:
|
||||||
|
return []
|
||||||
vmap_data = self._download_xml(vmap_url, video_id)
|
vmap_data = self._download_xml(vmap_url, video_id)
|
||||||
formats = []
|
formats = []
|
||||||
urls = []
|
urls = []
|
||||||
|
@ -21,6 +21,11 @@ class URPlayIE(InfoExtractor):
|
|||||||
'description': 'md5:5344508a52aa78c1ced6c1b8b9e44e9a',
|
'description': 'md5:5344508a52aa78c1ced6c1b8b9e44e9a',
|
||||||
'timestamp': 1513292400,
|
'timestamp': 1513292400,
|
||||||
'upload_date': '20171214',
|
'upload_date': '20171214',
|
||||||
|
'series': 'UR Samtiden - Livet, universum och rymdens märkliga musik',
|
||||||
|
'duration': 2269,
|
||||||
|
'categories': ['Kultur & historia'],
|
||||||
|
'tags': ['Kritiskt tänkande', 'Vetenskap', 'Vetenskaplig verksamhet'],
|
||||||
|
'episode': 'Om vetenskap, kritiskt tänkande och motstånd',
|
||||||
},
|
},
|
||||||
}, {
|
}, {
|
||||||
'url': 'https://urskola.se/Produkter/190031-Tripp-Trapp-Trad-Sovkudde',
|
'url': 'https://urskola.se/Produkter/190031-Tripp-Trapp-Trad-Sovkudde',
|
||||||
@ -31,6 +36,10 @@ class URPlayIE(InfoExtractor):
|
|||||||
'description': 'md5:b86bffdae04a7e9379d1d7e5947df1d1',
|
'description': 'md5:b86bffdae04a7e9379d1d7e5947df1d1',
|
||||||
'timestamp': 1440086400,
|
'timestamp': 1440086400,
|
||||||
'upload_date': '20150820',
|
'upload_date': '20150820',
|
||||||
|
'series': 'Tripp, Trapp, Träd',
|
||||||
|
'duration': 865,
|
||||||
|
'tags': ['Sova'],
|
||||||
|
'episode': 'Sovkudde',
|
||||||
},
|
},
|
||||||
}, {
|
}, {
|
||||||
'url': 'http://urskola.se/Produkter/155794-Smasagor-meankieli-Grodan-i-vida-varlden',
|
'url': 'http://urskola.se/Produkter/155794-Smasagor-meankieli-Grodan-i-vida-varlden',
|
||||||
@ -41,9 +50,11 @@ class URPlayIE(InfoExtractor):
|
|||||||
video_id = self._match_id(url)
|
video_id = self._match_id(url)
|
||||||
url = url.replace('skola.se/Produkter', 'play.se/program')
|
url = url.replace('skola.se/Produkter', 'play.se/program')
|
||||||
webpage = self._download_webpage(url, video_id)
|
webpage = self._download_webpage(url, video_id)
|
||||||
urplayer_data = self._parse_json(self._html_search_regex(
|
vid = int(video_id)
|
||||||
|
accessible_episodes = self._parse_json(self._html_search_regex(
|
||||||
r'data-react-class="routes/Product/components/ProgramContainer/ProgramContainer"[^>]+data-react-props="({.+?})"',
|
r'data-react-class="routes/Product/components/ProgramContainer/ProgramContainer"[^>]+data-react-props="({.+?})"',
|
||||||
webpage, 'urplayer data'), video_id)['accessibleEpisodes'][0]
|
webpage, 'urplayer data'), video_id)['accessibleEpisodes']
|
||||||
|
urplayer_data = next(e for e in accessible_episodes if e.get('id') == vid)
|
||||||
episode = urplayer_data['title']
|
episode = urplayer_data['title']
|
||||||
raw_streaming_info = urplayer_data['streamingInfo']['raw']
|
raw_streaming_info = urplayer_data['streamingInfo']['raw']
|
||||||
host = self._download_json(
|
host = self._download_json(
|
||||||
|
@ -23,6 +23,8 @@ class VGTVIE(XstreamIE):
|
|||||||
'fvn.no/fvntv': 'fvntv',
|
'fvn.no/fvntv': 'fvntv',
|
||||||
'aftenposten.no/webtv': 'aptv',
|
'aftenposten.no/webtv': 'aptv',
|
||||||
'ap.vgtv.no/webtv': 'aptv',
|
'ap.vgtv.no/webtv': 'aptv',
|
||||||
|
'tv.aftonbladet.se': 'abtv',
|
||||||
|
# obsolete URL schemas, kept in order to save one HTTP redirect
|
||||||
'tv.aftonbladet.se/abtv': 'abtv',
|
'tv.aftonbladet.se/abtv': 'abtv',
|
||||||
'www.aftonbladet.se/tv': 'abtv',
|
'www.aftonbladet.se/tv': 'abtv',
|
||||||
}
|
}
|
||||||
@ -140,6 +142,10 @@ class VGTVIE(XstreamIE):
|
|||||||
'url': 'http://www.vgtv.no/#!/video/127205/inside-the-mind-of-favela-funk',
|
'url': 'http://www.vgtv.no/#!/video/127205/inside-the-mind-of-favela-funk',
|
||||||
'only_matching': True,
|
'only_matching': True,
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
'url': 'https://tv.aftonbladet.se/video/36015/vulkanutbrott-i-rymden-nu-slapper-nasa-bilderna',
|
||||||
|
'only_matching': True,
|
||||||
|
},
|
||||||
{
|
{
|
||||||
'url': 'http://tv.aftonbladet.se/abtv/articles/36015',
|
'url': 'http://tv.aftonbladet.se/abtv/articles/36015',
|
||||||
'only_matching': True,
|
'only_matching': True,
|
||||||
|
@ -3,7 +3,6 @@ from __future__ import unicode_literals
|
|||||||
|
|
||||||
import base64
|
import base64
|
||||||
import functools
|
import functools
|
||||||
import json
|
|
||||||
import re
|
import re
|
||||||
import itertools
|
import itertools
|
||||||
|
|
||||||
@ -17,14 +16,14 @@ from ..compat import (
|
|||||||
from ..utils import (
|
from ..utils import (
|
||||||
clean_html,
|
clean_html,
|
||||||
determine_ext,
|
determine_ext,
|
||||||
dict_get,
|
|
||||||
ExtractorError,
|
ExtractorError,
|
||||||
|
get_element_by_class,
|
||||||
js_to_json,
|
js_to_json,
|
||||||
int_or_none,
|
int_or_none,
|
||||||
merge_dicts,
|
merge_dicts,
|
||||||
OnDemandPagedList,
|
OnDemandPagedList,
|
||||||
parse_filesize,
|
parse_filesize,
|
||||||
RegexNotFoundError,
|
parse_iso8601,
|
||||||
sanitized_Request,
|
sanitized_Request,
|
||||||
smuggle_url,
|
smuggle_url,
|
||||||
std_headers,
|
std_headers,
|
||||||
@ -74,25 +73,28 @@ class VimeoBaseInfoExtractor(InfoExtractor):
|
|||||||
expected=True)
|
expected=True)
|
||||||
raise ExtractorError('Unable to log in')
|
raise ExtractorError('Unable to log in')
|
||||||
|
|
||||||
def _verify_video_password(self, url, video_id, webpage):
|
def _get_video_password(self):
|
||||||
password = self._downloader.params.get('videopassword')
|
password = self._downloader.params.get('videopassword')
|
||||||
if password is None:
|
if password is None:
|
||||||
raise ExtractorError('This video is protected by a password, use the --video-password option', expected=True)
|
raise ExtractorError(
|
||||||
token, vuid = self._extract_xsrft_and_vuid(webpage)
|
'This video is protected by a password, use the --video-password option',
|
||||||
data = urlencode_postdata({
|
expected=True)
|
||||||
'password': password,
|
return password
|
||||||
'token': token,
|
|
||||||
})
|
def _verify_video_password(self, url, video_id, password, token, vuid):
|
||||||
if url.startswith('http://'):
|
if url.startswith('http://'):
|
||||||
# vimeo only supports https now, but the user can give an http url
|
# vimeo only supports https now, but the user can give an http url
|
||||||
url = url.replace('http://', 'https://')
|
url = url.replace('http://', 'https://')
|
||||||
password_request = sanitized_Request(url + '/password', data)
|
|
||||||
password_request.add_header('Content-Type', 'application/x-www-form-urlencoded')
|
|
||||||
password_request.add_header('Referer', url)
|
|
||||||
self._set_vimeo_cookie('vuid', vuid)
|
self._set_vimeo_cookie('vuid', vuid)
|
||||||
return self._download_webpage(
|
return self._download_webpage(
|
||||||
password_request, video_id,
|
url + '/password', video_id, 'Verifying the password',
|
||||||
'Verifying the password', 'Wrong password')
|
'Wrong password', data=urlencode_postdata({
|
||||||
|
'password': password,
|
||||||
|
'token': token,
|
||||||
|
}), headers={
|
||||||
|
'Content-Type': 'application/x-www-form-urlencoded',
|
||||||
|
'Referer': url,
|
||||||
|
})
|
||||||
|
|
||||||
def _extract_xsrft_and_vuid(self, webpage):
|
def _extract_xsrft_and_vuid(self, webpage):
|
||||||
xsrft = self._search_regex(
|
xsrft = self._search_regex(
|
||||||
@ -123,10 +125,11 @@ class VimeoBaseInfoExtractor(InfoExtractor):
|
|||||||
video_title = video_data['title']
|
video_title = video_data['title']
|
||||||
live_event = video_data.get('live_event') or {}
|
live_event = video_data.get('live_event') or {}
|
||||||
is_live = live_event.get('status') == 'started'
|
is_live = live_event.get('status') == 'started'
|
||||||
|
request = config.get('request') or {}
|
||||||
|
|
||||||
formats = []
|
formats = []
|
||||||
config_files = video_data.get('files') or config['request'].get('files', {})
|
config_files = video_data.get('files') or request.get('files') or {}
|
||||||
for f in config_files.get('progressive', []):
|
for f in (config_files.get('progressive') or []):
|
||||||
video_url = f.get('url')
|
video_url = f.get('url')
|
||||||
if not video_url:
|
if not video_url:
|
||||||
continue
|
continue
|
||||||
@ -142,7 +145,7 @@ class VimeoBaseInfoExtractor(InfoExtractor):
|
|||||||
# TODO: fix handling of 308 status code returned for live archive manifest requests
|
# TODO: fix handling of 308 status code returned for live archive manifest requests
|
||||||
sep_pattern = r'/sep/video/'
|
sep_pattern = r'/sep/video/'
|
||||||
for files_type in ('hls', 'dash'):
|
for files_type in ('hls', 'dash'):
|
||||||
for cdn_name, cdn_data in config_files.get(files_type, {}).get('cdns', {}).items():
|
for cdn_name, cdn_data in (try_get(config_files, lambda x: x[files_type]['cdns']) or {}).items():
|
||||||
manifest_url = cdn_data.get('url')
|
manifest_url = cdn_data.get('url')
|
||||||
if not manifest_url:
|
if not manifest_url:
|
||||||
continue
|
continue
|
||||||
@ -188,9 +191,7 @@ class VimeoBaseInfoExtractor(InfoExtractor):
|
|||||||
f['preference'] = -40
|
f['preference'] = -40
|
||||||
|
|
||||||
subtitles = {}
|
subtitles = {}
|
||||||
text_tracks = config['request'].get('text_tracks')
|
for tt in (request.get('text_tracks') or []):
|
||||||
if text_tracks:
|
|
||||||
for tt in text_tracks:
|
|
||||||
subtitles[tt['lang']] = [{
|
subtitles[tt['lang']] = [{
|
||||||
'ext': 'vtt',
|
'ext': 'vtt',
|
||||||
'url': urljoin('https://vimeo.com', tt['url']),
|
'url': urljoin('https://vimeo.com', tt['url']),
|
||||||
@ -198,7 +199,7 @@ class VimeoBaseInfoExtractor(InfoExtractor):
|
|||||||
|
|
||||||
thumbnails = []
|
thumbnails = []
|
||||||
if not is_live:
|
if not is_live:
|
||||||
for key, thumb in video_data.get('thumbs', {}).items():
|
for key, thumb in (video_data.get('thumbs') or {}).items():
|
||||||
thumbnails.append({
|
thumbnails.append({
|
||||||
'id': key,
|
'id': key,
|
||||||
'width': int_or_none(key),
|
'width': int_or_none(key),
|
||||||
@ -278,7 +279,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
|
|||||||
)?
|
)?
|
||||||
(?:videos?/)?
|
(?:videos?/)?
|
||||||
(?P<id>[0-9]+)
|
(?P<id>[0-9]+)
|
||||||
(?:/[\da-f]+)?
|
(?:/(?P<unlisted_hash>[\da-f]{10}))?
|
||||||
/?(?:[?&].*)?(?:[#].*)?$
|
/?(?:[?&].*)?(?:[#].*)?$
|
||||||
'''
|
'''
|
||||||
IE_NAME = 'vimeo'
|
IE_NAME = 'vimeo'
|
||||||
@ -318,6 +319,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
|
|||||||
'duration': 1595,
|
'duration': 1595,
|
||||||
'upload_date': '20130610',
|
'upload_date': '20130610',
|
||||||
'timestamp': 1370893156,
|
'timestamp': 1370893156,
|
||||||
|
'license': 'by',
|
||||||
},
|
},
|
||||||
'params': {
|
'params': {
|
||||||
'format': 'best[protocol=https]',
|
'format': 'best[protocol=https]',
|
||||||
@ -331,9 +333,9 @@ class VimeoIE(VimeoBaseInfoExtractor):
|
|||||||
'id': '54469442',
|
'id': '54469442',
|
||||||
'ext': 'mp4',
|
'ext': 'mp4',
|
||||||
'title': 'Kathy Sierra: Building the minimum Badass User, Business of Software 2012',
|
'title': 'Kathy Sierra: Building the minimum Badass User, Business of Software 2012',
|
||||||
'uploader': 'The BLN & Business of Software',
|
'uploader': 'Business of Software',
|
||||||
'uploader_url': r're:https?://(?:www\.)?vimeo\.com/theblnbusinessofsoftware',
|
'uploader_url': r're:https?://(?:www\.)?vimeo\.com/businessofsoftware',
|
||||||
'uploader_id': 'theblnbusinessofsoftware',
|
'uploader_id': 'businessofsoftware',
|
||||||
'duration': 3610,
|
'duration': 3610,
|
||||||
'description': None,
|
'description': None,
|
||||||
},
|
},
|
||||||
@ -396,6 +398,12 @@ class VimeoIE(VimeoBaseInfoExtractor):
|
|||||||
'uploader_id': 'staff',
|
'uploader_id': 'staff',
|
||||||
'uploader': 'Vimeo Staff',
|
'uploader': 'Vimeo Staff',
|
||||||
'duration': 62,
|
'duration': 62,
|
||||||
|
'subtitles': {
|
||||||
|
'de': [{'ext': 'vtt'}],
|
||||||
|
'en': [{'ext': 'vtt'}],
|
||||||
|
'es': [{'ext': 'vtt'}],
|
||||||
|
'fr': [{'ext': 'vtt'}],
|
||||||
|
},
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@ -468,6 +476,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
|
|||||||
'skip_download': True,
|
'skip_download': True,
|
||||||
},
|
},
|
||||||
'expected_warnings': ['Unable to download JSON metadata'],
|
'expected_warnings': ['Unable to download JSON metadata'],
|
||||||
|
'skip': 'this page is no longer available.',
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
'url': 'http://player.vimeo.com/video/68375962',
|
'url': 'http://player.vimeo.com/video/68375962',
|
||||||
@ -550,9 +559,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
|
|||||||
return urls[0] if urls else None
|
return urls[0] if urls else None
|
||||||
|
|
||||||
def _verify_player_video_password(self, url, video_id, headers):
|
def _verify_player_video_password(self, url, video_id, headers):
|
||||||
password = self._downloader.params.get('videopassword')
|
password = self._get_video_password()
|
||||||
if password is None:
|
|
||||||
raise ExtractorError('This video is protected by a password, use the --video-password option', expected=True)
|
|
||||||
data = urlencode_postdata({
|
data = urlencode_postdata({
|
||||||
'password': base64.b64encode(password.encode()),
|
'password': base64.b64encode(password.encode()),
|
||||||
})
|
})
|
||||||
@ -569,6 +576,37 @@ class VimeoIE(VimeoBaseInfoExtractor):
|
|||||||
def _real_initialize(self):
|
def _real_initialize(self):
|
||||||
self._login()
|
self._login()
|
||||||
|
|
||||||
|
def _extract_from_api(self, video_id, unlisted_hash=None):
|
||||||
|
token = self._download_json(
|
||||||
|
'https://vimeo.com/_rv/jwt', video_id, headers={
|
||||||
|
'X-Requested-With': 'XMLHttpRequest'
|
||||||
|
})['token']
|
||||||
|
api_url = 'https://api.vimeo.com/videos/' + video_id
|
||||||
|
if unlisted_hash:
|
||||||
|
api_url += ':' + unlisted_hash
|
||||||
|
video = self._download_json(
|
||||||
|
api_url, video_id, headers={
|
||||||
|
'Authorization': 'jwt ' + token,
|
||||||
|
}, query={
|
||||||
|
'fields': 'config_url,created_time,description,license,metadata.connections.comments.total,metadata.connections.likes.total,release_time,stats.plays',
|
||||||
|
})
|
||||||
|
info = self._parse_config(self._download_json(
|
||||||
|
video['config_url'], video_id), video_id)
|
||||||
|
self._vimeo_sort_formats(info['formats'])
|
||||||
|
get_timestamp = lambda x: parse_iso8601(video.get(x + '_time'))
|
||||||
|
info.update({
|
||||||
|
'description': video.get('description'),
|
||||||
|
'license': video.get('license'),
|
||||||
|
'release_timestamp': get_timestamp('release'),
|
||||||
|
'timestamp': get_timestamp('created'),
|
||||||
|
'view_count': int_or_none(try_get(video, lambda x: x['stats']['plays'])),
|
||||||
|
})
|
||||||
|
connections = try_get(
|
||||||
|
video, lambda x: x['metadata']['connections'], dict) or {}
|
||||||
|
for k in ('comment', 'like'):
|
||||||
|
info[k + '_count'] = int_or_none(try_get(connections, lambda x: x[k + 's']['total']))
|
||||||
|
return info
|
||||||
|
|
||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
url, data = unsmuggle_url(url, {})
|
url, data = unsmuggle_url(url, {})
|
||||||
headers = std_headers.copy()
|
headers = std_headers.copy()
|
||||||
@ -577,22 +615,19 @@ class VimeoIE(VimeoBaseInfoExtractor):
|
|||||||
if 'Referer' not in headers:
|
if 'Referer' not in headers:
|
||||||
headers['Referer'] = url
|
headers['Referer'] = url
|
||||||
|
|
||||||
channel_id = self._search_regex(
|
mobj = re.match(self._VALID_URL, url).groupdict()
|
||||||
r'vimeo\.com/channels/([^/]+)', url, 'channel id', default=None)
|
video_id, unlisted_hash = mobj['id'], mobj.get('unlisted_hash')
|
||||||
|
if unlisted_hash:
|
||||||
|
return self._extract_from_api(video_id, unlisted_hash)
|
||||||
|
|
||||||
# Extract ID from URL
|
|
||||||
video_id = self._match_id(url)
|
|
||||||
orig_url = url
|
orig_url = url
|
||||||
is_pro = 'vimeopro.com/' in url
|
is_pro = 'vimeopro.com/' in url
|
||||||
is_player = '://player.vimeo.com/video/' in url
|
|
||||||
if is_pro:
|
if is_pro:
|
||||||
# some videos require portfolio_id to be present in player url
|
# some videos require portfolio_id to be present in player url
|
||||||
# https://github.com/ytdl-org/youtube-dl/issues/20070
|
# https://github.com/ytdl-org/youtube-dl/issues/20070
|
||||||
url = self._extract_url(url, self._download_webpage(url, video_id))
|
url = self._extract_url(url, self._download_webpage(url, video_id))
|
||||||
if not url:
|
if not url:
|
||||||
url = 'https://vimeo.com/' + video_id
|
url = 'https://vimeo.com/' + video_id
|
||||||
elif is_player:
|
|
||||||
url = 'https://player.vimeo.com/video/' + video_id
|
|
||||||
elif any(p in url for p in ('play_redirect_hls', 'moogaloop.swf')):
|
elif any(p in url for p in ('play_redirect_hls', 'moogaloop.swf')):
|
||||||
url = 'https://vimeo.com/' + video_id
|
url = 'https://vimeo.com/' + video_id
|
||||||
|
|
||||||
@ -612,14 +647,25 @@ class VimeoIE(VimeoBaseInfoExtractor):
|
|||||||
expected=True)
|
expected=True)
|
||||||
raise
|
raise
|
||||||
|
|
||||||
# Now we begin extracting as much information as we can from what we
|
if '://player.vimeo.com/video/' in url:
|
||||||
# retrieved. First we extract the information common to all extractors,
|
config = self._parse_json(self._search_regex(
|
||||||
# and latter we extract those that are Vimeo specific.
|
r'\bconfig\s*=\s*({.+?})\s*;', webpage, 'info section'), video_id)
|
||||||
self.report_extraction(video_id)
|
if config.get('view') == 4:
|
||||||
|
config = self._verify_player_video_password(
|
||||||
|
redirect_url, video_id, headers)
|
||||||
|
info = self._parse_config(config, video_id)
|
||||||
|
self._vimeo_sort_formats(info['formats'])
|
||||||
|
return info
|
||||||
|
|
||||||
|
if re.search(r'<form[^>]+?id="pw_form"', webpage):
|
||||||
|
video_password = self._get_video_password()
|
||||||
|
token, vuid = self._extract_xsrft_and_vuid(webpage)
|
||||||
|
webpage = self._verify_video_password(
|
||||||
|
redirect_url, video_id, video_password, token, vuid)
|
||||||
|
|
||||||
vimeo_config = self._extract_vimeo_config(webpage, video_id, default=None)
|
vimeo_config = self._extract_vimeo_config(webpage, video_id, default=None)
|
||||||
if vimeo_config:
|
if vimeo_config:
|
||||||
seed_status = vimeo_config.get('seed_status', {})
|
seed_status = vimeo_config.get('seed_status') or {}
|
||||||
if seed_status.get('state') == 'failed':
|
if seed_status.get('state') == 'failed':
|
||||||
raise ExtractorError(
|
raise ExtractorError(
|
||||||
'%s said: %s' % (self.IE_NAME, seed_status['title']),
|
'%s said: %s' % (self.IE_NAME, seed_status['title']),
|
||||||
@ -628,67 +674,40 @@ class VimeoIE(VimeoBaseInfoExtractor):
|
|||||||
cc_license = None
|
cc_license = None
|
||||||
timestamp = None
|
timestamp = None
|
||||||
video_description = None
|
video_description = None
|
||||||
|
info_dict = {}
|
||||||
|
|
||||||
# Extract the config JSON
|
channel_id = self._search_regex(
|
||||||
try:
|
r'vimeo\.com/channels/([^/]+)', url, 'channel id', default=None)
|
||||||
try:
|
if channel_id:
|
||||||
config_url = self._html_search_regex(
|
config_url = self._html_search_regex(
|
||||||
r' data-config-url="(.+?)"', webpage,
|
r'\bdata-config-url="([^"]+)"', webpage, 'config URL')
|
||||||
'config URL', default=None)
|
video_description = clean_html(get_element_by_class('description', webpage))
|
||||||
if not config_url:
|
info_dict.update({
|
||||||
# Sometimes new react-based page is served instead of old one that require
|
'channel_id': channel_id,
|
||||||
# different config URL extraction approach (see
|
'channel_url': 'https://vimeo.com/channels/' + channel_id,
|
||||||
# https://github.com/ytdl-org/youtube-dl/pull/7209)
|
})
|
||||||
|
else:
|
||||||
page_config = self._parse_json(self._search_regex(
|
page_config = self._parse_json(self._search_regex(
|
||||||
r'vimeo\.(?:clip|vod_title)_page_config\s*=\s*({.+?});',
|
r'vimeo\.(?:clip|vod_title)_page_config\s*=\s*({.+?});',
|
||||||
webpage, 'page config'), video_id)
|
webpage, 'page config', default='{}'), video_id, fatal=False)
|
||||||
|
if not page_config:
|
||||||
|
return self._extract_from_api(video_id)
|
||||||
config_url = page_config['player']['config_url']
|
config_url = page_config['player']['config_url']
|
||||||
cc_license = page_config.get('cc_license')
|
cc_license = page_config.get('cc_license')
|
||||||
timestamp = try_get(
|
clip = page_config.get('clip') or {}
|
||||||
page_config, lambda x: x['clip']['uploaded_on'],
|
timestamp = clip.get('uploaded_on')
|
||||||
compat_str)
|
video_description = clean_html(
|
||||||
video_description = clean_html(dict_get(
|
clip.get('description') or page_config.get('description_html_escaped'))
|
||||||
page_config, ('description', 'description_html_escaped')))
|
|
||||||
config = self._download_json(config_url, video_id)
|
config = self._download_json(config_url, video_id)
|
||||||
except RegexNotFoundError:
|
|
||||||
# For pro videos or player.vimeo.com urls
|
|
||||||
# We try to find out to which variable is assigned the config dic
|
|
||||||
m_variable_name = re.search(r'(\w)\.video\.id', webpage)
|
|
||||||
if m_variable_name is not None:
|
|
||||||
config_re = [r'%s=({[^}].+?});' % re.escape(m_variable_name.group(1))]
|
|
||||||
else:
|
|
||||||
config_re = [r' = {config:({.+?}),assets:', r'(?:[abc])=({.+?});']
|
|
||||||
config_re.append(r'\bvar\s+r\s*=\s*({.+?})\s*;')
|
|
||||||
config_re.append(r'\bconfig\s*=\s*({.+?})\s*;')
|
|
||||||
config = self._search_regex(config_re, webpage, 'info section',
|
|
||||||
flags=re.DOTALL)
|
|
||||||
config = json.loads(config)
|
|
||||||
except Exception as e:
|
|
||||||
if re.search('The creator of this video has not given you permission to embed it on this domain.', webpage):
|
|
||||||
raise ExtractorError('The author has restricted the access to this video, try with the "--referer" option')
|
|
||||||
|
|
||||||
if re.search(r'<form[^>]+?id="pw_form"', webpage) is not None:
|
|
||||||
if '_video_password_verified' in data:
|
|
||||||
raise ExtractorError('video password verification failed!')
|
|
||||||
self._verify_video_password(redirect_url, video_id, webpage)
|
|
||||||
return self._real_extract(
|
|
||||||
smuggle_url(redirect_url, {'_video_password_verified': 'verified'}))
|
|
||||||
else:
|
|
||||||
raise ExtractorError('Unable to extract info section',
|
|
||||||
cause=e)
|
|
||||||
else:
|
|
||||||
if config.get('view') == 4:
|
|
||||||
config = self._verify_player_video_password(redirect_url, video_id, headers)
|
|
||||||
|
|
||||||
video = config.get('video') or {}
|
video = config.get('video') or {}
|
||||||
vod = video.get('vod') or {}
|
vod = video.get('vod') or {}
|
||||||
|
|
||||||
def is_rented():
|
def is_rented():
|
||||||
if '>You rented this title.<' in webpage:
|
if '>You rented this title.<' in webpage:
|
||||||
return True
|
return True
|
||||||
if config.get('user', {}).get('purchased'):
|
if try_get(config, lambda x: x['user']['purchased']):
|
||||||
return True
|
return True
|
||||||
for purchase_option in vod.get('purchase_options', []):
|
for purchase_option in (vod.get('purchase_options') or []):
|
||||||
if purchase_option.get('purchased'):
|
if purchase_option.get('purchased'):
|
||||||
return True
|
return True
|
||||||
label = purchase_option.get('label_string')
|
label = purchase_option.get('label_string')
|
||||||
@ -703,14 +722,10 @@ class VimeoIE(VimeoBaseInfoExtractor):
|
|||||||
'https://player.vimeo.com/player/%s' % feature_id,
|
'https://player.vimeo.com/player/%s' % feature_id,
|
||||||
{'force_feature_id': True}), 'Vimeo')
|
{'force_feature_id': True}), 'Vimeo')
|
||||||
|
|
||||||
# Extract video description
|
|
||||||
if not video_description:
|
|
||||||
video_description = self._html_search_regex(
|
|
||||||
r'(?s)<div\s+class="[^"]*description[^"]*"[^>]*>(.*?)</div>',
|
|
||||||
webpage, 'description', default=None)
|
|
||||||
if not video_description:
|
if not video_description:
|
||||||
video_description = self._html_search_meta(
|
video_description = self._html_search_meta(
|
||||||
'description', webpage, default=None)
|
['description', 'og:description', 'twitter:description'],
|
||||||
|
webpage, default=None)
|
||||||
if not video_description and is_pro:
|
if not video_description and is_pro:
|
||||||
orig_webpage = self._download_webpage(
|
orig_webpage = self._download_webpage(
|
||||||
orig_url, video_id,
|
orig_url, video_id,
|
||||||
@ -719,25 +734,14 @@ class VimeoIE(VimeoBaseInfoExtractor):
|
|||||||
if orig_webpage:
|
if orig_webpage:
|
||||||
video_description = self._html_search_meta(
|
video_description = self._html_search_meta(
|
||||||
'description', orig_webpage, default=None)
|
'description', orig_webpage, default=None)
|
||||||
if not video_description and not is_player:
|
if not video_description:
|
||||||
self._downloader.report_warning('Cannot find video description')
|
self._downloader.report_warning('Cannot find video description')
|
||||||
|
|
||||||
# Extract upload date
|
|
||||||
if not timestamp:
|
if not timestamp:
|
||||||
timestamp = self._search_regex(
|
timestamp = self._search_regex(
|
||||||
r'<time[^>]+datetime="([^"]+)"', webpage,
|
r'<time[^>]+datetime="([^"]+)"', webpage,
|
||||||
'timestamp', default=None)
|
'timestamp', default=None)
|
||||||
|
|
||||||
try:
|
|
||||||
view_count = int(self._search_regex(r'UserPlays:(\d+)', webpage, 'view count'))
|
|
||||||
like_count = int(self._search_regex(r'UserLikes:(\d+)', webpage, 'like count'))
|
|
||||||
comment_count = int(self._search_regex(r'UserComments:(\d+)', webpage, 'comment count'))
|
|
||||||
except RegexNotFoundError:
|
|
||||||
# This info is only available in vimeo.com/{id} urls
|
|
||||||
view_count = None
|
|
||||||
like_count = None
|
|
||||||
comment_count = None
|
|
||||||
|
|
||||||
formats = []
|
formats = []
|
||||||
|
|
||||||
source_format = self._extract_original_format(
|
source_format = self._extract_original_format(
|
||||||
@ -756,29 +760,20 @@ class VimeoIE(VimeoBaseInfoExtractor):
|
|||||||
r'<link[^>]+rel=["\']license["\'][^>]+href=(["\'])(?P<license>(?:(?!\1).)+)\1',
|
r'<link[^>]+rel=["\']license["\'][^>]+href=(["\'])(?P<license>(?:(?!\1).)+)\1',
|
||||||
webpage, 'license', default=None, group='license')
|
webpage, 'license', default=None, group='license')
|
||||||
|
|
||||||
channel_url = 'https://vimeo.com/channels/%s' % channel_id if channel_id else None
|
info_dict.update({
|
||||||
|
|
||||||
info_dict = {
|
|
||||||
'formats': formats,
|
'formats': formats,
|
||||||
'timestamp': unified_timestamp(timestamp),
|
'timestamp': unified_timestamp(timestamp),
|
||||||
'description': video_description,
|
'description': video_description,
|
||||||
'webpage_url': url,
|
'webpage_url': url,
|
||||||
'view_count': view_count,
|
|
||||||
'like_count': like_count,
|
|
||||||
'comment_count': comment_count,
|
|
||||||
'license': cc_license,
|
'license': cc_license,
|
||||||
'channel_id': channel_id,
|
})
|
||||||
'channel_url': channel_url,
|
|
||||||
}
|
|
||||||
|
|
||||||
info_dict = merge_dicts(info_dict, info_dict_config, json_ld)
|
return merge_dicts(info_dict, info_dict_config, json_ld)
|
||||||
|
|
||||||
return info_dict
|
|
||||||
|
|
||||||
|
|
||||||
class VimeoOndemandIE(VimeoIE):
|
class VimeoOndemandIE(VimeoIE):
|
||||||
IE_NAME = 'vimeo:ondemand'
|
IE_NAME = 'vimeo:ondemand'
|
||||||
_VALID_URL = r'https?://(?:www\.)?vimeo\.com/ondemand/([^/]+/)?(?P<id>[^/?#&]+)'
|
_VALID_URL = r'https?://(?:www\.)?vimeo\.com/ondemand/(?:[^/]+/)?(?P<id>[^/?#&]+)'
|
||||||
_TESTS = [{
|
_TESTS = [{
|
||||||
# ondemand video not available via https://vimeo.com/id
|
# ondemand video not available via https://vimeo.com/id
|
||||||
'url': 'https://vimeo.com/ondemand/20704',
|
'url': 'https://vimeo.com/ondemand/20704',
|
||||||
@ -939,11 +934,15 @@ class VimeoAlbumIE(VimeoBaseInfoExtractor):
|
|||||||
}
|
}
|
||||||
if hashed_pass:
|
if hashed_pass:
|
||||||
query['_hashed_pass'] = hashed_pass
|
query['_hashed_pass'] = hashed_pass
|
||||||
|
try:
|
||||||
videos = self._download_json(
|
videos = self._download_json(
|
||||||
'https://api.vimeo.com/albums/%s/videos' % album_id,
|
'https://api.vimeo.com/albums/%s/videos' % album_id,
|
||||||
album_id, 'Downloading page %d' % api_page, query=query, headers={
|
album_id, 'Downloading page %d' % api_page, query=query, headers={
|
||||||
'Authorization': 'jwt ' + authorization,
|
'Authorization': 'jwt ' + authorization,
|
||||||
})['data']
|
})['data']
|
||||||
|
except ExtractorError as e:
|
||||||
|
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 400:
|
||||||
|
return
|
||||||
for video in videos:
|
for video in videos:
|
||||||
link = video.get('link')
|
link = video.get('link')
|
||||||
if not link:
|
if not link:
|
||||||
@ -1058,9 +1057,22 @@ class VimeoReviewIE(VimeoBaseInfoExtractor):
|
|||||||
|
|
||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
page_url, video_id = re.match(self._VALID_URL, url).groups()
|
page_url, video_id = re.match(self._VALID_URL, url).groups()
|
||||||
clip_data = self._download_json(
|
data = self._download_json(
|
||||||
page_url.replace('/review/', '/review/data/'),
|
page_url.replace('/review/', '/review/data/'), video_id)
|
||||||
video_id)['clipData']
|
if data.get('isLocked') is True:
|
||||||
|
video_password = self._get_video_password()
|
||||||
|
viewer = self._download_json(
|
||||||
|
'https://vimeo.com/_rv/viewer', video_id)
|
||||||
|
webpage = self._verify_video_password(
|
||||||
|
'https://vimeo.com/' + video_id, video_id,
|
||||||
|
video_password, viewer['xsrft'], viewer['vuid'])
|
||||||
|
clip_page_config = self._parse_json(self._search_regex(
|
||||||
|
r'window\.vimeo\.clip_page_config\s*=\s*({.+?});',
|
||||||
|
webpage, 'clip page config'), video_id)
|
||||||
|
config_url = clip_page_config['player']['config_url']
|
||||||
|
clip_data = clip_page_config.get('clip') or {}
|
||||||
|
else:
|
||||||
|
clip_data = data['clipData']
|
||||||
config_url = clip_data['configUrl']
|
config_url = clip_data['configUrl']
|
||||||
config = self._download_json(config_url, video_id)
|
config = self._download_json(config_url, video_id)
|
||||||
info_dict = self._parse_config(config, video_id)
|
info_dict = self._parse_config(config, video_id)
|
||||||
|
@ -106,7 +106,7 @@ class VLiveIE(VLiveBaseIE):
|
|||||||
raise ExtractorError('Unable to log in', expected=True)
|
raise ExtractorError('Unable to log in', expected=True)
|
||||||
|
|
||||||
def _call_api(self, path_template, video_id, fields=None):
|
def _call_api(self, path_template, video_id, fields=None):
|
||||||
query = {'appId': self._APP_ID, 'gcc': 'KR'}
|
query = {'appId': self._APP_ID, 'gcc': 'KR', 'platformType': 'PC'}
|
||||||
if fields:
|
if fields:
|
||||||
query['fields'] = fields
|
query['fields'] = fields
|
||||||
try:
|
try:
|
||||||
|
@ -7,6 +7,8 @@ from ..compat import compat_urllib_parse_unquote
|
|||||||
from ..utils import (
|
from ..utils import (
|
||||||
ExtractorError,
|
ExtractorError,
|
||||||
int_or_none,
|
int_or_none,
|
||||||
|
try_get,
|
||||||
|
unified_timestamp,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
@ -19,14 +21,17 @@ class VoxMediaVolumeIE(OnceIE):
|
|||||||
|
|
||||||
setup = self._parse_json(self._search_regex(
|
setup = self._parse_json(self._search_regex(
|
||||||
r'setup\s*=\s*({.+});', webpage, 'setup'), video_id)
|
r'setup\s*=\s*({.+});', webpage, 'setup'), video_id)
|
||||||
video_data = setup.get('video') or {}
|
player_setup = setup.get('player_setup') or setup
|
||||||
|
video_data = player_setup.get('video') or {}
|
||||||
|
formatted_metadata = video_data.get('formatted_metadata') or {}
|
||||||
info = {
|
info = {
|
||||||
'id': video_id,
|
'id': video_id,
|
||||||
'title': video_data.get('title_short'),
|
'title': player_setup.get('title') or video_data.get('title_short'),
|
||||||
'description': video_data.get('description_long') or video_data.get('description_short'),
|
'description': video_data.get('description_long') or video_data.get('description_short'),
|
||||||
'thumbnail': video_data.get('brightcove_thumbnail')
|
'thumbnail': formatted_metadata.get('thumbnail') or video_data.get('brightcove_thumbnail'),
|
||||||
|
'timestamp': unified_timestamp(formatted_metadata.get('video_publish_date')),
|
||||||
}
|
}
|
||||||
asset = setup.get('asset') or setup.get('params') or {}
|
asset = try_get(setup, lambda x: x['embed_assets']['chorus'], dict) or {}
|
||||||
|
|
||||||
formats = []
|
formats = []
|
||||||
hls_url = asset.get('hls_url')
|
hls_url = asset.get('hls_url')
|
||||||
@ -47,6 +52,7 @@ class VoxMediaVolumeIE(OnceIE):
|
|||||||
if formats:
|
if formats:
|
||||||
self._sort_formats(formats)
|
self._sort_formats(formats)
|
||||||
info['formats'] = formats
|
info['formats'] = formats
|
||||||
|
info['duration'] = int_or_none(asset.get('duration'))
|
||||||
return info
|
return info
|
||||||
|
|
||||||
for provider_video_type in ('ooyala', 'youtube', 'brightcove'):
|
for provider_video_type in ('ooyala', 'youtube', 'brightcove'):
|
||||||
@ -84,7 +90,7 @@ class VoxMediaIE(InfoExtractor):
|
|||||||
}, {
|
}, {
|
||||||
# Volume embed, Youtube
|
# Volume embed, Youtube
|
||||||
'url': 'http://www.theverge.com/2014/10/21/7025853/google-nexus-6-hands-on-photos-video-android-phablet',
|
'url': 'http://www.theverge.com/2014/10/21/7025853/google-nexus-6-hands-on-photos-video-android-phablet',
|
||||||
'md5': '4c8f4a0937752b437c3ebc0ed24802b5',
|
'md5': 'fd19aa0cf3a0eea515d4fd5c8c0e9d68',
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
'id': 'Gy8Md3Eky38',
|
'id': 'Gy8Md3Eky38',
|
||||||
'ext': 'mp4',
|
'ext': 'mp4',
|
||||||
@ -93,6 +99,7 @@ class VoxMediaIE(InfoExtractor):
|
|||||||
'uploader_id': 'TheVerge',
|
'uploader_id': 'TheVerge',
|
||||||
'upload_date': '20141021',
|
'upload_date': '20141021',
|
||||||
'uploader': 'The Verge',
|
'uploader': 'The Verge',
|
||||||
|
'timestamp': 1413907200,
|
||||||
},
|
},
|
||||||
'add_ie': ['Youtube'],
|
'add_ie': ['Youtube'],
|
||||||
'skip': 'similar to the previous test',
|
'skip': 'similar to the previous test',
|
||||||
@ -100,13 +107,13 @@ class VoxMediaIE(InfoExtractor):
|
|||||||
# Volume embed, Youtube
|
# Volume embed, Youtube
|
||||||
'url': 'http://www.vox.com/2016/3/31/11336640/mississippi-lgbt-religious-freedom-bill',
|
'url': 'http://www.vox.com/2016/3/31/11336640/mississippi-lgbt-religious-freedom-bill',
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
'id': 'YCjDnX-Xzhg',
|
'id': '22986359b',
|
||||||
'ext': 'mp4',
|
'ext': 'mp4',
|
||||||
'title': "Mississippi's laws are so bad that its anti-LGBTQ law isn't needed to allow discrimination",
|
'title': "Mississippi's laws are so bad that its anti-LGBTQ law isn't needed to allow discrimination",
|
||||||
'description': 'md5:fc1317922057de31cd74bce91eb1c66c',
|
'description': 'md5:fc1317922057de31cd74bce91eb1c66c',
|
||||||
'uploader_id': 'voxdotcom',
|
|
||||||
'upload_date': '20150915',
|
'upload_date': '20150915',
|
||||||
'uploader': 'Vox',
|
'timestamp': 1442332800,
|
||||||
|
'duration': 285,
|
||||||
},
|
},
|
||||||
'add_ie': ['Youtube'],
|
'add_ie': ['Youtube'],
|
||||||
'skip': 'similar to the previous test',
|
'skip': 'similar to the previous test',
|
||||||
@ -160,6 +167,9 @@ class VoxMediaIE(InfoExtractor):
|
|||||||
'ext': 'mp4',
|
'ext': 'mp4',
|
||||||
'title': 'Post-Post-PC CEO: The Full Code Conference Video of Microsoft\'s Satya Nadella',
|
'title': 'Post-Post-PC CEO: The Full Code Conference Video of Microsoft\'s Satya Nadella',
|
||||||
'description': 'The longtime veteran was chosen earlier this year as the software giant\'s third leader in its history.',
|
'description': 'The longtime veteran was chosen earlier this year as the software giant\'s third leader in its history.',
|
||||||
|
'timestamp': 1402938000,
|
||||||
|
'upload_date': '20140616',
|
||||||
|
'duration': 4114,
|
||||||
},
|
},
|
||||||
'add_ie': ['VoxMediaVolume'],
|
'add_ie': ['VoxMediaVolume'],
|
||||||
}]
|
}]
|
||||||
|
@ -75,12 +75,15 @@ class VVVVIDIE(InfoExtractor):
|
|||||||
'https://www.vvvvid.it/user/login',
|
'https://www.vvvvid.it/user/login',
|
||||||
None, headers=self.geo_verification_headers())['data']['conn_id']
|
None, headers=self.geo_verification_headers())['data']['conn_id']
|
||||||
|
|
||||||
def _download_info(self, show_id, path, video_id, fatal=True):
|
def _download_info(self, show_id, path, video_id, fatal=True, query=None):
|
||||||
|
q = {
|
||||||
|
'conn_id': self._conn_id,
|
||||||
|
}
|
||||||
|
if query:
|
||||||
|
q.update(query)
|
||||||
response = self._download_json(
|
response = self._download_json(
|
||||||
'https://www.vvvvid.it/vvvvid/ondemand/%s/%s' % (show_id, path),
|
'https://www.vvvvid.it/vvvvid/ondemand/%s/%s' % (show_id, path),
|
||||||
video_id, headers=self.geo_verification_headers(), query={
|
video_id, headers=self.geo_verification_headers(), query=q, fatal=fatal)
|
||||||
'conn_id': self._conn_id,
|
|
||||||
}, fatal=fatal)
|
|
||||||
if not (response or fatal):
|
if not (response or fatal):
|
||||||
return
|
return
|
||||||
if response.get('result') == 'error':
|
if response.get('result') == 'error':
|
||||||
@ -98,7 +101,8 @@ class VVVVIDIE(InfoExtractor):
|
|||||||
show_id, season_id, video_id = re.match(self._VALID_URL, url).groups()
|
show_id, season_id, video_id = re.match(self._VALID_URL, url).groups()
|
||||||
|
|
||||||
response = self._download_info(
|
response = self._download_info(
|
||||||
show_id, 'season/%s' % season_id, video_id)
|
show_id, 'season/%s' % season_id,
|
||||||
|
video_id, query={'video_id': video_id})
|
||||||
|
|
||||||
vid = int(video_id)
|
vid = int(video_id)
|
||||||
video_data = list(filter(
|
video_data = list(filter(
|
||||||
@ -178,8 +182,8 @@ class VVVVIDIE(InfoExtractor):
|
|||||||
if not embed_code:
|
if not embed_code:
|
||||||
continue
|
continue
|
||||||
embed_code = ds(embed_code)
|
embed_code = ds(embed_code)
|
||||||
if video_type in ('video/rcs', 'video/kenc'):
|
|
||||||
if video_type == 'video/kenc':
|
if video_type == 'video/kenc':
|
||||||
|
embed_code = re.sub(r'https?(://[^/]+)/z/', r'https\1/i/', embed_code).replace('/manifest.f4m', '/master.m3u8')
|
||||||
kenc = self._download_json(
|
kenc = self._download_json(
|
||||||
'https://www.vvvvid.it/kenc', video_id, query={
|
'https://www.vvvvid.it/kenc', video_id, query={
|
||||||
'action': 'kt',
|
'action': 'kt',
|
||||||
@ -189,6 +193,9 @@ class VVVVIDIE(InfoExtractor):
|
|||||||
kenc_message = kenc.get('message')
|
kenc_message = kenc.get('message')
|
||||||
if kenc_message:
|
if kenc_message:
|
||||||
embed_code += '?' + ds(kenc_message)
|
embed_code += '?' + ds(kenc_message)
|
||||||
|
formats.extend(self._extract_m3u8_formats(
|
||||||
|
embed_code, video_id, 'mp4', m3u8_id='hls', fatal=False))
|
||||||
|
elif video_type == 'video/rcs':
|
||||||
formats.extend(self._extract_akamai_formats(embed_code, video_id))
|
formats.extend(self._extract_akamai_formats(embed_code, video_id))
|
||||||
elif video_type == 'video/youtube':
|
elif video_type == 'video/youtube':
|
||||||
info.update({
|
info.update({
|
||||||
@ -247,9 +254,13 @@ class VVVVIDShowIE(VVVVIDIE):
|
|||||||
show_info = self._download_info(
|
show_info = self._download_info(
|
||||||
show_id, 'info/', show_title, fatal=False)
|
show_id, 'info/', show_title, fatal=False)
|
||||||
|
|
||||||
|
if not show_title:
|
||||||
|
base_url += "/title"
|
||||||
|
|
||||||
entries = []
|
entries = []
|
||||||
for season in (seasons or []):
|
for season in (seasons or []):
|
||||||
episodes = season.get('episodes') or []
|
episodes = season.get('episodes') or []
|
||||||
|
playlist_title = season.get('name') or show_info.get('title')
|
||||||
for episode in episodes:
|
for episode in episodes:
|
||||||
if episode.get('playable') is False:
|
if episode.get('playable') is False:
|
||||||
continue
|
continue
|
||||||
@ -259,12 +270,13 @@ class VVVVIDShowIE(VVVVIDIE):
|
|||||||
continue
|
continue
|
||||||
info = self._extract_common_video_info(episode)
|
info = self._extract_common_video_info(episode)
|
||||||
info.update({
|
info.update({
|
||||||
'_type': 'url',
|
'_type': 'url_transparent',
|
||||||
'ie_key': VVVVIDIE.ie_key(),
|
'ie_key': VVVVIDIE.ie_key(),
|
||||||
'url': '/'.join([base_url, season_id, video_id]),
|
'url': '/'.join([base_url, season_id, video_id]),
|
||||||
'title': episode.get('title'),
|
'title': episode.get('title'),
|
||||||
'description': episode.get('description'),
|
'description': episode.get('description'),
|
||||||
'season_id': season_id,
|
'season_id': season_id,
|
||||||
|
'playlist_title': playlist_title,
|
||||||
})
|
})
|
||||||
entries.append(info)
|
entries.append(info)
|
||||||
|
|
||||||
|
@ -4,9 +4,10 @@ from __future__ import unicode_literals
|
|||||||
from .common import InfoExtractor
|
from .common import InfoExtractor
|
||||||
from ..compat import compat_str
|
from ..compat import compat_str
|
||||||
from ..utils import (
|
from ..utils import (
|
||||||
unified_strdate,
|
ExtractorError,
|
||||||
HEADRequest,
|
|
||||||
int_or_none,
|
int_or_none,
|
||||||
|
try_get,
|
||||||
|
unified_strdate,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
@ -29,6 +30,7 @@ class WatIE(InfoExtractor):
|
|||||||
'skip_download': True,
|
'skip_download': True,
|
||||||
},
|
},
|
||||||
'expected_warnings': ['HTTP Error 404'],
|
'expected_warnings': ['HTTP Error 404'],
|
||||||
|
'skip': 'This content is no longer available',
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
'url': 'http://www.wat.tv/video/gregory-lemarchal-voix-ange-6z1v7_6ygkj_.html',
|
'url': 'http://www.wat.tv/video/gregory-lemarchal-voix-ange-6z1v7_6ygkj_.html',
|
||||||
@ -40,8 +42,10 @@ class WatIE(InfoExtractor):
|
|||||||
'upload_date': '20140816',
|
'upload_date': '20140816',
|
||||||
},
|
},
|
||||||
'expected_warnings': ["Ce contenu n'est pas disponible pour l'instant."],
|
'expected_warnings': ["Ce contenu n'est pas disponible pour l'instant."],
|
||||||
|
'skip': 'This content is no longer available',
|
||||||
},
|
},
|
||||||
]
|
]
|
||||||
|
_GEO_BYPASS = False
|
||||||
|
|
||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
video_id = self._match_id(url)
|
video_id = self._match_id(url)
|
||||||
@ -49,71 +53,54 @@ class WatIE(InfoExtractor):
|
|||||||
|
|
||||||
# 'contentv4' is used in the website, but it also returns the related
|
# 'contentv4' is used in the website, but it also returns the related
|
||||||
# videos, we don't need them
|
# videos, we don't need them
|
||||||
|
# video_data = self._download_json(
|
||||||
|
# 'http://www.wat.tv/interface/contentv4s/' + video_id, video_id)
|
||||||
video_data = self._download_json(
|
video_data = self._download_json(
|
||||||
'http://www.wat.tv/interface/contentv4s/' + video_id, video_id)
|
'https://mediainfo.tf1.fr/mediainfocombo/' + video_id,
|
||||||
|
video_id, query={'context': 'MYTF1'})
|
||||||
video_info = video_data['media']
|
video_info = video_data['media']
|
||||||
|
|
||||||
error_desc = video_info.get('error_desc')
|
error_desc = video_info.get('error_desc')
|
||||||
if error_desc:
|
if error_desc:
|
||||||
self.report_warning(
|
if video_info.get('error_code') == 'GEOBLOCKED':
|
||||||
'%s returned error: %s' % (self.IE_NAME, error_desc))
|
self.raise_geo_restricted(error_desc, video_info.get('geoList'))
|
||||||
|
raise ExtractorError(error_desc, expected=True)
|
||||||
|
|
||||||
chapters = video_info['chapters']
|
title = video_info['title']
|
||||||
if chapters:
|
|
||||||
first_chapter = chapters[0]
|
|
||||||
|
|
||||||
def video_id_for_chapter(chapter):
|
|
||||||
return chapter['tc_start'].split('-')[0]
|
|
||||||
|
|
||||||
if video_id_for_chapter(first_chapter) != video_id:
|
|
||||||
self.to_screen('Multipart video detected')
|
|
||||||
entries = [self.url_result('wat:%s' % video_id_for_chapter(chapter)) for chapter in chapters]
|
|
||||||
return self.playlist_result(entries, video_id, video_info['title'])
|
|
||||||
# Otherwise we can continue and extract just one part, we have to use
|
|
||||||
# the video id for getting the video url
|
|
||||||
else:
|
|
||||||
first_chapter = video_info
|
|
||||||
|
|
||||||
title = first_chapter['title']
|
|
||||||
|
|
||||||
def extract_url(path_template, url_type):
|
|
||||||
req_url = 'http://www.wat.tv/get/%s' % (path_template % video_id)
|
|
||||||
head = self._request_webpage(HEADRequest(req_url), video_id, 'Extracting %s url' % url_type, fatal=False)
|
|
||||||
if head:
|
|
||||||
red_url = head.geturl()
|
|
||||||
if req_url != red_url:
|
|
||||||
return red_url
|
|
||||||
return None
|
|
||||||
|
|
||||||
formats = []
|
formats = []
|
||||||
manifest_urls = self._download_json(
|
|
||||||
'http://www.wat.tv/get/webhtml/' + video_id, video_id)
|
|
||||||
m3u8_url = manifest_urls.get('hls')
|
|
||||||
if m3u8_url:
|
|
||||||
formats.extend(self._extract_m3u8_formats(
|
|
||||||
m3u8_url, video_id, 'mp4',
|
|
||||||
'm3u8_native', m3u8_id='hls', fatal=False))
|
|
||||||
mpd_url = manifest_urls.get('mpd')
|
|
||||||
if mpd_url:
|
|
||||||
formats.extend(self._extract_mpd_formats(
|
|
||||||
mpd_url.replace('://das-q1.tf1.fr/', '://das-q1-ssl.tf1.fr/'),
|
|
||||||
video_id, mpd_id='dash', fatal=False))
|
|
||||||
self._sort_formats(formats)
|
|
||||||
|
|
||||||
date_diffusion = first_chapter.get('date_diffusion') or video_data.get('configv4', {}).get('estatS4')
|
def extract_formats(manifest_urls):
|
||||||
upload_date = unified_strdate(date_diffusion) if date_diffusion else None
|
for f, f_url in manifest_urls.items():
|
||||||
duration = None
|
if not f_url:
|
||||||
files = video_info['files']
|
continue
|
||||||
if files:
|
if f in ('dash', 'mpd'):
|
||||||
duration = int_or_none(files[0].get('duration'))
|
formats.extend(self._extract_mpd_formats(
|
||||||
|
f_url.replace('://das-q1.tf1.fr/', '://das-q1-ssl.tf1.fr/'),
|
||||||
|
video_id, mpd_id='dash', fatal=False))
|
||||||
|
elif f == 'hls':
|
||||||
|
formats.extend(self._extract_m3u8_formats(
|
||||||
|
f_url, video_id, 'mp4',
|
||||||
|
'm3u8_native', m3u8_id='hls', fatal=False))
|
||||||
|
|
||||||
|
delivery = video_data.get('delivery') or {}
|
||||||
|
extract_formats({delivery.get('format'): delivery.get('url')})
|
||||||
|
if not formats:
|
||||||
|
if delivery.get('drm'):
|
||||||
|
raise ExtractorError('This video is DRM protected.', expected=True)
|
||||||
|
manifest_urls = self._download_json(
|
||||||
|
'http://www.wat.tv/get/webhtml/' + video_id, video_id, fatal=False)
|
||||||
|
if manifest_urls:
|
||||||
|
extract_formats(manifest_urls)
|
||||||
|
|
||||||
|
self._sort_formats(formats)
|
||||||
|
|
||||||
return {
|
return {
|
||||||
'id': video_id,
|
'id': video_id,
|
||||||
'title': title,
|
'title': title,
|
||||||
'thumbnail': first_chapter.get('preview'),
|
'thumbnail': video_info.get('preview'),
|
||||||
'description': first_chapter.get('description'),
|
'upload_date': unified_strdate(try_get(
|
||||||
'view_count': int_or_none(video_info.get('views')),
|
video_data, lambda x: x['mediametrie']['chapters'][0]['estatS4'])),
|
||||||
'upload_date': upload_date,
|
'duration': int_or_none(video_info.get('duration')),
|
||||||
'duration': duration,
|
|
||||||
'formats': formats,
|
'formats': formats,
|
||||||
}
|
}
|
||||||
|
@ -58,6 +58,7 @@ class XFileShareIE(InfoExtractor):
|
|||||||
(r'vidlocker\.xyz', 'VidLocker'),
|
(r'vidlocker\.xyz', 'VidLocker'),
|
||||||
(r'vidshare\.tv', 'VidShare'),
|
(r'vidshare\.tv', 'VidShare'),
|
||||||
(r'vup\.to', 'VUp'),
|
(r'vup\.to', 'VUp'),
|
||||||
|
(r'wolfstream\.tv', 'WolfStream'),
|
||||||
(r'xvideosharing\.com', 'XVideoSharing'),
|
(r'xvideosharing\.com', 'XVideoSharing'),
|
||||||
)
|
)
|
||||||
|
|
||||||
@ -82,6 +83,9 @@ class XFileShareIE(InfoExtractor):
|
|||||||
}, {
|
}, {
|
||||||
'url': 'https://aparat.cam/n4d6dh0wvlpr',
|
'url': 'https://aparat.cam/n4d6dh0wvlpr',
|
||||||
'only_matching': True,
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://wolfstream.tv/nthme29v9u2x',
|
||||||
|
'only_matching': True,
|
||||||
}]
|
}]
|
||||||
|
|
||||||
@staticmethod
|
@staticmethod
|
||||||
|
@ -11,6 +11,7 @@ from ..utils import (
|
|||||||
parse_duration,
|
parse_duration,
|
||||||
sanitized_Request,
|
sanitized_Request,
|
||||||
str_to_int,
|
str_to_int,
|
||||||
|
url_or_none,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
@ -87,10 +88,10 @@ class XTubeIE(InfoExtractor):
|
|||||||
'Cookie': 'age_verified=1; cookiesAccepted=1',
|
'Cookie': 'age_verified=1; cookiesAccepted=1',
|
||||||
})
|
})
|
||||||
|
|
||||||
title, thumbnail, duration = [None] * 3
|
title, thumbnail, duration, sources, media_definition = [None] * 5
|
||||||
|
|
||||||
config = self._parse_json(self._search_regex(
|
config = self._parse_json(self._search_regex(
|
||||||
r'playerConf\s*=\s*({.+?})\s*,\s*(?:\n|loaderConf)', webpage, 'config',
|
r'playerConf\s*=\s*({.+?})\s*,\s*(?:\n|loaderConf|playerWrapper)', webpage, 'config',
|
||||||
default='{}'), video_id, transform_source=js_to_json, fatal=False)
|
default='{}'), video_id, transform_source=js_to_json, fatal=False)
|
||||||
if config:
|
if config:
|
||||||
config = config.get('mainRoll')
|
config = config.get('mainRoll')
|
||||||
@ -99,20 +100,52 @@ class XTubeIE(InfoExtractor):
|
|||||||
thumbnail = config.get('poster')
|
thumbnail = config.get('poster')
|
||||||
duration = int_or_none(config.get('duration'))
|
duration = int_or_none(config.get('duration'))
|
||||||
sources = config.get('sources') or config.get('format')
|
sources = config.get('sources') or config.get('format')
|
||||||
|
media_definition = config.get('mediaDefinition')
|
||||||
|
|
||||||
if not isinstance(sources, dict):
|
if not isinstance(sources, dict) and not media_definition:
|
||||||
sources = self._parse_json(self._search_regex(
|
sources = self._parse_json(self._search_regex(
|
||||||
r'(["\'])?sources\1?\s*:\s*(?P<sources>{.+?}),',
|
r'(["\'])?sources\1?\s*:\s*(?P<sources>{.+?}),',
|
||||||
webpage, 'sources', group='sources'), video_id,
|
webpage, 'sources', group='sources'), video_id,
|
||||||
transform_source=js_to_json)
|
transform_source=js_to_json)
|
||||||
|
|
||||||
formats = []
|
formats = []
|
||||||
|
format_urls = set()
|
||||||
|
|
||||||
|
if isinstance(sources, dict):
|
||||||
for format_id, format_url in sources.items():
|
for format_id, format_url in sources.items():
|
||||||
|
format_url = url_or_none(format_url)
|
||||||
|
if not format_url:
|
||||||
|
continue
|
||||||
|
if format_url in format_urls:
|
||||||
|
continue
|
||||||
|
format_urls.add(format_url)
|
||||||
formats.append({
|
formats.append({
|
||||||
'url': format_url,
|
'url': format_url,
|
||||||
'format_id': format_id,
|
'format_id': format_id,
|
||||||
'height': int_or_none(format_id),
|
'height': int_or_none(format_id),
|
||||||
})
|
})
|
||||||
|
|
||||||
|
if isinstance(media_definition, list):
|
||||||
|
for media in media_definition:
|
||||||
|
video_url = url_or_none(media.get('videoUrl'))
|
||||||
|
if not video_url:
|
||||||
|
continue
|
||||||
|
if video_url in format_urls:
|
||||||
|
continue
|
||||||
|
format_urls.add(video_url)
|
||||||
|
format_id = media.get('format')
|
||||||
|
if format_id == 'hls':
|
||||||
|
formats.extend(self._extract_m3u8_formats(
|
||||||
|
video_url, video_id, 'mp4', entry_protocol='m3u8_native',
|
||||||
|
m3u8_id='hls', fatal=False))
|
||||||
|
elif format_id == 'mp4':
|
||||||
|
height = int_or_none(media.get('quality'))
|
||||||
|
formats.append({
|
||||||
|
'url': video_url,
|
||||||
|
'format_id': '%s-%d' % (format_id, height) if height else format_id,
|
||||||
|
'height': height,
|
||||||
|
})
|
||||||
|
|
||||||
self._remove_duplicate_formats(formats)
|
self._remove_duplicate_formats(formats)
|
||||||
self._sort_formats(formats)
|
self._sort_formats(formats)
|
||||||
|
|
||||||
|
@ -154,7 +154,7 @@ class YoukuIE(InfoExtractor):
|
|||||||
# request basic data
|
# request basic data
|
||||||
basic_data_params = {
|
basic_data_params = {
|
||||||
'vid': video_id,
|
'vid': video_id,
|
||||||
'ccode': '0590',
|
'ccode': '0532',
|
||||||
'client_ip': '192.168.1.1',
|
'client_ip': '192.168.1.1',
|
||||||
'utid': cna,
|
'utid': cna,
|
||||||
'client_ts': time.time() / 1000,
|
'client_ts': time.time() / 1000,
|
||||||
|
@ -25,6 +25,7 @@ class YouPornIE(InfoExtractor):
|
|||||||
'title': 'Sex Ed: Is It Safe To Masturbate Daily?',
|
'title': 'Sex Ed: Is It Safe To Masturbate Daily?',
|
||||||
'description': 'Love & Sex Answers: http://bit.ly/DanAndJenn -- Is It Unhealthy To Masturbate Daily?',
|
'description': 'Love & Sex Answers: http://bit.ly/DanAndJenn -- Is It Unhealthy To Masturbate Daily?',
|
||||||
'thumbnail': r're:^https?://.*\.jpg$',
|
'thumbnail': r're:^https?://.*\.jpg$',
|
||||||
|
'duration': 210,
|
||||||
'uploader': 'Ask Dan And Jennifer',
|
'uploader': 'Ask Dan And Jennifer',
|
||||||
'upload_date': '20101217',
|
'upload_date': '20101217',
|
||||||
'average_rating': int,
|
'average_rating': int,
|
||||||
@ -54,6 +55,7 @@ class YouPornIE(InfoExtractor):
|
|||||||
'params': {
|
'params': {
|
||||||
'skip_download': True,
|
'skip_download': True,
|
||||||
},
|
},
|
||||||
|
'skip': '404',
|
||||||
}, {
|
}, {
|
||||||
'url': 'https://www.youporn.com/embed/505835/sex-ed-is-it-safe-to-masturbate-daily/',
|
'url': 'https://www.youporn.com/embed/505835/sex-ed-is-it-safe-to-masturbate-daily/',
|
||||||
'only_matching': True,
|
'only_matching': True,
|
||||||
@ -153,6 +155,8 @@ class YouPornIE(InfoExtractor):
|
|||||||
thumbnail = self._search_regex(
|
thumbnail = self._search_regex(
|
||||||
r'(?:imageurl\s*=|poster\s*:)\s*(["\'])(?P<thumbnail>.+?)\1',
|
r'(?:imageurl\s*=|poster\s*:)\s*(["\'])(?P<thumbnail>.+?)\1',
|
||||||
webpage, 'thumbnail', fatal=False, group='thumbnail')
|
webpage, 'thumbnail', fatal=False, group='thumbnail')
|
||||||
|
duration = int_or_none(self._html_search_meta(
|
||||||
|
'video:duration', webpage, 'duration', fatal=False))
|
||||||
|
|
||||||
uploader = self._html_search_regex(
|
uploader = self._html_search_regex(
|
||||||
r'(?s)<div[^>]+class=["\']submitByLink["\'][^>]*>(.+?)</div>',
|
r'(?s)<div[^>]+class=["\']submitByLink["\'][^>]*>(.+?)</div>',
|
||||||
@ -194,6 +198,7 @@ class YouPornIE(InfoExtractor):
|
|||||||
'title': title,
|
'title': title,
|
||||||
'description': description,
|
'description': description,
|
||||||
'thumbnail': thumbnail,
|
'thumbnail': thumbnail,
|
||||||
|
'duration': duration,
|
||||||
'uploader': uploader,
|
'uploader': uploader,
|
||||||
'upload_date': upload_date,
|
'upload_date': upload_date,
|
||||||
'average_rating': average_rating,
|
'average_rating': average_rating,
|
||||||
|
@ -24,6 +24,7 @@ from ..jsinterp import JSInterpreter
|
|||||||
from ..utils import (
|
from ..utils import (
|
||||||
ExtractorError,
|
ExtractorError,
|
||||||
clean_html,
|
clean_html,
|
||||||
|
dict_get,
|
||||||
float_or_none,
|
float_or_none,
|
||||||
int_or_none,
|
int_or_none,
|
||||||
mimetype2ext,
|
mimetype2ext,
|
||||||
@ -45,6 +46,10 @@ from ..utils import (
|
|||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def parse_qs(url):
|
||||||
|
return compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
|
||||||
|
|
||||||
|
|
||||||
class YoutubeBaseInfoExtractor(InfoExtractor):
|
class YoutubeBaseInfoExtractor(InfoExtractor):
|
||||||
"""Provide base functions for Youtube extractors"""
|
"""Provide base functions for Youtube extractors"""
|
||||||
_LOGIN_URL = 'https://accounts.google.com/ServiceLogin'
|
_LOGIN_URL = 'https://accounts.google.com/ServiceLogin'
|
||||||
@ -60,11 +65,6 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
|
|||||||
|
|
||||||
_PLAYLIST_ID_RE = r'(?:(?:PL|LL|EC|UU|FL|RD|UL|TL|PU|OLAK5uy_)[0-9A-Za-z-_]{10,}|RDMM)'
|
_PLAYLIST_ID_RE = r'(?:(?:PL|LL|EC|UU|FL|RD|UL|TL|PU|OLAK5uy_)[0-9A-Za-z-_]{10,}|RDMM)'
|
||||||
|
|
||||||
def _ids_to_results(self, ids):
|
|
||||||
return [
|
|
||||||
self.url_result(vid_id, 'Youtube', video_id=vid_id)
|
|
||||||
for vid_id in ids]
|
|
||||||
|
|
||||||
def _login(self):
|
def _login(self):
|
||||||
"""
|
"""
|
||||||
Attempt to log in to YouTube.
|
Attempt to log in to YouTube.
|
||||||
@ -248,7 +248,23 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
|
|||||||
|
|
||||||
return True
|
return True
|
||||||
|
|
||||||
|
def _initialize_consent(self):
|
||||||
|
cookies = self._get_cookies('https://www.youtube.com/')
|
||||||
|
if cookies.get('__Secure-3PSID'):
|
||||||
|
return
|
||||||
|
consent_id = None
|
||||||
|
consent = cookies.get('CONSENT')
|
||||||
|
if consent:
|
||||||
|
if 'YES' in consent.value:
|
||||||
|
return
|
||||||
|
consent_id = self._search_regex(
|
||||||
|
r'PENDING\+(\d+)', consent.value, 'consent', default=None)
|
||||||
|
if not consent_id:
|
||||||
|
consent_id = random.randint(100, 999)
|
||||||
|
self._set_cookie('.youtube.com', 'CONSENT', 'YES+cb.20210328-17-p0.en+FX+%s' % consent_id)
|
||||||
|
|
||||||
def _real_initialize(self):
|
def _real_initialize(self):
|
||||||
|
self._initialize_consent()
|
||||||
if self._downloader is None:
|
if self._downloader is None:
|
||||||
return
|
return
|
||||||
if not self._login():
|
if not self._login():
|
||||||
@ -289,7 +305,7 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
|
|||||||
return self._parse_json(
|
return self._parse_json(
|
||||||
self._search_regex(
|
self._search_regex(
|
||||||
r'ytcfg\.set\s*\(\s*({.+?})\s*\)\s*;', webpage, 'ytcfg',
|
r'ytcfg\.set\s*\(\s*({.+?})\s*\)\s*;', webpage, 'ytcfg',
|
||||||
default='{}'), video_id, fatal=False)
|
default='{}'), video_id, fatal=False) or {}
|
||||||
|
|
||||||
def _extract_video(self, renderer):
|
def _extract_video(self, renderer):
|
||||||
video_id = renderer['videoId']
|
video_id = renderer['videoId']
|
||||||
@ -312,7 +328,7 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
|
|||||||
(lambda x: x['ownerText']['runs'][0]['text'],
|
(lambda x: x['ownerText']['runs'][0]['text'],
|
||||||
lambda x: x['shortBylineText']['runs'][0]['text']), compat_str)
|
lambda x: x['shortBylineText']['runs'][0]['text']), compat_str)
|
||||||
return {
|
return {
|
||||||
'_type': 'url_transparent',
|
'_type': 'url',
|
||||||
'ie_key': YoutubeIE.ie_key(),
|
'ie_key': YoutubeIE.ie_key(),
|
||||||
'id': video_id,
|
'id': video_id,
|
||||||
'url': video_id,
|
'url': video_id,
|
||||||
@ -338,21 +354,28 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||||||
r'(?:www\.)?invidious\.mastodon\.host',
|
r'(?:www\.)?invidious\.mastodon\.host',
|
||||||
r'(?:www\.)?invidious\.zapashcanon\.fr',
|
r'(?:www\.)?invidious\.zapashcanon\.fr',
|
||||||
r'(?:www\.)?invidious\.kavin\.rocks',
|
r'(?:www\.)?invidious\.kavin\.rocks',
|
||||||
|
r'(?:www\.)?invidious\.tinfoil-hat\.net',
|
||||||
|
r'(?:www\.)?invidious\.himiko\.cloud',
|
||||||
|
r'(?:www\.)?invidious\.reallyancient\.tech',
|
||||||
r'(?:www\.)?invidious\.tube',
|
r'(?:www\.)?invidious\.tube',
|
||||||
r'(?:www\.)?invidiou\.site',
|
r'(?:www\.)?invidiou\.site',
|
||||||
r'(?:www\.)?invidious\.site',
|
r'(?:www\.)?invidious\.site',
|
||||||
r'(?:www\.)?invidious\.xyz',
|
r'(?:www\.)?invidious\.xyz',
|
||||||
r'(?:www\.)?invidious\.nixnet\.xyz',
|
r'(?:www\.)?invidious\.nixnet\.xyz',
|
||||||
|
r'(?:www\.)?invidious\.048596\.xyz',
|
||||||
r'(?:www\.)?invidious\.drycat\.fr',
|
r'(?:www\.)?invidious\.drycat\.fr',
|
||||||
|
r'(?:www\.)?inv\.skyn3t\.in',
|
||||||
r'(?:www\.)?tube\.poal\.co',
|
r'(?:www\.)?tube\.poal\.co',
|
||||||
r'(?:www\.)?tube\.connect\.cafe',
|
r'(?:www\.)?tube\.connect\.cafe',
|
||||||
r'(?:www\.)?vid\.wxzm\.sx',
|
r'(?:www\.)?vid\.wxzm\.sx',
|
||||||
r'(?:www\.)?vid\.mint\.lgbt',
|
r'(?:www\.)?vid\.mint\.lgbt',
|
||||||
|
r'(?:www\.)?vid\.puffyan\.us',
|
||||||
r'(?:www\.)?yewtu\.be',
|
r'(?:www\.)?yewtu\.be',
|
||||||
r'(?:www\.)?yt\.elukerio\.org',
|
r'(?:www\.)?yt\.elukerio\.org',
|
||||||
r'(?:www\.)?yt\.lelux\.fi',
|
r'(?:www\.)?yt\.lelux\.fi',
|
||||||
r'(?:www\.)?invidious\.ggc-project\.de',
|
r'(?:www\.)?invidious\.ggc-project\.de',
|
||||||
r'(?:www\.)?yt\.maisputain\.ovh',
|
r'(?:www\.)?yt\.maisputain\.ovh',
|
||||||
|
r'(?:www\.)?ytprivate\.com',
|
||||||
r'(?:www\.)?invidious\.13ad\.de',
|
r'(?:www\.)?invidious\.13ad\.de',
|
||||||
r'(?:www\.)?invidious\.toot\.koeln',
|
r'(?:www\.)?invidious\.toot\.koeln',
|
||||||
r'(?:www\.)?invidious\.fdn\.fr',
|
r'(?:www\.)?invidious\.fdn\.fr',
|
||||||
@ -397,15 +420,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||||||
)
|
)
|
||||||
)? # all until now is optional -> you can pass the naked ID
|
)? # all until now is optional -> you can pass the naked ID
|
||||||
(?P<id>[0-9A-Za-z_-]{11}) # here is it! the YouTube video ID
|
(?P<id>[0-9A-Za-z_-]{11}) # here is it! the YouTube video ID
|
||||||
(?!.*?\blist=
|
|
||||||
(?:
|
|
||||||
%(playlist_id)s| # combined list/video URLs are handled by the playlist IE
|
|
||||||
WL # WL are handled by the watch later IE
|
|
||||||
)
|
|
||||||
)
|
|
||||||
(?(1).+)? # if we found the ID, everything can follow
|
(?(1).+)? # if we found the ID, everything can follow
|
||||||
$""" % {
|
$""" % {
|
||||||
'playlist_id': YoutubeBaseInfoExtractor._PLAYLIST_ID_RE,
|
|
||||||
'invidious': '|'.join(_INVIDIOUS_SITES),
|
'invidious': '|'.join(_INVIDIOUS_SITES),
|
||||||
}
|
}
|
||||||
_PLAYER_INFO_RE = (
|
_PLAYER_INFO_RE = (
|
||||||
@ -791,6 +807,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||||||
},
|
},
|
||||||
'skip': 'This video does not exist.',
|
'skip': 'This video does not exist.',
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
# Video with incomplete 'yt:stretch=16:'
|
||||||
|
'url': 'https://www.youtube.com/watch?v=FRhJzUSJbGI',
|
||||||
|
'only_matching': True,
|
||||||
|
},
|
||||||
{
|
{
|
||||||
# Video licensed under Creative Commons
|
# Video licensed under Creative Commons
|
||||||
'url': 'https://www.youtube.com/watch?v=M4gD1WSo5mA',
|
'url': 'https://www.youtube.com/watch?v=M4gD1WSo5mA',
|
||||||
@ -1067,6 +1088,23 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||||||
'url': 'https://www.youtube.com/watch?v=nGC3D_FkCmg',
|
'url': 'https://www.youtube.com/watch?v=nGC3D_FkCmg',
|
||||||
'only_matching': True,
|
'only_matching': True,
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
# restricted location, https://github.com/ytdl-org/youtube-dl/issues/28685
|
||||||
|
'url': 'cBvYw8_A0vQ',
|
||||||
|
'info_dict': {
|
||||||
|
'id': 'cBvYw8_A0vQ',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'title': '4K Ueno Okachimachi Street Scenes 上野御徒町歩き',
|
||||||
|
'description': 'md5:ea770e474b7cd6722b4c95b833c03630',
|
||||||
|
'upload_date': '20201120',
|
||||||
|
'uploader': 'Walk around Japan',
|
||||||
|
'uploader_id': 'UC3o_t8PzBmXf5S9b7GLx1Mw',
|
||||||
|
'uploader_url': r're:https?://(?:www\.)?youtube\.com/channel/UC3o_t8PzBmXf5S9b7GLx1Mw',
|
||||||
|
},
|
||||||
|
'params': {
|
||||||
|
'skip_download': True,
|
||||||
|
},
|
||||||
|
},
|
||||||
]
|
]
|
||||||
_formats = {
|
_formats = {
|
||||||
'5': {'ext': 'flv', 'width': 400, 'height': 240, 'acodec': 'mp3', 'abr': 64, 'vcodec': 'h263'},
|
'5': {'ext': 'flv', 'width': 400, 'height': 240, 'acodec': 'mp3', 'abr': 64, 'vcodec': 'h263'},
|
||||||
@ -1174,6 +1212,16 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||||||
'397': {'acodec': 'none', 'vcodec': 'av01.0.05M.08'},
|
'397': {'acodec': 'none', 'vcodec': 'av01.0.05M.08'},
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def suitable(cls, url):
|
||||||
|
# Hack for lazy extractors until more generic solution is implemented
|
||||||
|
# (see #28780)
|
||||||
|
from .youtube import parse_qs
|
||||||
|
qs = parse_qs(url)
|
||||||
|
if qs.get('list', [None])[0]:
|
||||||
|
return False
|
||||||
|
return super(YoutubeIE, cls).suitable(url)
|
||||||
|
|
||||||
def __init__(self, *args, **kwargs):
|
def __init__(self, *args, **kwargs):
|
||||||
super(YoutubeIE, self).__init__(*args, **kwargs)
|
super(YoutubeIE, self).__init__(*args, **kwargs)
|
||||||
self._code_cache = {}
|
self._code_cache = {}
|
||||||
@ -1431,7 +1479,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||||||
base_url = self.http_scheme() + '//www.youtube.com/'
|
base_url = self.http_scheme() + '//www.youtube.com/'
|
||||||
webpage_url = base_url + 'watch?v=' + video_id
|
webpage_url = base_url + 'watch?v=' + video_id
|
||||||
webpage = self._download_webpage(
|
webpage = self._download_webpage(
|
||||||
webpage_url + '&bpctr=9999999999', video_id, fatal=False)
|
webpage_url + '&bpctr=9999999999&has_verified=1', video_id, fatal=False)
|
||||||
|
|
||||||
player_response = None
|
player_response = None
|
||||||
if webpage:
|
if webpage:
|
||||||
@ -1450,7 +1498,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||||||
'Refetching age-gated info webpage',
|
'Refetching age-gated info webpage',
|
||||||
'unable to download video info webpage', query={
|
'unable to download video info webpage', query={
|
||||||
'video_id': video_id,
|
'video_id': video_id,
|
||||||
'eurl': 'https://www.youtube.com/embed/' + video_id,
|
'eurl': 'https://youtube.googleapis.com/v/' + video_id,
|
||||||
}, fatal=False)),
|
}, fatal=False)),
|
||||||
lambda x: x['player_response'][0],
|
lambda x: x['player_response'][0],
|
||||||
compat_str) or '{}', video_id)
|
compat_str) or '{}', video_id)
|
||||||
@ -1468,7 +1516,13 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||||||
def get_text(x):
|
def get_text(x):
|
||||||
if not x:
|
if not x:
|
||||||
return
|
return
|
||||||
return x.get('simpleText') or ''.join([r['text'] for r in x['runs']])
|
text = x.get('simpleText')
|
||||||
|
if text and isinstance(text, compat_str):
|
||||||
|
return text
|
||||||
|
runs = x.get('runs')
|
||||||
|
if not isinstance(runs, list):
|
||||||
|
return
|
||||||
|
return ''.join([r['text'] for r in runs if isinstance(r.get('text'), compat_str)])
|
||||||
|
|
||||||
search_meta = (
|
search_meta = (
|
||||||
lambda x: self._html_search_meta(x, webpage, default=None)) \
|
lambda x: self._html_search_meta(x, webpage, default=None)) \
|
||||||
@ -1617,7 +1671,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||||||
f['format_id'] = itag
|
f['format_id'] = itag
|
||||||
formats.append(f)
|
formats.append(f)
|
||||||
|
|
||||||
if self._downloader.params.get('youtube_include_dash_manifest'):
|
if self._downloader.params.get('youtube_include_dash_manifest', True):
|
||||||
dash_manifest_url = streaming_data.get('dashManifestUrl')
|
dash_manifest_url = streaming_data.get('dashManifestUrl')
|
||||||
if dash_manifest_url:
|
if dash_manifest_url:
|
||||||
for f in self._extract_mpd_formats(
|
for f in self._extract_mpd_formats(
|
||||||
@ -1666,13 +1720,16 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||||||
for m in re.finditer(self._meta_regex('og:video:tag'), webpage)]
|
for m in re.finditer(self._meta_regex('og:video:tag'), webpage)]
|
||||||
for keyword in keywords:
|
for keyword in keywords:
|
||||||
if keyword.startswith('yt:stretch='):
|
if keyword.startswith('yt:stretch='):
|
||||||
w, h = keyword.split('=')[1].split(':')
|
mobj = re.search(r'(\d+)\s*:\s*(\d+)', keyword)
|
||||||
w, h = int(w), int(h)
|
if mobj:
|
||||||
|
# NB: float is intentional for forcing float division
|
||||||
|
w, h = (float(v) for v in mobj.groups())
|
||||||
if w > 0 and h > 0:
|
if w > 0 and h > 0:
|
||||||
ratio = w / h
|
ratio = w / h
|
||||||
for f in formats:
|
for f in formats:
|
||||||
if f.get('vcodec') != 'none':
|
if f.get('vcodec') != 'none':
|
||||||
f['stretched_ratio'] = ratio
|
f['stretched_ratio'] = ratio
|
||||||
|
break
|
||||||
|
|
||||||
thumbnails = []
|
thumbnails = []
|
||||||
for container in (video_details, microformat):
|
for container in (video_details, microformat):
|
||||||
@ -1895,7 +1952,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
|
|||||||
info['channel'] = get_text(try_get(
|
info['channel'] = get_text(try_get(
|
||||||
vsir,
|
vsir,
|
||||||
lambda x: x['owner']['videoOwnerRenderer']['title'],
|
lambda x: x['owner']['videoOwnerRenderer']['title'],
|
||||||
compat_str))
|
dict))
|
||||||
rows = try_get(
|
rows = try_get(
|
||||||
vsir,
|
vsir,
|
||||||
lambda x: x['metadataRowContainer']['metadataRowContainerRenderer']['rows'],
|
lambda x: x['metadataRowContainer']['metadataRowContainerRenderer']['rows'],
|
||||||
@ -1942,7 +1999,7 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
|
|||||||
invidio\.us
|
invidio\.us
|
||||||
)/
|
)/
|
||||||
(?:
|
(?:
|
||||||
(?:channel|c|user|feed)/|
|
(?:channel|c|user|feed|hashtag)/|
|
||||||
(?:playlist|watch)\?.*?\blist=|
|
(?:playlist|watch)\?.*?\blist=|
|
||||||
(?!(?:watch|embed|v|e)\b)
|
(?!(?:watch|embed|v|e)\b)
|
||||||
)
|
)
|
||||||
@ -1968,6 +2025,15 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
|
|||||||
'title': 'Игорь Клейнер - Playlists',
|
'title': 'Игорь Клейнер - Playlists',
|
||||||
'description': 'md5:be97ee0f14ee314f1f002cf187166ee2',
|
'description': 'md5:be97ee0f14ee314f1f002cf187166ee2',
|
||||||
},
|
},
|
||||||
|
}, {
|
||||||
|
# playlists, series
|
||||||
|
'url': 'https://www.youtube.com/c/3blue1brown/playlists?view=50&sort=dd&shelf_id=3',
|
||||||
|
'playlist_mincount': 5,
|
||||||
|
'info_dict': {
|
||||||
|
'id': 'UCYO_jab_esuFRV4b17AJtAw',
|
||||||
|
'title': '3Blue1Brown - Playlists',
|
||||||
|
'description': 'md5:e1384e8a133307dd10edee76e875d62f',
|
||||||
|
},
|
||||||
}, {
|
}, {
|
||||||
# playlists, singlepage
|
# playlists, singlepage
|
||||||
'url': 'https://www.youtube.com/user/ThirstForScience/playlists',
|
'url': 'https://www.youtube.com/user/ThirstForScience/playlists',
|
||||||
@ -2228,6 +2294,16 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
|
|||||||
}, {
|
}, {
|
||||||
'url': 'https://www.youtube.com/TheYoungTurks/live',
|
'url': 'https://www.youtube.com/TheYoungTurks/live',
|
||||||
'only_matching': True,
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://www.youtube.com/hashtag/cctv9',
|
||||||
|
'info_dict': {
|
||||||
|
'id': 'cctv9',
|
||||||
|
'title': '#cctv9',
|
||||||
|
},
|
||||||
|
'playlist_mincount': 350,
|
||||||
|
}, {
|
||||||
|
'url': 'https://www.youtube.com/watch?list=PLW4dVinRY435CBE_JD3t-0SRXKfnZHS1P&feature=youtu.be&v=M9cJMXmQ_ZU',
|
||||||
|
'only_matching': True,
|
||||||
}]
|
}]
|
||||||
|
|
||||||
@classmethod
|
@classmethod
|
||||||
@ -2250,9 +2326,12 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
|
|||||||
|
|
||||||
@staticmethod
|
@staticmethod
|
||||||
def _extract_grid_item_renderer(item):
|
def _extract_grid_item_renderer(item):
|
||||||
for item_kind in ('Playlist', 'Video', 'Channel'):
|
assert isinstance(item, dict)
|
||||||
renderer = item.get('grid%sRenderer' % item_kind)
|
for key, renderer in item.items():
|
||||||
if renderer:
|
if not key.startswith('grid') or not key.endswith('Renderer'):
|
||||||
|
continue
|
||||||
|
if not isinstance(renderer, dict):
|
||||||
|
continue
|
||||||
return renderer
|
return renderer
|
||||||
|
|
||||||
def _grid_entries(self, grid_renderer):
|
def _grid_entries(self, grid_renderer):
|
||||||
@ -2263,7 +2342,8 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
|
|||||||
if not isinstance(renderer, dict):
|
if not isinstance(renderer, dict):
|
||||||
continue
|
continue
|
||||||
title = try_get(
|
title = try_get(
|
||||||
renderer, lambda x: x['title']['runs'][0]['text'], compat_str)
|
renderer, (lambda x: x['title']['runs'][0]['text'],
|
||||||
|
lambda x: x['title']['simpleText']), compat_str)
|
||||||
# playlist
|
# playlist
|
||||||
playlist_id = renderer.get('playlistId')
|
playlist_id = renderer.get('playlistId')
|
||||||
if playlist_id:
|
if playlist_id:
|
||||||
@ -2271,10 +2351,12 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
|
|||||||
'https://www.youtube.com/playlist?list=%s' % playlist_id,
|
'https://www.youtube.com/playlist?list=%s' % playlist_id,
|
||||||
ie=YoutubeTabIE.ie_key(), video_id=playlist_id,
|
ie=YoutubeTabIE.ie_key(), video_id=playlist_id,
|
||||||
video_title=title)
|
video_title=title)
|
||||||
|
continue
|
||||||
# video
|
# video
|
||||||
video_id = renderer.get('videoId')
|
video_id = renderer.get('videoId')
|
||||||
if video_id:
|
if video_id:
|
||||||
yield self._extract_video(renderer)
|
yield self._extract_video(renderer)
|
||||||
|
continue
|
||||||
# channel
|
# channel
|
||||||
channel_id = renderer.get('channelId')
|
channel_id = renderer.get('channelId')
|
||||||
if channel_id:
|
if channel_id:
|
||||||
@ -2283,6 +2365,17 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
|
|||||||
yield self.url_result(
|
yield self.url_result(
|
||||||
'https://www.youtube.com/channel/%s' % channel_id,
|
'https://www.youtube.com/channel/%s' % channel_id,
|
||||||
ie=YoutubeTabIE.ie_key(), video_title=title)
|
ie=YoutubeTabIE.ie_key(), video_title=title)
|
||||||
|
continue
|
||||||
|
# generic endpoint URL support
|
||||||
|
ep_url = urljoin('https://www.youtube.com/', try_get(
|
||||||
|
renderer, lambda x: x['navigationEndpoint']['commandMetadata']['webCommandMetadata']['url'],
|
||||||
|
compat_str))
|
||||||
|
if ep_url:
|
||||||
|
for ie in (YoutubeTabIE, YoutubePlaylistIE, YoutubeIE):
|
||||||
|
if ie.suitable(ep_url):
|
||||||
|
yield self.url_result(
|
||||||
|
ep_url, ie=ie.ie_key(), video_id=ie._match_id(ep_url), video_title=title)
|
||||||
|
break
|
||||||
|
|
||||||
def _shelf_entries_from_content(self, shelf_renderer):
|
def _shelf_entries_from_content(self, shelf_renderer):
|
||||||
content = shelf_renderer.get('content')
|
content = shelf_renderer.get('content')
|
||||||
@ -2375,6 +2468,14 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
|
|||||||
for entry in self._post_thread_entries(renderer):
|
for entry in self._post_thread_entries(renderer):
|
||||||
yield entry
|
yield entry
|
||||||
|
|
||||||
|
def _rich_grid_entries(self, contents):
|
||||||
|
for content in contents:
|
||||||
|
video_renderer = try_get(content, lambda x: x['richItemRenderer']['content']['videoRenderer'], dict)
|
||||||
|
if video_renderer:
|
||||||
|
entry = self._video_entry(video_renderer)
|
||||||
|
if entry:
|
||||||
|
yield entry
|
||||||
|
|
||||||
@staticmethod
|
@staticmethod
|
||||||
def _build_continuation_query(continuation, ctp=None):
|
def _build_continuation_query(continuation, ctp=None):
|
||||||
query = {
|
query = {
|
||||||
@ -2420,13 +2521,12 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
|
|||||||
ctp = continuation_ep.get('clickTrackingParams')
|
ctp = continuation_ep.get('clickTrackingParams')
|
||||||
return YoutubeTabIE._build_continuation_query(continuation, ctp)
|
return YoutubeTabIE._build_continuation_query(continuation, ctp)
|
||||||
|
|
||||||
def _entries(self, tab, identity_token):
|
def _entries(self, tab, item_id, webpage):
|
||||||
tab_content = try_get(tab, lambda x: x['content'], dict)
|
tab_content = try_get(tab, lambda x: x['content'], dict)
|
||||||
if not tab_content:
|
if not tab_content:
|
||||||
return
|
return
|
||||||
slr_renderer = try_get(tab_content, lambda x: x['sectionListRenderer'], dict)
|
slr_renderer = try_get(tab_content, lambda x: x['sectionListRenderer'], dict)
|
||||||
if not slr_renderer:
|
if slr_renderer:
|
||||||
return
|
|
||||||
is_channels_tab = tab.get('title') == 'Channels'
|
is_channels_tab = tab.get('title') == 'Channels'
|
||||||
continuation = None
|
continuation = None
|
||||||
slr_contents = try_get(slr_renderer, lambda x: x['contents'], list) or []
|
slr_contents = try_get(slr_renderer, lambda x: x['contents'], list) or []
|
||||||
@ -2471,31 +2571,61 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
|
|||||||
|
|
||||||
if not continuation:
|
if not continuation:
|
||||||
continuation = self._extract_continuation(is_renderer)
|
continuation = self._extract_continuation(is_renderer)
|
||||||
|
|
||||||
if not continuation:
|
if not continuation:
|
||||||
continuation = self._extract_continuation(slr_renderer)
|
continuation = self._extract_continuation(slr_renderer)
|
||||||
|
else:
|
||||||
|
rich_grid_renderer = tab_content.get('richGridRenderer')
|
||||||
|
if not rich_grid_renderer:
|
||||||
|
return
|
||||||
|
for entry in self._rich_grid_entries(rich_grid_renderer.get('contents') or []):
|
||||||
|
yield entry
|
||||||
|
continuation = self._extract_continuation(rich_grid_renderer)
|
||||||
|
|
||||||
|
ytcfg = self._extract_ytcfg(item_id, webpage)
|
||||||
|
client_version = try_get(
|
||||||
|
ytcfg, lambda x: x['INNERTUBE_CLIENT_VERSION'], compat_str) or '2.20210407.08.00'
|
||||||
|
|
||||||
headers = {
|
headers = {
|
||||||
'x-youtube-client-name': '1',
|
'x-youtube-client-name': '1',
|
||||||
'x-youtube-client-version': '2.20201112.04.01',
|
'x-youtube-client-version': client_version,
|
||||||
|
'content-type': 'application/json',
|
||||||
}
|
}
|
||||||
|
|
||||||
|
context = try_get(ytcfg, lambda x: x['INNERTUBE_CONTEXT'], dict) or {
|
||||||
|
'client': {
|
||||||
|
'clientName': 'WEB',
|
||||||
|
'clientVersion': client_version,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
visitor_data = try_get(context, lambda x: x['client']['visitorData'], compat_str)
|
||||||
|
|
||||||
|
identity_token = self._extract_identity_token(ytcfg, webpage)
|
||||||
if identity_token:
|
if identity_token:
|
||||||
headers['x-youtube-identity-token'] = identity_token
|
headers['x-youtube-identity-token'] = identity_token
|
||||||
|
|
||||||
|
data = {
|
||||||
|
'context': context,
|
||||||
|
}
|
||||||
|
|
||||||
for page_num in itertools.count(1):
|
for page_num in itertools.count(1):
|
||||||
if not continuation:
|
if not continuation:
|
||||||
break
|
break
|
||||||
|
if visitor_data:
|
||||||
|
headers['x-goog-visitor-id'] = visitor_data
|
||||||
|
data['continuation'] = continuation['continuation']
|
||||||
|
data['clickTracking'] = {
|
||||||
|
'clickTrackingParams': continuation['itct']
|
||||||
|
}
|
||||||
count = 0
|
count = 0
|
||||||
retries = 3
|
retries = 3
|
||||||
while count <= retries:
|
while count <= retries:
|
||||||
try:
|
try:
|
||||||
# Downloading page may result in intermittent 5xx HTTP error
|
# Downloading page may result in intermittent 5xx HTTP error
|
||||||
# that is usually worked around with a retry
|
# that is usually worked around with a retry
|
||||||
browse = self._download_json(
|
response = self._download_json(
|
||||||
'https://www.youtube.com/browse_ajax', None,
|
'https://www.youtube.com/youtubei/v1/browse?key=AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8',
|
||||||
'Downloading page %d%s'
|
None, 'Downloading page %d%s' % (page_num, ' (retry #%d)' % count if count else ''),
|
||||||
% (page_num, ' (retry #%d)' % count if count else ''),
|
headers=headers, data=json.dumps(data).encode('utf8'))
|
||||||
headers=headers, query=continuation)
|
|
||||||
break
|
break
|
||||||
except ExtractorError as e:
|
except ExtractorError as e:
|
||||||
if isinstance(e.cause, compat_HTTPError) and e.cause.code in (500, 503):
|
if isinstance(e.cause, compat_HTTPError) and e.cause.code in (500, 503):
|
||||||
@ -2503,12 +2633,12 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
|
|||||||
if count <= retries:
|
if count <= retries:
|
||||||
continue
|
continue
|
||||||
raise
|
raise
|
||||||
if not browse:
|
|
||||||
break
|
|
||||||
response = try_get(browse, lambda x: x[1]['response'], dict)
|
|
||||||
if not response:
|
if not response:
|
||||||
break
|
break
|
||||||
|
|
||||||
|
visitor_data = try_get(
|
||||||
|
response, lambda x: x['responseContext']['visitorData'], compat_str) or visitor_data
|
||||||
|
|
||||||
continuation_contents = try_get(
|
continuation_contents = try_get(
|
||||||
response, lambda x: x['continuationContents'], dict)
|
response, lambda x: x['continuationContents'], dict)
|
||||||
if continuation_contents:
|
if continuation_contents:
|
||||||
@ -2531,13 +2661,14 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
|
|||||||
continuation = self._extract_continuation(continuation_renderer)
|
continuation = self._extract_continuation(continuation_renderer)
|
||||||
continue
|
continue
|
||||||
|
|
||||||
|
on_response_received = dict_get(response, ('onResponseReceivedActions', 'onResponseReceivedEndpoints'))
|
||||||
continuation_items = try_get(
|
continuation_items = try_get(
|
||||||
response, lambda x: x['onResponseReceivedActions'][0]['appendContinuationItemsAction']['continuationItems'], list)
|
on_response_received, lambda x: x[0]['appendContinuationItemsAction']['continuationItems'], list)
|
||||||
if continuation_items:
|
if continuation_items:
|
||||||
continuation_item = continuation_items[0]
|
continuation_item = continuation_items[0]
|
||||||
if not isinstance(continuation_item, dict):
|
if not isinstance(continuation_item, dict):
|
||||||
continue
|
continue
|
||||||
renderer = continuation_item.get('gridVideoRenderer')
|
renderer = self._extract_grid_item_renderer(continuation_item)
|
||||||
if renderer:
|
if renderer:
|
||||||
grid_renderer = {'items': continuation_items}
|
grid_renderer = {'items': continuation_items}
|
||||||
for entry in self._grid_entries(grid_renderer):
|
for entry in self._grid_entries(grid_renderer):
|
||||||
@ -2551,6 +2682,19 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
|
|||||||
yield entry
|
yield entry
|
||||||
continuation = self._extract_continuation(video_list_renderer)
|
continuation = self._extract_continuation(video_list_renderer)
|
||||||
continue
|
continue
|
||||||
|
renderer = continuation_item.get('backstagePostThreadRenderer')
|
||||||
|
if renderer:
|
||||||
|
continuation_renderer = {'contents': continuation_items}
|
||||||
|
for entry in self._post_thread_continuation_entries(continuation_renderer):
|
||||||
|
yield entry
|
||||||
|
continuation = self._extract_continuation(continuation_renderer)
|
||||||
|
continue
|
||||||
|
renderer = continuation_item.get('richItemRenderer')
|
||||||
|
if renderer:
|
||||||
|
for entry in self._rich_grid_entries(continuation_items):
|
||||||
|
yield entry
|
||||||
|
continuation = self._extract_continuation({'contents': continuation_items})
|
||||||
|
continue
|
||||||
|
|
||||||
break
|
break
|
||||||
|
|
||||||
@ -2603,11 +2747,12 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
|
|||||||
alerts.append(text)
|
alerts.append(text)
|
||||||
return '\n'.join(alerts)
|
return '\n'.join(alerts)
|
||||||
|
|
||||||
def _extract_from_tabs(self, item_id, webpage, data, tabs, identity_token):
|
def _extract_from_tabs(self, item_id, webpage, data, tabs):
|
||||||
selected_tab = self._extract_selected_tab(tabs)
|
selected_tab = self._extract_selected_tab(tabs)
|
||||||
renderer = try_get(
|
renderer = try_get(
|
||||||
data, lambda x: x['metadata']['channelMetadataRenderer'], dict)
|
data, lambda x: x['metadata']['channelMetadataRenderer'], dict)
|
||||||
playlist_id = title = description = None
|
playlist_id = item_id
|
||||||
|
title = description = None
|
||||||
if renderer:
|
if renderer:
|
||||||
channel_title = renderer.get('title') or item_id
|
channel_title = renderer.get('title') or item_id
|
||||||
tab_title = selected_tab.get('title')
|
tab_title = selected_tab.get('title')
|
||||||
@ -2616,14 +2761,18 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
|
|||||||
title += ' - %s' % tab_title
|
title += ' - %s' % tab_title
|
||||||
description = renderer.get('description')
|
description = renderer.get('description')
|
||||||
playlist_id = renderer.get('externalId')
|
playlist_id = renderer.get('externalId')
|
||||||
|
else:
|
||||||
renderer = try_get(
|
renderer = try_get(
|
||||||
data, lambda x: x['metadata']['playlistMetadataRenderer'], dict)
|
data, lambda x: x['metadata']['playlistMetadataRenderer'], dict)
|
||||||
if renderer:
|
if renderer:
|
||||||
title = renderer.get('title')
|
title = renderer.get('title')
|
||||||
description = None
|
else:
|
||||||
playlist_id = item_id
|
renderer = try_get(
|
||||||
|
data, lambda x: x['header']['hashtagHeaderRenderer'], dict)
|
||||||
|
if renderer:
|
||||||
|
title = try_get(renderer, lambda x: x['hashtag']['simpleText'])
|
||||||
playlist = self.playlist_result(
|
playlist = self.playlist_result(
|
||||||
self._entries(selected_tab, identity_token),
|
self._entries(selected_tab, item_id, webpage),
|
||||||
playlist_id=playlist_id, playlist_title=title,
|
playlist_id=playlist_id, playlist_title=title,
|
||||||
playlist_description=description)
|
playlist_description=description)
|
||||||
playlist.update(self._extract_uploader(data))
|
playlist.update(self._extract_uploader(data))
|
||||||
@ -2647,8 +2796,7 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
|
|||||||
self._playlist_entries(playlist), playlist_id=playlist_id,
|
self._playlist_entries(playlist), playlist_id=playlist_id,
|
||||||
playlist_title=title)
|
playlist_title=title)
|
||||||
|
|
||||||
def _extract_identity_token(self, webpage, item_id):
|
def _extract_identity_token(self, ytcfg, webpage):
|
||||||
ytcfg = self._extract_ytcfg(item_id, webpage)
|
|
||||||
if ytcfg:
|
if ytcfg:
|
||||||
token = try_get(ytcfg, lambda x: x['ID_TOKEN'], compat_str)
|
token = try_get(ytcfg, lambda x: x['ID_TOKEN'], compat_str)
|
||||||
if token:
|
if token:
|
||||||
@ -2662,7 +2810,7 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
|
|||||||
url = compat_urlparse.urlunparse(
|
url = compat_urlparse.urlunparse(
|
||||||
compat_urlparse.urlparse(url)._replace(netloc='www.youtube.com'))
|
compat_urlparse.urlparse(url)._replace(netloc='www.youtube.com'))
|
||||||
# Handle both video/playlist URLs
|
# Handle both video/playlist URLs
|
||||||
qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
|
qs = parse_qs(url)
|
||||||
video_id = qs.get('v', [None])[0]
|
video_id = qs.get('v', [None])[0]
|
||||||
playlist_id = qs.get('list', [None])[0]
|
playlist_id = qs.get('list', [None])[0]
|
||||||
if video_id and playlist_id:
|
if video_id and playlist_id:
|
||||||
@ -2671,12 +2819,11 @@ class YoutubeTabIE(YoutubeBaseInfoExtractor):
|
|||||||
return self.url_result(video_id, ie=YoutubeIE.ie_key(), video_id=video_id)
|
return self.url_result(video_id, ie=YoutubeIE.ie_key(), video_id=video_id)
|
||||||
self.to_screen('Downloading playlist %s - add --no-playlist to just download video %s' % (playlist_id, video_id))
|
self.to_screen('Downloading playlist %s - add --no-playlist to just download video %s' % (playlist_id, video_id))
|
||||||
webpage = self._download_webpage(url, item_id)
|
webpage = self._download_webpage(url, item_id)
|
||||||
identity_token = self._extract_identity_token(webpage, item_id)
|
|
||||||
data = self._extract_yt_initial_data(item_id, webpage)
|
data = self._extract_yt_initial_data(item_id, webpage)
|
||||||
tabs = try_get(
|
tabs = try_get(
|
||||||
data, lambda x: x['contents']['twoColumnBrowseResultsRenderer']['tabs'], list)
|
data, lambda x: x['contents']['twoColumnBrowseResultsRenderer']['tabs'], list)
|
||||||
if tabs:
|
if tabs:
|
||||||
return self._extract_from_tabs(item_id, webpage, data, tabs, identity_token)
|
return self._extract_from_tabs(item_id, webpage, data, tabs)
|
||||||
playlist = try_get(
|
playlist = try_get(
|
||||||
data, lambda x: x['contents']['twoColumnWatchNextResults']['playlist']['playlist'], dict)
|
data, lambda x: x['contents']['twoColumnWatchNextResults']['playlist']['playlist'], dict)
|
||||||
if playlist:
|
if playlist:
|
||||||
@ -2759,12 +2906,19 @@ class YoutubePlaylistIE(InfoExtractor):
|
|||||||
|
|
||||||
@classmethod
|
@classmethod
|
||||||
def suitable(cls, url):
|
def suitable(cls, url):
|
||||||
return False if YoutubeTabIE.suitable(url) else super(
|
if YoutubeTabIE.suitable(url):
|
||||||
YoutubePlaylistIE, cls).suitable(url)
|
return False
|
||||||
|
# Hack for lazy extractors until more generic solution is implemented
|
||||||
|
# (see #28780)
|
||||||
|
from .youtube import parse_qs
|
||||||
|
qs = parse_qs(url)
|
||||||
|
if qs.get('v', [None])[0]:
|
||||||
|
return False
|
||||||
|
return super(YoutubePlaylistIE, cls).suitable(url)
|
||||||
|
|
||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
playlist_id = self._match_id(url)
|
playlist_id = self._match_id(url)
|
||||||
qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
|
qs = parse_qs(url)
|
||||||
if not qs:
|
if not qs:
|
||||||
qs = {'list': playlist_id}
|
qs = {'list': playlist_id}
|
||||||
return self.url_result(
|
return self.url_result(
|
||||||
|
@ -7,7 +7,9 @@ from .common import InfoExtractor
|
|||||||
from ..compat import compat_str
|
from ..compat import compat_str
|
||||||
from ..utils import (
|
from ..utils import (
|
||||||
determine_ext,
|
determine_ext,
|
||||||
|
float_or_none,
|
||||||
int_or_none,
|
int_or_none,
|
||||||
|
merge_dicts,
|
||||||
NO_DEFAULT,
|
NO_DEFAULT,
|
||||||
orderedSet,
|
orderedSet,
|
||||||
parse_codecs,
|
parse_codecs,
|
||||||
@ -21,49 +23,17 @@ from ..utils import (
|
|||||||
|
|
||||||
|
|
||||||
class ZDFBaseIE(InfoExtractor):
|
class ZDFBaseIE(InfoExtractor):
|
||||||
def _call_api(self, url, player, referrer, video_id, item):
|
|
||||||
return self._download_json(
|
|
||||||
url, video_id, 'Downloading JSON %s' % item,
|
|
||||||
headers={
|
|
||||||
'Referer': referrer,
|
|
||||||
'Api-Auth': 'Bearer %s' % player['apiToken'],
|
|
||||||
})
|
|
||||||
|
|
||||||
def _extract_player(self, webpage, video_id, fatal=True):
|
|
||||||
return self._parse_json(
|
|
||||||
self._search_regex(
|
|
||||||
r'(?s)data-zdfplayer-jsb=(["\'])(?P<json>{.+?})\1', webpage,
|
|
||||||
'player JSON', default='{}' if not fatal else NO_DEFAULT,
|
|
||||||
group='json'),
|
|
||||||
video_id)
|
|
||||||
|
|
||||||
|
|
||||||
class ZDFIE(ZDFBaseIE):
|
|
||||||
_VALID_URL = r'https?://www\.zdf\.de/(?:[^/]+/)*(?P<id>[^/?]+)\.html'
|
|
||||||
_QUALITIES = ('auto', 'low', 'med', 'high', 'veryhigh', 'hd')
|
|
||||||
_GEO_COUNTRIES = ['DE']
|
_GEO_COUNTRIES = ['DE']
|
||||||
|
_QUALITIES = ('auto', 'low', 'med', 'high', 'veryhigh', 'hd')
|
||||||
|
|
||||||
_TESTS = [{
|
def _call_api(self, url, video_id, item, api_token=None, referrer=None):
|
||||||
'url': 'https://www.zdf.de/dokumentation/terra-x/die-magie-der-farben-von-koenigspurpur-und-jeansblau-100.html',
|
headers = {}
|
||||||
'info_dict': {
|
if api_token:
|
||||||
'id': 'die-magie-der-farben-von-koenigspurpur-und-jeansblau-100',
|
headers['Api-Auth'] = 'Bearer %s' % api_token
|
||||||
'ext': 'mp4',
|
if referrer:
|
||||||
'title': 'Die Magie der Farben (2/2)',
|
headers['Referer'] = referrer
|
||||||
'description': 'md5:a89da10c928c6235401066b60a6d5c1a',
|
return self._download_json(
|
||||||
'duration': 2615,
|
url, video_id, 'Downloading JSON %s' % item, headers=headers)
|
||||||
'timestamp': 1465021200,
|
|
||||||
'upload_date': '20160604',
|
|
||||||
},
|
|
||||||
}, {
|
|
||||||
'url': 'https://www.zdf.de/service-und-hilfe/die-neue-zdf-mediathek/zdfmediathek-trailer-100.html',
|
|
||||||
'only_matching': True,
|
|
||||||
}, {
|
|
||||||
'url': 'https://www.zdf.de/filme/taunuskrimi/die-lebenden-und-die-toten-1---ein-taunuskrimi-100.html',
|
|
||||||
'only_matching': True,
|
|
||||||
}, {
|
|
||||||
'url': 'https://www.zdf.de/dokumentation/planet-e/planet-e-uebersichtsseite-weitere-dokumentationen-von-planet-e-100.html',
|
|
||||||
'only_matching': True,
|
|
||||||
}]
|
|
||||||
|
|
||||||
@staticmethod
|
@staticmethod
|
||||||
def _extract_subtitles(src):
|
def _extract_subtitles(src):
|
||||||
@ -109,20 +79,11 @@ class ZDFIE(ZDFBaseIE):
|
|||||||
})
|
})
|
||||||
formats.append(f)
|
formats.append(f)
|
||||||
|
|
||||||
def _extract_entry(self, url, player, content, video_id):
|
def _extract_ptmd(self, ptmd_url, video_id, api_token, referrer):
|
||||||
title = content.get('title') or content['teaserHeadline']
|
|
||||||
|
|
||||||
t = content['mainVideoContent']['http://zdf.de/rels/target']
|
|
||||||
|
|
||||||
ptmd_path = t.get('http://zdf.de/rels/streams/ptmd')
|
|
||||||
|
|
||||||
if not ptmd_path:
|
|
||||||
ptmd_path = t[
|
|
||||||
'http://zdf.de/rels/streams/ptmd-template'].replace(
|
|
||||||
'{playerId}', 'ngplayer_2_4')
|
|
||||||
|
|
||||||
ptmd = self._call_api(
|
ptmd = self._call_api(
|
||||||
urljoin(url, ptmd_path), player, url, video_id, 'metadata')
|
ptmd_url, video_id, 'metadata', api_token, referrer)
|
||||||
|
|
||||||
|
content_id = ptmd.get('basename') or ptmd_url.split('/')[-1]
|
||||||
|
|
||||||
formats = []
|
formats = []
|
||||||
track_uris = set()
|
track_uris = set()
|
||||||
@ -140,7 +101,7 @@ class ZDFIE(ZDFBaseIE):
|
|||||||
continue
|
continue
|
||||||
for track in tracks:
|
for track in tracks:
|
||||||
self._extract_format(
|
self._extract_format(
|
||||||
video_id, formats, track_uris, {
|
content_id, formats, track_uris, {
|
||||||
'url': track.get('uri'),
|
'url': track.get('uri'),
|
||||||
'type': f.get('type'),
|
'type': f.get('type'),
|
||||||
'mimeType': f.get('mimeType'),
|
'mimeType': f.get('mimeType'),
|
||||||
@ -149,6 +110,103 @@ class ZDFIE(ZDFBaseIE):
|
|||||||
})
|
})
|
||||||
self._sort_formats(formats)
|
self._sort_formats(formats)
|
||||||
|
|
||||||
|
duration = float_or_none(try_get(
|
||||||
|
ptmd, lambda x: x['attributes']['duration']['value']), scale=1000)
|
||||||
|
|
||||||
|
return {
|
||||||
|
'extractor_key': ZDFIE.ie_key(),
|
||||||
|
'id': content_id,
|
||||||
|
'duration': duration,
|
||||||
|
'formats': formats,
|
||||||
|
'subtitles': self._extract_subtitles(ptmd),
|
||||||
|
}
|
||||||
|
|
||||||
|
def _extract_player(self, webpage, video_id, fatal=True):
|
||||||
|
return self._parse_json(
|
||||||
|
self._search_regex(
|
||||||
|
r'(?s)data-zdfplayer-jsb=(["\'])(?P<json>{.+?})\1', webpage,
|
||||||
|
'player JSON', default='{}' if not fatal else NO_DEFAULT,
|
||||||
|
group='json'),
|
||||||
|
video_id)
|
||||||
|
|
||||||
|
|
||||||
|
class ZDFIE(ZDFBaseIE):
|
||||||
|
_VALID_URL = r'https?://www\.zdf\.de/(?:[^/]+/)*(?P<id>[^/?#&]+)\.html'
|
||||||
|
_TESTS = [{
|
||||||
|
# Same as https://www.phoenix.de/sendungen/ereignisse/corona-nachgehakt/wohin-fuehrt-der-protest-in-der-pandemie-a-2050630.html
|
||||||
|
'url': 'https://www.zdf.de/politik/phoenix-sendungen/wohin-fuehrt-der-protest-in-der-pandemie-100.html',
|
||||||
|
'md5': '34ec321e7eb34231fd88616c65c92db0',
|
||||||
|
'info_dict': {
|
||||||
|
'id': '210222_phx_nachgehakt_corona_protest',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'title': 'Wohin führt der Protest in der Pandemie?',
|
||||||
|
'description': 'md5:7d643fe7f565e53a24aac036b2122fbd',
|
||||||
|
'duration': 1691,
|
||||||
|
'timestamp': 1613948400,
|
||||||
|
'upload_date': '20210221',
|
||||||
|
},
|
||||||
|
}, {
|
||||||
|
# Same as https://www.3sat.de/film/ab-18/10-wochen-sommer-108.html
|
||||||
|
'url': 'https://www.zdf.de/dokumentation/ab-18/10-wochen-sommer-102.html',
|
||||||
|
'md5': '0aff3e7bc72c8813f5e0fae333316a1d',
|
||||||
|
'info_dict': {
|
||||||
|
'id': '141007_ab18_10wochensommer_film',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'title': 'Ab 18! - 10 Wochen Sommer',
|
||||||
|
'description': 'md5:8253f41dc99ce2c3ff892dac2d65fe26',
|
||||||
|
'duration': 2660,
|
||||||
|
'timestamp': 1608604200,
|
||||||
|
'upload_date': '20201222',
|
||||||
|
},
|
||||||
|
}, {
|
||||||
|
'url': 'https://www.zdf.de/dokumentation/terra-x/die-magie-der-farben-von-koenigspurpur-und-jeansblau-100.html',
|
||||||
|
'info_dict': {
|
||||||
|
'id': '151025_magie_farben2_tex',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'title': 'Die Magie der Farben (2/2)',
|
||||||
|
'description': 'md5:a89da10c928c6235401066b60a6d5c1a',
|
||||||
|
'duration': 2615,
|
||||||
|
'timestamp': 1465021200,
|
||||||
|
'upload_date': '20160604',
|
||||||
|
},
|
||||||
|
}, {
|
||||||
|
# Same as https://www.phoenix.de/sendungen/dokumentationen/gesten-der-maechtigen-i-a-89468.html?ref=suche
|
||||||
|
'url': 'https://www.zdf.de/politik/phoenix-sendungen/die-gesten-der-maechtigen-100.html',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
# Same as https://www.3sat.de/film/spielfilm/der-hauptmann-100.html
|
||||||
|
'url': 'https://www.zdf.de/filme/filme-sonstige/der-hauptmann-112.html',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
# Same as https://www.3sat.de/wissen/nano/nano-21-mai-2019-102.html, equal media ids
|
||||||
|
'url': 'https://www.zdf.de/wissen/nano/nano-21-mai-2019-102.html',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://www.zdf.de/service-und-hilfe/die-neue-zdf-mediathek/zdfmediathek-trailer-100.html',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://www.zdf.de/filme/taunuskrimi/die-lebenden-und-die-toten-1---ein-taunuskrimi-100.html',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://www.zdf.de/dokumentation/planet-e/planet-e-uebersichtsseite-weitere-dokumentationen-von-planet-e-100.html',
|
||||||
|
'only_matching': True,
|
||||||
|
}]
|
||||||
|
|
||||||
|
def _extract_entry(self, url, player, content, video_id):
|
||||||
|
title = content.get('title') or content['teaserHeadline']
|
||||||
|
|
||||||
|
t = content['mainVideoContent']['http://zdf.de/rels/target']
|
||||||
|
|
||||||
|
ptmd_path = t.get('http://zdf.de/rels/streams/ptmd')
|
||||||
|
|
||||||
|
if not ptmd_path:
|
||||||
|
ptmd_path = t[
|
||||||
|
'http://zdf.de/rels/streams/ptmd-template'].replace(
|
||||||
|
'{playerId}', 'ngplayer_2_4')
|
||||||
|
|
||||||
|
info = self._extract_ptmd(
|
||||||
|
urljoin(url, ptmd_path), video_id, player['apiToken'], url)
|
||||||
|
|
||||||
thumbnails = []
|
thumbnails = []
|
||||||
layouts = try_get(
|
layouts = try_get(
|
||||||
content, lambda x: x['teaserImageRef']['layouts'], dict)
|
content, lambda x: x['teaserImageRef']['layouts'], dict)
|
||||||
@ -169,33 +227,33 @@ class ZDFIE(ZDFBaseIE):
|
|||||||
})
|
})
|
||||||
thumbnails.append(thumbnail)
|
thumbnails.append(thumbnail)
|
||||||
|
|
||||||
return {
|
return merge_dicts(info, {
|
||||||
'id': video_id,
|
|
||||||
'title': title,
|
'title': title,
|
||||||
'description': content.get('leadParagraph') or content.get('teasertext'),
|
'description': content.get('leadParagraph') or content.get('teasertext'),
|
||||||
'duration': int_or_none(t.get('duration')),
|
'duration': int_or_none(t.get('duration')),
|
||||||
'timestamp': unified_timestamp(content.get('editorialDate')),
|
'timestamp': unified_timestamp(content.get('editorialDate')),
|
||||||
'thumbnails': thumbnails,
|
'thumbnails': thumbnails,
|
||||||
'subtitles': self._extract_subtitles(ptmd),
|
})
|
||||||
'formats': formats,
|
|
||||||
}
|
|
||||||
|
|
||||||
def _extract_regular(self, url, player, video_id):
|
def _extract_regular(self, url, player, video_id):
|
||||||
content = self._call_api(
|
content = self._call_api(
|
||||||
player['content'], player, url, video_id, 'content')
|
player['content'], video_id, 'content', player['apiToken'], url)
|
||||||
return self._extract_entry(player['content'], player, content, video_id)
|
return self._extract_entry(player['content'], player, content, video_id)
|
||||||
|
|
||||||
def _extract_mobile(self, video_id):
|
def _extract_mobile(self, video_id):
|
||||||
document = self._download_json(
|
video = self._download_json(
|
||||||
'https://zdf-cdn.live.cellular.de/mediathekV2/document/%s' % video_id,
|
'https://zdf-cdn.live.cellular.de/mediathekV2/document/%s' % video_id,
|
||||||
video_id)['document']
|
video_id)
|
||||||
|
|
||||||
|
document = video['document']
|
||||||
|
|
||||||
title = document['titel']
|
title = document['titel']
|
||||||
|
content_id = document['basename']
|
||||||
|
|
||||||
formats = []
|
formats = []
|
||||||
format_urls = set()
|
format_urls = set()
|
||||||
for f in document['formitaeten']:
|
for f in document['formitaeten']:
|
||||||
self._extract_format(video_id, formats, format_urls, f)
|
self._extract_format(content_id, formats, format_urls, f)
|
||||||
self._sort_formats(formats)
|
self._sort_formats(formats)
|
||||||
|
|
||||||
thumbnails = []
|
thumbnails = []
|
||||||
@ -213,12 +271,12 @@ class ZDFIE(ZDFBaseIE):
|
|||||||
})
|
})
|
||||||
|
|
||||||
return {
|
return {
|
||||||
'id': video_id,
|
'id': content_id,
|
||||||
'title': title,
|
'title': title,
|
||||||
'description': document.get('beschreibung'),
|
'description': document.get('beschreibung'),
|
||||||
'duration': int_or_none(document.get('length')),
|
'duration': int_or_none(document.get('length')),
|
||||||
'timestamp': unified_timestamp(try_get(
|
'timestamp': unified_timestamp(document.get('date')) or unified_timestamp(
|
||||||
document, lambda x: x['meta']['editorialDate'], compat_str)),
|
try_get(video, lambda x: x['meta']['editorialDate'], compat_str)),
|
||||||
'thumbnails': thumbnails,
|
'thumbnails': thumbnails,
|
||||||
'subtitles': self._extract_subtitles(document),
|
'subtitles': self._extract_subtitles(document),
|
||||||
'formats': formats,
|
'formats': formats,
|
||||||
|
@ -1,93 +1,94 @@
|
|||||||
# coding: utf-8
|
# coding: utf-8
|
||||||
from __future__ import unicode_literals
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
import re
|
|
||||||
|
|
||||||
from .common import InfoExtractor
|
from .common import InfoExtractor
|
||||||
from ..utils import (
|
from ..utils import (
|
||||||
ExtractorError,
|
ExtractorError,
|
||||||
int_or_none,
|
int_or_none,
|
||||||
update_url_query,
|
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
class ZingMp3BaseInfoExtractor(InfoExtractor):
|
class ZingMp3BaseIE(InfoExtractor):
|
||||||
|
_VALID_URL_TMPL = r'https?://(?:mp3\.zing|zingmp3)\.vn/(?:%s)/[^/]+/(?P<id>\w+)\.html'
|
||||||
|
_GEO_COUNTRIES = ['VN']
|
||||||
|
|
||||||
def _extract_item(self, item, page_type, fatal=True):
|
def _extract_item(self, item, fatal):
|
||||||
error_message = item.get('msg')
|
item_id = item['id']
|
||||||
if error_message:
|
title = item.get('name') or item['title']
|
||||||
if not fatal:
|
|
||||||
return
|
|
||||||
raise ExtractorError(
|
|
||||||
'%s returned error: %s' % (self.IE_NAME, error_message),
|
|
||||||
expected=True)
|
|
||||||
|
|
||||||
formats = []
|
formats = []
|
||||||
for quality, source_url in zip(item.get('qualities') or item.get('quality', []), item.get('source_list') or item.get('source', [])):
|
for k, v in (item.get('source') or {}).items():
|
||||||
if not source_url or source_url == 'require vip':
|
if not v:
|
||||||
continue
|
continue
|
||||||
if not re.match(r'https?://', source_url):
|
if k in ('mp4', 'hls'):
|
||||||
source_url = '//' + source_url
|
for res, video_url in v.items():
|
||||||
source_url = self._proto_relative_url(source_url, 'http:')
|
if not video_url:
|
||||||
quality_num = int_or_none(quality)
|
continue
|
||||||
f = {
|
if k == 'hls':
|
||||||
'format_id': quality,
|
formats.extend(self._extract_m3u8_formats(
|
||||||
'url': source_url,
|
video_url, item_id, 'mp4',
|
||||||
}
|
'm3u8_native', m3u8_id=k, fatal=False))
|
||||||
if page_type == 'video':
|
elif k == 'mp4':
|
||||||
f.update({
|
formats.append({
|
||||||
'height': quality_num,
|
'format_id': 'mp4-' + res,
|
||||||
'ext': 'mp4',
|
'url': video_url,
|
||||||
|
'height': int_or_none(self._search_regex(
|
||||||
|
r'^(\d+)p', res, 'resolution', default=None)),
|
||||||
})
|
})
|
||||||
else:
|
else:
|
||||||
f.update({
|
formats.append({
|
||||||
'abr': quality_num,
|
|
||||||
'ext': 'mp3',
|
'ext': 'mp3',
|
||||||
|
'format_id': k,
|
||||||
|
'tbr': int_or_none(k),
|
||||||
|
'url': self._proto_relative_url(v),
|
||||||
|
'vcodec': 'none',
|
||||||
})
|
})
|
||||||
formats.append(f)
|
if not formats:
|
||||||
|
if not fatal:
|
||||||
|
return
|
||||||
|
msg = item['msg']
|
||||||
|
if msg == 'Sorry, this content is not available in your country.':
|
||||||
|
self.raise_geo_restricted(countries=self._GEO_COUNTRIES)
|
||||||
|
raise ExtractorError(msg, expected=True)
|
||||||
|
self._sort_formats(formats)
|
||||||
|
|
||||||
cover = item.get('cover')
|
subtitles = None
|
||||||
|
lyric = item.get('lyric')
|
||||||
|
if lyric:
|
||||||
|
subtitles = {
|
||||||
|
'origin': [{
|
||||||
|
'url': lyric,
|
||||||
|
}],
|
||||||
|
}
|
||||||
|
|
||||||
|
album = item.get('album') or {}
|
||||||
|
|
||||||
return {
|
return {
|
||||||
'title': (item.get('name') or item.get('title')).strip(),
|
'id': item_id,
|
||||||
|
'title': title,
|
||||||
'formats': formats,
|
'formats': formats,
|
||||||
'thumbnail': 'http:/' + cover if cover else None,
|
'thumbnail': item.get('thumbnail'),
|
||||||
'artist': item.get('artist'),
|
'subtitles': subtitles,
|
||||||
|
'duration': int_or_none(item.get('duration')),
|
||||||
|
'track': title,
|
||||||
|
'artist': item.get('artists_names'),
|
||||||
|
'album': album.get('name') or album.get('title'),
|
||||||
|
'album_artist': album.get('artists_names'),
|
||||||
}
|
}
|
||||||
|
|
||||||
def _extract_player_json(self, player_json_url, id, page_type, playlist_title=None):
|
def _real_extract(self, url):
|
||||||
player_json = self._download_json(player_json_url, id, 'Downloading Player JSON')
|
page_id = self._match_id(url)
|
||||||
items = player_json['data']
|
webpage = self._download_webpage(
|
||||||
if 'item' in items:
|
url.replace('://zingmp3.vn/', '://mp3.zing.vn/'),
|
||||||
items = items['item']
|
page_id, query={'play_song': 1})
|
||||||
|
data_path = self._search_regex(
|
||||||
if len(items) == 1:
|
r'data-xml="([^"]+)', webpage, 'data path')
|
||||||
# one single song
|
return self._process_data(self._download_json(
|
||||||
data = self._extract_item(items[0], page_type)
|
'https://mp3.zing.vn/xhr' + data_path, page_id)['data'])
|
||||||
data['id'] = id
|
|
||||||
|
|
||||||
return data
|
|
||||||
else:
|
|
||||||
# playlist of songs
|
|
||||||
entries = []
|
|
||||||
|
|
||||||
for i, item in enumerate(items, 1):
|
|
||||||
entry = self._extract_item(item, page_type, fatal=False)
|
|
||||||
if not entry:
|
|
||||||
continue
|
|
||||||
entry['id'] = '%s-%d' % (id, i)
|
|
||||||
entries.append(entry)
|
|
||||||
|
|
||||||
return {
|
|
||||||
'_type': 'playlist',
|
|
||||||
'id': id,
|
|
||||||
'title': playlist_title,
|
|
||||||
'entries': entries,
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
class ZingMp3IE(ZingMp3BaseInfoExtractor):
|
class ZingMp3IE(ZingMp3BaseIE):
|
||||||
_VALID_URL = r'https?://mp3\.zing\.vn/(?:bai-hat|album|playlist|video-clip)/[^/]+/(?P<id>\w+)\.html'
|
_VALID_URL = ZingMp3BaseIE._VALID_URL_TMPL % 'bai-hat|video-clip'
|
||||||
_TESTS = [{
|
_TESTS = [{
|
||||||
'url': 'http://mp3.zing.vn/bai-hat/Xa-Mai-Xa-Bao-Thy/ZWZB9WAB.html',
|
'url': 'http://mp3.zing.vn/bai-hat/Xa-Mai-Xa-Bao-Thy/ZWZB9WAB.html',
|
||||||
'md5': 'ead7ae13693b3205cbc89536a077daed',
|
'md5': 'ead7ae13693b3205cbc89536a077daed',
|
||||||
@ -95,49 +96,66 @@ class ZingMp3IE(ZingMp3BaseInfoExtractor):
|
|||||||
'id': 'ZWZB9WAB',
|
'id': 'ZWZB9WAB',
|
||||||
'title': 'Xa Mãi Xa',
|
'title': 'Xa Mãi Xa',
|
||||||
'ext': 'mp3',
|
'ext': 'mp3',
|
||||||
'thumbnail': r're:^https?://.*\.jpg$',
|
'thumbnail': r're:^https?://.+\.jpg',
|
||||||
|
'subtitles': {
|
||||||
|
'origin': [{
|
||||||
|
'ext': 'lrc',
|
||||||
|
}]
|
||||||
|
},
|
||||||
|
'duration': 255,
|
||||||
|
'track': 'Xa Mãi Xa',
|
||||||
|
'artist': 'Bảo Thy',
|
||||||
|
'album': 'Special Album',
|
||||||
|
'album_artist': 'Bảo Thy',
|
||||||
},
|
},
|
||||||
}, {
|
}, {
|
||||||
'url': 'http://mp3.zing.vn/video-clip/Let-It-Go-Frozen-OST-Sungha-Jung/ZW6BAEA0.html',
|
'url': 'https://mp3.zing.vn/video-clip/Suong-Hoa-Dua-Loi-K-ICM-RYO/ZO8ZF7C7.html',
|
||||||
'md5': '870295a9cd8045c0e15663565902618d',
|
'md5': 'e9c972b693aa88301ef981c8151c4343',
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
'id': 'ZW6BAEA0',
|
'id': 'ZO8ZF7C7',
|
||||||
'title': 'Let It Go (Frozen OST)',
|
'title': 'Sương Hoa Đưa Lối',
|
||||||
'ext': 'mp4',
|
'ext': 'mp4',
|
||||||
|
'thumbnail': r're:^https?://.+\.jpg',
|
||||||
|
'duration': 207,
|
||||||
|
'track': 'Sương Hoa Đưa Lối',
|
||||||
|
'artist': 'K-ICM, RYO',
|
||||||
},
|
},
|
||||||
}, {
|
}, {
|
||||||
'url': 'http://mp3.zing.vn/album/Lau-Dai-Tinh-Ai-Bang-Kieu-Minh-Tuyet/ZWZBWDAF.html',
|
'url': 'https://zingmp3.vn/bai-hat/Xa-Mai-Xa-Bao-Thy/ZWZB9WAB.html',
|
||||||
'info_dict': {
|
|
||||||
'_type': 'playlist',
|
|
||||||
'id': 'ZWZBWDAF',
|
|
||||||
'title': 'Lâu Đài Tình Ái - Bằng Kiều,Minh Tuyết | Album 320 lossless',
|
|
||||||
},
|
|
||||||
'playlist_count': 10,
|
|
||||||
'skip': 'removed at the request of the owner',
|
|
||||||
}, {
|
|
||||||
'url': 'http://mp3.zing.vn/playlist/Duong-Hong-Loan-apollobee/IWCAACCB.html',
|
|
||||||
'only_matching': True,
|
'only_matching': True,
|
||||||
}]
|
}]
|
||||||
IE_NAME = 'zingmp3'
|
IE_NAME = 'zingmp3'
|
||||||
IE_DESC = 'mp3.zing.vn'
|
IE_DESC = 'mp3.zing.vn'
|
||||||
|
|
||||||
def _real_extract(self, url):
|
def _process_data(self, data):
|
||||||
page_id = self._match_id(url)
|
return self._extract_item(data, True)
|
||||||
|
|
||||||
webpage = self._download_webpage(url, page_id)
|
|
||||||
|
|
||||||
player_json_url = self._search_regex([
|
class ZingMp3AlbumIE(ZingMp3BaseIE):
|
||||||
r'data-xml="([^"]+)',
|
_VALID_URL = ZingMp3BaseIE._VALID_URL_TMPL % 'album|playlist'
|
||||||
r'&xmlURL=([^&]+)&'
|
_TESTS = [{
|
||||||
], webpage, 'player xml url')
|
'url': 'http://mp3.zing.vn/album/Lau-Dai-Tinh-Ai-Bang-Kieu-Minh-Tuyet/ZWZBWDAF.html',
|
||||||
|
'info_dict': {
|
||||||
|
'_type': 'playlist',
|
||||||
|
'id': 'ZWZBWDAF',
|
||||||
|
'title': 'Lâu Đài Tình Ái',
|
||||||
|
},
|
||||||
|
'playlist_count': 10,
|
||||||
|
}, {
|
||||||
|
'url': 'http://mp3.zing.vn/playlist/Duong-Hong-Loan-apollobee/IWCAACCB.html',
|
||||||
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://zingmp3.vn/album/Lau-Dai-Tinh-Ai-Bang-Kieu-Minh-Tuyet/ZWZBWDAF.html',
|
||||||
|
'only_matching': True,
|
||||||
|
}]
|
||||||
|
IE_NAME = 'zingmp3:album'
|
||||||
|
|
||||||
playlist_title = None
|
def _process_data(self, data):
|
||||||
page_type = self._search_regex(r'/(?:html5)?xml/([^/-]+)', player_json_url, 'page type')
|
def entries():
|
||||||
if page_type == 'video':
|
for item in (data.get('items') or []):
|
||||||
player_json_url = update_url_query(player_json_url, {'format': 'json'})
|
entry = self._extract_item(item, False)
|
||||||
else:
|
if entry:
|
||||||
player_json_url = player_json_url.replace('/xml/', '/html5xml/')
|
yield entry
|
||||||
if page_type == 'album':
|
info = data.get('info') or {}
|
||||||
playlist_title = self._og_search_title(webpage)
|
return self.playlist_result(
|
||||||
|
entries(), info.get('id'), info.get('name') or info.get('title'))
|
||||||
return self._extract_player_json(player_json_url, page_id, page_type, playlist_title)
|
|
||||||
|
68
youtube_dl/extractor/zoom.py
Normal file
68
youtube_dl/extractor/zoom.py
Normal file
@ -0,0 +1,68 @@
|
|||||||
|
# coding: utf-8
|
||||||
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
|
import re
|
||||||
|
|
||||||
|
from .common import InfoExtractor
|
||||||
|
from ..utils import (
|
||||||
|
ExtractorError,
|
||||||
|
int_or_none,
|
||||||
|
js_to_json,
|
||||||
|
parse_filesize,
|
||||||
|
urlencode_postdata,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
class ZoomIE(InfoExtractor):
|
||||||
|
IE_NAME = 'zoom'
|
||||||
|
_VALID_URL = r'(?P<base_url>https?://(?:[^.]+\.)?zoom.us/)rec(?:ording)?/(?:play|share)/(?P<id>[A-Za-z0-9_.-]+)'
|
||||||
|
_TEST = {
|
||||||
|
'url': 'https://economist.zoom.us/rec/play/dUk_CNBETmZ5VA2BwEl-jjakPpJ3M1pcfVYAPRsoIbEByGsLjUZtaa4yCATQuOL3der8BlTwxQePl_j0.EImBkXzTIaPvdZO5',
|
||||||
|
'md5': 'ab445e8c911fddc4f9adc842c2c5d434',
|
||||||
|
'info_dict': {
|
||||||
|
'id': 'dUk_CNBETmZ5VA2BwEl-jjakPpJ3M1pcfVYAPRsoIbEByGsLjUZtaa4yCATQuOL3der8BlTwxQePl_j0.EImBkXzTIaPvdZO5',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'title': 'China\'s "two sessions" and the new five-year plan',
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
def _real_extract(self, url):
|
||||||
|
base_url, play_id = re.match(self._VALID_URL, url).groups()
|
||||||
|
webpage = self._download_webpage(url, play_id)
|
||||||
|
|
||||||
|
try:
|
||||||
|
form = self._form_hidden_inputs('password_form', webpage)
|
||||||
|
except ExtractorError:
|
||||||
|
form = None
|
||||||
|
if form:
|
||||||
|
password = self._downloader.params.get('videopassword')
|
||||||
|
if not password:
|
||||||
|
raise ExtractorError(
|
||||||
|
'This video is protected by a passcode, use the --video-password option', expected=True)
|
||||||
|
is_meeting = form.get('useWhichPasswd') == 'meeting'
|
||||||
|
validation = self._download_json(
|
||||||
|
base_url + 'rec/validate%s_passwd' % ('_meet' if is_meeting else ''),
|
||||||
|
play_id, 'Validating passcode', 'Wrong passcode', data=urlencode_postdata({
|
||||||
|
'id': form[('meet' if is_meeting else 'file') + 'Id'],
|
||||||
|
'passwd': password,
|
||||||
|
'action': form.get('action'),
|
||||||
|
}))
|
||||||
|
if not validation.get('status'):
|
||||||
|
raise ExtractorError(validation['errorMessage'], expected=True)
|
||||||
|
webpage = self._download_webpage(url, play_id)
|
||||||
|
|
||||||
|
data = self._parse_json(self._search_regex(
|
||||||
|
r'(?s)window\.__data__\s*=\s*({.+?});',
|
||||||
|
webpage, 'data'), play_id, js_to_json)
|
||||||
|
|
||||||
|
return {
|
||||||
|
'id': play_id,
|
||||||
|
'title': data['topic'],
|
||||||
|
'url': data['viewMp4Url'],
|
||||||
|
'width': int_or_none(data.get('viewResolvtionsWidth')),
|
||||||
|
'height': int_or_none(data.get('viewResolvtionsHeight')),
|
||||||
|
'http_headers': {
|
||||||
|
'Referer': base_url,
|
||||||
|
},
|
||||||
|
'filesize_approx': parse_filesize(data.get('fileSize')),
|
||||||
|
}
|
@ -39,6 +39,7 @@ import zlib
|
|||||||
from .compat import (
|
from .compat import (
|
||||||
compat_HTMLParseError,
|
compat_HTMLParseError,
|
||||||
compat_HTMLParser,
|
compat_HTMLParser,
|
||||||
|
compat_HTTPError,
|
||||||
compat_basestring,
|
compat_basestring,
|
||||||
compat_chr,
|
compat_chr,
|
||||||
compat_cookiejar,
|
compat_cookiejar,
|
||||||
@ -2879,12 +2880,60 @@ class YoutubeDLCookieProcessor(compat_urllib_request.HTTPCookieProcessor):
|
|||||||
|
|
||||||
|
|
||||||
class YoutubeDLRedirectHandler(compat_urllib_request.HTTPRedirectHandler):
|
class YoutubeDLRedirectHandler(compat_urllib_request.HTTPRedirectHandler):
|
||||||
if sys.version_info[0] < 3:
|
"""YoutubeDL redirect handler
|
||||||
|
|
||||||
|
The code is based on HTTPRedirectHandler implementation from CPython [1].
|
||||||
|
|
||||||
|
This redirect handler solves two issues:
|
||||||
|
- ensures redirect URL is always unicode under python 2
|
||||||
|
- introduces support for experimental HTTP response status code
|
||||||
|
308 Permanent Redirect [2] used by some sites [3]
|
||||||
|
|
||||||
|
1. https://github.com/python/cpython/blob/master/Lib/urllib/request.py
|
||||||
|
2. https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/308
|
||||||
|
3. https://github.com/ytdl-org/youtube-dl/issues/28768
|
||||||
|
"""
|
||||||
|
|
||||||
|
http_error_301 = http_error_303 = http_error_307 = http_error_308 = compat_urllib_request.HTTPRedirectHandler.http_error_302
|
||||||
|
|
||||||
def redirect_request(self, req, fp, code, msg, headers, newurl):
|
def redirect_request(self, req, fp, code, msg, headers, newurl):
|
||||||
|
"""Return a Request or None in response to a redirect.
|
||||||
|
|
||||||
|
This is called by the http_error_30x methods when a
|
||||||
|
redirection response is received. If a redirection should
|
||||||
|
take place, return a new Request to allow http_error_30x to
|
||||||
|
perform the redirect. Otherwise, raise HTTPError if no-one
|
||||||
|
else should try to handle this url. Return None if you can't
|
||||||
|
but another Handler might.
|
||||||
|
"""
|
||||||
|
m = req.get_method()
|
||||||
|
if (not (code in (301, 302, 303, 307, 308) and m in ("GET", "HEAD")
|
||||||
|
or code in (301, 302, 303) and m == "POST")):
|
||||||
|
raise compat_HTTPError(req.full_url, code, msg, headers, fp)
|
||||||
|
# Strictly (according to RFC 2616), 301 or 302 in response to
|
||||||
|
# a POST MUST NOT cause a redirection without confirmation
|
||||||
|
# from the user (of urllib.request, in this case). In practice,
|
||||||
|
# essentially all clients do redirect in this case, so we do
|
||||||
|
# the same.
|
||||||
|
|
||||||
# On python 2 urlh.geturl() may sometimes return redirect URL
|
# On python 2 urlh.geturl() may sometimes return redirect URL
|
||||||
# as byte string instead of unicode. This workaround allows
|
# as byte string instead of unicode. This workaround allows
|
||||||
# to force it always return unicode.
|
# to force it always return unicode.
|
||||||
return compat_urllib_request.HTTPRedirectHandler.redirect_request(self, req, fp, code, msg, headers, compat_str(newurl))
|
if sys.version_info[0] < 3:
|
||||||
|
newurl = compat_str(newurl)
|
||||||
|
|
||||||
|
# Be conciliant with URIs containing a space. This is mainly
|
||||||
|
# redundant with the more complete encoding done in http_error_302(),
|
||||||
|
# but it is kept for compatibility with other callers.
|
||||||
|
newurl = newurl.replace(' ', '%20')
|
||||||
|
|
||||||
|
CONTENT_HEADERS = ("content-length", "content-type")
|
||||||
|
# NB: don't use dict comprehension for python 2.6 compatibility
|
||||||
|
newheaders = dict((k, v) for k, v in req.headers.items()
|
||||||
|
if k.lower() not in CONTENT_HEADERS)
|
||||||
|
return compat_urllib_request.Request(
|
||||||
|
newurl, headers=newheaders, origin_req_host=req.origin_req_host,
|
||||||
|
unverifiable=True)
|
||||||
|
|
||||||
|
|
||||||
def extract_timezone(date_str):
|
def extract_timezone(date_str):
|
||||||
|
@ -1,3 +1,3 @@
|
|||||||
from __future__ import unicode_literals
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
__version__ = '2021.02.10'
|
__version__ = '2021.04.26'
|
||||||
|
Loading…
Reference in New Issue
Block a user